Embedding runnable Python in PDFs

Tired of distributing code with your manuscript? How about distributing your code as your manuscript!
code
fun
dumb stuff with computers
Published

October 12, 2024

One reason I am excited about starting this blog is because I have a long backlog of “dumb stuff I have done with computers” that I want to share with the world. Here’s one of them.

To show you the idea, check out this PDF.

How it works

A little-known (I think) feature of Python is that you can run code directly from a .zip file containing a __main__.py. Check this out:

echo 'print("it works!")' > __main__.py
zip -q code.zip __main__.py
python code.zip
it works!

(See this part of the Python docs.)

Now consider the following two facts about the structure of ZIP and PDF files:

  1. PDF files start with a header that describes the version and some other information, and end with %%EOF to mark the end of the file.
  2. ZIP files end with a directory describing, via relative byte offsets, where each part of the data in the file is stored.

But wait. This means that if we just concatenate a PDF and a ZIP file, the result will still start with the PDF header, and be a PDF until %%EOF, and the last part of the file will be the ZIP directory, which describes in relative offsets where to find the ZIP data.

👀

So it’s a valid file, in both formats!

In short, you can make a file that is both a PDF and a ZIP file by simply concatenating the two, and since it is possible to create and run ZIP files as Python, that means you can create a file that both is a valid PDF and runs as Python!

Addendum

A surprising thing that somewhat undermines the above argument about file structure is that (at least on my machine) it actually still works if you concatenate them in the “wrong” order—the ZIP first, then the PDF! I guess the libraries for reading PDFs and ZIP files are really robust to weird/corrupt file structures. I wonder what possibilities this opens up. Could we make a single file that is a PNG, and a PDF, and a ZIP file? It is left for the reader to explore…