We’re given a ZIP file
containing the first 26 chapters of “Gadsby: A Story of Over 50,000 Words Without Using the Letter
“E”” by Ernest Vincent Wright.
The text of the book in the ZIP file is entirely in uppercase, but it is identical to the text on Project
Gutenberg if converted to uppercase. This led us to
believe that the challenge is about the ZIP file itself, not the contents. We examined the file using Kaitai
Struct’s WebIDE and tools like
zipinfo. After some time we realized that we got two
hints from the challenge’s name and description:
- “what’s more important is not what you have, but what you’re missing”: we’re looking for something missing, the ZIP’ed book omits the letter ‘E’.
- “David and the Tree”: the ZIP file is compressed using the DEFLATE algorithm which uses (David) Huffman trees to compress the data.
This led us to believe that the letter
E, together with its byte code representing its compressed version,
might be present in the Huffman tree for each of the deflated files. Each byte code for the compressed letter
E could then encode a character of the flag. Note that normally there would be no reason to include the
E in the Huffman tree as the uncompressed text does not contain the letter
After some really ugly hacking, we indeed got the flag this way:
In case you’re interested in the quick and dirty way we solved this challenge: we wrote a simple script that
gets the byte code for the letter
E from the Huffman tree in two steps. First, we extract the DEFLATE bytes
from the ZIP file using Kaitai Struct’s ZIP parser. Next, we
parse the DEFLATE bytes using a
deflatedecompress.py script found on
Github to get
to its Huffman tree. We hacked our way into the
deflatedecompress.py script to access the Huffman tree
by adding a global variable
hacky = litlencode right after the line
litlencode = CanonicalCode(codelens[ : numlitlencodes]).
While the byte codes associated with the letter
E are 9 bits long and do not directly encode ASCII
characters, we knew we were on the right track as the letter
E should normally not have appeared in the
Huffman tree. It turns out that we have to drop the most significant bit and reverse order of the remaining
bits to get to the character of the flag.
#!/usr/bin/env python3 import io import deflatedecompress # modified from https://github.com/nayuki/Simple-DEFLATE-decompressor/blob/master/python/deflatedecompress.py import zip # from https://formats.kaitai.io/zip/python.html parsed_zip = zip.Zip.from_file('challenge.zip') flag =  for section_number, section in enumerate(parsed_zip.sections): if isinstance(section.body, zip.Zip.LocalFile): deflated_file = section.body.body else: continue deflated_buf = deflatedecompress.BitInputStream(io.BytesIO(deflated_file)) inflated_buf = deflatedecompress.Decompressor.decompress_to_bytes(deflated_buf) table = deflatedecompress.hacky._code_bits_to_symbol for k, v in table.items(): if v == ord('E'): k = k & 0xFF # reverse bit order rk = 0 for _ in range(8): rk = rk << 1 if k & 1 == 1: rk = rk | 1 k = k >> 1 flag.append(rk) print(bytes(flag).decode())