Polyglot-HTML-ZIP-PNG

submited by
Style Pass
2024-12-27 23:30:03

This article is a summary of the presentation available here. The resulting demo file can be downloaded at the end of the article.

SingleFile, a tool for web archiving, commonly stores web page resources as data URIs. While effective, this approach can be inefficient for large resources. A more elegant solution emerges through combining the ZIP format’s flexible structure with HTML. We’ll then take it a step further by encapsulating this entire structure within a PNG file.

The ZIP format provides an organized structure for storing multiple files. It is based on a structure with file entries followed by a central directory. The central directory acts as a table of contents, containing headers with metadata about each file entry. These headers include crucial information such as file names, sizes, checksums, and file entry offsets. What makes ZIP particularly versatile is its flexibility with data placement. The format enables data to be prepended before the ZIP content by setting an offset greater than 0 for the first file entry, while allowing up to 64KB of data to be appended afterward (i.e. ZIP comment). This feature makes it well-suited for creating polyglot files.

Based on this knowledge, we can create a self-extracting archive that works in web browsers. The page to be displayed and its resources are stored in a ZIP file. By storing the ZIP data in an HTML comment, we can create a self-extracting page that extracts and displays the contents of the ZIP file.

Leave a Comment