At this point, I can run honggfuzz against pdftotext, but it takes a bit of manual effort to get things started. I promised in part one that I’d

Using Nix to Fuzz Test a PDF Parser (Part Two)

submited by

Style Pass

2024-11-10 11:00:06

At this point, I can run honggfuzz against pdftotext, but it takes a bit of manual effort to get things started. I promised in part one that I’d get all of the installation and fuzzing down to a single command.

While I’m automating, I can probably do better than a single PDF. For fuzzing, my goal is to have an expansive variety of PDFs that exercise different parts of the PDF file format.

The best collection of difficult-to-parse PDFs I found was in Mozilla’s pdf.js project. It contains 700 PDFs that have caused parsing bugs in their tool, so it’s likely that these same PDFs will trip up other PDF parsers.

Sidenote: In addition to the PDFs themselves, the pdf.js repo contains several hundred .link files that contain URLs of external PDFs. I can’t think of a simple way of pulling those external PDFs into a Nix pipeline. I welcome suggestions on integrating them, as they would achieve higher fuzzing coverage.

The new build step gives me an initial corpus of edge case PDFs that will hopefully exercise less frequent code paths of any PDF parsing code.