The other day, my flatmate mentioned that a colleague had a 700                 page PDF with a bunch of tables that would be nice to

Converting a 700 Page PDF to Excel — Martin Klepsch

submited by
Style Pass
2025-01-17 09:30:02

The other day, my flatmate mentioned that a colleague had a 700 page PDF with a bunch of tables that would be nice to have in a spreadsheet.

With the understanding that those problems are a thing of the past and poor impulse control, we went ahead and started playing with Gemini Advanced and other LLMs trying to extract the data. This blogpost details some of the steps that were necessary to do that reliably.

While I’ve recently been toying with Gemini Flash 2.0 it didn’t take long to bump against its limits. No matter how many things I called out in the prompt, I couldn’t get it to put empty cells into the CSV as an empty string "". Instead, it would simply omit the cell, making every cell slide left and corrupting the entire CSV.

For fun I even tried adding more vertical cell separation by quickly drawing in some black lines but that didn’t help much. Not that it would’ve been a feasible approach anyways.

Leave a Comment