tabula-java is a library for extracting tables from PDF files — it is the table extraction engine that powers Tabula (repo). You can use tabula-java

tabulapdf / tabula-java

submited by
Style Pass
2021-06-07 06:00:04

tabula-java is a library for extracting tables from PDF files — it is the table extraction engine that powers Tabula (repo). You can use tabula-java as a command-line tool to programmatically extract tables from PDFs.

Download a version of the tabula-java's jar, with all dependencies included, that works on Mac, Windows and Linux from our releases page.

It also includes a debugging tool, run java -cp ./target/tabula-1.0.2-jar-with-dependencies.jar technology.tabula.debug.Debug -h for the available options.

JVM start-up time is a lot of the cost of the tabula command, so if you're trying to extract many tables from PDFs, you have a few options for speeding it up:

You can also support our continued work on tabula-java with a one-time or monthly donation on OpenCollective. Organizations who use tabula-java can also sponsor the project for acknowledgement on our official site and this README.

Leave a Comment
Related Posts