Aryn’s goal is to provide analytics over unstructured documents, typically PDFs. We believe that integral to that goal is the ability to break down these documents into a more structured form that we can use to extract more information than you might be able to by simply stripping out all of the text and embedding it with a language model. Since documents are typically intended for human consumption, and humans use eyeballs to ingest information from documents, we believe that the first step of computer document processing is fundamentally a computer vision problem. To that end, we created a computer vision model to segment documents into a collection of physical, typed elements representing how a real person would likely parse the document.
With many documents, a large portion of customer questions can be answered by a table or set of tables within the document. If I have Boeing’s 10-K form from 2024 and I want to know how many airplanes they sold, the answer is in a table. If I have the municipal budget report of Wilmington, NC, and I want to know what percent of their budget they spent on policing, the answer is in a table. Like with documents, the tables themselves are laid out to optimize for human consumption - that is, visually. I can’t ask a PDF for an indexable data structure representing a particular table on a particular page because all the PDF sees is a collection of rectangles with text inside them. For Aryn to query these tables, we first need a way to turn a picture of a table into such a data structure.