My team and I are building  Fintool, Warren Buffett as a service. It's a set of AI agents that analyze massive amounts of financial data and documents

But But, You Were Supposed to Be a GPT Wrapper?!

submited by
Style Pass
2025-01-25 00:00:12

My team and I are building Fintool, Warren Buffett as a service. It's a set of AI agents that analyze massive amounts of financial data and documents to assist institutional investors in making investment decisions. To simplify for customers, we explain Fintool as a sort of ChatGPT on SEC filings and earnings calls.

We got our fair share of "yOU aRe JuST a GPT wRapPer" from people who had no clue what they were talking about but wanted to sound smart and provocative. Anyway! For more serious people I thought it would be nice to disclose our infrastructure and unique challenges.

Our goal is to ingest as much financial data as possible—ranging from news, management presentations, internal notes, broker research, market data, rating agency reports, alternative data, internal data and much more. We started with SEC filings, but our infrastructure is designed to scale and adapt, with no limit to the types of data sources it can handle.

Our data ingestion pipeline uses Apache Spark to efficiently process vast amounts of structured and unstructured data. The primary data source is the SEC database, which provides, on average, around 3,000 filings daily. We've built a custom Spark job to pull data from the SEC, process HTML files, and distribute the workload across our Spark cluster for real-time ingestion. With SEC filings and earnings calls alone, we manage 70 million chunks, 2 million documents, and around 5 TB of data in Databricks for every ten years of data.

Leave a Comment