Disclaimer: I am by no means affiliated with Apache Iceberg or any vendor that sells hosted Iceberg service. The only connection is that I build RisingWave which offers real-time SQL stream processing solutions for Iceberg and more.
Earlier this week, I attended the Small Data Conference in San Francisco. If you’re unfamiliar with the name of this conference, you’ve probably at least heard of popular small data projects in recent years, like DuckDB. The Small Data Conference is an event organized by enthusiasts and contributors of these kinds of small data projects.
At this conference, people are discussing projects like DuckDB and SQLite, but there’s also plenty of talk about globally distributed systems like Tigris and Fly.io, new BI tools like Outerbase Evidence, and Fabi.ai, and even AI systems like Fireworks.ai and LangChain. However, one system keeps coming up in discussions across the event, even though there’s no speaker advocating for it. That system is Apache Iceberg.
Iceberg’s popularity was expected, but I was surprised to see it come up so much at a conference focusing on small data. After all, Iceberg is typically associated with big data projects. So why is it being discussed here? And more importantly, is Iceberg really ready to fit into a small data stack?