Using DuckDB+dbt, FastAPI for real-time analytics

submited by
Style Pass
2024-07-09 11:00:09

In this post I’ll demonstrate how to use DuckDB, an in memory SQL engine, optimized to perform on big data within your laptop, to serve a real-time analytics use case, served by FastAPI and using dbt as the data build tool to manage the pipeline functionality.

DuckDB is the component that will actually be performing all the work, it is a fast (and getting faster) in memory database that uses a dialect based on Postgres, the Python client has some nice fancy features that we will take advantage of in this article.

dbt is a tool used to perform the transform in Extract Load Transform (ELT) it allows you to write out a set of SQL queries and deploy them into a database and several other excellent quality of life features that bring SQL into the 21st century with respect to the Software Development Life-Cycle (SDLC). In this project it will be used to deploy a set of views to a DuckDB database file that the API will consume.

FastAPI is a great REST API server for Python, it integrates well with pydantic, allowing you to write fast and well typed APIs quickly and efficiently. We will POST the source data to this endpoint which it will use to invoke duckdb to execute the pipeline defined using dbt.

Leave a Comment