Apache DataFusion — Apache DataFusion documentation

submited by

Style Pass

2025-01-12 19:00:04

The documentation on this site is for the core DataFusion project, which contains libraries and binaries for developers building fast and feature rich database and analytic systems, customized to particular workloads. See use cases for examples.

“Out of the box,” DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. Python Bindings are also available.

DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can customize DataFusion at almost all points including additional data sources, query languages, functions, custom operators and more. See the Architecture section for more details.