Nimble (formerly known as “Alpha”) is a new columnar file format for large datasets created by Meta. Nimble is meant to be a replacement for file

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-04-25 22:00:03

Nimble (formerly known as “Alpha”) is a new columnar file format for large datasets created by Meta. Nimble is meant to be a replacement for file formats such as Apache Parquet and ORC. 

Wide: Nimble is better suited for workloads that are wide in nature, such as tables with thousands of columns (or streams) which are commonly found in feature engineering workloads and training tables for machine learning. 

Extensible: Since the state-of-the-art in data encoding evolves faster than the file layout itself, Nimble decouples stream encoding from the underlying physical layout. Nimble allows encodings to be extended by library users and recursively applied (cascading). 

Parallel: Nimble is meant to fully leverage highly parallel hardware by providing encodings which are SIMD and GPU friendly. Although this is not implemented yet, we intend to expose metadata to allow developers to better plan decoding trees and schedule kernels without requiring the data streams themselves. 

Unified: More than a specification, Nimble is a product. We strongly discourage developers to (re-)implement Nimble’s spec to prevent environmental fragmentation issues observed with similar projects in the past. We encourage developers to leverage the single unified Nimble library, and create high-quality bindings to other languages as needed.

Leave a Comment