Sneller is a high-performance vectorized SQL engine for JSON that runs directly on top of object storage. Sneller is optimized to handle huge TB-sized

Vectorized SQL for JSON at scale: fast, simple, schemaless

submited by
Style Pass
2022-05-13 13:30:05

Sneller is a high-performance vectorized SQL engine for JSON that runs directly on top of object storage. Sneller is optimized to handle huge TB-sized JSON (and more generally, semi-structured data) including deeply nested structures/fields without requiring a schema to be specified upfront. It is particularly well suited for the rapidly growing world of event data such as data from Security, Observability, Ops, Product Analytics and Sensor/IoT data pipelines. Under the hood, Sneller operates on ion, a compact binary representation of the original JSON data.

Sneller's query performance derives from pervasive use of SIMD, specifically AVX-512 assembly in its 250+ core primitives. The main engine is capable of processing many lanes in parallel per core for very high processing throughput. This eliminates the need to pre-process JSON data into an alternate representation - such as search indices (Elasticsearch and variants) or columnar formats like parquet (as commonly done with SQL-based tools). Combined with the fact that Sneller's main 'API' is SQL (with JSON as the primary output format), this greatly simplifies processing pipelines built around JSON data.

Sneller extends standard SQL syntax via PartiQL by supporting path expressions to reference nested fields/structures in JSON. For example, the . operator dereferences fields within structures. In combination with normal SQL functions/operators, this makes for a far more ergonomic way to query deeply nested JSON than non-standard SQL extensions. Additionally, Sneller implements a large (and growing!) number of built-in functions from other SQL implementations.

Leave a Comment