Build a serverless ACID database with this one neat trick (atomic PutIfAbsent)

submited by
Style Pass
2024-09-29 20:00:08

Delta Lake is an open protocol for serverless ACID databases. Due to its simplicity, scalability, and the number of open-source implementations, it's quickly becoming the DuckDB of serverless transactional databases. Iceberg is a contender too, and is similar in many ways. But since Delta Lake is simpler (simple != better) that's where we'll focus in this post.

Thanks to its simplicity, in this post we'll implement a Delta Lake-inspired serverless ACID database in 500 lines of Go code with zero dependencies. It will support creating tables, inserting rows into a table, and scanning all rows in a table. All while allowing concurrent readers and writers and achieving snapshot isolation.

There are other critical parts of Delta Lake we'll ignore: updating rows, deleting rows, checkpointing the transaction metadata log, compaction, and probably much more I'm not aware of. We must start somewhere.

Delta Lake writes immutable data files to blob storage. It stores the names of new data files for a transaction in a metadata file. It handles concurrency (i.e. achieves snapshot isolation) with an atomic PutIfAbsent operation on the metadata file for the transaction.

Leave a Comment