A Log-Structured Merge tree, or LSM, is a popular data structure for storage engines. It’s what is used by RocksDB, which is sort of the poster chil

Fine! I'll Play With Skiplists

submited by
Style Pass
2024-11-25 20:00:06

A Log-Structured Merge tree, or LSM, is a popular data structure for storage engines. It’s what is used by RocksDB, which is sort of the poster child LSM. At a high level, the way an LSM works is that new writes get added to a memory-only index called a memtable and a durable file called a write-ahead log, or WAL. The memtable is used to serve queries and the WAL is used to provide durability. When one of those things gets too big, the entire in-memory index gets flushed out to an immutable indexed file called a sorted string table, or sstable, or sst. Since that file is durable, we no longer need it to live in the WAL, and since it is indexed, we no longer need it to live in the memtable, so we can clear both of those out and reclaim their space.

I have a personal, pedagogical interest in trying to express the idea of an LSM in the simplest possible terms. It's generally my view that from a bird's eye view LSMs are simpler than on-disk Btrees, largely because they are more composable and a lot more of their data is immutable (so outside of the one really mutable part, the memtable, concurrency is easy). This also makes them harder to tune for particular workloads since there’s a lot of moving pieces. But generally, you can abstract the various subcomponents of an LSM pretty well.

Leave a Comment