Today's blog is announcing SwissMap, a new Golang hash table based on SwissTable that is faster and uses less memory than Golang's built-in map. We'll cover the motivation, design and implementation of this new package and give you some reasons to try it. This blog is part of our deep-dive series on the Go programming language. Past iterations include posts about concurrency, "inheritance", and managing processes with Golang.
At DoltHub, we love Golang and have used it to build DoltDB, the first and only SQL database you can branch, diff and merge. Through our experience building Dolt, we've gained some expertise in the language. We found a lot of features we appreciate and a few more sharp edges that have bitten us. One of the hallmarks of the Go language is its focus on simplicity. It strives to expose a minimal interface while hiding a lot of complexity in the runtime environment. Golang's built-in map is a great example of this: its read and write operations have dedicated syntax and its implementation is embedded within the runtime. For most use cases, map works great, but its opaque implementation makes it largely non-extensible. Lacking alternatives, we decided to roll our own hash table.
Hash tables are used heavily throughout the Dolt codebase, however they become particularly performance critical at lower layers in stack that deal with data persistence and retrieval. The abstraction responsible for data persistence in Dolt is called a ChunkStore. There are many ChunkStore implementations, but they share a common set of semantics: variable-length byte buffers called "chunks" are stored and fetched using a byte content-addressable hash. Dolt's table indexes are stored in Prolly Trees a tree-based data structure composed of these variable-sized chunks. Higher nodes in a Prolly tree reference child nodes by their hash. To dereference this hash address, a ChunkStore must use a "chunk index" to map hash addresses to physical locations on disk. In contrast, traditional B-tree indexes use fixed-sized data pages and parent nodes reference children directly by their physical location within an index file.