LFFS: Simplicity vs Efficiency

submited by
Style Pass
2024-12-27 03:00:07

I knew it was gonna happen, just knew it: I finally hit the first part of writing the book where I have to make a choice between something being easy to describe and understand up front vs long-term efficiency or flexibility.

When you have a state-based CRDT, you have a merge operation. The details don't really matter for this explanation, but basically if merge is commutative, idempotent, and associative, you can make use it for a CRDT and syncing state is as simple as:Exchanging states with a peer.Both sides call merge to get the final state.

With a few more supporting details, that's easy to understand and explain! (Even easier because in the book the only peer is a sync server.)

However this sync protocol gives up efficiency for that simplicity: you have to send all the data back and forth over the network. Most of the time, you're gonna be sending a bunch of duplicate data with a few changes, which means you're actually sending all the data twice.

You can improve efficiency by sending only the things that changed: if I've got a grow-only set, and I add 1 to it, I only send 1 to my peer. In fact, you can keep track of all the tiny pieces of data as separate instances of the CRDT, and only send the ones you care about. For example, the set {1, 2, 3} and the sets {1}, {2}, and {3} (when merged) represent the same data.

Leave a Comment