A couple of weeks ago I was a guest on The Geek Narrator to talk about formal verification. I spoke a lot about how modeling and simulation are tremendously powerful tools, whether you use a formal verification language (such as TLA+) or just a Python script.
This post goes through a real world example of how I used modeling and simulation to understand the statistical properties of a proposed distributed system protocol, using both Python and TLA+. There is a talk version of this post from TLA+ Conf 2022.
During the process of writing and executing a model of a design, the system designer can uncover a number of things that they may not have considered while writing the design document. Optimizations can arise, as well as unexpected gotchas. But while TLA+ is amazing at finding correctness and liveness bugs in designs, it hasn’t always been useful for understanding other properties. There can be a host of other statistical properties that remain unexplored, which can lead to undesirable surprises later on.
A model can be written in any language, including TLA+. Traditionally, TLA+ has not been very good at answering these questions, but its model checker, TLC, now has the necessary capabilities to run simulations to obtain statistical data.