When building database software which will run in some of the world’s most demanding environments, testing it in ways that simulate these customer e

You Ran the Operational Database on What?! Testing Spot Instances

submited by
Style Pass
2024-09-22 16:30:03

When building database software which will run in some of the world’s most demanding environments, testing it in ways that simulate these customer environments is non-negotiable.  

There are several classes of problems which only reveal themselves at a larger scale.  For example, importing data may work well for 100 GiBs of data, but may fail to perform for 100 TiBs of data.  Similarly, achieving balanced performance across the nodes in a cluster may be straightforward with a three-node cluster, but could encounter problems when the cluster has 30 or 300 nodes - especially if multiple nodes are dropped or added concurrently.     

Testing large clusters is critical, but doing so at a scale that matches some of our largest customer environments can get costly. What if a totally different – and highly cost-effective -- approach to large cluster testing revealed itself?  At Cockroach Labs we decided to take a non-traditional path, and things got very interesting.      

As builders of a distributed database, CockroachDB, many of our customers run clusters that have dozens of nodes,  and some of our largest customers run clusters with hundreds of nodes.  While scale testing on hundreds of nodes is feasible for short durations, large clusters can get expensive very quickly.  For example, here are the costs of a moderately sized 40 node cluster on Google Cloud:

Leave a Comment