Software applications with growing data eventually reach a point where they start experiencing memory, storage, or network limitations, which impact the overall performance and availability of the system. The data no longer fits on the existing node(s). To overcome these limitations, data must be moved to a bigger machine (monolith) or split into chunks and distributed across multiple machines/shards (sharding). The goal with sharding is to make sure data is distributed across enough machines/shards to avoid any resource constraints that could impact the performance of the data operations.
Once data is sharded to multiple nodes, there can be scenarios such as thundering herd problem or co-location of high throughput accounts, where most read or write requests are on the same shard. This can lead to the following situations:
Memory Saturation: The impacted shard will run out of RAM to hold the warmed data. To address this, RAM will start paging the data in the memory to the disk. Reading that data will require pulling those pages back from the disk to the memory. This to-and-fro from the memory to the disk and vice versa will impact the overall read and write performance of the shard, which will end up creating a backlog of requests or, even worse, making the shard unresponsive.