Clear picture of how well RavenDB performs crucial tasks

Postmortem: 250% cluster-wide transaction speed improvement

submited by
Style Pass
2024-07-08 13:30:04

Clear picture of how well RavenDB performs crucial tasks

Announcements, blog posts and other solutions in one place

Clear picture of how well RavenDB performs crucial tasks

This tale starts with a customer opening a support ticket. His application is struggling to keep up with the load during peak hours. Those sorts of tickets are taken very seriously, so we sat down to understand exactly what is going on. The scenario was simple, under load, the customer started getting errors similar to this:

That isn’t a normal timeout, which usually indicates some slow I/O or network problems. This timeout in particular came from the cluster-wide transaction code. The scenario was now clearer, under heavy concurrent load, RavenDB will start timing out for cluster-wide transactions.

A cluster-wide transaction is more expensive than a single-node transaction, but the numbers that we were seeing in this scenario were a lot lower than what we would normally expect. We dug deeper, trying to figure out what is actually going on. There were a number of documents that were involved in those transactions, and when we looked into those, we figured out at least part of the problem.

Leave a Comment