Everyone knows serious programs have to make data durable. You persist data on disk or in a database so it doesn’t disappear the second your program

Why Durable Execution Should Be Lightweight

submited by
Style Pass
2024-11-05 21:00:03

Everyone knows serious programs have to make data durable. You persist data on disk or in a database so it doesn’t disappear the second your program crashes or your server is restarted. But we take it for granted that programs themselves aren’t durable. When you restart your server, your data might be safe in the database, but any programs you were running are gone, and if you want them back, you have to restart them yourself.

Now, restarting your programs might be fine if they’re short-lived and stateless, but what if they’re long-lived or stateful?  Let’s say your server handles hotel reservations, and it was halfway through processing a reservation when it’s restarted. What happens to the reservation? Someone has to find the unfinished reservation and finish processing it, or a customer might find a room they paid for wasn’t actually booked. To use a more modern example, what if your server is indexing a batch of 10K documents for RAG, but is restarted after only finishing 4K?  Someone has to go back and index the other 6K documents, ideally without re-indexing the ones that have already completed.

‍Durable execution is a powerful solution to this problem of building highly reliable programs. At a high level, the idea is to persist the execution state of your program as it’s running. That way, if your program is ever crashed, interrupted, or restarted, it can automatically recover to where it left off. Currently, durable execution is usually implemented “as a service” by external orchestrators that manage your program’s control flow. In this post, we’ll propose a new lightweight implementation of durable execution contained entirely in an open-source library you can add to your program.

Leave a Comment