At Community.com, we had a problem where a bunch of workers needed to pick up and process a large amount of data from the same DB table in a highly scalable manner, with high throughput. We wanted to be able to:
I came up with a solution that has been in production, at scale, for quite awhile now. While it was my design, other great engineers at Community deserve the credit for the excellent implementation and for rounding off some of the rough edges. It was a team effort! There are undoubtedly other ways to solve this problem, but I thought this was pretty interesting and I myself have never seen anyone else do it before. I am not claiming it’s entirely novel. But this is what we did to nicely solve this problem.
We have workers that process outbound SMS campaigns. They need to be able to take a single message and dispatch it to a million plus phone numbers. And, they need to do that at most once for each number. These workers have access to a data store that maps some data related to the campaign to a set of recipients. The workers don’t actually do the SMS, but they do the heavy lifting: the expansion of one message to millions.
We wanted to be able to divide that audience up into chunks of a size large enough to be efficient for querying and processing, and small enough that a worker could shut down mid-campaign and not lose anything. Ideally each worker would pick up an amount of work, crank through it, and then process another piece of work, without checking in with anyone else.