While considering concurrent queue design I came up with a generic, lock-free queue that fits in a 32-bit integer. The queue is “generic” in that a single implementation supports elements of any arbitrary type, despite an implementation in C. It’s lock-free in that there is guaranteed system-wide progress. It can store up to 32,767 elements at a time — more than enough for message queues, which must always be bounded. I will first present a single-consumer, single-producer queue, then expand support to multiple consumers at a cost. Like my lightweight barrier, I’m not presenting this as a packaged solution, but rather as a technique you can apply when circumstances call.
How can the queue store so many elements when it’s just 32 bits? It only handles the indexes of a circular buffer. The caller is responsible for allocating and manipulating the queue’s storage, which, in the single-consumer case, doesn’t require anything fancy. Synchronization is managed by the queue.
Like a typical circular buffer, it has a head index and a tail index. The head is the next element to be pushed, and the tail is the next element to be popped. The queue storage must have a power-of-two length, but the capacity is one less than the length. If the head and tail are equal then the queue is empty. This “wastes” one element, which is why the capacity is one less than the length of the storage. So already there are some notable constraints imposed by this design, but I believe the main use case for such a queue — a job queue for CPU-bound jobs — has no problem with these constraints.