If you've never heard of "NUMA," you aren't alone – it's a bit of an IYKYK topic that very few engineers encounter in their entire careers. But if you've ever worked on an extremely high core-count server, these four letters will put shivers down your spine.
NUMA is the dark magic that powers extremely high-performance applications on extremely large servers – and the people who work on those systems rarely have the time to write blog posts explaining how it works. This contributes to the overall air of mystery surrounding the topic, as there are very few good introductory materials on it, leaving the whole thing kind of inscrutable.
But I find myself with some holiday time on my hands, so here you go: a deep dive into how Expensify's core hardware works – down to the chip level – with an explanation of how that enables a critical feature of our software to work. I hope you'll find it useful as you architect your own NUMA-balanced applications on your own high-density servers!
(Though a bit hard to visualize at first, the connections are easier to understand if you imagine this as a 2x2x2 cube: CPUs 0-3 are on the "bottom" and CPUs 4-7 are on "top" – imagine you just slide the right tray of 4 CPUs overtop the left tray. In reality, they aren't physically organized like that, but it helps explain why the CPUs are connected in this fashion. At least, I think it does.)