We talk about scale a lot here at The Next Platform, but there are many different aspects to this beyond lashing a bunch of nodes together and counting aggregate peak flops.
For instance, for the past decade, and certainly since the rise of GPU-accelerated AI training in the cloud, the most powerful HPC system in the United States at any given time could fit in the back pocket of one of the datacenters inside one of the multi-datacenter regions of one of the multi-region facilities of any one of the big three cloud providers (two of which are also hyperscalers serving out applications to billions of users). That would be Amazon Web Services, Microsoft Azure, and Google Cloud, in the order of infrastructure size.
It is natural enough to wonder why all HPC is not done in the cloud, or at least all in one place for those organizations that want or need to control their own data and infrastructure.
If you pushed economics to its logical extreme and if you assume latency did not matter and network security for external users could be guaranteed – both of which are dubious assumptions, but this is a thought experiment – then the US federal government would build – or rather probably pay UT Battelle to build and manage – one giant HPC system located in Lawrence, Kansas or in Omaha, Nebraska that the five HPC centers at the US Department of Defense, the seventeen labs of the US Department of Energy, and the six labs of the National Science Foundation would share.