Tesla and xAI, Elon Musk's companies, will bring online $10 billion worth of training compute capacity by the end of this year, as observed by Sawyer Merritt, a co-founder of TwinBirch and a Tesla investor. And yet, it probably means that both companies will be somewhat behind schedule set by Elon Musk.
Elon Musk and his companies have recently been actively making announcements about AI supercomputers, so indeed, we are talking about huge investments.
In July, xAI began AI training using the Memphis Supercluster, which is set to integrate 100,000 liquid-cooled H100 GPUs. This system requires a gargantuan amount of power, drawing at least 150 MW, as the 100,000 H100 GPUs alone account for around 70 MW. The system's total cost is unknown, though GPUs alone would cost around $2 billion (if bought at $20,000 per unit), and typically, AI GPUs account for half of the cost of the whole system.
In late August, Tesla unveiled its Cortex AI cluster, equipped with an impressive 50,000 Nvidia H100 GPUs and 20,000 of Tesla's own Dojo AI wafer-sized chips. The Dojo cluster is projected to train Tesla's full self-drive (FSD) capability, so this machine is strategically vital for the company.