The last couple of years of Gen AI frenzy have brought us some undeniably cool new products, like Copilot and Suno. One thing they all have in common

The future of AI needs more flexible GPU capacity | Modal Blog

submited by
Style Pass
2024-10-25 20:30:09

The last couple of years of Gen AI frenzy have brought us some undeniably cool new products, like Copilot and Suno. One thing they all have in common is they demand lots of compute – in particular GPUs. But the supply of GPUs is constrained. This supply-demand imbalance has caused the market for cloud GPUs to behave very differently than other cloud markets.

Why is that? What should we do? And what can we expect going forward? Should startups keep buying long-term GPU reservations from cloud vendors? Or will there be other options in the future?

But training is a cost center and eventually you need to recoup that cost through real revenue. How do you do that? Enter inference – the less sexy but money-making sibling of training.

So why is most GPU demand driven by training even though inference is where you make the money? I think a lot of it reflects where we are in the cycle – there’s an expectation that the revenue potential for inference is big, but in order to get this, you have to spend a lot of money on training.

The economics of this – high upfront capital, but large potential – is something VCs understand quite well, so I think this explains why model builders have had no challenges raising lots of dollars. But for the economics of this to make sense eventually, we need to see a much larger % of GPU spend going towards inference.

Leave a Comment