Inference Time Scaling Laws - by Tanay Jaipuria

submited by

Style Pass

2024-11-02 21:30:04

This is a weekly newsletter about the business of the technology industry. To receive Tanay’s Newsletter in your inbox, subscribe here for free:

This week I’ll be discussing OpenAI’s recently released o-1 model, which represents a new leap in the reasoning capabilities of LLMs, and the consequences of the new inference-time scaling laws that underpin it. But first, a bit of background.

Scaling laws for LLMs are pretty well understood at this point – as compute and dataset sizes increase and model parameters increase, model performance improves.

Already today hundreds of millions are being spent on pre-training models, but the expectation is that this number is going to only go up, as noted by Mark Zuckerberg on Meta’s spend and by Dario from Anthropic.

“The amount of compute needed to train Llama 4 will likely be almost 10 times more than what we used to train Llama 3, and future models will continue to grow beyond that.” — Mark Zuckerberg, Meta CEO