In our pursuit of becoming a better full service research firm, we’ve moved off Substack. For any questions please read https://semianalysis.com

Scaling Laws – O1 Pro Architecture, Reasoning Training Infrastructure, Orion and Claude 3.5 Opus “Failures” – SemiAnalysis

submited by

Style Pass

2024-12-12 14:30:39

In our pursuit of becoming a better full service research firm, we’ve moved off Substack. For any questions please read https://semianalysis.com/faq/#substack

There has been an increasing amount of fear, uncertainty and doubt (FUD) regarding AI Scaling laws. A cavalcade of part-time AI industry prognosticators have latched on to any bearish narrative they can find, declaring the end of scaling laws that have driven the rapid improvement in Large Language Model (LLM) capabilities in the last few years. Journalists have joined the dogpile and have supported these narratives, armed with noisy leaks filled with vague information around the failure of models to scale successfully due to alleged underperformance. Other skeptics point to saturated benchmarks, with newer models showing little sign of improvement said benchmarks. Critics also point to the exhaustion of available training data and slowing hardware scaling for training.

Despite this angst, large AI Labs and hyperscalers’ accelerating datacenter buildouts and capital expenditure speaks for itself. From Amazon investing considerable sums to accelerate its Trainium2 custom silicon and preparing 400k chips for Anthropic at an estimated cost of $6.5B in total IT and datacenter investment, to Meta’s 2GW datacenter plans for 2026 in Louisiana, to OpenAI and Google’s aggressive multi-datacenter training plans to overcome single-site power limitations – key decision makers appear to be unwavering in their conviction that scaling laws are alive and well. Why?