Last week, DeepSeek unveiled their V3 model, trained on just 2,048 H800 GPUs - a fraction of the hardware used by OpenAI or Meta. DeepSeek claims their model matches or exceeds several benchmarks set by GPT-4 and Claude
Recent research shows model training costs growing by 2.4x annually since 2016. Everyone assumed you needed massive GPU clusters to compete at the frontier. DeepSeek suggests otherwise.
The U.S. banned high-end GPU exports to China to slow their AI progress. DeepSeek had to work with H800s - handicapped versions of H100s with half the bandwidth. But this constraint might have accidentally spurred innovation.
They couldn't access unlimited hardware, so they made their hardware work smarter. It's like they were forced to solve a different, potentially more valuable problem.
Context matters though. DeepSeek isn't a typical startup - they're backed by High-Flyer, an $8B quant fund. Their CEO Liang Wenfeng built High-Flyer from scratch and seems focused on foundational research over quick profits: