AI capabilities have improved remarkably quickly, fuelled by the explosive scale-up of resources being used to train the leading models. But if you examine the scaling laws that inspired this rush, they actually show extremely poor returns to scale. What’s going on?
The era of LLMs has seen remarkable improvements in AI capabilities over a very short time. This is often attributed to the AI scaling laws — statistical relationships which govern how AI capabilities improve with more parameters, compute, or data. Indeed AI thought-leaders such as Ilya Sutskever and Dario Amodei have said that the discovery of these laws led them to the current paradigm of rapid AI progress via a dizzying increase in the size of frontier systems.
Before the 2020s, most AI researchers were looking for architectural changes to push the frontiers of AI forwards. The idea that scale alone was sufficient to provide the entire range of faculties involved in intelligent thought was unfashionable and seen as simplistic.
A key reason it worked was the tremendous versatility of text. As Turing had noted more than 60 years earlier, almost any challenge that one could pose to an AI system can be posed in text. The single metric of human-like text production could therefore assess the AI’s intellectual competence across a huge range of domains. The next-token prediction scheme was also an instance of both sequence prediction and compression — two tasks that were long hypothesized to be what intelligence is fundamentally about.