In AI research, Artificial General Intelligence (AGI) is often described as the kind of intelligence that could match or even surpass human abilities across a wide range of tasks. The possible development of AGI marks a significant milestone in the progress of AI engineering and the capability of autonomous systems. To measure how close we are to achieving AGI, researchers use tests like the ARC (Abstraction and Reasoning Corpus). These tests challenge AI systems to solve puzzles that require creativity and flexibility—skills humans take for granted but current AI struggles with. On Friday, December 20th, the co-founder of ARC Prize François Chollet said that the performance of OpenAI’s new o3 model “ represents a significant breakthrough in getting AI to adapt to novel tasks.”
However, he also pointed out that “ while the new model is very impressive and represents a big milestone on the way towards AGI, I don't believe this is AGI.” He notes that the model still falls short on some relatively simple tasks in the first level of these tests, called ARC-AGI-1, and is unlikely to tackle the tougher challenges of the next level, ARC-AGI-2. His comments have kicked off enormous debate in the AI research community, and to understand why this matters, it’s important to explore what these tests reveal about how far we’ve come—and how far we still have to go—in the quest for AGI.