(If you’ve been hiding under a rock, or I suppose if you somehow don’t spend every waking minute following AI Twitter: OpenAI recently announced 1 o3, their latest “reasoning” model. It’s not yet available to the public, but they released some astonishing benchmark figures. This included rapid progress on some tasks that had previously been very difficult for AIs, as well as advanced math and programming problems that require elite human-level performance. o3 is the successor to “o1”, itself only a few months old. If you’re wondering what happened to o2, the official answer is that a British telecom company owns the trademark, but I’m confident the real reason is the well-known fact that AI developers have an aversion to logical or even comprehensible names for anything.)
Some early analysis of o3’s capabilities shines new light on the strengths and weaknesses of LLMs 2 , and where things might go from here. In particular, it’s dawning on me that we live in a world designed to support human cognition, full of tools optimized to help meatbrains think. Most of these tools are not available to LLMs (or well designed to work with them. Which raises a new question: what happens when we start giving AIs a toolkit as rich as the one we give ourselves?