o1 reasoners are the most exciting models since the original GPT-4. They prove what I predicted earlier this year in my AI Search: The Bitter-er Lesson paper: we can get models to think for longer instead of building bigger models.
Before we built large language models, we had reinforcement learning (RL). 2015 - 2020 was the golden age of RL, and anyone remotely interested in AI stared at the ceiling wondering how you could build “AlphaZero, but general.”
RL is magic: build an environment, set an objective, and watch a cute little agent ascend past human skill. When I was younger, I wanted to RL everything: Clash Royale, investment analysis, algebra; whatever interested me, I daydreamed about RL that could do it better.
Why is RL so fun? Unlike today’s chatbots, it gave you a glimpse into the mind of God. AlphaZero was artistic, satisfying, and right; it metaphorized nature’s truths with subtle strategy and stories of the cosmos with daring tactics. Today’s smartest chatbots feel like talking to a hyper-educated human with 30th-percentile intelligence. AI can be beautiful, and we’ve forgotten that.
AlphaZero (black) plays Ng5, sacrificing its knight for no immediate material gain. This move was flippant but genius; it changed chess forever. After seeing it, I realized I wanted to research AI.