The o1 was out of preview, and OpenAI has also introduced a new tier, the “Pro plan,” which costs around $200. Sam Altman says it’s for researchers, engineers, and power users who need a model that thinks harder and works without rate limits for more money. Honestly, it kind of reminds me of myself.
Since I can’t afford ChatGPT Pro, I’ll stick to talking about the o1 model in this article. I won’t repeat what’s already in the System Card, but I’ll share a few interesting observations and some commentaries by internet people instead.
Also, I have collected a personal stash of math, reasoning, and coding questions that I will test with o1 to see if it is a significant update from the preview model and how it fares compared to 3.6 Sonnet. I have already tested a few questions with o1-preview. Do check out the blog for all the test cases.
If you’ve already checked out the System Card, it covers a lot of red teaming, safety, and other related topics. However, what really caught my attention were the SWE and MLE benchmark results. Honestly, you’ve got to give them credit for their transparency. On the MLE bench, the model scored less than its preview counterpart on both Kaggle bronze and Silver.