Two weeks ago, OpenAI released the o1 family of models along with a graph showing scaling laws for inference time compute. Using only the public o1-mi

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-09-24 02:00:03

Two weeks ago, OpenAI released the o1 family of models along with a graph showing scaling laws for inference time compute. Using only the public o1-mini API, I tried to reconstruct the graph as closely as possible. The original is on the left, my attempt is on the right.

We evaluate on the 30 questions that make up the 2024 American Invitation Mathematics Examination (AIME). These are kindly provided by Project Numina here.

The OpenAI API does not allow you to easily control how many tokens to spend at test-time. I hack my way around this by telling o1-mini how long I want it to think for. Afterwards, I can figure out how many tokens were actually used based on how much the query cost!

Here's a plot of how many tokens we ask o1-mini to think for against how many it actually uses. If you request a very small token budget, it often refuses to listen. The same for large token budgets. But requesting between $2^4$ and $2^{11}$ tokens seems to work reasonably well.

When restricting to just that region of $2^4$ to $2^{11}$, we get the following curve. Note that o1-mini doesn't really "listen" to the precise number of tokens we ask it to use. In fact, in this region, it seems to consistently use ~8 times as many tokens as we ask for!