We clarify that OpenAI commissioned Epoch AI to produce 300 math questions for the FrontierMath benchmark. They own these and have access to the state

Clarifying the Creation and Use of the FrontierMath Benchmark

submited by
Style Pass
2025-01-25 23:30:02

We clarify that OpenAI commissioned Epoch AI to produce 300 math questions for the FrontierMath benchmark. They own these and have access to the statements and solutions, except for a 50-question holdout set.

FrontierMath is a benchmark we created to evaluate the mathematical capabilities of frontier AI models. We saw a need for high-quality, challenging mathematical problems that could meaningfully test the limits of these systems. This remains our core mission—to help the AI community and the public at large accurately understand and measure AI capabilities.

Building high-quality evaluations at this scale requires substantial resources. After approaching several potential funders, we partnered with OpenAI, who provided both the necessary funding and technical expertise to develop the benchmark.1 Working with industry sponsors helps make the benchmark more impactful for the AI field.

However, we recognize we have not communicated clearly enough about the relationship between FrontierMath and OpenAI, leading to questions and concerns among contributors, researchers, and the public. To address these issues, here are the facts:

Leave a Comment