‍CAIS and Scale AI are excited to announce the launch of Humanity's Last Exam, a project aimed at measuring how close we are to achieving exper

Submit Your Toughest Questions for Humanity's Last Exam

submited by
Style Pass
2024-09-16 18:00:03

‍CAIS and Scale AI are excited to announce the launch of Humanity's Last Exam, a project aimed at measuring how close we are to achieving expert-level AI systems. The exam is aimed at building the world's most difficult public AI benchmark gathering experts across all fields. People who submit successful questions will be invited as coauthors on the paper for the dataset and have a chance to win money from a $500,000 prize pool.

AI is developing at a rapid pace. Just a few years ago, AI systems performed no better than random chance on MMLU, the AI community’s most-downloaded benchmark (developed by CAIS). But just last week, OpenAI’s newest model performed around the ceiling on all of the most popular benchmarks, including MMLU, and received top scores on a variety of highly competitive STEM olympiads. Humanity must maintain a good understanding of the capabilities of AI systems. Existing tests now have become too easy and we can no longer track AI developments well, or how far they are from becoming expert-level.

Despite these advances, AI systems are still far from being able to answer difficult research and other intellectual questions. To keep track of how far the AI systems are from expert-level capabilities, we are developing Humanity’s Last Exam, which aims to be the world’s most difficult AI test.

Leave a Comment