OpenAI scientists have designed MLE-bench — a compilation of 75 extremely difficult tests that can assess whether a future advanced AI agent is

Scientists design new 'AGI benchmark' that indicates whether any future AI model could cause 'catastrophic harm'

submited by
Style Pass
2024-10-21 19:30:04

OpenAI scientists have designed MLE-bench — a compilation of 75 extremely difficult tests that can assess whether a future advanced AI agent is capable of modifying its own code and improving itself.

Scientists have designed a new set of tests that measure whether artificial intelligence (AI) agents can modify their own code and improve its capabilities without human instruction.

The benchmark, dubbed "MLE-bench," is a compilation of 75 Kaggle tests, each one a challenge that tests machine learning engineering. This work involves training AI models, preparing datasets, and running scientific experiments, and the Kaggle tests measure how well the machine learning algorithms perform at specific tasks.

OpenAI scientists designed MLE-bench to measure how well AI models perform at "autonomous machine learning engineering" — which is among the hardest tests an AI can face. They outlined the details of the new benchmark Oct. 9 in a paper uploaded to the arXiv preprint database.

Any future AI that scores well on the 75 tests that comprise MLE-bench may be considered powerful enough to be an artificial general intelligence (AGI) system — a hypothetical AI that is much smarter than humans — the scientists said.

Leave a Comment