Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-10-09 19:30:08

Code for the paper "MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering". We have released the code used to construct the dataset, the evaluation logic, as well as the agents we evaluated for this benchmark.

The MLE-bench dataset is a collection of 75 Kaggle competitions which we use to evaluate the ML engineering capabilities of AI systems.

Since Kaggle does not provide the held-out test set for each competition, we provide preparation scripts that split the publicly available training set into a new training and test set.

We use the Kaggle API to download the raw datasets. Ensure that you have downloaded your Kaggle credentials (kaggle.json) and placed it in the ~/.kaggle/ directory (this is the default location where the Kaggle API looks for your credentials). To download and prepare the MLE-bench dataset, run the following, which will download and prepare the dataset in your system's default cache directory. Note, we've found this to take two days when running from scratch:

Answers for competitions must be submitted in CSV format; the required format is described in each competition's description, or shown in a competition'ssample submission file. You can grade multiple submissions by using the mlebench grade command. Given a JSONL file, where each line corresponds with a submission for one competition, mlebench grade will produce a grading report for each competition. The JSONL file must contain the following fields: