How far are we from having competent AI co-workers that can perform tasks as varied as software development, project management, administration, and data science? In more radical words, how far are human jobs from getting replaced by AI?
> AGI will automate all human work in the next few years. A report by Goldman Sachs in 2023 said AI could replace the equivalent of 300 million full-time jobs.
To answer these questions, we need benchmarks to testify AI’s rapidly evolving capabilities. You probably have heard of a few famous benchmarks like WebArena that tests AI’s abilities to use browsers, and SWEBench that tests AI’s abilities to code, but they are far from real-world tasks, which are often ambiguous, long-horizon, interactive, and require a diversity of skills and tool use.
Recently a group of researchers from CMU and industry released a new benchmark targeting diverse work-related tasks: The Agent Company. It’s a simulated software company with tasks inspired by real-world work and cover SWE, DS, PM, HR, Admin, and Finance fields. Some are easy for human beings while others are either extremely complicated or very time-consuming.