Kevin Roose, of Hard Fork and NYT, was so impressed with OpenAI’s rollout that he joked “of course they have to announce AGI the day my vacation s

𝗼𝟯 “𝗔𝗥𝗖 𝗔𝗚𝗜” 𝗽𝗼𝘀𝘁𝗺𝗼𝗿𝘁𝗲𝗺 𝗺𝗲𝗴𝗮𝘁𝗵𝗿𝗲𝗮𝗱: 𝘄𝗵𝘆 𝘁𝗵𝗶𝗻𝗴𝘀 𝗴𝗼𝘁 𝗵𝗲𝗮𝘁𝗲𝗱, 𝘄𝗵𝗮𝘁 𝘄𝗲𝗻𝘁 𝘄𝗿𝗼𝗻𝗴, 𝗮𝗻𝗱 𝘄𝗵𝗮𝘁 𝗶𝘁 𝗮𝗹𝗹 𝗺𝗲𝗮𝗻𝘀

submited by

Style Pass

2024-12-22 17:30:06

Kevin Roose, of Hard Fork and NYT, was so impressed with OpenAI’s rollout that he joked “of course they have to announce AGI the day my vacation starts”.

For many people, what sealed the deal, or lead them to conclude, wrongly, that o3 necessarily “must be a step to AGI”, was o3’s performance on @fchollet’s ARC—AGI.

1. As NYU prof Brenden Lake pointed out, the test should never have been called ARC-AGI. Even Chollet acknowledged this in his blog, saying “it’s not an acid test for AGI”. At *most* the test is necessary for AGI; it certainly isn’t sufficient. Critical things like factuality, compositionality, and common sense aren’t even addressed.

2. The video should have been much clearer about what was actually tested and what was actually trained. To the average listener it may have sounded like the AI took the test cold, with a few sample items, like a human would, but that’s not actually what happened.

3. What was actually done - pretraining on what I believe was hundreds of public examples - is NOT comparable to what humans require. Such pretraining is not uncommon in the field, but was not made clear in the video. Altman saying that the test wasn’t “targeted” added to the confusion.

Leave a Comment

Related Posts

Recent Posts

Certified ethanol produced in Brazil for global airlines linked to slave labor

NixOS Hates Precompiled Programs (Learn How To Fix It)

What Google’s quantum computing breakthrough Willow means for the future of bitcoin and other cryptos

Apollo 47 Technical Handbook

What would it take to add refinement types to Rust?

Google Ads "Add as Keyword" Match Type Helper

T2 System Development Environment

BitTorrent Protocol v2 and Dynamic Content Updates

The Fight to Save Googie, the Style of Postwar Optimism

Whither dashboard design?

Search code, repositories, users, issues, pull requests...

Musings on Media in the Age of AI

Software Architecture and the Art of Experimentation

Trains that run through buildings and escalators with tolls: ‘The craziest city on the planet’

Engineering Leadership

A Python Christmas - by Stephen Gruppetta

Interview: Gerald Epstein on Pathogen Research

The REAL Reason Disney Stopped Subscriptions on the App Store…

Real-Time Interrupt-driven Concurrency

Tetsuwan Scientific is making robotic AI scientists that can run experiments on their own