Say you have a time machine. You can only use it once, to send a single idea back to 2005. If you wanted to speed up the development of AI, what would

The real data wall is billions of years of evolution

submited by
Style Pass
2024-10-03 18:30:04

Say you have a time machine. You can only use it once, to send a single idea back to 2005. If you wanted to speed up the development of AI, what would you send back? Many people suggest attention or transformers. But I’m convinced that the answer is “brute-force”—to throw as much data at the problem as possible.

AI has recently been improving at a harrowing rate. If trends hold, we are in for quite a show. But some suggest AI progress might falter due to a “data wall”. Current language models are trained on datasets fast approaching “all the text, ever”. What happen when it runs out?

Many argue this data wall won’t be a problem, because humans have excellent language and reasoning despite seeing far less language data. They say that humans must be leveraging visual data and/or using a more data-efficient learning algorithm. Whatever trick humans are using, they say, we can copy it and avoid the data wall.

Every day, an average person reads a few thousand words, and hears perhaps 16 to 40 thousand. So a well-educated 40-year old might have encountered 5×10⁸ words in their lifetime. Recent language models are trained on upwards of 10¹² words—20,000 times more. It’s not even close.

Leave a Comment