Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?
If you answered “190,” congratulations: You did as well as the average grade school kid by getting it right. (Friday’s 44 plus Saturday’s 58 plus Sunday’s 44 multiplied by 2, or 88, equals 190.)
You also did better than more than 20 state-of-the-art artificial intelligence models tested by an AI research team at Apple. The AI bots, they found, consistently got it wrong.
The Apple team found “catastrophic performance drops” by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn’t understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered “185.”
Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.