submited by

Style Pass

If you’re interested in large language models, then you should care about their ability to do arithmetic. On the road to AGI, arithmetic problems provide a neat microcosm of more general multi-step reasoning problems. Here’s why:

Arithmetic problems can be solved by simple algorithms, rules that must be consistently applied. You can arbitrarily increase the difficulty of these arithmetic problems, i.e. the number of times that a rule must be applied.

This means that arithmetic provides good windows into both single-step and multi-step (i.e. chain-of-thought) reasoning tasks. We can evaluate both individual calls to LLMs, as well as compositions of such calls.

State-of-the-art LLMs still struggle to do simple arithmetic problems, even while LLMs have scaled up dramatically in size and standard evaluation benchmarks.

In this article, I try to summarize everything that we know about LLMs doing arithmetic. I’ll point you to all the interesting papers that I’m aware of, and draw some observations of my own.

Read more loeber.subst...