I recently listened to an episode of the Gradient Dissent (the Weights & Biases podcast) with Emily Bender in which they discussed Language Models (LM) and the dangers arising from making increasingly larger ones. The discussion was primarily centered around the paper On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. They also discuss some ideas from Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (E. Bender & A. Koller, 2020). Here I wanted to present a few of the ideas discussed in the podcast episode and the referenced papers.
Everytime a new larger language model is released, there appears new articles claiming that the model 'understands' language. The term 'understanding' is quite the loaded term, and needs to be carefully examined before claiming that a machine learning model possesses it. Because language models are trained on text (form), it is not possible for the language model to learn meaning from the data in the same way as a person would learn from it. It can learn to associate inputs to outputs, and learn to pick out clusters of words that are associated with each other - but that is not the same as understanding the meaning.
Getting an accurate and consistent definition of 'meaning' and 'knowledge' is surprisingly difficult. Here are some useful definitions from Bender & Koller1