Nine days before the paper was accepted for the conference, Timnit Gebru, one of the authors, was fired from her position as co-director of Google’s Ethics team after refusing to retract her name from the paper. Another author and Gebru’s Ethics team co-director, Margaret Mitchell, was fired two months later. Google attracted a media backlash for these actions, for example in The New York Times and Wired. 2 Since then, both company and professional colleagues have shown strong support for Gebru and Mitchell. 3
Thus the paper’s significance goes beyond purely scientific conclusions; it initiated a discourse on the societal implications of natural language processing (NLP) technology.
A language model (LM) is a record, for a given body of text (a corpus), of the probabilities of words appearing in particular contexts in the corpus. 4 For example, given an English corpus consisting of the Bible, a LM would indicate that the context the beginning of was followed most often by the words creation, reign, and world. It is straightforward to write a program that goes through a corpus and counts all of the occurrences of words and their contexts, resulting in a system called an n-gram model.
However, many LMs nowadays are neural networks. 5 These can efficiently learn to associate relatively long contexts – of up to over two thousand words, as compared to around eight for the most complex n-gram models – with the probabilities of different words following them. To do this, they need to be presented with (“trained on”) a very large corpus of hundreds of billions of words. The associations between contexts and following words are implemented in the network in the form of parameters. These models are “big” in the sense that they scale up to trillions of parameters.