The ability to communicate with advanced natural language is arguably the thing that makes us human. It’s how we convey ideas, build relationships, start wars, end wars, plan, learn, teach… The list goes on.
And yet, despite all of its power and flexibility, here’s the most remarkable thing about language: It developed without anyone ever sitting down and defining its rules. The complexity of the world’s tongues emerged over time through the interactions of individuals and societies, entirely unsupervised. This should sound familiar to anyone interested in either evolution or machine learning. Language is probably the all time greatest human experiment in unsupervised learning and emergent behavior.
Several years ago, I stumbled across a book called The Unfolding of Language by linguist Guy Deutscher. It’s dense and technical — certainly not a book I’d recommend to just anyone — but few books have changed my worldview as much as this one. That’s because it provides a brilliant and very compelling argument for how this unsupervised evolution of language emerged. How did we go from our prehistoric days of identifying objects by pointing at them to what we have today: a rich and infinitely adaptable ability to describe anything (tangible or intangible) as combinations of sounds?
First, Latin is the common ancestor of many of Europe’s most spoken languages. Yet virtually none of these languages share its inflections, its cases, its three genders, and so on. And on top of that, despite developing in geographic proximity to one another, these descendent languages are extremely dissimilar to one another (not only in grammar and spelling but also pronunciation). If you didn’t already know these “Romance Languages” all evolved from Latin, it would be very difficult to spot. For instance, the English sentence the beautiful birds sing in the gardens, is translated as follows to other languages. Notice how different they are from one another.