Surprising shared word etymologies - Daniel de Haas

I find word etymologies fascinating. Every word we speak or write sits unassumingly on the surface of a rich history, sometimes spanning millenia.

A little while ago I bought a book called Dictionary of Word Origins which details the history of thousands of words, and in reading it I'm always delighted to learn about the various historical connections between words, especially when the modern forms of the words have little to do with each other. The book even mentions a few particularly surprising examples of this in its introduction, and to this day I use one of those examples ("bacteria" and "imbecile" are etymologically related!) as a go-to fun fact when the need arises. I'm great at parties.

Recently I realized that with the use of a few publicly available datasets I might be able to write a program that would automatically identify surprising shared word etymologies. After a bit of trial, error, and data-massaging, I was able to produce some results. If you're interested in the journey there, keep reading. If you just want to see the results, you can jump to the results.

My definition of "surprising" was a pair of words that have orthogonal definitions but a shared etymological history. "Orthogonal definitions" here means they relate to two very different things (like "bacteria" and "imbecile"), not just that they have opposite meanings (like "anything" and "nothing"). Another way of phrasing this is that the two words are semantically very different.

