Damerau-Levenshtein Edit Distance Explained

submited by
Style Pass
2021-06-27 17:00:02

For my master's studio, I implemented the Wagner-Fischer algorithm for finding the Levenshtein edit distance between two protein sequences to find the closest match from a database of protein sequences to an input sequence. It's a very elegant algorithm, and has a straightforward explanation that can be easily found with a quick web search.

Now, Wagner-Fischer has a cousin, the algorithm for calculating the Damerau-Levenshtein edit distance, which does not have nearly as many resources available to those wanting to learn it. The pseudocode on Wikipedia is not very expressive. I couldn't find a good explanation of how it works. So what I did was implemented the pseudocode in Python and walked through it, renaming the confusing variables as I figured out their role. You can find my Python code here. (I've also implemented it in C99.) First Steps: Restricted Edit Distance

To understand Damerau-Levenshtein, it helps to to understand a simple variant on Wagner-Fischer for calculating the "restricted edit distance."

Leave a Comment