A surprisingly common problem in both application development and analysis is: given an input name, find the database record it most likely refers to.

Fuzzy Name Matching in Postgres

submited by
Style Pass
2024-04-19 11:00:36

A surprisingly common problem in both application development and analysis is: given an input name, find the database record it most likely refers to. It's common because databases of names and people are common, and it's a problem because names are a very irregular identifying token.

The page "Falsehoods Programmers Believe About Names" covers some of the ways names are hard to deal with in programming. This post will ignore most of those complexities, and deal with the problem of matching up loose user input to a database of names.

The data for this post are 50,000 Westernized names created by the Fake Name Generator, then loaded into PostgreSQL using the COPY command.

Suppose we are working with a form input field, and want to return names that match user keystrokes in real time. If a user types "Back" what query can find all the surnames that start with "Back"?

Now, when we run the "LIKE" query, the result is ... 8 names in about 11ms. Wait, what? The index didn't make the query any faster!

Leave a Comment