I wrote another post about Understanding the BM25 full text search algorithm and had initially included comparisons with two other algorithms. However

Comparing full text search algorithms: BM25, TF-IDF, and Postgres | Evan Schwartz

submited by
Style Pass
2024-11-19 17:00:14

I wrote another post about Understanding the BM25 full text search algorithm and had initially included comparisons with two other algorithms. However, that post was already quite long so here are the brief comparisons between BM25, TF-IDF, and PostgreSQL's full text search.

TF-IDF was the main model that was used prior to the development of BM25 (it was used in Lucene and Elasticsearch until 2016).

While TF-IDF also tries to use Term Frequency (TF) and the Inverse Document Frequency (IDF) to estimate the relevance of documents, BM25 introduced some key improvements:

As far as I understand, Postgres' built-in search results may be somewhat worse than a system using BM25, though that may not be an issue for systems already using Postgres where simplicity is more important than search quality.

It is also worth mentioning that ParadeDB has a Postgres extension called pg_bm25 (formerly called pg_search), which adds support for BM25-based ranking.

Leave a Comment