When most computer scientists think of information retrieval, statistical keyword algorithms like TF-IDF and BM25 probably come to mind. These are dep

Semantic search as an alternative to keyword search

submited by
Style Pass
2021-06-16 00:30:10

When most computer scientists think of information retrieval, statistical keyword algorithms like TF-IDF and BM25 probably come to mind. These are deployed in open source systems such as Apache Lucene and Apache Solr. Cloud versions of keyword search are available from companies like Elasticsearch and Algolia.

What you may not know is that advancements in natural language processing (NLP), particularly the introduction of transformers in 2017, have ushered in a new flavor of information retrieval known variously as neural information retrieval (neural IR for short) and semantic search. The defining characteristic of these systems is that they apply neural networks to understand language at a deeper level than keyword search. This enables them to surface a broader variety of relevant content, while showing results with greater precision.

Neural IR systems are in their infancy, and I would recommend An Introduction to Neural Information Retrieval by Mitra and Craswell if you’re interested in gaining a deeper appreciation of the field. Amazon Kendra, released early in 2020, is the first commercial example of such a system, while Microsoft Semantic Search, released in April 2021, and ZIR Semantic Search are more recent.

Leave a Comment