Imagine you were tasked with building a search system for wikipedia.org in 2024. Before ChatGPT changed the industry’s perceptions of what’s possi

How We Built Search Over All Of Wikipedia in 30 minutes with 34% Better Relevance

submited by
Style Pass
2024-07-10 18:00:14

Imagine you were tasked with building a search system for wikipedia.org in 2024. Before ChatGPT changed the industry’s perceptions of what’s possible with AI, you would have likely reached for Elasticsearch first. Lucene-based search engines like Elasticsearch, OpenSearch and Solr have been the primary workhorses of the search industry for a long time. In fact, Wikipedia.org itself uses Elasticsearch to power their own search experience. But in the last few years, the innovation & developer tooling options around semantic search have exploded. There are now lots of options for developers, from embedding models (OpenAI, together.ai, etc.) to vector databases (Pinecone, Qdrant, etc.), and choices at every other layer of the stack. But, it means every developer is now faced with cobbling a solution together after answering a mountain of questions:

Many search problems show characteristics that are similar to Wikipedia search: millions of articles, many queries per second, information-dense, long-form text, etc. If your search problem has any of these characteristics, you probably will have to answer all of the questions above, and more. If doing all this work sounds like fun to you, see you in a few months (and we recommend buying a good pager)! However, if you want to ship something today instead of Q4, keep on reading.

Leave a Comment