For the past year at Meilisearch, we’ve been working hard on hybrid search, which mixes keyword search with semantic search using Arroy: our vector

Meilisearch Indexes Embeddings 7x Faster with Binary Quantization

submited by
Style Pass
2024-11-29 12:00:04

For the past year at Meilisearch, we’ve been working hard on hybrid search, which mixes keyword search with semantic search using Arroy: our vector store, based on a Spotify technology. A vector store is a data structure that efficiently stores embeddings (vectors) for fast and relevant retrieval, called Approximate Nearest Neighbors (ANN) search.

When our customers started to use arroy heavily, some of them started reaching the machine limits. For example, for a customer with a 768 dimensions model, we found out that we couldn’t index more than 15M of embeddings on a machine with 64GiB of RAM.

The concept behind this elaborate term is that we are going to quantize the embeddings. Quantizing a number involves spreading it into a defined number of values; for binary quantization, two values can be represented on a single bit, e.g., 0.2561 becomes 1, and -0.568 becomes -1.

This implies that we could convert a 32-bit float number into a 1-bit number, which would divide the disk and RAM usage by 32. That would mean we could index up to 480M embeddings with 64GiB of RAM, instead of being limited to 15M embeddings.

Leave a Comment