In the first post in this series, we introduced compact binary-coded vector representations that can reduce the storage and computational complexity o

Billion-scale vector search with Vespa - part two | Vespa Blog

submited by

Style Pass

2022-01-27 18:30:04

In the first post in this series, we introduced compact binary-coded vector representations that can reduce the storage and computational complexity of both exact and approximate nearest neighbor search. This second post covers an experiment using a 1-billion binary-coded representation derived from a vector dataset used in the big-ann-benchmark challenge. The purpose of this post is to highlight some of the trade-offs related to approximate nearest neighbor search, and especially we focus on serving performance versus accuracy.

Vespa implements a version of the HNSW (Hierarchical Navigable Small Word) algorithm for approximate vector search. Before diving into this post, we recommend reading the HNSW in Vespa blog post explaining why we chose the HNSW algorithm.

When evaluating approximate nearest neighbor search algorithms it is important to use realistic vector data. If you don’t have the vectors for your problem at hand, use a publicly available dataset.