The algorithm used to build an index has implications in the quality of the results, not only for the data quality (accuracy) but also for the system

Powering AI With Vector Databases: A Benchmark - Part I

submited by
Style Pass
2022-09-23 22:00:10

The algorithm used to build an index has implications in the quality of the results, not only for the data quality (accuracy) but also for the system performance (used memory and speed). More information on the different approaches can be found  in this Pinecone blog article . An up-to-date ANN benchmark repository can also be found  in the famous GitHub repository by Erik Bern , with graphical quality/speed representations for popular public datasets.

Qdrant and Weaviate implement only HNSW natively. Thus, the experiment uses HNSW solely as the  de facto indexation algorithm [2]. The configuration parameters for HNSW have also been fixed for all engines:

Following the principles of statistical analysis, we want all scenarios to execute a minimum of 30 times. During the execution of index querying, it is vital to use different queries to avoid the engines employing implicit result caches, which would benefit the querying speed. Thus, we're feeding the following queries sequentially:

Milvus is an open-source vector database built to manage vectorial data and power embedding search. It originated in October 2019 under an  LF AI & Data Foundation  graduate project. The latest version is Milvus 2.0.0, which is in steady development, with the release candidate eight having been released just in 5-11-21 (at the time of writing of this technical blog). However, upon trying to setup Milvus, the team has encountered multiple challenges:

Leave a Comment