submited by

Style Pass

In the past, hybrid search combined two search methods: traditional keyword-based search and vector-based similarity search. However, sparse vector can act as a substitute for keyword search, unlocking the full potential of the data pipeline with pure vector search.

This post explores how to use how BGE-M3 and pgvecto.rs to exploit the potential of hybrid search using both sparse and dense vectors.

Sparse vectors are high-dimensional vectors that contain few non-zero values. They are suitable for traditional information retrieval use cases. For example, a vector with 32,000 dimensions but only 2 non-zero elements is a typical sparse vector.

$$S[1\times 32000]=\left[ \begin{matrix} 0 & 0 & 0.015 & 0 & 0 & 0 & 0 & \cdots & 0.543 & 0 \end{matrix} \right]$$

Dense vectors are embeddings from neural networks. They are generated by text embedding models and have most or all elements non-zero. They have fewer dimensions, such as 256 or 1536, much less than sparse vectors.

Read more blog.pgvecto...