We’ve recently announced the open-sourcing of pgvectorscale, a new PostgreSQL extension that provides advanced indexing techniques for vector data.

How We Made PostgreSQL as Fast as Pinecone for Vector Data

submited by
Style Pass
2024-06-11 19:00:09

We’ve recently announced the open-sourcing of pgvectorscale, a new PostgreSQL extension that provides advanced indexing techniques for vector data. Pgvectorscale provides a new index method for pgvector data, significantly improving the search performance of approximate nearest neighbor (ANN) queries. These queries are key for leveraging modern vector embedding techniques to facilitate semantic search, which allows for finding things similar to a query's meaning. That, in turn, enables applications like retrieval-augmented generation (RAG), summarization, clustering, or general search.

In our announcement post, we described how our new StreamingDiskANN vector index allows us to perform vector search faster than bespoke purpose-built databases created for this purpose—like Pinecone. We also observed that if bespoke databases aren’t faster, then there is no reason to use them because they can’t possibly compete with the rich feature set and ecosystem of general-purpose databases like PostgreSQL.

In this article we’ll go into the technical contributions that allowed us to “break the speed barrier” and create a fast vector index in PostgreSQL. We’ll cover three technical improvements we made:

Leave a Comment