The 150x pgvector speedup: a year-in-review

submited by
Style Pass
2024-05-01 01:30:04

I wanted to write a “year-in-review” covering all the performance pgvector has made (with significant credit to Andrew Kane), highlighting specific areas where pgvector has improved (including one 150x improvement!) and areas where we can continue to do better.

A few weeks ago, I started outlining this post and began my data collection. While I was working on this over a two week period, no fewer than three competitive benchmarks against pgvector published. To me, this is a testament both how well pgvector is at handling vector workloads (and by extension, PostgreSQL too) that people are using it as the baseline to compare it to their vector search systems.

Some of these benchmarks did contain info that identified areas we can continue to improve both PostgreSQL and pgvector, but I was generally disappointed in the methodology used to make these comparisons. Of course I’d like to see pgvector perform well in benchmarks, but it’s important to position technologies fairly and be vocally self-critical on where your system can improve to build trust in what you’re building.

I have a separate blog post planned for how to best present benchmark studies between different systems for vector similarity search (it’s a topic I’m interested in). Today though, I want to compare pgvector against itself, and highlight areas it’s improved over the past year, and where the project can continue to go and grow.

Leave a Comment