TL;DR – Voyage AI’s latest general-purpose text embedding model, voyage-large-2-instruct, now tops the overall MTEB leaderboard,  out

voyage-large-2-instruct: Instruction-tuned and rank 1 on MTEB – Voyage AI

submited by
Style Pass
2024-05-10 12:00:08

TL;DR – Voyage AI’s latest general-purpose text embedding model, voyage-large-2-instruct, now tops the overall MTEB leaderboard,  outperforming OpenAI v3 large and Cohere English v3 on key tasks, such as retrieval, classification, clustering, and, reranking.

The Massive Text Embedding Benchmark (MTEB) hosted by HuggingFace is the de facto community benchmark for measuring the quality of text embedding models. As world-class experts and providers of embedding models, we have submitted several models over the past year. Our recently released legal embedding model, voyage-law-2, tops the retrieval leaderboard for law. Our voyage-lite-02-instruct, which was overall ranked #3 previously, had a 6x smaller number of parameters and 4x smaller embedding dimensions than other models in the top five.

Now, we are thrilled to announce that our latest general-purpose text embedding model voyage-large-2-instruct ranks #1 in the overall MTEB leaderboard.  With a 16K context window, voyage-large-2-instruct incorporates instruction tuning and all our learnings from developing our other second-generation models. As shown in the following table, a simplified version of the overall MTEB leaderboard, voyage-large-2-instruct outperforms all other competing commercial models in five of the seven benchmarked tasks (e.g., retrieval, classification, clustering, reranking).

Leave a Comment