In the world of resource-constrained computing, a quiet revolution is taking place. While transformers dominate leaderboards with their impressive cap

Static Embeddings: should you pay attention?

submited by
Style Pass
2025-01-17 15:30:18

In the world of resource-constrained computing, a quiet revolution is taking place. While transformers dominate leaderboards with their impressive capabilities, static embeddings are making an unexpected comeback, offering remarkable speed improvements with surprisingly small quality trade-offs. We evaluated how Qdrant users can benefit from this renaissance, and the results are promising.

Transformers are often seen as the only way to go when it comes to embeddings. The use of attention mechanisms helps to capture the relationships between the input tokens, so each token gets a vector representation that is context-aware and defined not only by the token itself but also by the surrounding tokens. Transformer-based models easily beat the quality of the older methods, such as word2vec or GloVe, which could only create a single vector embedding per each word. As a result, the word “bank” would have identical representation in the context of “river bank” and “financial institution”.

Transformer-based models would represent the word “bank” differently in each of the contexts. However, transformers come with a cost. They are computationally expensive and usually require a lot of memory, although the embeddings models usually have fewer parameters than the Large Language Models. Still, GPUs are preferred to be used, even for inference.

Leave a Comment