Part 1: What is SemScore and why would I care? Embedding conversational data Embedding LLM answers to recreate the arena leaderboard Part 2: Implementation - Bringing SemScore to Life Prerequisites Hello world example Evaluate a finetuned model on any dataset Evaluate while training Summary Accurately assessing the performance of Large Language Models (LLMs) ist crucial but hard. Currently evaluation methods come with significant limitations:
Embeddings are numerical representations of text which carry semantic meaning. The transformation from text to embedding vectors is done using embedding models.
To illustrate, consider the words orange, lemon, car, and money. Embedding the word orange with sentence-transformers/all-mpnet-base-v2 (the model used in the SemScore paper) yields a 768-dimensional vector: