With just a handful of examples, a fine-tuned open source embedding model can provide greater accuracy at a lower price than proprietary models like O

Beating Proprietary Models with a Quick Fine-Tune | Modal Blog

submited by
Style Pass
2024-04-29 17:30:05

With just a handful of examples, a fine-tuned open source embedding model can provide greater accuracy at a lower price than proprietary models like OpenAI’s text-embedding-3-small. In this article, we’ll explain how to create one with Modal. First, we’ll cover the basics of fine-tuning. Then, we’ll talk about an experiment we ran to determine how much fine-tuning data we needed for a simple question-answering use case.

Custom models matter. That’s how Netflix keeps suggesting better movies and how Spotify manages to find a new anthem for your daylist. By tracking when you finish the movie you chose or whether you skipped a song, these companies accumulate data. They use that data to improve their internal embedding models and recommendation systems, which lead to better suggestions and a better experience for you. That can even lead to more engagement from more users, leading to more data, leading to better models, in a virtuous cycle known as the data flywheel.

Large, ML-forward organizations like Netflix and Spotify have taken advantage of data flywheels to create their own models from scratch, and now they have lots of data. But when you’re just starting out on a new company or project, you don’t always have the data you need. Bootstrapping a data flywheel in the 2010s required substantial creativity or resource investment.

Leave a Comment