I’ve been working on  extractGPT, a tool powered by large language models (LLMs) that extracts structured data from web pages.  I recently wanted to

Request for Product: Pipeline Replay - by Kasra Kyanzadeh

submited by
Style Pass
2023-03-18 23:00:09

I’ve been working on extractGPT, a tool powered by large language models (LLMs) that extracts structured data from web pages.

I recently wanted to switch the underlying model from OpenAI’s old GPT-3 model to the more affordable ChatGPT model. But first, I needed to make sure that the new model would perform just as well for my use case.

There are dozens[1] of startups building tools for instrumenting API calls to large-language models. They let you do useful things like seeing cost & latency, user feedback, evaluate different prompts, and collect examples for fine-tuning. But because they only instrument the API call, they are not good at answering the ultimate question I care about: if I change part of the pipeline, are users going to get better or worse results?

LLMs are great, but to build a useful production app, you often need to do a bunch of pre-processing before you call them, and post-processing on the results to get an acceptable output.

Leave a Comment