AlignEval: Building an App to Make Evals Easy, Fun, and Automated

submited by

Style Pass

2024-11-02 19:00:04

[ llm eval learning 🛠 🩷 ] · 13 min read

Every AI-powered product needs evals. But let’s face it—they’re a pain to build, hard to scale, and most teams get them wrong. As a result, many AI-powered experiences are bottlenecked on evals, sometimes delaying launches by weeks or even months.

I’ve spent the past year or so wrestling with product/task-specific evals. Testing different ways to detect hallucinations, finetune evaluators, and evaluate LLM-based evaluators. There were dead ends. There were rabbit holes. But I learned what works and what doesn’t.

And that’s why I’m excited to introduce AlignEval, an app that makes evals easy and… fun? (Okay, I’ll settle for less painful.) It also tries to automate part of the process. AlignEval makes building LLM-evaluators as straightforward as four simple steps: