When we first launched early access, the ecosystem was incredibly young – ChatGPT hadn't been released, and the most advanced model available w

Humanloop is moving to General Availability

submited by

Style Pass

2024-11-21 19:00:05

When we first launched early access, the ecosystem was incredibly young – ChatGPT hadn't been released, and the most advanced model available was GPT-3. We had the privilege of working closely with pioneering teams building the first generation of LLM products at companies like Duolingo, Gusto, and Vanta.

We used the experience to deeply understand the challenges of developing with this new technology and design a new way of building software.

Humanloop has evolved from a simple system to A/B test prompts into a fully fledged evals-platform — at the core is a new workflow based around systematic evaluation, coupled with a collaborative development environment for AI engineering and tools for observability in production.

I'd like to walk you through what we've learned and how we can help you build production-ready AI products. For more details of engineering and product velocity over the year, read our accompanying post from Peter our CTO here.

Large Language Models (LLMs) make writing tests extremely challenging. They're inherently stochastic – the same input can produce different outputs. Without good tests it’s easy to accidentally introduce regressions or endlessly tweak prompts without measurable improvement.