Model Evaluation - Amp

submited by

Style Pass

2025-08-08 17:30:05

A lot of models have been released in the last few weeks: Kimi K2, Qwen3-Coder, GLM-4.5, gpt-oss, Claude Opus 4.1, diffusion models, and there's no end in sight.

In Amp, a new model isn't just another entry in a model selection dropdown menu. It's part of a whole in which many different models have different jobs to do and for each job we want to use the best model, regardless of cost or deployment concerns. So when a new model comes along, we ask:

With this post, we want to show you a week in the life of the Amp team as we evaluate new models. Impressions, ideas, tips — we'll share what we discover.

This morning I took GPT-5 out for a proper spin, not just testing it, but actually putting it to use, trying to fix a bug in the Amp CLI. This time I did something I usually don't do: I used voice dictation and ended up with a long, rambly prompt that contained a lot of redundant information.

To my (and everyone's who was in hearing distance here in the office) surprise, GPT-5 fixed the bug in a single turn. Flawlessly. I committed the code just as GPT-5 wrote it.

Model Evaluation - Amp

Leave a Comment

Related Posts

Recent Posts

Johns Hopkins is building classified versions of its AI wargaming tools for DoD, IC

ChatGPT will apologize for anything

Search code, repositories, users, issues, pull requests...

Acting NASA Administrator Reflects on Legacy of Astronaut Jim Lovell

Jim Lovell - Wikipedia

ng-openapi | Angular OpenAPI Client Generator

whoa there, pardner!

Why You Should Build Durable Workflows With Postgres

Optimized Autonomous Inference

Gold futures jump to record high after US tariffs on cast bars

LVFS Sustainability Plan

Query-Mutating Data Race in Go: Hiding in Plain Sight

Digit/fin: Fish Extensible Text Editor Written In Fish - Codeberg.org

M5 MacBook Pro No Longer Coming in 2025

Welcome to Rich Guy Fantasy Camp, where Jeff Bezos and wealthy men like him live in their own worlds

ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy

Three Tiers of Responses to Fact

Keep Your Photos and Videos Personal

Campaigning for Extinction: Eradication of Sparrows and the Great Famine in China

GRETA to Open a New Eye on the Nucleus – Berkeley Lab News Center