TL;DR >> Built AgentProbe to test how AI agents interact with CLI tools. Even simple commands like 'vercel deploy' show massive variance

Why I Built a Tool to Test AI's Command Line AX

submited by

Style Pass

2025-07-29 18:30:04

TL;DR >> Built AgentProbe to test how AI agents interact with CLI tools. Even simple commands like 'vercel deploy' show massive variance: 16-33 turns across runs, 40% success rate. The tool reveals specific friction points and grades CLI 'agent-friendliness' from A-F. Now available for Claude Code MAX subscribers. <<

This wasn’t a complex multi-step deployment. This was the simplest possible case. And it revealed something broken about how we’re building for the AI-native era.

I’ve pushed 50+ projects with AI agents in recent months. The pattern became undeniable: agents don’t fail because they’re dumb. They fail because our tools are hostile.

Watch Claude spiral for hours clicking an unclickable interface. Watch it misinterpret error messages written for humans who can read between lines. Watch it retry the same failing command because the output gives zero actionable feedback.

Each scenario gets an AX Score (Agent Experience Score). Just like school, but for how well your CLI plays with artificial intelligence.

Why I Built a Tool to Test AI's Command Line AX

Leave a Comment

Related Posts

Recent Posts

I Know When You're Vibe Coding

An Open Letter to Webflow: The Platform Stability Crisis

New algorithms enable efficient machine learning with symmetric data

Windows 11’s new Android integration lets you control PC, transfer files,…

AGNTCY project donated to Linux Foundation with major industry backing

HTML Day – August 2nd, 2025

Programming of refractive functions

Tsunami alerts from Japan to the US after powerful earthquake sparks warnings across Pacific

Why Donald Trump, Elon Musk and JD Vance want to 'make America procreate again' through pronatalism

State Capacity and Eight Parking Spaces

Flat design vs rich (“skeuomorphic”) design - The bibliography

Search code, repositories, users, issues, pull requests...

Ideas to promote democracy

The FBI’s Leaders ‘Have No Idea What They’re Doing’

A Trip Through Hell | The Nation

Search code, repositories, users, issues, pull requests...

Wan2.2 — Every Shot AI Perfects

Engineering Leadership Bites

Morty: Scaling Concurrency Control with Re-Execution

How I Do Support and Community