Hitting your Claude Code limit mid-sprint feels less like a feature and more like a bug you can't fix. One minute you're deep in the flow, the next you're staring at a lockout that can last for hours or even an entire week, just two prompts short of a breakthrough. This isn't just an inconvenience; it's a major roadblock that kills momentum.
This creates a difficult choice. Pay-per-token API access gives you a limitless runway but threatens a surprise bill. Meanwhile, the Pro and Max plans promise predictability but deliver opaque, shifting limits that create an unpredictable barrier to getting work done. Whether you're a solo dev rationing prompts or a manager trying to forecast a budget, you're flying blind without a way to monitor usage.
There are several ways to monitor Claude Code, but some are overly ambitious. For example, you could use a custom script that monitors local usage, but it's a clunky, manual solution that doesn’t scale. Or, you could deploy a full-blown observability stack with Docker, Prometheus, and Grafana—powerful, but complex. Another common approach is routing traffic through a proxy like LiteLLM, but for just one model, it’s like renting a crane to hang a picture.
Here’s the better way: Claude Code supports OpenTelemetry (OTEL) out of the box, a vendor-neutral standard for collecting performance data like metrics and logs from your applications. This native support means no extra services, wrappers, or hacks are needed. You just enable it in the config and send the telemetry directly to a collector. In my case, that’s a Grafana Cloud endpoint, which is free to start and requires no servers to manage.