As part of the Brokk Power Ranking of coding models coming next week, we’re pleased to present the first independent numbers for GPT-OSS performance

A first look at GPT-OSS-120B’s coding ability

submited by

Style Pass

2025-08-05 22:00:06

As part of the Brokk Power Ranking of coding models coming next week, we’re pleased to present the first independent numbers for GPT-OSS performance!

To put it in context, we’ve included the performance of the other recent open model releases, as well as o4-mini and Gemini Flash 2.0 as known-quantity comparisons.

“Roughly as good as Flash 2.0” is a disappointing result, but let’s put it in context: as a 120 billion parameter model quantized to FP4, it’s roughly 1/16 the size of Qwen 3 Coder, DeepSeek-V3, or Kimi K2. Alas, it seems that size still matters: GPT-OSS has a ton of trouble generating valid edit blocks which makes it hard to get anything across the finish line.

Kimi K2 doesn’t have that excuse and it’s still bad. You may remember Kimi showcasing K2 handily beating DeepSeek-V3 at other coding benchmarks. When the discrepancy between the performance in and older benchmark and a new one is this large, it’s hard to avoid the conclusion that Kimi trained K2 against the test.

By contrast, Qwen 3 Coder (480B, unquantized) is the real deal and finally dethrones DeepSeek-V3 as the best non-thinking model for coding.

A first look at GPT-OSS-120B’s coding ability

Leave a Comment

Related Posts

Recent Posts

Miami’s Freedom Tower turns 100 as more Cubans are being deported than ever before

Sentinel-2 Super-Resolution Showdown: SR4RS vs. S2DR3

Tarsnap - The kivaloo data store

Search code, repositories, users, issues, pull requests...

Melting glaciers could trigger volcanic eruptions around the globe, study finds

Against Deep Utopia: Why the Future Will Still Be Petty, Jealous & Full of Weird Status Games / Review of Nick Bostrom’s deep utopia

Open Source and OpenAI’s Return - GizVault

SECURE BOOT INFORMATION

Proof of the ABC Conjecture via Ghost Drift Theory

The Necessity of Using Archives in Single Page Applications

Mysterious Illness Decimating Sea Stars Finally Identified

To defend against malicious AI, the United States needs to build a robust digital immune system

Apple shipped its 3 billionth iPhone

Search code, repositories, users, issues, pull requests...

Chiral Superconductivity in Rhombohedral Graphene

RIP to the Macintosh HD hard drive icon, 2000–2025

HNI acquires Steelcase for $2.2B in another blow to west Michigan

Agentic Reflection Part 3: A Smarter Loop for Smarter Models

[2508.00024] Embedding-Aware Quantum-Classical SVMs for Scalable Quantum Machine Learning

Search code, repositories, users, issues, pull requests...