LRMs Are Interpretable

submited by
Style Pass
2024-11-21 22:00:22

A year ago I wrote a post called LLMs Are Interpretable. The gist is that LLMs were the closest thing to “interpretable machine learning” that we’ve seen from ML so far. Today, I think it’s fair to say that LRMs (Large Reasoning Models) are even more interpretable.

Most people will (should) do a double take, and then give up. It’s a nonsense question. Even if you try to estimate the sizes of doghouses and pancakes, there’s so much contention about both that the estimates are also meaningless. This is a test of a highly ambiguous situation, how does the model handle it?

The transcripts are fascinating, I’ll quote some passages here, but really you should go ahead and read the full reasoning trace. The final answer isn’t terribly interesting; tl;dr it figures out that it’s a nonsense question.

First, “flying over a desert in a canoe.” Well, canoes are typically used on water, not in the air or over deserts. So that’s already a bit odd. Maybe it’s a metaphor or a riddle that plays on words. Then it says, “your wheels fall off.” Canoes don’t have wheels, so that’s another strange part. Maybe the wheels are part of something else, or maybe it’s just adding to the confusion.

Leave a Comment