Deriving RoPE the proper way

submited by

Style Pass

2025-07-31 21:30:07

RoPE has become the de facto positional embedding for transformer models. Its popularity mainly stems from its performance, but the “derivation” in the paper is also quite elegant (but flawed).

Implementing high dimensional RoPE also pushes us to think about generalizing the underlying ideas as much as possible (alongside using signal processing intuition) - there’s code at the end of the post that implements things based on the ideas we develop here.

Upon a closer look, the original derivation is unfortunately not rigorous - the paper solves the problem for 2 head dimensions (not position dimensions), and then proceeds to generalize it to a higher number of (even) dimensions with a similar-looking form. That does provide a solution, but is still incomplete in terms of whether or not there are other solutions. There is another attempt at proof here that talks about completeness, but it makes some assumptions, while being accidentally benign, that leave the proofs incomplete.

So I decided to settle this question, and show that RoPE is actually optimally expressive under these conditions. Well, not quite - a couple of little things make it slightly suboptimal, but increasing the head dimension just works.

Deriving RoPE the proper way

Leave a Comment

Related Posts

Recent Posts

Listening To Ethernet Via Eurorack

Search code, repositories, users, issues, pull requests...

We’re updating our plans for goo.gl links.

Search code, repositories, users, issues, pull requests...

10 Lessons from 10 years at GitHub

Qwen3 Coder 480B is Live on Cerebras

2026: A Tech Odyssey

Why MCP’s Disregard for 40 Years of RPC Best Practices Will Burn Enterprises

Spotify used to seem like a necessary evil for musicians. Now it just seems evil

Live-Action Assassin’s Creed Series Coming to Netflix

nsf-data – Grant Witness

I couldn't submit a PR, so I got hired and fixed it myself

What’s Really Driving Your Retirement Account

India to penalize universities with too many retractions

Async status updates for remote startups

How a Nazi-Obsessed Amateur Historian Went From Obscurity to the Top of Substack

Donald J. Trump: "Based on the highly provocative statements of the Former President of Russia... - Trump's Truth

Lying increases trust in science, study finds

Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’

Get the email address for GitHub username