n times faster than C, Arm edition

submited by
Style Pass
2024-11-18 01:30:05

The other day I read a two-parter blog post by Owen Shepherd, {n} times faster than C. In it, he takes a simple algorithm, and optimizes it as much as he can, dropping down to raw assembly along the way.

I love this sort of thing, though I’m not very good at it. I decided I’d try my own hand at it anyway, but for the A64 instruction set. I’ve tried to make this post understandable if you haven’t read Owen’s, though I highly recommend you do! (if only for how much fun I found it was to read).

All code (plus a nifty benchmarking harness!) is available in this GitHub repo. I’ve matched the inputs generated by Owen’s original benchmarking setup as closely as possible.

Problem statement: Initialize a counter variable to zero. Iterate over a null-terminated array of bytes; whenever you see an 's', increment the counter; whenever you see a 'p', decrement the counter.

Let’s begin with the most trivial solution to the problem, written in C, as given by Owen’s blog post (plus some minor stylistic changes):

Leave a Comment