CPU performance · CuriousCoding

submited by
Style Pass
2025-01-09 03:30:21

This (planned) series of posts has the aim to write a high performance search algorithm for suffix arrays. We will start with a classic binary search implementation and make incremental improvements to it. But that is planned for Part 3.

Before that, we will review existing methods for binary searching plain integer arrays as a warm up exercise. There is a great paper by Khuong and Morin (2017), “Array Layouts for Comparison-Based Searching”, and an algorithmica.org case study based on it, that will form the basis of that. See Part 2.

First, (in a typical case of nerd-sniping,) we must understand our hardware. In this part, we will look at some simple benchmarks to understand the capabilities and limits of a modern CPU. This part is very much inspired by Chapter 9 of algorithmica.org. In fact, I will probably end up directly replicating some of the experiments from there for educational purposes (at least for myself). Algorithmica also lists this post by Igor Ostrovsky and What Every Programmer Should Know About Memory by Ulrich Drepper as useful resources, but I haven’t read them closely myself.

In this post, Part 1, we will mostly investigate latency and bandwidth of L1/L2/L3 caches and main memory in different workloads.

Leave a Comment