Large projects written in C, especially those written close to the hardware layer like Linux, have long treated the language as a high-level assembler. Using C allowed for abstracting away much of the difficulty of writing directly in machine code while still providing easy low-level access to memory, registers, and CPU features. However, C has matured over the last half century, and many language features that improve robustness go unused in older codebases. This is especially true for arrays, where the historical lack of bounds checking has been a consistent source of security flaws.
Converting such codebases to use “modern” language features, like those in C99 (still from the prior millennium), can be a major challenge, but it is an entirely tractable problem. This post is a deep dive into an effort underway in the Linux kernel to make array index overflows (and more generally, buffer overflows) a thing of the past, where they belong. Our success hinges on replacing anachronistic array definitions with well-defined C99 flexible arrays. This approach can be used by developers to refactor C code, making it possible to leverage 21st century mitigations (like -fsanitize=bounds and FORTIFY_SOURCE), since such things can finally be cleanly applied to the modernized codebase.
For the compiler to successfully apply array index bounds checking, array sizes must be defined unambiguously, which is not always easy in C. Depending on the array definition, bounds checking falls roughly into three categories: fixed-sized arrays, dynamically-sized arrays, and pointer offsets. Each category of array definitions must be made unambiguous before the next, as they mostly build on top of each other. For example, if the compiler cannot protect a fixed-sized array, it certainly cannot protect a dynamically-sized array, and array indexing is just a specialized case of calculating a memory pointer offset.