Legalizations in LLVM Backend

submited by
Style Pass
2024-05-16 04:30:02

Ideally, compilers can build a program for a wide variety of hardware without the need to change a single line of its source code. While there are exceptions and corner cases, this holds in the majority of cases. Which means that if the input code uses something that is not directly available on the hardware, the compiler has to figure out a way to effectively emulate those features.

This might sound a little distant to our typical software development experiences, but I’m not even talking about a problem that only happens on some exotic proprietary ML accelerator whatnots. There is a good example of this regarding something you use almost everyday: boolean variables. I’m pretty sure none of the modern processors provides 1-bit registers1 (or addressable memory space), yet we still use boolean variables extensively. Other common examples include the lack of double-precision floating point operations, or even lacks floating point unit altogether in some embedded devices.

The process that “reshapes” input programs into using what’s available on the target hardware is called legalization in LLVM2, and it’s done in LLVM’s code generation (codegen) pipeline a.k.a the backend. In this post, I’m going to give an overview on how it works.

Leave a Comment