Generic GPU Kernels – Mike Innes

submited by
Style Pass
2021-12-06 18:30:07

Is this better than writing CUDA C? At first, it’s easy to mistake this for simple syntactic convenience, but I’m convinced that it brings something fundamentally new to the table. Julia’s powerful array abstractions turn out to be a great fit for GPU programming, and it should be of interest to GPGPU hackers regardless of whether they use the language already.

For numerics experts, one of Julia’s killer features is its powerful N-dimensional array support. This extends not just to high-level “vectorised” operations like broadcasting arithmetic, but also to the inner loops in the lowest-level kernels. For example, take a CPU kernel that adds two 2D arrays:

This kernel is fast, but hard to generalise across different numbers of dimensions. The change needed to support 3D arrays, for example, is small and mechanical (add an extra inner loop), but we can’t write it using normal functions.

The @generated annotation allows us to hook into Julia’s code specialisation; when the function receives matrices as input, our custom code generation will create and run a twice-nested loop. This will behave the same as our add! function above, but for arrays of any dimension. If you remove @generated you can see the internals.

Leave a Comment