I posted a few days ago a screenshot of the long shader ISA code produced by the RGA compiler for a single atan2() instruction. The post got quite a large engagement and it felt like a lot of people were surprised by the fact, so I decided to write a post to discuss the “hidden” cost of shader instructions a bit more.
For the following I am referring to GCN/RDNA architectures and most ISA was produced using https://godbolt.org/. To aid the discussion I have, quite unscientifically, assigned the cause of the “hidden” the cost of shader instructions broadly 3 to categories:
Let’s start with the first category, an instruction doesn’t have a hardware (native) implementation and needs to be implemented using a, sometimes large, number of native instructions. This is very common cause of “hidden” cost and can take people by surprise. Inverse trigonometric functions (acos, asin, atan, atan2) don’t have a native implementation, this is for eg the RDNA ISA code produced for a single atan2:
Admittedly this is one of the most extremes examples, not all inverse trigonometric functions expand to so many instructions. It is not only inverse trigonometric instructions that are expanded into many native ones, tan() has no native implementation as well, it is calculated using cos and sin instructions, which have: