Industrial environments often need deterministic instruction execution, which poses a challenge for general compute systems. Despite appearing to exec

Performance tuning at the edge using Cache Allocation Technology

submited by

Style Pass

2024-10-21 05:30:04

Industrial environments often need deterministic instruction execution, which poses a challenge for general compute systems. Despite appearing to execute instructions instantly, a user process receives only a commitment for execution as soon as a CPU becomes available, which in a general compute system is dependent on other processes competing for the same resources (CPU, memory and others).

Although this is usually acceptable, extremely critical applications, such as stopping an industrial process or relocating a robotic arm, require a guarantee of when execution will occur. This problem is usually solved by using special purpose compute systems, which might have specially-built hardware and a tightly controlled software environment.

In electronics, jitter is the deviation from true periodicity of a periodic signal. For CPUs we can measure the execution time variation of a CPU test workload. The performance of the workload is impacted by kernel interrupts. Minimizing these interrupts also minimizes the jitter that applications could potentially experience. Intel’s caterpillar benchmark measures the execution time variation of a memory-intensive workload. Using Cache Allocation Technology (CAT) improves application performance by assigning CPU affinity to cache ways, which can be dedicated to real-time applications.

The caterpillar binary does a function pointer chasing test and measures the execution time variation, as it reads data from memory. As long as the data is kept in the cache, the access time to memory is fast and execution time is short. If there is another process sharing the cache, heavily accessing memory and thus evicting benchmark cache lines (aka "trashing the cache"), the performance of the benchmark workload is impacted by these cache misses. The benchmark needs to wait until the data is fetched from RAM, significantly slowing memory access and increasing execution time. A Python script is used to launch caterpillar alongside a stress application (stress-ng, available in the Red Hat Enterprise Linux (RHEL) and Red Hat Device Edge standard repositories) simulating a noisy neighbor scenario.