A group of users were trying to implement a simple, terminal-based video game and found the performance under Windows Terminal to be entirely unsuitable for such a task. The performance issue could be replicated by repeatedly drawing a “rainbow” and measuring how many frames per second (FPS) we can achieve. The one below has 20 distinct colors and could be drawn at around 30 FPS on my Surface Book with an Intel i7-6700HQ CPU. However, if we draw the same rainbow with 21 or more distinct colors it would drop down to less than 10 FPS. This drop is consistent and doesn’t get worse even with thousands of distinct colors.
Initially the culprit wasn’t immediately obvious of course. Does the performance drop because we’re misusing Direct2D or DirectWrite? Does our virtual terminal (VT) sequence parser have any issues with quickly processing colors? We usually begin any performance investigations with Windows Performance Analyzer (WPA). It requires us to create an “.etl” trace file, which we can be done using Windows Performance Recorder (WPR).
The “Flame by Process” view inside WPA is my personal favorite. In a flame graph each horizontal bar represents a specific function call. The widths of the bars correspond to the total CPU time spent within that function, including time spent in all functions it calls recursively. This makes it trivial to visually spot changes between two flame graphs of the same application, or to find outliers which are easily visible as overly wide bars.