That lead me to an unreasonable amount of overhead in ibus-daemon. Seems odd that something so critical to our typing latency would require so many CPU samples.
The first thing I do when diving into a problem like this is to record a few things with Sysprof and just have a look around the flamegraphs. That helps me get a good feel for the code and how the components interact.
The next thing I do after finding what I think is a major culprit, is to sort of rewrite a minimal version of it to make sure I understand the problem with some level of expertise.
Last time, I did this by writing a minimal terminal emulator so I could improve VTE (and apparently that made people … mad?). This time, my eyes were set on GVariantBuilder.
I was surprised to learn that GVariantBuilder does not write to a serialized buffer while building. Instead, it builds a tree of GVariant*.
That sets off my antennae because I know from experience that GVariant uses GBytes and GBytes uses malloc and that right there is 3 separate memory allocations (each aligned to 2*sizeof(void*)) just to maybe store a 4-byte int32.