In the previous post, we started covering some parser optimizations. There’s just a handful more to cover until we reached what’s the state of the currently released version of ruby/json.
Something that was bothering me in that profile, was the whopping 26.6% of time spent in rb_hash_aset, which is the C API for Hash#[]=.
It wasn’t really surprising to me though. I’m sure you’ve heard about some super fast JSON parsers like simdjson, rapidJSON etc, Some of you may have wondered why I didn’t just do a binding of one of these to make ruby/json faster. Aside from many technical and legal restrictions, a big reason is that actually parsing JSON isn’t that much of a bottleneck, even the fairly naive Ragel parser in ruby/json isn’t that slow (It could be way better though, but more on that later).
No, the really expansive part is building the Ruby objects tree, as evidenced by the time spent in rb_hash_aset on that flame graph, but also in rb_ary_push on benchmarks that use a lot of arrays. So a custom parser can potentially end up faster overall by being better tailored to make efficient use of Ruby APIs, such as how we do unescaping inside Ruby strings to avoid extra copies and cache string keys.