At Glance, we run recommender systems that rank content on the lock screens of over 150+ million users. Not all users have the same recommendation algorithm. We call each recommendation algorithm a Prediction Service (PS).
First is super easy but is also crazy expensive. The second is much harder as no silver bullet exists to solve this. Also, note that our Prediction Services are written in Python, which leaves you with only a handful of tricks to add more speed.
To solve for 2, I decided to implement one of the largest PS (an LR model which does ~1.5 million predictions/second with 20% traffic) in a compiled language. After a bit of research, I decided to write it in Rust. Why? because:
This did not make sense! After a certain load, the model latencies started rising exponentially. Note that the Python PS was able to easily do ~160 RPS.
I spent a couple of days digging deeper and found this epic blog by ScyllaDB on their debugging experience with Rust. I had a new shiny tool in my arsenal: Flamegraphs!