These days, my day job is all about optimizing email deliverability, but I am fascinated by making computers faster. Originally, that was even the focus of the company I cofounded (until we pivoted to emails—got to pay the rent!). I still believe that the way we’ve built the computing ecosystem is fundamentally flawed and, in many ways, disempowering.
My focus in on the JVM (despite the JVM being undervalued by a lot) but the techniques are architecture independant (e.g. x86, ARM, Python, Javascript).
This post is to share a few reasons why I think so—the small bits of evidence that keep me fascinated by performance and make me question the status quo.
One of my early wins was optimizing Uber’s app startup time by 30% a few years ago (app startup is a key metrics for all Android app). I achieved this by using an automated tool I developed to identify slow sections of the code. The tool could then prove (or disprove) that certain code sections weren’t needed “soon” (a complicated question that would need a few pages to answer) and could be moved to run asynchronously.
I applied the same core algorithm to detecting “too coarse” synchronized blocks (e.g. synchronizing a whole instance instead of a field of a field of a field). The result: a 3x speed improvement in Android display speed. It’s because there was a “too coarse synchronized” in a key Android class that is used a lot by most apps (i.e. all Android apps but the ones on Flutter and Unity).