Here at Outschool, our primary transactional data store is a relational Postgres database. As such, it is not uncommon that we encounter optimization problems involving database queries.
At a recent company lunch-n-learn, we unpacked a few basic concepts to help dismantle the bogey man that is the database optimizer. To some engineers, the optimizer is a mysterious black box that often has to be cajoled through sleight of hand to do the right thing. However, given a set of inputs, the optimizer is actually predictably deterministic. An understanding of how the optimizer makes decisions empowers us to make better decisions ourselves while fine-tuning query performance.
Recently, our team launched a popular feature that allows learners to self-discover classes. Learners can now search for classes and save them to their Favorites list. Imagine that we wanted to query for a list of classes saved by learners. Our query might look something like this:
Which query do you think would perform better? When I posed this question at the lunch-n-learn, the consensus was that the second query should perform better than the first, because it was restricting the users table before joining it with user_favorites. Sounds logical; and yet surprisingly, both queries actually generate the exact same query plan. Here’s what that plan looks like: