So, recently, I have been exploring papers on optimization strategies for databases. In this process, I occasionally encounter fascinating new research happening in this domain. This blog post is about one such topic.
We will be exploring a 2021 paper published in SIGMOD by Marcus et al. which presents a learned system for query optimization named Bao.
One of my favorite textbooks for academic-based explorations in DBMS is "Database System Concepts" by Silberschatz et al. Chapter 13 of the book primarily focuses on query optimization. Additionally, Andy Pavlo's lecture notes on query optimization (14 and 15) provide a good overview of this topic.
According to the book, query optimization can be done at both the relational algebra level and on the query processing side. At the relational algebra level, the system tries to find a query expression that is equivalent to the given expression but with improved execution efficiency. Then, on the query processing side, a detailed strategy is composed to choose the specific indices and select the more efficient algorithms for executing an operation.