Based on Apahce Spark 3.1.1 version, this article describes the principles of AQE adaptive query optimization, as well as the pain points encountered

Improved efficiency by 7 times, the in-depth practice and improvement of Apache Spark adaptive query optimization in NetEase

submited by
Style Pass
2021-05-19 12:58:25

Based on Apahce Spark 3.1.1 version, this article describes the principles of AQE adaptive query optimization, as well as the pain points encountered and thoughts made by NetEase Shufan in AQE practice.

Adaptive Query Execution (AQE) is one of the major features introduced in Spark 3.0. It can dynamically optimize the user's SQL execution plan at runtime, greatly improving the performance and stability of Spark jobs. AQE includes multiple sub-features such as dynamic partition merging, automatic optimization of Join data tilt, dynamic Join strategy selection, etc. These features can save users a lot of painful processes that need to be manually tuned one by one according to the workload, or even modify business logic, which greatly improves The ease of use and flexibility of Spark itself.

As the creator of NetEase's big data basic software, the NetEase YouShu team under NetEase Shufan has been paying attention to its application since the birth of AQE. The first system to apply AQE is Kyuubi. Kyuubi is an enterprise-level data lake exploration platform open sourced by NetEase. It implements a multi-tenant SQL on Hadoop query engine based on Spark SQL. Within NetEase, based on Kyuubi's C/S architecture, under the premise of ensuring SQL compatibility, the server can smoothly upgrade the Spark version, and quickly empower users with the latest optimizations and enhancements in the community and internal. Starting from Spark 3.0.2, NetEase has gradually tried and promoted the features of AQE in the production environment. After Spark 3.1.1 was released, AQE has become the default execution method for users in the Kyuubi production environment. In this process, we also helped a business migrate 1500+ Hive historical tasks to Spark 3.1.1 end-to-end, which not only reduced the amount of resources by half, but also shortened the total execution time by more than 70%. See the performance improvement of more than 7 times.

Of course, as a "new" feature, AQE has been found to be unsatisfactory in many aspects in practice, and there is still a lot of room for optimization. Adhering to the open source strategy, NetEase has worked hard to share the problems encountered by the team with the Spark community, and integrate our optimization efforts into the community. In the following chapters, we will introduce the practical experience and optimization improvements of AQE features in NetEase over the past six months.

Leave a Comment