At Airtable, MySQL replication plays a crucial role in scaling our reads and achieving our long-term vision for high availability. Having low-lag repl

Optimizing MySQL Replication Lag with Parallel Replication and Writeset-based Dependency Tracking

submited by
Style Pass
2024-09-23 19:00:05

At Airtable, MySQL replication plays a crucial role in scaling our reads and achieving our long-term vision for high availability. Having low-lag replicas allows us to offload a significant chunk of read traffic to replicas and eases the operational burden of performing lag-sensitive tasks such as failovers and blue-green deployment switchover.

Our largest database is a monolithic MySQL 8 instance (we previously blogged about upgrading main from MySQL 5.6 to 8), deployed on AWS RDS as a Multi-AZ deployment with one standby. It handles approximately 50,000 read queries per second (QPS) and 150 writes per second, with occasional spikes reaching up to 1,000 writes per second. While this is a very manageable load for the instance type we are using, we have had repeated hours-long replication lag spikes, making it unsuitable for live interactive queries. We wanted to solve this problem to effectively scale to meet our growing customer usage. Over the last two quarters, we spent a significant amount of time investigating and tuning our replication configuration and successfully brought the replication lag down to consistent single-digit seconds. We also built a binary log parser, which we used to guide this tuning process.

Replication is the mechanism used by MySQL to propagate changes from a primary instance to replica instances. Let’s look at how a single transaction is replicated from a MySQL primary to a replica.

Leave a Comment