Data replication has been around almost since the birth of databases. Early uses included integration between heterogeneous systems and data centraliz

MySQL CDC and Database Replication for the Data Lake Age

submited by
Style Pass
2021-07-06 20:30:06

Data replication has been around almost since the birth of databases. Early uses included integration between heterogeneous systems and data centralization (via the enterprise data warehouse). 

Change Data Capture (CDC) is not a recent invention, either. It was devised to identify and capture changes happening in the database. Much has already been said about it, and probably a lot more will be said, but the main idea remains the same: CDC is an efficient way to unsilo and democratize an organization’s data, so the enterprise can derive business value from its data,, rather than just hoarding it or keeping a copy to please the auditors. Heck – in a lot of cases CDC has even brought enterprise teams closer, so instead of fighting about the scope of responsibilities, people actually change their mindset to start collaborating to maximize the value of their shared data.

What is fresh is the paradigm shift from a locked and heavily guarded data warehouse (DW) to a data lake (DL), or these together in a Data Lakehouse architecture. Moving away from batch-oriented data distribution to streaming data pipelines further makes possible more ways of leveraging this data, with both change propagation and event-based architectures built on top of the same stream. Thus a CDC capability becomes essential to any modern data platform. 

Leave a Comment