There are a number of reasons one might need to move data off of a cluster, such as migrating to a different ClickHouse cluster or DB. In our case, we

ClickHouse — moving data between clusters without adding load

submited by
Style Pass
2024-11-27 15:30:25

There are a number of reasons one might need to move data off of a cluster, such as migrating to a different ClickHouse cluster or DB. In our case, we were migrating from an unsharded to a sharded ClickHouse cluster. I am a software engineer at Triple Whale, and we use ClickHouse as our main production database.

There are a few options for moving the data — see this Altinity post. However, each of them puts read load on the cluster you are migrating from — this is true whether the tool operates at the DB level, such as clickhouse-copier (deprecated), or at the file-level, such as clickhouse-backup.

Since our cluster is used in production, it was not practical to add this read load. We opted to create a new instance with a full copy of the instance from a disk image. Below are the steps we took (we use GCP, so I’ve written out the details according to that).

Leave a Comment