1.1 Billion Taxi Rides in ClickHouse on DoubleCloud

submited by
Style Pass
2022-09-22 06:30:34

DoubleCloud is a newly-launched managed database provider. With a headcount of 50, they have development and sales offices in Berlin, Prague and Tel Aviv. Further office openings in the US, UK and France are expected in the next few months. DoubleCloud lets you run ClickHouse and Kafka on your own Cloud account. They provide a UI to set up and scale while software and operating system updates are handled automatically.

DoubleCloud also offers consulting services to help migrate existing workloads to their offering. Earlier this year, I teamed up with them to help Paris-based Enterprise SEO provider Botify understand what it would take to move their 5 PB data warehouse from BigQuery to DoubleCloud.

The key to being able to improve Botify's workload performance while saving ~5x on storage costs was the S3 storage engine for ClickHouse. This allows storage to scale automatically, minimises maintenance and typically only incurs a 2-3x performance penalty over DoubleCloud's EBS-based storage option.

The S3 support in DoubleCloud includes a Hybrid storage engine. When you launch a cluster, an S3 bucket is set up automatically. When your cluster detects it's running low on local storage space it'll automatically transfer your data off to S3. This migration won't require any existing queries to be rewritten. Communication with S3 is optimised with a caching layer which can speed up queries by 4.5x in some cases once the cache has been warmed up.

Leave a Comment