Faster ClickHouse Imports

submited by
Style Pass
2021-10-20 19:30:10

ClickHouse is the workhorse of many services at Yandex and several other large Internet firms in Russia. These companies serve an audience of 258 million Russian speakers worldwide and have some of the greatest demands for distributed OLAP systems in Europe.

This year has seen good progress in ClickHouse's development and stability. Support has been added for HDFS, ZFS and Btrfs for both reading datasets and storing table data, a T64 codec which can significantly improve ZStandard compression, faster LZ4 performance and tiered storage.

Anyone uncomfortable with the number of moving parts in a typical Hadoop setup might find assurances in ClickHouse as being a single piece of software rather than a loose collection of several different projects. For anyone unwilling to pay for Cloud hosting, ClickHouse can run off a laptop running MacOS; paired with Python and Tableau there would be little reason to connect to the outside world for most analytical operations. Being written in C++ means there are no JVM configurations to consider when running in standalone mode.

ClickHouse relies heavily on 3rd-party libraries which helps keep the C++ code base at ~300K lines of code. To contrast, PostgreSQL's current master branch has about 800K lines of C code and MySQL has 2M lines of C++. There have been 13 developers that have made at least 100 commits to the project this year. PostgreSQL has only had five developers reach the same target, MySQL has only seen two. ClickHouse's engineers have managed to deliver a new release every ten days on average for the past 2.5 years.

Leave a Comment