Over the last decades, the volume of data to be processed keeps growing, from megabytes to gigabytes, terabytes, and now petabytes: one petabyte holds

From Shared Nothing to Shared Disk: Build a Fully Flexible Data System on Cloud Services

submited by
Style Pass
2024-10-23 15:30:04

Over the last decades, the volume of data to be processed keeps growing, from megabytes to gigabytes, terabytes, and now petabytes: one petabyte holds 1,000,000,000,000,000 bytes (1e15 bytes). Roughly, a Petabyte is the equivalent of 20 million tall filing cabinets or 500 billion pages of standard printed text.

Obviously, such a massive amount of data cannot be processed or even stored in a single machine. To address this issue, two distributed architectures have been proposed: Shared Nothing and Shared Disk.

This article will explain Shared Nothing and Shared Disk architectures and how they perform in the cloud environment. Based on these experiences, I will provide a design blueprint for a fully flexible database built on cloud services.

Big data technologies come here to solve the massive data problem by sharding data into multiple storage machines and distributing computation across those machines. Such technologies are typically referred to as “Shared Nothing.”

Many other big data systems, including Apache Hadoop, Apache HBase, Apache Cassandra, and Apache Kafka, follow the Shared Nothing architecture.

Leave a Comment