TL;DR: A new class of hierarchical distributed file system with scaleout metadata has taken over at Google, Facebook, and Microsoft, and provides a si

Scaleout Metadata File Systems already store much of your data. What are they?

submited by
Style Pass
2021-06-05 05:00:06

TL;DR: A new class of hierarchical distributed file system with scaleout metadata has taken over at Google, Facebook, and Microsoft, and provides a single centralized file system that manages the data for an entire data center, scaling to Exabytes in size. The common architectural feature of these systems is scaleout metadata, so we call them scaleout metadata file systems.

Hierarchical file systems typically provide well-defined behaviour (a POSIX API) for how a client can securely create, read, write, modify, delete, organize, and find files.

The data in such file systems is stored in files as blocks or extents. Distributed file systems spread and replicate these blocks/extents over many servers for improved performance and high availability. However, the data about what files, directories, blocks, and file system permissions are in the system have historically been stored in a single server called the metaserver or namenode. We call this data about the file system objects metadata. In file systems like HDFS, the namenode stores its metadata in-memory to improve both latency and throughput in the number of metadata operations it can support per second. Example metadata operations are: create a directory, move or rename a file or directory, change file permissions or ownership.

As the size of data under management by distributed file systems increased, it was quickly discovered that metadata servers became a bottleneck. For example, HDFS could scale to, at a push, a Petabyte, but not handle more than 100K reads/sec and only a few thousand writes/sec.

Leave a Comment