The problem is that the tree structure, and the file path listing just below it, are totally different ways of sorting. Filesystems walk the tree by l

Representing filesystems in databases efficiently with Hierarchical Ordering

submited by
Style Pass
2024-11-16 18:30:04

The problem is that the tree structure, and the file path listing just below it, are totally different ways of sorting. Filesystems walk the tree by level, where as databases walk their B-Trees by order (and are constantly rebalanced).

S3 isn’t just blobs of your files, there’s a key-value database that keeps track of your files as well. And it’s the thing you hit first every time you make a request to S3.

The problem is that if we wanted to treat S3 as a filesystem, and did an ls /, we’ve potentially just asked our S3 client to make unlimited requests.

This means that all children of the first “directory” that would appear will have to be sorted through before getting to the next.

Or in other words, if a/ has 1 million files beneath it, then we have to do 1,000 ListObjectV2 requests before we even see the b/ paths.

If you’re trying to mount S3 as a filesystem, this could result in atrocious performance (among other shortfalls trying to use S3 as a filesystem).

Leave a Comment