In the enterprise world when someone thinks of storage, usually the first two questions to answer are the capacity and performance. Do you need a lot of storage space? Or perhaps you need less capacity but fast IO? And what is “fast”? Bandwidth? Random writes? Reads? etc…
If your budget is unlimited and you need tons of Terabytes just get a huge HDD array. If you need the fastest storage possible you can get NVMe drives or even some RAM based solutions.
One of the possible solutions is the hybrid storage, basically you can mix fast SSD or NVMe drives with gold old HDD rust. Now, one of the ways of doing that is using the ZFS file system that has the builtin “cache” mechanisms called ZIL (for writes) and L2ARC (for reads). However, that “cache” is not really cache. ZIL has a specific use case of being the “backup” for asynchronous writes that are cached in RAM first and then dumped to the backing HDD drives periodically and the L2ARC is the L2 cache for the ARC which also is kept in RAM. ZFS algorithm decides what data should and should not be cached. All that means, that despite having the SSD or NVMe drive used as “cache” with ZFS your performance improvement may not be that great afterall and depends on your specific workload.
At the beginning of 2023 I decided to work on a project that relies on the OpenStreetMap data. While using OSM for a specific region, like a country or even a continent is relatively doable on average hardware, importing data of the entire planet is a challenge in itself. Compressed planet data in the .pbf format is around 70GB, updated weekly. Import to a PostgreSQL/PostGIS can tak days(!) and the database grows to almost 2TB. Moreover, you need a high performance storage with lots of random IOPS (read/writes per second) to make it usable.