While I'm developing Universql, I had a chance to learn more about the up-to-date comparison of data warehouses. It's hard to do benchmarks

Part 1: Comparing the pricing models of modern data warehouses

submited by
Style Pass
2024-09-26 23:00:03

While I'm developing Universql, I had a chance to learn more about the up-to-date comparison of data warehouses. It's hard to do benchmarks for databases, but it's harder to do comparisons on their pricing. There are 3 major things you need to consider:

Often providers such as Redshift and Snowflake use their own proprietary database formats so while the pricing is based on the data volume, it's hard to estimate how much data space you will need before actually using them. However; looking at the storage pricing for Snowflake, S3, and BigQuery the margin is pretty small for the storage. From Snowflake, we know that the storage cost is usually < 10% of the total cost of data warehouses. Also, the compaction for the files don't make huge difference (~30%) in the storage according to Clickbench.

Some charge based on the data your query processed, and some charge based on the compute units you use under the hood (warehouse, slot, etc.) The performance for specific operations such as ingestion, transformation, querying small tables vs big tables, the use of specific SQL syntax such as WINDOW functions have huge impact on the cost due to the way underlying engines implement them. Also, the performance/cost changes over time with the software updates.

Leave a Comment