By now, it is well known that Spectral Core database migration tooling is the most performant on the market, by some margin. Benchmarks would be good

10 million records per second, on-premises to the cloud

submited by
Style Pass
2024-04-30 12:00:05

By now, it is well known that Spectral Core database migration tooling is the most performant on the market, by some margin. Benchmarks would be good to back up such bold claims, but benchmarks are tricky and take time so let's use the next best thing - some screenshots.

When migrating data from on-premises servers to the cloud, we are typically talking about a lot of data. Tens on terabytes are not uncommon at all, nor are thousands of tables in a database.

Omni Loader is designed to remove as much complexity as possible from the migration process. It creates the target schema and loads all the data with no need of option tweaking - even for the largest and most complex projects.

We fight WAN latency with high parallelism. Multiple tables are loaded in parallel - even any table at a time is dynamically sliced into many parts copied in parallel. As disks are much slower than CPU and RAM, we completely avoid spilling out to disks during the data transformation. Data is processed completely in-memory.

To ingest data into warehousing solutions such as Microsoft Fabric, Google BigQuery, Snowflake, and others, one needs to prepare data for efficient ingestion. Parquet is a very good choice for intermediate data format because it is columnar and highly compressed. We choose our slice sizes heuristically in such a way that data warehouse can quickly ingest that data. As data is compressed on-premises (Omni Loader can run in a walled-garden), only the compressed data travels to the cloud. With columnar data format, compression is extremely efficient - typically you will end up with just 20% of the data size. As data upload is a bottleneck (right after CPU), moving 5x less data is a good thing.

Leave a Comment