One of the unique characteristics of the recently launched Crunchy Bridge for Analytics is that it is effectively a hybrid between a transactional and

Syncing Postgres Partitions to Your Data Lake in Bridge for Analytics

submited by

Style Pass

2024-05-08 14:00:02

One of the unique characteristics of the recently launched Crunchy Bridge for Analytics is that it is effectively a hybrid between a transactional and an analytical database system. That is a powerful tool when dealing with data-intensive applications which may for example require a combination of low latency, high throughput insertion, efficient lookup of recent data, and fast interactive analytics over historical data.

A common source of large data volumes is append-mostly time series data or event data generated by an application. PostgreSQL has various tools to optimize your database for time series, such as partitioning, BRIN indexes, time functions, and its native heap storage format is well-suited for bulk writes. However, there is a limit to what PostgreSQL can do with large data volumes, especially in terms of performance of analytical queries on large data sets, and the operational overhead of storing a large amount of historical data in your database.

This blog post describes an end-to-end solution for storing recent event data in PostgreSQL using time-partitioning, and then copying those time partitions into your data lake, and running fast analytical queries, all on the same Bridge for Analytics instance.