pg_parquet - Postgres To Parquet Interoperability

submited by
Style Pass
2024-11-28 21:00:04

pg_parquet is a new extension by Crunchy Data that allows a PostgreSQL instance to work with Parquet files. With pg_duckdb, pg_analytics and pg_mooncake all of which can access Parquet files, is there need for yet another extension?

Well actually if you don't need the full strength of duck_db behind the covers, but just want to import a Parquet file as a Postgres table and work with it, then this is the appropriate extension for you. In other words, pg_parquet is the lightweight counterpart to the other mentioned extensions that does not summon a duck_db instance. Instead it does just a few things, but does them right:

-- Copy a query result into Parquet in S3 COPY (SELECT * FROM table) TO 's3://mybucket/data.Parquet' WITH (format 'Parquet');

After installing a Postgres instance, you need to set up rustup and cargo-pgrx to build the extension; full instructions on its Github repo.

Leave a Comment
Related Posts