We have been invited to chat about the content in this blog post on the Trino Community Broadcast. Enjoy the live stream on February 17, 2022. As one

Enabling Highly Available Trino Clusters at Goldman Sachs

submited by
Style Pass
2022-05-14 01:00:06

We have been invited to chat about the content in this blog post on the Trino Community Broadcast. Enjoy the live stream on February 17, 2022.

As one of the Data Platform teams at Goldman Sachs (GS), we are responsible for making accurate and timely data available to our analytics and application teams. At GS, we work with various types of data, such as transaction-related data, valuations, and product reference data from external vendors. These datasets can reside in multiple heterogeneous data sources like HDFS, S3, Oracle, Snowflake, Sybase, Elasticsearch, and MongoDB. Each of these options presents datasets in different ways, each of which must be individually dealt with. The challenge we encountered was how to consistently make these varied datasets from different sources centrally available to our data science team for analytics purposes.

Trino, an open source distributed SQL query engine, gives users the ability to query various datasets through a single uniform platform. Users can connect to multiple data sources and perform federated joins with its connector-based architecture, eliminating ETL development and maintenance costs.

Leave a Comment