At Preset, I'm focused on better understanding how healthy communities grow around popular open-source projects as they evolve. To reach a better

Which Open-source Data Integration Tool Is Right for Your Project?

submited by
Style Pass
2021-05-27 20:30:08

At Preset, I'm focused on better understanding how healthy communities grow around popular open-source projects as they evolve. To reach a better quantitative understanding of these communities, I first need to regularly ingest data from many different sources into a platform I'm building. Writing custom scripts that extract data out of SaaS tools through API integration from scratch is often a huge pitfall. While it may seem easy to write a small python script to simply get the information you need, things get challenging as your need and APIs evolve, and doing efficient incremental loads can be tricky.

Rather than write extract, load, and transform (ELT) scripts for each data source, I wanted to see if any open-source projects out there could make this task easier. Coincidentally, a number of folks have reached out to me from the Superset community looking for help understanding the differences between two up-and-coming open-source data integration tools we have written about in the past (here and here), both of which i'm very excited about: Airbyte and Meltano. I configured, tested, and compared both of these tools for use in my data ingestion project, and I have some thoughts.

While no single open-source project yet meets all these needs, Airbyte and Meltano both cover enough to be a potentially huge time saver. Data-integration tools like these are a still-emerging part of the modern data stack designed to make data ingestion easier by consolidating the core functionality of an ingestion layer into a single tool and providing low to no-code ways of manipulating it. There are closed-source SaaS offerings like Fivetran that fit most of my wishlist, but they all require a budget and reduce the portability of your data pipelines by locking them into a closed ecosystem. There are certainly valid reasons to make that trade-off, not the least of which is top notch support for a long list of high-quality connectors. In the spirit of Preset's values, my intention is instead to build a reusable solution on freely available open-source tools that can be used as reference for organizations looking to collect and analyze data about their own communities.

Leave a Comment