There seems to be a knowledge gap around Data Orchestration tools such as Airflow and Dagster in the Data Engineering community.   Though most Data En

You Need An Orchestrator In Your Data Stack

submited by
Style Pass
2023-01-24 22:00:05

There seems to be a knowledge gap around Data Orchestration tools such as Airflow and Dagster in the Data Engineering community.

Though most Data Engineers have a surface level understanding of what they do, they don't quite understand the value of them, and either don't always opt to include them in their stack or just use them as dumb script runners.

Scheduling jobs - You can schedule your jobs without Cron, based on time schedules or events such as when new data arrives. This moves from batch to more frequent and dynamic pipeline runs.

Clean and maintainable code - You can break your code out of proprietary tools and have it defined in Python, including classes, modules, unit testing etc. This code can be checked into source control, versioned, included in CI/CD pipelines etc.

Seperate environments from logic - You can seperate out your environment definitions from your environment details, making it easier to run the same code in dev/test/prod.

Leave a Comment