Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It was originally developed by the engineering team at Airbnb but was given to the Apache Software Foundation where it is licensed under Apache 2.0.
Airflow is commonly used in data engineering and data science pipelines to automate the execution of tasks, such as data transformation, loading and analysis. It is also used in other industries, such as finance, healthcare and e-commerce, to automate business processes.
Airflow is very flexible with regard to what it can connect to. This includes data lakes, data warehouses, databases, APIs and, of course, object stores. It excels in those use cases that benefit from data-pipelines-as-code, such as:
Airflow is written in Python and uses a directed acyclic graph (DAG) to represent the workflow. Each node in the DAG represents a task, and the edges between the nodes represent dependencies between the tasks. A DAG does not care about the tasks themselves, just the order, number of retries etc. A complex DAG can become brittle and difficult to troubleshoot, particularly if there are dozens of tasks that must be managed by the architect.