This is the second post in a three part series on our migration from Airflow to our own custom DAG orchestration system. To find more context regardin

Building a Custom DAG Orchestration System for Experimentation

submited by
Style Pass
2024-07-27 02:00:12

This is the second post in a three part series on our migration from Airflow to our own custom DAG orchestration system. To find more context regarding how we use Airflow, the challenges we faced in using it, and the process we followed which led us to decide to write our own replacement, check out this post.

As we set out to build a replacement for Airflow, capable of handling 50,000 concurrent experiments running on the Eppo platformed, we IDed a number of a requirements: 

We envisioned a solution built on top of queues which takes inspiration from Airflow’s scheduler and workers, yet also takes advantage of the fact that we do not need the flexibility of computation environments that Airflow provides. We embrace the monolith at Eppo. We also push as much computation to the data warehouse as possible, which reduces the need to support computational environments other than our NodeJS monolith.

At the heart of every scalable orchestration system is some sort of queue for maintaining what tasks need to be executed. We decided to use Redis to store the contents of our queue as it is blazing fast and battle tested. We interface with Redis via the BullMQ library, which has a great set of APIs for using queues in Redis.

Leave a Comment