daskqueue is small python library built on top of Dask and Dask Distributed that implements a very lightweight Distributed Task queue. Think of this l

GitHub - AmineDiro/daskqueue: Distributed Task Queue based Dask

submited by
Style Pass
2023-03-14 15:00:05

daskqueue is small python library built on top of Dask and Dask Distributed that implements a very lightweight Distributed Task queue.

Think of this library as a simpler version of Celery built entirely on Dask. Running Celery on HPC environnment (for instance) is usually very tricky whereas spawning a Dask Cluster is a lot easier to manage, debug and cleanup.

Dask is an amazing library for parallel computing written entirely in Python. It is easy to install and offer both a high level API wrapping common collections (arrays, bags, dataframes) and a low level API for written custom code (Task graph with Delayed and Futures).

For all its greatness, Dask implements a central scheduler (basically a simple tornado eventloop) involved in every decision, which can sometimes create a central bottleneck. This is a pretty serious limitation when trying use Dask in high throughput situation. A simple Task Queue is usually the best approach when trying to distribute millions of tasks.

The daskqueue python library leverages Dask Actors to implement distributed queues with a simple load balancer QueuePool and a Consummer class to consumme message from these queues.

Leave a Comment