The Climate Data Science Lab (funded by the Gordon and Betty Moore Foundation) was founded to create a positive feedback loop between practicing clima

Dask.distributed and Pangeo: Better performance for everyone thanks to science / software collaboration

submited by
Style Pass
2024-10-15 08:00:05

The Climate Data Science Lab (funded by the Gordon and Betty Moore Foundation) was founded to create a positive feedback loop between practicing climate scientists and developers of open-source scientific software. Motivated by the explosion in the size of scientific datasets needing analysis, the idea was to improve the software tools available in the Pangeo stack by creating a group with people who could both use the software to do scientific research and write the necessary code.

For some software packages, this approach of having scientist-developer contributors has worked pretty well. In particular Xarray and Zarr have both benefited hugely from having a tight loop between users and developers, or even a core development team containing users who are practicing scientists.

Central to the Pangeo vision was the ability for users to easily scale up analysis from data small enough to fit on a single machine to workloads requiring distributed systems, analyzing many terabytes or even petabytes of data. As something of a scientist myself, this would be transformative — all we really want out of computers is to write down our analysis method clearly, and then simply apply it to all of our available data.

Leave a Comment
Related Posts