UMAP is a popular dimension reduction algorithm used in fields like bioinformatics, NLP topic modeling, and ML preprocessing. It works by creating a k

Even Faster and More Scalable UMAP on the GPU with RAPIDS cuML

submited by
Style Pass
2024-11-02 19:00:06

UMAP is a popular dimension reduction algorithm used in fields like bioinformatics, NLP topic modeling, and ML preprocessing. It works by creating a k-nearest neighbors (k-NN) graph, which is known in literature as an all-neighbors graph, to build a fuzzy topological representation of the data, which is used to embed high-dimensional data into lower dimensions. 

RAPIDS cuML already contained an accelerated UMAP, which provided significant speed improvements over the original CPU-based UMAP. As we demonstrate in this post, there was still room for improvement. 

In this post, we explore how to use the new features introduced in RAPIDS cuML 24.10. We also dive into the details of the NN-descent algorithm and the batching process. Finally, we share benchmark results to highlight possible performance gains. By the end of this post, we hope you are excited about the benefits that RAPIDS’ faster and scalable UMAP can provide.

One challenge we faced is that the all-neighbors graph-building phase takes a long time, especially in comparison to the other steps in the UMAP algorithm. 

Leave a Comment