We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-perfo

Distributed out-of-memory NMF on CPU/GPU architectures

submited by
Style Pass
2023-09-16 19:30:03

We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/output latency associated with batch copies between host and device is hidden using CUDA streams to overlap data transfers and compute asynchronously, and latency associated with collective communications (both intra-node and inter-node) is reduced using optimized NVIDIA Collective Communication Library (NCCL) based communicators. Benchmark results show significant improvement, from 32X to 76x speedup, with the new implementation using GPUs over the CPU-based NMFk. Good weak scaling was demonstrated on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when decomposing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density \(10^{-6}\) .

NMF is a popular unsupervised learning method that extracts sparse and explainable latent features [1], which are often used to reveal explainable low-dimensional hidden structures that represent and classify the elements of the whole dataset [2]. NMF is used in big data analysis, which plays a crucial role in many problems, including human health, cyber security, economic stability, emergency response, and scientific discovery. With the increased accessibility to data and technology, datasets continue to grow in size and complexity. At the same time, the operational value of the information hidden in patterns in such datasets continues to grow in significance. Extracting explainable hidden features from large datasets, collected experimentally or computer-generated, is vital because the data presumably carries essential (but often previously unknown) information about the investigated phenomenon’s causality, relationships, and mechanisms. Discovering meaningful hidden patterns from data is not a trivial task because the datasets are formed only by directly observable quantities while the underlying processes or features, in general, remain unobserved, latent, or hidden [3].

Leave a Comment