nim Composition Space Optimization is a high-performance tool implementing several methods for selecting components (data dimensions) in compositional

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-10 00:30:03

nim Composition Space Optimization is a high-performance tool implementing several methods for selecting components (data dimensions) in compositional datasets, which optimize the data availability and density for applications such as machine learning (ML) given a constraint on the number of components to be selected, so that they can be designed in a way balancing their accuracy and domain of applicability. Making said choice is a combinatorically hard problem when data is composed of a large number of independent components due to the interdependency of components being present. Thus, efficiency of the search becomes critical for any application where interaction between components is of interest in a modeling effort, ranging:

We are particularily interested in the latter case of materials science, where we utilize nimCSO to optimize ML deployment over our datasets on Compositionally Complex Materials (CCMs) which are largest ever collected (from almost 550 publications) spanning up to 60 dimensions and developed within the ULTERA Project (ultera.org) carried under the US DOE ARPA-E ULTIMATE program which aims to develop a new generation of ultra-high temperature materials for aerospace applications, through generative machine learning models 10.20517/jmi.2021.05 driving thermodynamic modeling and experimentation 10.2139/ssrn.4689687.

At its core, nimCSO leverages the metaprogramming ability of the Nim language to optimize itself at the compile time, both in terms of speed and memory handling, to the specific problem statement and dataset at hand based on a human-readable configuration file. As demonstrated later in benchamrks, nimCSO reaches the physical limits of the hardware (L1 cache latency) and can outperform an efficient native Python implementation over 400 times in terms of speed and 50 times in terms of memory usage (not counting interpreter), while also outperforming NumPy implementation 35 and 17 times, respectively, when checking a candidate solution.

Leave a Comment