submited by

Style Pass

A method for selecting a representative subset from a given superset by dividing the data into quantiles across multiple dimensions. The algorithm identifies samples closest to the quantile boundaries in each dimension and combines them to form the most representative subset possible. This approach ensures that the final subset captures the overall distribution of the original data across the chosen dimensions.

Calculate the required number of quantiles (q) for each dimension (d), such that q^d is approximately equal to the desired number of samples. In this example, q=⌊n^1/d⌋.

For each combination, find the sample from the superset that is closest to the combination's boundary values across all dimensions (e.g., using Euclidean distance or another appropriate metric).

Quantile-Based Representative Subsampling (QBRS) is useful in various domains for selecting a representative subset from a given superset. Some applications include:

Read more github.com/s...