We design a classification algorithm called Clustering Aware Classification (CAC), to find clusters in data that are tailor made to be easily classifi

shivin9 / CAC

submited by
Style Pass
2021-06-23 11:00:07

We design a classification algorithm called Clustering Aware Classification (CAC), to find clusters in data that are tailor made to be easily classifiable when used as training datasets by classifiers for each underlying subpopulation. CAC is theoretically motivated, efficient, convergent and provably guaranteed to improve the performance of classifiers using the Logistic Loss functions. The CAC framework improves the performance of 9 different Machine Learning classifiers on 5 standard and 1 large real world dataset.

CAC problem setting. Data points (here p1) are selected iteratively and assigned to clusters (here C1) based on the cluster update equations. At testing time, x* is assigned to the cluster that lies nearest to x*.

CAC expects every dataset in its separate folder within the data folder. X.csv denotes the comma-separated data file and y.csv contains the corresponding binary labels for the data points.

Leave a Comment
Related Posts