K-Means Clustering in Klong

submited by
Style Pass
2024-02-27 17:30:13

As I continue to spiral out of control with my exploration of array programming I've recreated some rudimentary "unsupervised machine learning" from scratch.

Because I am not an expert in either K-Means Clustering or Klong, I thought to start with at least one known quantity, the data set. I'll be using Fisher's iris data because it is so widely known and studied. I thought (correctly!) that I might need to fall back to checking my work before moving on to better or more interesting instances of clustering.

The use of this data set in cluster analysis however is not common, since the data set only contains two clusters with rather obvious separation. One of the clusters contains Iris setosa, while the other cluster contains both Iris virginica and Iris versicolor and is not separable without the species information Fisher used. This makes the data set a good example to explain the difference between supervised and unsupervised techniques in data mining:

While there is technically a CSV reader in Klong's "standard library", I couldn't work out the necessary incantation to read anything but string values. Instead I wound up with a hack-y solution below to pre-prepare the data in a compatible format:

Leave a Comment