The K Means algorithm is by far the most popular, by far the most widely used clustering algorithm, and in this video I would like to tell you what the K Means Algorithm is and how it works. The K means clustering algorithm is best illustrated in pictures.

In case of k-means clustering, the curse of dimensionality results in difficulty in clustering data due to vast data space. For example, with Euclidean space as a proximity measure, two data points that may be very dissimilar could be grouped together because, due to too many dimensions, somehow, their net distance from the centroid is comparable.

K-Means finds the best centroids by alternating between (1) assigning data points to clusters based on the current centroids (2) chosing centroids (points which are the center of a cluster) based on the current assignment of data points to clusters. Figure 1: K-means algorithm.

In k-means clustering, each cluster is represented by its center (i.e, centroid) which corresponds to the mean of points assigned to the cluster.

The cluster centers are also called means because it can be shown that, when the clustering is optimal, the centers are the means of the corresponding data points. The cluster centers and assignments can be visualized as follows.

Hi We will start with understanding how k-NN, and k-means clustering works. K-Nearest Neighbors (K-NN) k-NN is a supervised algorithm used for classification. What this means is that we have some labeled data upfront which we provide to the model.

Clustering and the K-means algorithm Yihui Saw 18.304 Seminar Talk I March 6, 2013 Saturday, March 16, 13.