1
5.2kviews
Explain K Means clustering algorithm? Apply K Means algorithms for the following data set with two clusters. Data set = {1, 2, 6, 7, 8, 10, 15, 17, 20}
1 Answer
0
451views

K Means Clustering

  • K-Means clustering is an unsupervised iterative clustering technique.
  • It partitions the given data set into k predefined distinct clusters.
  • A cluster is defined as a collection of data points exhibiting certain similarities.

K Means Clustering

  • It partitions the data set in such a way that -
    • Each data point belongs to a cluster with the nearest mean.
    • Data points belonging to one cluster have a high degree of similarity.
    • Data points belonging to different clusters have a high degree of dissimilarity.

Algorithm for K-Means Clustering -

Step 1 -

  • Choose the number of clusters K.

Step 2 -

  • Randomly select any K data points as cluster centers.
  • Select cluster centers in such a way that they are as farther as possible from each other.

Step 3 -

  • Calculate the distance between each data point and each cluster center.
  • The distance may be calculated either by using the given Distance Function or by using Euclidean Distance.

Step 4 -

  • Assign each data point to some cluster.
  • A data point is assigned to that cluster whose center is nearest to that data point.

Step 5 -

  • Re-compute the center of newly formed clusters.
  • The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

Step 6 -

  • Keep repeating the procedure from Step 3 to Step 5 until any of the following stopping criteria is met:
    • Center of newly formed clusters does not change.
    • Data points remain present in the same cluster.
    • A maximum number of iterations are reached.

The given Data set = {1, 2, 6, 7, 8, 10, 15, 17, 20}

K = 2


Iteration - 1

Step 1 -

  • Randomly select cluster centers in such a way that they are as farther as possible from each other.
  • Therefore,

$$m1 = 6, m2 = 17$$

Step 2 -

  • Calculate the distance between each data point and each cluster center.
  • A data point is assigned to that cluster whose center is nearest to that data point.
Data Points Distance from center m1 = 6 for Cluster 1 Distance from center m2 = 17 for Cluster 2 Data Points belongs to cluster
1 |6 - 1| = 5 |17 - 1| = 16 K1
2 |6 - 2| = 4 |17 - 2| = 15 K1
6 |6 - 6| = 0 |17 - 6| = 11 K1
7 |7 - 6| = 1 |17 - 7| = 10 K1
8 |8 - 6| = 2 |17 - 8| = 9 K1
10 |10 - 6| = 4 |17 - 10| = 7 K1
15 |15 - 6| = 9 |17 - 15| = 2 K2
17 |17 - 6| = 11 |17 - 17| = 0 K2
20 |20 - 6| = 14 |20 - 17| = 3 K2
  • Therefore, two clusters K1 and K2 can be formed as follows:

$$ K1 = \{1, 2, 6, 7, 8, 10\}$$

$$K2 = \{15, 17, 20\}$$

Step 3 -

  • Re-compute the center of newly formed clusters.
  • The center of a cluster is computed by taking the mean of all the data points contained in that cluster.
  • Therefore,

$$m1 = \frac {(1 + 2 + 6 + 7 + 8 + 10)}{6} = \frac {34}{6} = 5.67$$

$$m2 = \frac{(15 + 17 + 20)}{3} = \frac{52}{3} = 17.34$$


Iteration - 2

Step 4 -

  • Repeat the procedures of Step 2 and Step 3.
  • Therefore,
Data Points Distance from center m1 = 5.67 for Cluster 1 Distance from center m2 = 17.34 for Cluster 2 Data Points belongs to cluster
1 |5.67 - 1| = 4.67 |17.34 - 1| = 16.34 K1
2 |5.67 - 2| = 3.67 |17.34 - 2| = 15.34 K1
6 |6 - 5.67| = 0.33 |17.34 - 6| = 11.34 K1
7 |7 - 5.67| = 1.33 |17.34 - 7| = 10.34 K1
8 |8 - 5.67| = 2.33 |17.34 - 8| = 9.34 K1
10 |10 - 5.67| = 4.33 |17.34 - 10| = 7.34 K1
15 |15 - 5.67| = 9.33 |17.34 - 15| = 2.34 K2
17 |17 - 5.67| = 11.33 |17.34 - 17| = 0.34 K2
20 |20 - 5.67| = 14.33 |20 - 17.34| = 2.66 K2
  • Therefore, we again get the two similar clusters K1 and K2 as follows:

$$ K1 = \{1, 2, 6, 7, 8, 10\}$$

$$K2 = \{15, 17, 20\}$$

  • Center of these newly formed clusters is also the same as the previous one.

$$m1 = \frac {(1 + 2 + 6 + 7 + 8 + 10)}{6} = \frac {34}{6} = 5.67$$

$$m2 = \frac{(15 + 17 + 20)}{3} = \frac{52}{3} = 17.34$$


  • Here we stopped after the 2 - Iterations because

    • The Center of newly formed clusters does not change
    • Data points remain present in the same clusters.
  • After 2 - Iterations we get the 2 - Clusters with their Center Points are as follows:

$$ K1 = \{1, 2, 6, 7, 8, 10\} with\ center\ m1 = 5.67$$

$$K2 = \{15, 17, 20\} with\ center\ m2 = 17.34$$

Please log in to add an answer.