What is clustering? Explain k-means clustering algorithm.
1 Answer
  • Clustering is a data mining technique used to place data elements into related groups without advance knowledge of the group definitions.
  • Clustering is a process of partitioning a set of data in set of meaningful sub-classes, called as clusters.
  • A cluster is therefore a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters.

enter image description here

  • In this case, we easily identify the 4 clusters into which the data can be divided.

k-means algorithm:

  • K-means clustering is an algorithm to classify or to group the different object based on attributes or features into K number of group.
  • K is positive integer number(which can be decided by user)
  • Define K centroids for K clusters which are generally far away from each other.
  • Then group the elements into clusters which are nearer to the centroid of that cluster.
  • After this first step, again calculate the new centroid for each cluster based on the elements of that cluster.
  • Follow the same method, and group the elements based on new centroid.
  • In every step, the centroid changes and elements move from one cluster to another.
  • Do the same process till no element is moving from one cluster to another.


    k: number of clusters

    n :sample features vectors $x_1, x_2………x_n$

    $m_i$: the mean of the vectors in cluster i

    Assume k<n< p="">

    • Make initial guesses for the mean m_1 , m_2……..,m_k

    • Until there is no changes in any mean

      • Use the estimated means to classify the samples into clusters.

      • For I from 1 to k

      Replace m_i with the mean of all of the samples for cluster i

      • End _for
    • End _until

    • Suppose the data for clustering – 2,4,10,12,3,20,11,25

    1. Randomly assign means $m_1$=3 and $m_2$=4
    2. The number which are close to mean $m_1$=3 are grouped into cluster $k_1$ and numbers which are close to mean $m_2$=4 are grouped into cluster $k_2$
    3. Again calculate the new mean for new cluster groups
    4. $k_1$={2,3} , $k_2$= {4,10,12,20,30,11,25} , m_1=2.5, $m_2$=16
    5. $k_1$={2,3,4} , $k_2$= {10,12,20,30,11,25}, $m_1$=3, $m_2$=18
    6. $k_1$={2,3,4,10}, k_2= {12,20,30,11,25}, $m_1$=4.75, $m_2$=19.6
    7. $k_1$={2,3,4,10,11,12}, $k_2$= {20,30,25}, $m_1=7, m_2=25$
    8. $k_1$={2,3,4,10,11,12}, $k_2$= {20,30,25}
    9. Stop as clusters with these means in step 7 and 8 are same.
    10. So the final answer is $k_1={2,3,4,10,11,12}, k_2= {20,30,25}$

Please log in to add an answer.