Question: What is clustering? Explain k-means clustering algorithm.

0

2

- Clustering is a data mining technique used to place data elements into related groups without advance knowledge of the group definitions.
- Clustering is a process of partitioning a set of data in set of meaningful sub-classes, called as clusters.
- A cluster is therefore a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters.

- In this case, we easily identify the 4 clusters into which the data can be divided.

**k-means algorithm:**

- K-means clustering is an algorithm to classify or to group the different object based on attributes or features into K number of group.
- K is positive integer number(which can be decided by user)
- Define K centroids for K clusters which are generally far away from each other.
- Then group the elements into clusters which are nearer to the centroid of that cluster.
- After this first step, again calculate the new centroid for each cluster based on the elements of that cluster.
- Follow the same method, and group the elements based on new centroid.
- In every step, the centroid changes and elements move from one cluster to another.
Do the same process till no element is moving from one cluster to another.

**Algorithm**:k: number of clusters

n :sample features vectors $x_1, x_2………x_n$

$m_i$: the mean of the vectors in cluster i

Assume k<n< p="">

Make initial guesses for the mean m_1 , m_2……..,m_k

Until there is no changes in any mean

Use the estimated means to classify the samples into clusters.

For I from 1 to k

Replace m_i with the mean of all of the samples for cluster i

- End _for

End _until

Suppose the data for clustering – 2,4,10,12,3,20,11,25

- Randomly assign means $m_1$=3 and $m_2$=4
- The number which are close to mean $m_1$=3 are grouped into cluster $k_1$ and numbers which are close to mean $m_2$=4 are grouped into cluster $k_2$
- Again calculate the new mean for new cluster groups
- $k_1$={2,3} , $k_2$= {4,10,12,20,30,11,25} , m_1=2.5, $m_2$=16
- $k_1$={2,3,4} , $k_2$= {10,12,20,30,11,25}, $m_1$=3, $m_2$=18
- $k_1$={2,3,4,10}, k_2= {12,20,30,11,25}, $m_1$=4.75, $m_2$=19.6
- $k_1$={2,3,4,10,11,12}, $k_2$= {20,30,25}, $m_1=7, m_2=25$
- $k_1$={2,3,4,10,11,12}, $k_2$= {20,30,25}
- Stop as clusters with these means in step 7 and 8 are same.
- So the final answer is $k_1={2,3,4,10,11,12}, k_2= {20,30,25}$

Please log in to add an answer.