0
4.2kviews
Explain K-means clustering algorithm? Apply K-Means algorithm for the following Data Set = { 15, 15, 16, 19, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}
1
145views

K-means Clustering:

In K-means approach the data objects are classified based on their attributes or features into k number of clusters. The number of clusters i.e K is an input given by the user. K-means is one of the simplest unsupervised learning algorithms.

K-means Algorithm:

K- number of clusters

n- sample vectors $x_{1},x_{2},....x_{n}$

$m_{i}$- The mean of vectors in cluster i

1. Assume k < n
2. Make initial guesses for the means $m_{1}, m_{2},...m_{k}$
3. Until there are no changes in any mean
4. Use the estimated means to classify the samples into clusters.

for i = 1 to k

Replace $m_i$ with the mean of all of the samples for cluster i

end_for

end_until

5. Following three steps are repeated until convergence:

6. Iterate till no object moves to a different group:
7. Find the centroid coordinate.
8. Find the distance of each object to the centroids
9. Based on minimum distance group the objects.

1. If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering, if we keep k smalls.

2. K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.

1. Difficult to predict K-Value.
2. With global cluster, it didn't work well.
3. It does not work well with clusters (in the original data) of Different size and Different density.

Numerical

Given:

Dataset -{15, 15, 16, 19, 19, 20, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}

Solution:

Assume K=2

So, two clusters are formed (Users point of view) i.e

$Cluster 1$ = 15, 16, 19, 20, 21, 22, 28 $\therefore C_1$ = 20 (mean value)

$Cluster 2$ = 35, 40, 41, 42, 43, 44, 60, 61, 65 $\therefore C_2$ = 43 (mean value)

Below table:

Calculate Cluster 1 and Cluster 2 values, by subtracting the datapoint from $C_1$ and $C_2$

i.e (20 - 15) = 5 and (43 - 15) = 28

then choose min value (5) and assign that cluster.

Data point Cluster 1 Cluster 2 Cluster assign
15 5 28 1
15 5 28 1
16 4 27 1
19 1 24 1
19 1 24 1
20 0 23 1
20 0 23 1
21 1 22 1
22 2 21 1
28 8 15 1
35 15 8 2
40 20 3 2
41 21 2 2
42 22 1 2
43 23 0 2
44 24 1 2
60 40 17 2
61 41 18 2
65 45 22 2

New centroid is calculated by taking average value of clusters.

Old centroid New centroid
20 19.5
43 47.88
Data point Cluster 1 Cluster 2 Cluster assign
15 4.5 32.8 1
15 4.5 32.8 1
16 3.5 31.8 1
19 0.5 28.8 1
19 0.5 28.8 1
20 0.5 27.8 1
20 0.5 27.8 1
21 1.5 26.8 1
22 2.5 25.8 1
28 8.5 19.8 1
35 15.5 12.8 2
40 20.5 7.8 2
41 21.5 6.8 2
42 22.5 5.8 2
43 23.5 4.8 2
44 24.5 3.8 2
60 40.5 12.1 2
61 41.5 13.1 2
65 45.5 17.1 2
Old centroid New centroid
19.5 19.5
47.88 47.88

Since, Old centriod is same as New centroid stop the iteration.

$\therefore$ Final answer is $k_1$ = {15,15,16,19,19,20,20,21,22,28} $k_2$ = {35,40,41,42,43,44,60,61,65}