Explain K-means clustering algorithm? Apply K-Means algorithm for the following Data Set = { 15, 15, 16, 19, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}

2.3kviews

written 5.0 years ago by

modified 13 months ago by

K-means Clustering:

In K-means approach the data objects are classified based on their attributes or features into k number of clusters. The number of clusters i.e K is an input given by the user. K-means is one of the simplest unsupervised learning algorithms.

K-means Algorithm:

K- number of clusters

n- sample vectors $x_{1},x_{2},....x_{n}$

$m_{i}$- The mean of vectors in cluster i

Assume k < n
Make initial guesses for the means $m_{1}, m_{2},...m_{k}$
Until there are no changes in any mean
Use the estimated means to classify the samples into clusters.

for i = 1 to k

Replace $m_i$ with the mean of all of the samples for cluster i

end_for

end_until
Following three steps are repeated until convergence:
Iterate till no object moves to a different group:
Find the centroid coordinate.
Find the distance of each object to the centroids
Based on minimum distance group the objects.

K-Means Advantages :

If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering, if we keep k smalls.
K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.

K-Means Disadvantages :

Difficult to predict K-Value.
With global cluster, it didn't work well.
It does not work well with clusters (in the original data) of Different size and Different density.

Numerical

Given:

Dataset -{15, 15, 16, 19, 19, 20, 20, 21, 22, 28, 35, 40, 41, 42, 43, 44, 60, 61, 65}

Solution:

Assume K=2

So, two clusters are formed (Users point of view) i.e

$Cluster 1$ = 15, 16, 19, 20, 21, 22, 28 $\therefore C_1$ = 20 (mean value)

$Cluster 2$ = 35, 40, 41, 42, 43, 44, 60, 61, 65 $\therefore C_2$ = 43 (mean value)

Below table:

Calculate Cluster 1 and Cluster 2 values, by subtracting the datapoint from $C_1$ and $C_2$

i.e (20 - 15) = 5 and (43 - 15) = 28

then choose min value (5) and assign that cluster.

Data point	Cluster 1	Cluster 2	Cluster assign
15	5	28	1
15	5	28	1
16	4	27	1
19	1	24	1
19	1	24	1
20	0	23	1
20	0	23	1
21	1	22	1
22	2	21	1
28	8	15	1
35	15	8	2
40	20	3	2
41	21	2	2
42	22	1	2
43	23	0	2
44	24	1	2
60	40	17	2
61	41	18	2
65	45	22	2

New centroid is calculated by taking average value of clusters.

Old centroid	New centroid
20	19.5
43	47.88

Data point	Cluster 1	Cluster 2	Cluster assign
15	4.5	32.8	1
15	4.5	32.8	1
16	3.5	31.8	1
19	0.5	28.8	1
19	0.5	28.8	1
20	0.5	27.8	1
20	0.5	27.8	1
21	1.5	26.8	1
22	2.5	25.8	1
28	8.5	19.8	1
35	15.5	12.8	2
40	20.5	7.8	2
41	21.5	6.8	2
42	22.5	5.8	2
43	23.5	4.8	2
44	24.5	3.8	2
60	40.5	12.1	2
61	41.5	13.1	2
65	45.5	17.1	2

Old centroid	New centroid
19.5	19.5
47.88	47.88

Since, Old centriod is same as New centroid stop the iteration.

$\therefore$ Final answer is $k_1$ = {15,15,16,19,19,20,20,21,22,28} $k_2$ = {35,40,41,42,43,44,60,61,65}

ADD COMMENT EDIT

Data point	Cluster 1	Cluster 2	Cluster assign
15	5	28	1
15	5	28	1
16	4	27	1
19	1	24	1
19	1	24	1
20	0	23	1
20	0	23	1
21	1	22	1
22	2	21	1
28	8	15	1
35	15	8	2
40	20	3	2
41	21	2	2
42	22	1	2
43	23	0	2
44	24	1	2
60	40	17	2
61	41	18	2
65	45	22	2

Data point	Cluster 1	Cluster 2	Cluster assign
15	5	28	1
15	5	28	1
16	4	27	1
19	1	24	1
19	1	24	1
20	0	23	1
20	0	23	1
21	1	22	1
22	2	21	1
28	8	15	1
35	15	8	2
40	20	3	2
41	21	2	2
42	22	1	2
43	23	0	2
44	24	1	2
60	40	17	2
61	41	18	2
65	45	22	2

Data point	Cluster 1	Cluster 2	Cluster assign
15	5	28	1
15	5	28	1
16	4	27	1
19	1	24	1
19	1	24	1
20	0	23	1
20	0	23	1
21	1	22	1
22	2	21	1
28	8	15	1
35	15	8	2
40	20	3	2
41	21	2	2
42	22	1	2
43	23	0	2
44	24	1	2
60	40	17	2
61	41	18	2
65	45	22	2