written 5.2 years ago by | modified 10 months ago by |

**Subject:** Data Mining And Business Intelligence

**Topic:** Clustering

**Difficulty:** High

**1 Answer**

0

1.3kviews

Use K-means to cluster the following data set into 3 clusters.

written 5.2 years ago by | modified 10 months ago by |

**Subject:** Data Mining And Business Intelligence

**Topic:** Clustering

**Difficulty:** High

ADD COMMENT
EDIT

0

12views

written 10 months ago by | • modified 10 months ago |

**The Given Dataset =** *{(20, 9), (21, 9), (15, 7), (22, 17), (20, 8), (25, 12), (26, 14), (18, 9)}*

**Number of Clusters = K = 3**

*Step 1 -*

- Randomly select any 3 data points as cluster centers because the number of clusters is 3.
- Select cluster centers in such a way that they are as farther as possible from each other.
- So here we choose 3 random initial cluster centers as

$$C1 = (15, 7), C2 = (22, 17), C3 = (25, 12)$$

*Step 2 -*

- Calculate the distance between each data point and each cluster center.
- The distance may be calculated either by using the
**Distance Function**or by using the**Euclidean distance**formula. - Here, we calculate the distance by using the
between two points*Distance Function*and*a = (x1, y1)*as follows:*b = (x2, y2)*

$$P(a,b)=|x2–x1|+|y2–y1|$$

*Step 3 -*

- Calculate the distance between each data point and each cluster center.
- A data point is assigned to that cluster whose center is nearest to that data point.

Given Points |
Distance from center (15, 7) of Cluster - 1 | Distance from center (22, 17) of Cluster - 2 | Distance from center (25, 12) of Cluster - 3 | Point belongs to Cluster |
---|---|---|---|---|

(20, 9) |
|20 - 15| + |9 - 7| = 7 |
|22 - 20| + |17 - 9| = 10 | |25 - 20| + |12 - 9| = 8 | C1 |

(21, 9) |
|21 - 15| + |9 - 7| = 8 | |22 - 21| + |17 - 9| = 9 | |25 - 21| + |12 - 9| = 7 |
C3 |

(15, 7) |
|15 - 15| + |7 - 7| = 0 |
|22 - 15| + |17 - 7| = 17 | |25 - 15| + |12 - 7| = 15 | C1 |

(22, 17) |
|22 - 15| + |17 - 7| = 17 | |22 - 22| + |17 - 17| = 0 |
|25 - 22| + |17 - 12| = 8 | C2 |

(20, 8) |
|20 - 15| + |8 - 7| = 6 |
|22 - 20| + |17 - 8| = 11 | |25 - 20| + |12 - 8| = 9 | C1 |

(25, 12) |
|25 - 15| + |12 - 7| = 15 | |25 - 22| + |17 - 12| = 8 | |25 - 25| + |12 - 12| = 0 |
C3 |

(26, 14) |
|26 - 15| + |14 - 7| = 18 | |26 - 22| + |17 - 14| = 7 | |26 - 15| + |14 - 12| = 3 |
C3 |

(18, 9) |
|18 - 15| + |9 - 7| = 5 |
|22 - 18| + |17 - 9| = 12 | |25 - 18| + |12 - 9| = 10 | C1 |

- Therefore, three clusters K1, K2, and K3 can be formed as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9)\} $$ $$ K2 = \{(22, 17)\} $$ $$ K3 = \{(21,9), (25, 12), (26, 14)\} $$

*Step 4 -*

- Re-compute the center of newly formed clusters.
- The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

**1] Center for Cluster-1 (K1) -**

$$ X = \frac {(20 + 15 + 20 + 18)}{4} = 18.25 $$

$$ Y = \frac {(9 + 7 + 8 + 9)}{4} = 8.25 $$

Therefore, new **Clsuter Center C1 = (18.25, 8.25)**

**2] Center for Cluster-2 (K2) -**

Cluster-2 (K2) has only 1 data point.

Therefore, new **Clsuter Center C2 = (22, 17)**

**3] Center for Cluster-3 (K3) -**

$$ X = \frac {(21 + 25 + 26)}{3} = 24 $$

$$ Y = \frac {(9 + 12 + 14)}{3} = 11.67 $$

Therefore, new **Clsuter Center C3 = (24, 11.67)**

*This is the completion of Iteration 1.*

*Again Repeat steps 3 and 4 same as performed in Iteration - 1.*

Therefore,

Given Points |
Distance from center (18.25, 8.25) of Cluster - 1 | Distance from center (22, 17) of Cluster - 2 | Distance from center (24, 11.67) of Cluster - 3 | Point belongs to Cluster |
---|---|---|---|---|

(20, 9) |
|20 - 18.25| + |9 - 8.25| = 2.5 |
|22 - 20| + |17 - 9| = 10 | |24 - 20| + |11.67 - 9| = 6.67 | C1 |

(21, 9) |
|21 - 18.25| + |9 - 8.25| = 3.5 |
|22 - 21| + |17 - 9| = 9 | |24 - 21| + |11.67 - 9| = 5.67 | C1 |

(15, 7) |
|18.25 - 15| + |8.25 - 7| = 4.5 |
|22 - 15| + |17 - 7| = 17 | |24 - 15| + |11.67 - 7| = 13.67 | C1 |

(22, 17) |
|22 - 18.25| + |17 - 8.25| = 12.5 | |22 - 22| + |17 - 17| = 0 |
|24 - 22| + |17 - 11.67| = 7.33 | C2 |

(20, 8) |
|20 - 18.25| + |8.25 - 8| = 2 |
|22 - 20| + |17 - 8| = 11 | |24 - 20| + |11.67 - 8| = 7.67 | C1 |

(25, 12) |
|25 - 18.25| + |12 - 8.25| = 10.5 | |25 - 22| + |17 - 12| = 8 | |25 - 24| + |12 - 11.67| = 1.33 |
C3 |

(26, 14) |
|26 - 18.25| + |14 - 8.25| = 13.5 | |26 - 22| + |17 - 14| = 7 | |26 - 24| + |14 - 11.67| = 4.44 |
C3 |

(18, 9) |
|18.25 - 18| + |9 - 8.25| = 1 |
|22 - 18| + |17 - 9| = 12 | |24 - 18| + |11.67 - 9| = 8.67 | C1 |

- Therefore, three clusters K1, K2, and K3 can be formed as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9), (21,9)\} $$ $$ K2 = \{(22, 17)\} $$ $$ K3 = \{(25, 12), (26, 14)\} $$

- Re-compute the center of newly formed clusters.
- The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

**1] Center for Cluster-1 (K1) -**

$$ X = \frac {(20 + 15 + 20 + 18 + 21)}{5} = 18.8 $$

$$ Y = \frac {(9 + 7 + 8 + 9 + 9)}{5} = 8.4 $$

Therefore, new **Clsuter Center C1 = (18.8, 8.4)**

**2] Center for Cluster-2 (K2) -**

Cluster-2 (K2) has only 1 data point.

Therefore, new **Clsuter Center C2 = (22, 17)**

**3] Center for Cluster-3 (K3) -**

$$ X = \frac {(25 + 26)}{2} = 25.5 $$

$$ Y = \frac {(12 + 14)}{2} = 13 $$

Therefore, new **Clsuter Center C3 = (25.5, 13)**

*This is the completion of Iteration 2.*

*Again Repeat steps 3 and 4 same as performed in Iteration - 1.*

Therefore,

Given Points |
Distance from center (18.8, 8.4) of Cluster - 1 | Distance from center (22, 17) of Cluster - 2 | Distance from center (25.5, 13) of Cluster - 3 | Point belongs to Cluster |
---|---|---|---|---|

(20, 9) |
|20 - 18.8| + |9 - 8.4| = 1.8 |
|22 - 20| + |17 - 9| = 10 | |25.5 - 20| + |13 - 9| = 9.5 | C1 |

(21, 9) |
|21 - 18.8| + |9 - 8.4| = 2.8 |
|22 - 21| + |17 - 9| = 9 | |25.5 - 21| + |13 - 9| = 8.5 | C1 |

(15, 7) |
|18.8 - 15| + |8.4 - 7| = 5.2 |
|22 - 15| + |17 - 7| = 17 | |25.5 - 15| + |13 - 7| = 16.5 | C1 |

(22, 17) |
|22 - 18.8| + |17 - 8.4| = 11.8 | |22 - 22| + |17 - 17| = 0 |
|25.5 - 22| + |17 - 13| = 7.5 | C2 |

(20, 8) |
|20 - 18.8| + |8.4 - 8| = 1.6 |
|22 - 20| + |17 - 8| = 11 | |25.5 - 20| + |13 - 8| = 10.5 | C1 |

(25, 12) |
|25 - 18.8| + |12 - 8.4| = 9.8 | |25 - 22| + |17 - 12| = 8 | |25.5 - 25| + |13 - 12| = 1.5 |
C3 |

(26, 14) |
|26 - 18.8| + |14 - 8.4| = 12.8 | |26 - 22| + |17 - 14| = 7 | |26 - 25.5| + |14 - 13| = 1.5 |
C3 |

(18, 9) |
|18.8 - 18| + |9 - 8.4| = 1.4 |
|22 - 18| + |17 - 9| = 12 | |25.5 - 18| + |13 - 9| = 11.5 | C1 |

- Therefore, three clusters K1, K2, and K3 can be formed as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9), (21,9)\} $$ $$ K2 = \{(22, 17)\} $$ $$ K3 = \{(25, 12), (26, 14)\} $$

- Re-compute the center of newly formed clusters.
- The center of a cluster is computed by taking the mean of all the data points contained in that cluster.

**1] Center for Cluster-1 (K1) -**

$$ X = \frac {(20 + 15 + 20 + 18 + 21)}{5} = 18.8 $$

$$ Y = \frac {(9 + 7 + 8 + 9 + 9)}{5} = 8.4 $$

Therefore, new **Clsuter Center C1 = (18.8, 8.4)**

**2] Center for Cluster-2 (K2) -**

Cluster-2 (K2) has only 1 data point.

Therefore, new **Clsuter Center C2 = (22, 17)**

**3] Center for Cluster-3 (K3) -**

$$ X = \frac {(25 + 26)}{2} = 25.5 $$

$$ Y = \frac {(12 + 14)}{2} = 13 $$

Therefore, new **Clsuter Center C3 = (25.5, 13)**

*This is the completion of Iteration 3.*

Here we stopped after the 3 - Iterations because

*The Center of newly formed clusters does not change**Data points remain present in the same clusters.*

After 3 - Iterations we get the 3 - Clusters with their Center Points are as follows:

$$ K1 = \{(20, 9), (15, 7), (20, 8), (18, 9), (21,9)\}\ with\ Center\ C1 = (18.8, 8.4) $$ $$ K2 = \{(22, 17)\}\ with\ Center\ C2 = (22, 17) $$ $$ K3 = \{(25, 12), (26, 14)\} \ with\ Center\ C3 = (25.5, 13) $$

*Important Point -*

*Here, we see Cluster K2 holds only one data point.**Such, 1 data point clusters occur quite frequently in k-means because of dirty data.**That indicates K-means is highly sensitive to noise.**K-means minimizes squared errors by assigning outlier points to their own cluster and gives "optimal" results for the squared errors objective.**So this is often the correct result.**This makes k-means sensitive to the noises.*

ADD COMMENT
EDIT

Please log in to add an answer.