0
1.4kviews
Assume that the database D is given by the table below. Follow single link technique to find clusters in D. Use Euclidean distance measure.

Subject: Data Mining And Business Intelligence

Topic: Clustering

Difficulty: High

1
41views

• Single linkage technique comes under Hierarchical Agglomerative Clustering (HAC).
• It follows bottom-up approaches.
• This technique starts with every single data point as a single cluster.
• Then similar clusters are successively merged until all clusters have merged into one cluster and the result is represented in Dendrogram.

Let's solve the given problem using the Euclidean Distance.

$$Euclidean\ Distance [(x,y),(a,b)] = \sqrt{ (x - a)^2 + (y - b)^2}$$

Step 1 - Plot the given dataset in graph format for better understanding.

Step 2 - Create a Distance Matrix by using the Euclidean Distance formula.

$$Distance\ (P1,P2) = \sqrt {(0.40 - 0.22)^2 + (0.53 - 0.38)^2}$$

$$= \sqrt {(0.18)^2 + (0.15)^2}$$

$$= \sqrt {0.0324 + 0.0225}$$

$$= \sqrt {0.0549}$$

$$= 0.23$$

Similarly, find out the distance for the (P1,P3), (P1,P4), (P1,P5),(P1,P6)

Distance (P1,P3) = 0.22

Distance (P1,P4) = 0.37

Distance (P1,P5) = 0.34

Distance (P1,P6) = 0.23

$$Distance\ (P2,P3) = \sqrt {(0.22 - 0.35)^2 + (0.38 - 0.32)^2}$$

$$= \sqrt {(-0.13)^2 + (0.06)^2}$$

$$= \sqrt {0.0169 + 0.0036}$$

$$= \sqrt {0.0205}$$

$$= 0.15$$

Similarly, find out the distance for the (P2,P4), (P2,P5),(P2,P6)

Distance (P2,P4) = 0.20

Distance (P2,P5) = 0.14

Distance (P2,P6) = 0.25

$$Distance\ (P3,P4) = \sqrt {(0.35 - 0.26)^2 + (0.32 - 0.19)^2}$$

$$= \sqrt {(0.09)^2 + (0.13)^2}$$

$$= \sqrt {0.0081 + 0.0169}$$

$$= \sqrt {0.025}$$

$$= 0.15$$

Similarly, find out the distance for the (P3,P5),(P3,P6)

Distance (P3,P5) = 0.28

Distance (P3,P6) = 0.11

$$Distance\ (P4,P5) = \sqrt {(0.26 - 0.08)^2 + (0.19 - 0.41)^2}$$

$$= \sqrt {(0.18)^2 + (-0.22)^2}$$

$$= \sqrt {0.0324 + 0.0484}$$

$$= \sqrt {0.0808}$$

$$= 0.29$$

Similarly, find out the distance for the (P4,P6)

Distance (P4,P6) = 0.22

$$Distance\ (P5,P6) = \sqrt {(0.08 - 0.45)^2 + (0.41 - 0.30)^2}$$

$$= \sqrt {(- 0.37)^2 + (0.11)^2}$$

$$= \sqrt {0.1369 + 0.0121}$$

$$= \sqrt {0.149}$$

$$= 0.39$$

Based on the above-computed values Distance Matrix can be formed as follows:

. P1 P2 P3 P4 P5 P6
P1 0
P2 0.23 0
P3 0.22 0.15 0
P4 0.37 0.20 0.15 0
P5 0.34 0.14 0.28 0.29 0
P6 0.23 0.25 0.11 0.22 0.39 0

Step 3 - Find the minimum value element from the distance matrix.

The minimum value element is (P3,P6) with a value is 0.11 - 1st Cluster (P3,P6)

Step 4 - Now, Recalculate or update the distance matrix for the cluster (P3,P6)

$$Min[dist(point 1, point 2)]$$ $$Min[dist((P3,P6),P1)]$$ $$Min[dist((P3,P1),(P6,P1))]$$ $$Min[dist(0.22,0.23)]$$ $$= 0.22$$

Similarly, perform for P2, P4, and P5.

Therefore, the updated Distance Matrix looks as follows:

. P1 P2 P3,P6 P4 P5
P1 0
P2 0.23 0
P3,P6 0.22 0.15 0
P4 0.37 0.20 0.15 0
P5 0.34 0.14 0.28 0.29 0

Step 5 - Repeat steps 3 and 4.

The minimum value element is (P2,P5) with a value is 0.14 - 2nd Cluster (P2,P5)

Recalculate or update the distance matrix for the cluster (P2,P5)

$$Min[dist(point 1, point 2)]$$ $$Min[dist((P2,P5),P1)]$$ $$Min[dist((P2,P1),(P5,P1))]$$ $$Min[dist(0.23,0.34)]$$ $$= 0.23$$

Similarly, perform for (P3,P6), and P4.

Therefore, the updated Distance Matrix looks as follows:

. P1 P2,P5 P3,P6 P4
P1 0
P2,P5 0.23 0
P3,P6 0.22 0.15 0
P4 0.37 0.20 0.15 0

Step 6 - Repeat steps 3 & 4.

The minimum value element is (P2,P5,P3,P6) with a value is 0.15 - 3rd Cluster (P2,P5,P3,P6)

Here 2 values are the same then the first element is chosen as the minimum value element

Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6)

$$Min[dist(point 1, point 2)]$$ $$Min[dist((P2,P5,P3,P6),P1)]$$ $$Min[dist((P2,P5),P1),((P3,P6),P1))]$$ $$Min[dist(0.23,0.22)]$$ $$= 0.22$$

Similarly, perform for P4.

Therefore, the updated Distance Matrix looks as follows:

. P1 P2,P5,P3,P6 P4
P1 0
P2,P5,P3,P6 0.22 0
P4 0.37 0.15 0

Step 7 - Repeat steps 3 & 4.

The minimum value element is (P2,P5,P3,P6,P4) with a value is 0.15 - 4th Cluster (P2,P5,P3,P6,P4)

Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6,P4)

$$Min[dist(point 1, point 2)]$$ $$Min[dist((P2,P5,P3,P6,P4),P1)]$$ $$Min[dist((P2,P5,P3,P6),P1),(P4,P1)]$$ $$Min[dist(0.22,0.37)]$$ $$= 0.22$$

Therefore, the updated Distance Matrix looks as follows:

. P1 P2,P5,P3,P6,P4
P1 0
P2,P5,P3,P6,P4 0.22 0

Step 8 - Repeat the step 3 & 4.

The minimum value element is (P2,P5,P3,P6,P4,P1) with a value is 0.22 - 5th Cluster (P2,P5,P3,P6,P4,P1)

Recalculate or update the distance matrix for the cluster (P2,P5,P3,P6,P4,P1)

In this step, only 1 value is remaining so it is by default cluster.

Step 9 - Finally, draw the Dendrogram.