1
2.2kviews
What is hierarchical clustering? Explain any two techniques for finding distance between the clusters in hierarchical clustering
1 Answer
0
171views

Clustering

  • Clustering is one of the unsupervised Machine Learning mechanisms that search for similarity and relationship patterns among data samples and then creates a cluster of those samples based on similarity measurements.
  • This clustering mechanism is further classified into various types, such as Density-based, Hierarchical-based, Partitioning, and Grid.

Hierarchical Clustering

  • This type of clustering groups together the unlabeled data points having similar characteristics.
  • Hierarchical clustering treats every data point as a separate cluster.
  • Then, it repeatedly executes the subsequent steps like, Identify the two clusters which can be closest together, and merging the two maximum comparable clusters.
  • This process needs to continue until all the clusters are merged.
  • Hence, this method creates a hierarchical decomposition of the given set of data objects.
  • Based on this how the hierarchical decomposition is formed this clustering is further classified into two types,
    • Agglomerative Approach
    • Divisive Approach
  • Hierarchical clustering typically works by sequentially merging similar clusters. This is known as agglomerative hierarchical clustering.
  • In theory, it can also be done by initially grouping all the observations into one cluster, and then successively splitting these clusters. This is known as divisive hierarchical clustering.
  • Divisive clustering is rarely done in practice.

Agglomerative Approach

  • This approach is also known as the Bottom-Up Approach.
  • This approach starts with each object forming a separate group.
  • It keeps on merging the objects or groups that are close to one another.
  • It keeps on doing so until all of the groups are merged into one or until the termination condition holds.
  • Algorithm for Agglomerative Hierarchical Clustering is:
    • Step 1 - Calculate the similarity of one cluster with all the other clusters. Calculation of Proximity Matrix.
    • Step 2 - Consider every data point as an individual cluster.
    • Step 3 - Merge the clusters which are highly similar or close to each other.
    • Step 4 - Recalculate the proximity matrix for each cluster.
    • Step 5 - Repeat Steps 3 and 4 until only a single cluster remains.

Divisive Approach

  • This approach is also known as the Top-Down Approach.
  • This approach starts with all of the objects in the same cluster.
  • In the continuous iteration, a cluster is split up into smaller clusters.
  • It is down until each object in one cluster or the termination condition holds.
  • This method is rigid, i.e., once a merging or splitting is done, it can never be undone.
Please log in to add an answer.