0
Clearly explain the working of the DB_SCAN algorithm using appropriate diagrams.

Mumbai University > Information Technology > Sem6 > Data Mining and Business Intelligence

Marks: 10M

Year: May 2015

0

DBSCAN: Density Based Clustering of Applications with noise

• DBSCAN is a density-based algorithm.
• DBSCAN requires two parameters: epsilon (Eps) and minimum points (MinPts).It starts with an arbitrary starting point that has not been visited .It then finds all the neighbour points within distance Eps of the starting point.
• If the number of neighbours is greater than or equal to MinPts, a cluster is formed. The starting point and its neighbours are added to this cluster and the starting point is marked as visited. The algorithm then repeats the evaluation process for all the neighbours recursively.
• If the number of neighbours is less than MinPts, the point is marked as noise.

• If a cluster is fully expanded (all points within reach are visited) then the algorithm proceeds to iterate through the remaining unvisited points in the dataset.

• Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters

• Basic concept:

For any cluster we have:

• A central point (p) i.e. core point

• A distance from the core point(Eps)

• Minimum number of points within the specified distance (MinPts)

DBSCAN Algorithm

1. Create a graph whose nodes are the points to be clustered
2. For each core-point c create an edge from c to every point p in the -neighborhood of c
3. Set N to the nodes of the graph;
4. If N does not contain any core points terminate
5. Pick a core point c in N
6. Let X be the set of nodes that can be reached from c by going forward;

a. create a cluster containing X$\cup${c}

b. N=N/(X $\cup$ {c})

7. Continue with step 4

• MinPts: Minimum number of points in any cluster
• $\epsilon$ : For each point in cluster there must be another point in it less than this distance away.

• -neighborhood: Points within  distance of appoint

• N $\epsilon$(p) :{q belongs to D |dist(p,q) <=$\epsilon$)
• Core point: $\epsilon$- neighborhood dense enough (MinPts)
• Conditions: p belongs to N $\epsilon$(q)

|N$\epsilon$(q)| > = MinPts