what is Concept Hierarchy? How Concept Hierarchy is generated for Numerical and categorical data?
1 Answer

Concept Hierarchy reduce the data by collecting and replacing low level concepts (such as numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).

Concept hierarchy generation for numeric data is as follows:

  • Binning (see sections before)
  • Histogram analysis (see sections before)
  • Clustering analysis (see sections before)
  • Entropy-based discretization
  • Segmentation by natural partitioning

  • Binning

    • In binning, first sort data and partition into (equi-depth) bins then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.
  • Histogram analysis

    • Histogram is a popular data reduction technique
    • Divide data into buckets and store average (sum) for each bucket
    • Can be constructed optimally in one dimension using dynamic programming
    • Related to quantization problems.
  • Clustering analysis

    • Partition data set into clusters, and one can store cluster representation only
    • Can be very effective if data is clustered but not if data is “smeared”
    • Can have hierarchical clustering and be stored in multi-dimensional index tree structures
  • Entropy-based discretization

    • Given a set of samples S, if S is partitioned into two intervals S1 and S2 using boundary T, the entropy after partitioning is enter image description here

      – S1 & S2 correspond to samples in S satisfying conditions A<v &amp;="" a="">=v

    • The boundary that minimizes the entropy function over all possible boundaries is selected as a binary discretization.

    • The process is recursively applied to partitions obtained until some stopping criterion is met, e.g., Ent (S)- E(T,S)>δ
    • Experiments show that it may reduce data size and improve classification accuracy
  • Segmentation by natural partitioning

    • 3-4-5 rule can be used to segment numeric data into relatively uniform, “natural” intervals.
    • If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit, partition the range into 3 equi-width intervals
    • If it covers 2, 4, or 8 distinct values at the most significant digit, partition the range into 4 intervals
    • If it covers 1, 5, or 10 distinct values at the most significant digit, partition the range into 5 intervals

Concept hierarchy generation for categorical data is as follows:

  • Specification of a set of attributes, but not of their partial ordering

    • Auto generate the attribute ordering based upon observation that attribute defining a high level concept has a smaller # of distinct values than an attribute defining a lower level concept
    • Example : country (15), state_or_province (365), city (3567), street (674,339)
  • Specification of only a partial set of attributes

    • Try and parse database schema to determine complete hierarchy
Please log in to add an answer.

Continue reading...

The best way to discover useful content is by searching it.