Explain various attribute selection measures
1 Answer

Attribute Selection Measures

  • The measure of attribute selection is a heuristic in nature for selecting the splitting criterion that “best” separates a given data partition, D, of class-labeled training tuples into individual classes.
  • It determines how the tuples at a given node are to be split.
  • The attribute selection measure provides a ranking for each attribute describing the given training tuples.
  • The three methods are used for attribute selection as follows:

    • Information Gain
    • Gain Ratio
    • Gini Index

Information Gain

  • The Information gain is used to select the splitting attribute in each node in the decision tree.
  • It follows the method of entropy while aiming at reducing the level of entropy, starting from the root node to the leaf nodes.
  • The attribute with the highest information gain is chosen as the splitting attribute for the current node.
  • It is biased towards the multi-valued attribute.
  • The information gained on attribute A is the mutual information that exists between the attribute Class and attribute A.
  • It is defined as follows:

$$ Infromation\ Gain\ (A) = H(Class) - H(Class | A) $$

Gain Ratio

  • It is an unbalanced split.
  • In this one partition is much smaller than the other partition.
  • The gain ratio on attribute A is the ratio of the information gained on A over the expected information of A, normalizing uncertainty across attributes.
  • It is defined as follows:

$$ Gain\ Ratio\ (A) = \frac {H(Class) - H(Class | A)}{ H(A)} $$

Gini Index

  • The Gini index measures uses binary split for each attribute.
  • In this partitions are equal.
  • The attribute with the minimum Gini index is selected as the splitting attribute.
  • It is also biased toward the multi-valued attribute.
  • It can not manage a large number of classes.
  • The Gini function measures the impurity of an attribute with respect to classes.
  • The impurity function is defined as:

$$ Gini\ (Class) = 1 - \sum p_i^2 $$

  • The Gini index of A defined below, is the difference between the impurity of Class and the average impurity of A regarding the classes, representing a reduction of impurity over the choice of attribute A.
  • The Gini index is defined as follows:

$$ Gini\ lndex\ (A) = Gini\ (Class) - \sum_{j = 0}^m P(c_j )\ Gini\ (A = c_j ) $$

Please log in to add an answer.