0
14kviews
Explain various attribute selection measures
1
2.0kviews

## Attribute Selection Measures

• The measure of attribute selection is a heuristic in nature for selecting the splitting criterion that “best” separates a given data partition, D, of class-labeled training tuples into individual classes.
• It determines how the tuples at a given node are to be split.
• The attribute selection measure provides a ranking for each attribute describing the given training tuples.
• The three methods are used for attribute selection as follows:

• Information Gain
• Gain Ratio
• Gini Index

## Information Gain

• The Information gain is used to select the splitting attribute in each node in the decision tree.
• It follows the method of entropy while aiming at reducing the level of entropy, starting from the root node to the leaf nodes.
• The attribute with the highest information gain is chosen as the splitting attribute for the current node.
• It is biased towards the multi-valued attribute.
• The information gained on attribute A is the mutual information that exists between the attribute Class and attribute A.
• It is defined as follows:

$$Infromation\ Gain\ (A) = H(Class) - H(Class | A)$$

## Gain Ratio

• It is an unbalanced split.
• In this one partition is much smaller than the other partition.
• The gain ratio on attribute A is the ratio of the information gained on A over the expected information of A, normalizing uncertainty across attributes.
• It is defined as follows:

$$Gain\ Ratio\ (A) = \frac {H(Class) - H(Class | A)}{ H(A)}$$

## Gini Index

• The Gini index measures uses binary split for each attribute.
• In this partitions are equal.
• The attribute with the minimum Gini index is selected as the splitting attribute.
• It is also biased toward the multi-valued attribute.
• It can not manage a large number of classes.
• The Gini function measures the impurity of an attribute with respect to classes.
• The impurity function is defined as:

$$Gini\ (Class) = 1 - \sum p_i^2$$

• The Gini index of A defined below, is the difference between the impurity of Class and the average impurity of A regarding the classes, representing a reduction of impurity over the choice of attribute A.
• The Gini index is defined as follows:

$$Gini\ lndex\ (A) = Gini\ (Class) - \sum_{j = 0}^m P(c_j )\ Gini\ (A = c_j )$$