Why is tree pruning useful in decision tree induction.

1.7kviews

written 10.3 years ago by

teamques10 ★ 70k

When decision trees are built, many of the branches may reflect noise or outliers in the training data.

Tree pruning methods address this problem of overfittingthe data.

Tree pruning attempts to identify and remove such branches, with the goal of improving classification accuracy on unseen data.

Decision trees can suffer from repetition and replication, making them overwhelming to interpret.

Repetition occurs when an attribute is repeatedly tested along a given branch of the tree

In replication, duplicate subtrees exist within the tree.

These situations can impede the accuracy and comprehensibility of a decision tree.

Pruned trees

• These tend to be smaller and less complex and, thus, easier to comprehend.

• They are usually faster and better at correctly classifying independent test data than unpruned trees.

• Pruned trees tend to be more compact than their unpruned counterparts

There are two common approaches to tree pruning:

prepruning :

 In the pre-pruning approach, a tree is “pruned” by halting its construction early (e.g. by deciding not to further split or partition the subset of training tuples at a given node).

 When constructing a tree, measures such as statistical significance, information gain, Gini index, and so on can be used to assess the goodness of a split.

 If partitioning the tuples at a node would result in a split that falls below a pre specified threshold, then further partitioning of the given subset is halted.

 There are difficulties, however, in choosing an appropriate threshold.

 High thresholds could result in oversimplified trees, whereas low thresholds could result in very little simplification.

post pruning.

 The second and more common approach is post pruning, which removes subtrees from a “fully grown” tree.

 A subtree at a given node is pruned by removing its branches and replacing it with a leaf.

 The leaf is labeled with the most frequent class among the subtree being replaced.

 The cost complexity pruning algorithm used in CART is an example of the post pruning approach.

 The basic idea is that the simplest solution is preferred.

 Unlike cost complexity, pruning does not require an independent set of tuples.

 Post pruning leads to a more reliable tree.

ADD COMMENT EDIT