This lesson first reviews the topics learned in the previous lesson about how to trai a decision tree using scikit-learn, and handle a decision tree model not generalizing well due to overfitting of the data.
In this lesson, we learn about how to best split a decision tree and different classification criteria that can be used to split a tree. We dive deep in an example, splitting trees with misclassification
criteria. Additionally, different stopping criteria to break the iterative tree split are discussed.
-
Structure of a decision tree: A decision tree is a data structure composed of nodes (which contain conditions) and branches (which represent the values for a condition: True or False). The tree starts with a root node, which is the parent of two other nodes, and each of these nodes can also be the parent of others. At the last level of the tree, there are terminal nodes, also called leaves.
-
Depth of a decision tree: The depth of a tree is the number of levels it has, or simply the length of the longest path from the root node to a leaf node.
-
Rules & Conditions, Thresholds: The learning algorithm for a decision tree involves determining the best conditions to split the data at each node in order to achieve the best possible classifier. When there are many features, the algorithm considers each feature with its optimized threshold to determine the best feature for splitting at a particular node. In essence, at each node, the algorithm evaluates all possible thresholds for every feature and calculates the resulting misclassification rate. It then selects the condition (feature and threshold) that yields the lowest impurity.
-
Misclassification rate: After each split, the goal is to divide the data into two sets that are as pure as possible. This means that the data within each set should belong predominantly to either one class, or the other. Another way to describe this is to aim for the lowest possible misclassification rate (impurity). The misclassification rate is a weighted average of the error rates obtained after splitting the data into two sets. The predicted class for each set is determined by the majority class present in this set.
-
Impurity criteria: Common misclassification rate measurements are GINI Impurity and Entropy. It is also possible to use MSE for regression problems.
-
Decision trees can be used to solve regression problems: While we focused on decision tree classifiers, it's important to note that decision trees can also be applied to regression problems using decision tree regressors.
-
Stopping Criteria: The process of recursively splitting the data at each child node eventually stops based on certain stopping criteria. These criteria prevent the model from overfitting and include:
- The group is already pure: 0% impurity.
- The maximum depth has been reached.
- The group is smaller than the minimum size set for groups.
- The maximum number of leaves/terminal nodes has been reached.
- At a node, find the best split.
- Stop if max_depth is reached.
- For each child node, if the node is sufficiently large and not pure, repeat the process from the beginning.
Add notes from the video (PRs are welcome)
The notes are written by the community. If you see an error here, please create a PR with a fix. |