The Misclassification Cost vs Number of Terminal Nodes Plot displays the
misclassification cost for each tree in the sequence that produces the optimal
tree. By default, the initial optimal tree is the smallest tree with a
misclassification cost within one standard error of the tree that minimizes the
misclassification cost. When the analysis uses cross-validation or a test data
set, the misclassification cost is from the validation sample. The
misclassification costs for the validation sample typically level off and
eventually increase as the tree grows larger.
Select an Alternative Tree
to open an interactive plot that includes a table of model summary statistics.
Use the plot to investigate alternative trees with similar performance.
Typically, you select an alternative tree for one of the following two reasons:
The optimal tree is part of a
pattern when the misclassification costs decrease. One or more trees that have
a few more nodes are part of the same pattern. Typically, you want to make
predictions from a tree with as much prediction accuracy as possible. If the
tree is simple enough, you can also use it to understand how each predictor
variable affects the response values.
The optimal tree is part of a
pattern when the misclassification costs are relatively flat. One or more trees
with similar model summary statistics have much fewer nodes than the optimal
tree. Typically, a tree with fewer terminal nodes gives a clearer picture of
how each predictor variable affects the response values. A smaller tree also
makes it easier to identify a few target groups for further studies. If the
difference in prediction accuracy for a smaller tree is negligible, you can
also use the smaller tree to evaluate the relationships between the response
and the predictor variables.
Step 2: Investigate the purest terminal nodes on the tree diagram
After you select a tree, investigate the purest terminal nodes on the diagram. Blue represents the event level, and Red represents the nonevent level.
You can right-click the tree diagram to show the Node Split View of the tree. This view is helpful when you have a large tree and want to see only the variables that split the nodes.
Nodes continue to split until the terminal nodes cannot be split into further groupings. The nodes that are
mostly blue indicate a strong proportion of the event level. The nodes that are
mostly red indicate a strong proportion of the nonevent level.
At the root node, the Yes event has 139 cases, and the No event has 164 cases. The root node is split using the variable, THAL. When THAL
= Normal, go to the left node (Node 2). When THAL = Fixed or Reversible, go to the right node (Node 5).
Node 2: THAL was Normal for 167 cases. Of the 167 cases, 38 or 22.8% are Yes, and 129 or 77.2% are No.
Node 5: THAL was Fixed or Reversible for 136 cases. Of the 136 cases, 101 or 74.3% are Yes, and 35 or 25.7% are No.
The next splitter for both the left child node and the right child node is
Chest Pain Type, where pain is rated as 1, 2, 3, or 4. Node 2 is the parent to Terminal Node 1, and Node 5 is the parent to Terminal Node 7.
The root node has 45.9% of the Yes event and 54.1% of the No event. The following terminal nodes are the most pure and show good separation of cases:
Terminal Node 1: For 100 cases, THAL was Normal, and Chest Pain was 2 or 3. Of the 100 cases, 9 or 9% are Yes, and 91 or 91% are No.
Terminal Node 7: For 90 cases, THAL was Fixed or Reversible, and Chest Pain was 4. Of the 90 cases, 80 or 88.9% are Yes, and 10 or 11.1% are No.
The ranking of terminal nodes from most pure to least pure are: 1, 7, 2, 3, 6, 4, and 5.
Step 3: Determine the important variables
Use the relative variable importance chart to determine which predictors are the most important variables to the tree.
Important variables are a primary or surrogate splitters in the tree. The variable with the highest improvement score is set as the most important variable, and the other variables are ranked accordingly. Relative variable importance standardizes the
importance values for ease of interpretation. Relative importance is defined as the percent
improvement with respect to the most important
Relative variable importance values range from 0% to 100%. The most important variable always has a relative
importance of 100%. If a variable is not in the tree, that variable is not important.
Step 4: Evaluate the predictive power of your tree
The most accurate tree is the one with the lowest misclassification cost. Sometimes, simpler trees with slightly higher misclassification costs work just as well. You can use the Misclassification Cost vs. Terminal Nodes Plot to identify alternate trees.
The Receiver Operating Characteristic (ROC) Curve shows how well a tree classifies the data. The ROC curve plots the true positive rate on the y-axis and the false positive rate on the x-axis. The true positive rate is also known as power. The false positive rate is also known as Type I error.
When a classification tree can perfectly separate categories in the response variable, then the area under the ROC curve is 1, which is the best possible classification model. Alternatively, if a classification tree cannot distinguish categories and makes assignments completely randomly, then the area under the ROC curve is 0.5.
When you use a validation technique to build the tree, Minitab provides information about the performance of the tree on the training and validation (test) data. When the curves are close together, you can be more confident that the tree is not overfit. The performance of the tree with the test data indicates how well the tree can predict new data.
The Confusion Matrix also shows how well the tree separates the classes using these metrics:
True positive rate (TPR) — the probability that an event case is predicted correctly
False positive rate (FPR) — the probability that a nonevent case is predicted incorrectly
False negative rate (FNR) — the probability that an event case is predicted incorrectly
True negative rate (TNR) — the probability that a nonevent case is predicted correctly