Tree diagram for CART® Regression

By default, Minitab displays the smallest tree with an R2 value that is within one standard error of the maximum R2 value.

Right-click the tree diagram to perform the following interactions:
  • Highlight the 5 nodes with the least variation from the fitted value for the node. These nodes are the optimal nodes.
  • Highlight the 5 nodes with the highest means or medians, depending on the criterion for the tree.
  • Highlight the 5 nodes with the lowest means or medians, depending on the criterion for the tree.
  • Copy the values of the predictors that lead to a node that you select. These values are the node rules.
  • Show the Node Split View. This view is helpful when you have a large tree and want to see only which variables split the nodes.

The root node is the top node of the tree and is the only node that does not have parents. At each node, the data is split into 2 children, depending on the characteristics of the data. Nodes continue to split until the tree runs out of data to split. In other words, the terminal nodes cannot be split into further groupings.

When you validate the performance of the tree with a test data set, the tree diagram shows cases from the training data set.

Interpretation

Each node contains details on the mean, standard deviation, and count.

Node ID
Gives the node or terminal node designation. The terminal node is a final node that cannot be split further.
Mean
The mean is the mean response value of the cases in the node. The mean is the predicted value for new data that fall in the same node. Terminal nodes with means that are the most different from the other terminal nodes can be of special interest because the predictions for cases in those terminal nodes are the most different. For example, the largest or smallest means can be of special interest.
Standard deviation
The standard deviation is the standard deviation of the response values in the node. Terminal nodes with smaller standard deviations can be of special interest because the predictions from these nodes are more precise than for terminal nodes with larger standard deviations.
Total Count
The total count is the number of cases in the node. If the analysis includes weights, then the count is the total weighted count. Terminal nodes with many cases can be of special interest because these nodes typically represent more common cases.