Fits and error statistics for best and worst terminal nodes for CART® Regression

Use the fits and error statistics to characterize nodes of special interest because of their performance.


Each row of the table shows the fit and error statistics for a node. The best nodes are in order from least error to greatest error. The worst nodes are in order from greatest error to least error.

When you use a test data set, Minitab calculates separate statistics for the training and test data. You can compare the statistics to examine the relative performance of the tree on the training data and on new data. The test statistics are usually a better measure of how the tree performs for new data.

The fit is the mean response value of the cases in the node. The fit is the predicted value for new data that fall in the same node. Terminal nodes with fits that are different from the other terminal nodes can be of special interest because the fitted values for cases in those terminal nodes will be different.
The count is the number of cases in the node. If the analysis includes weights, then the count is the weighted count. Terminal nodes with many cases can be of special interest because these nodes typically represent more common cases.
The standard deviation is the standard deviation of the response values in the node. Terminal nodes with smaller standard deviations can be of special interest because the predictions from these nodes are more precise than for terminal nodes with larger standard deviations.
The mean square error (MSE) measures the accuracy of the node. Outliers have a greater effect on MSE than on MAD and MAPE.

The mean absolute deviation (MAD) expresses accuracy in the same units as the data, which helps conceptualize the amount of error. Outliers have less of an effect on MAD than on MSE.

The mean absolute percent error (MAPE) expresses accuracy as a percentage of the error. Because the MAPE is a percentage, it can be easier to understand than the other accuracy measure statistics. For example, if the MAPE is 5, on average, the fit is off by 5%. Outliers have less of an effect on MAPE than on MSE.

However, sometimes you may see a very large value of MAPE even though the node appears to fit the data well. Examine the fitted vs actual response value plot to see if any data values are close to 0. Because MAPE divides the absolute error by the actual data, values close to 0 can greatly inflate the MAPE.