Interpretation
Each row of the table shows the fit and error statistics for a node. The
best nodes are in order from least error to greatest error. The worst nodes are
in order from greatest error to least error.
When you use a test data set, Minitab calculates separate statistics for
the training and test data. You can compare the statistics to examine the
relative performance of the tree on the training data and on new data. The test
statistics are usually a better measure of how the tree performs for new data.
- Fit
- The fit is the mean response value of the cases in the node. The fit
is the predicted value for new data that fall in the same node. Terminal nodes
with fits that are different from the other terminal nodes can be of special
interest because the fitted values for cases in those terminal nodes will be
different.
- Count
- The count is the number of cases in the node. If the analysis
includes weights, then the count is the weighted count. Terminal nodes with
many cases can be of special interest because these nodes typically represent
more common cases.
- StDev
- The standard deviation is the standard deviation of the response
values in the node. Terminal nodes with smaller standard deviations can be of
special interest because the predictions from these nodes are more precise than
for terminal nodes with larger standard deviations.
- MSE
- The mean square error (MSE) measures the accuracy of the node.
Outliers have a greater effect on MSE than on MAD and MAPE.
- MAD
-
The mean absolute deviation (MAD) expresses accuracy in the same
units as the data, which helps conceptualize the amount of error. Outliers have
less of an effect on MAD than on MSE.
- MAPE
- The mean absolute percent error (MAPE) expresses accuracy as a
percentage of the error. Because the MAPE is a percentage, it can be easier to
understand than the other accuracy measure statistics. For example, if the MAPE
is 5, on average, the fit is off by 5%. Outliers have less of an effect on MAPE
than on MSE.
-
However, sometimes you may see a very large value of MAPE even
though the node appears to fit the data well. Examine the fitted vs actual
response value plot to see if any data values are close to 0. Because MAPE
divides the absolute error by the actual data, values close to 0 can greatly
inflate the MAPE.