Use the percent of error statistics to examine the amount of error
in the tree fits from the worst fits. When the analysis uses a validation
technique, you can also compare the statistics of the tree for the training and
test data.

Each row of the table shows the error statistics for the given percentage of
residuals. The percent of the Mean Squared Error (MSE) that comes from the
largest residuals is usually higher than the percent for the other two
statistics. MSE uses the squares of the errors in the calculations, so the most
extreme observations typically have the greatest influence on the statistic.
Large differences between the percent of error for MSE and the other two
measures can indicate that the tree is more sensitive to the selection of
splitting the nodes with least squared error or least absolute deviation.

When you use a validation technique, Minitab calculates separate statistics
for the training data and for the test data. You can compare the statistics to
examine the relative performance of the tree on the training data and on new
data. The test statistics are usually a better measure of how the tree will
perform for new data.

A possible pattern is that a small percentage of the residuals account for a
large portion of the error in the data. For example, in the following table,
the total size of the data set is about 4500. From the perspective of the MSE,
that indicates that 1% of the data account for about 12% of the error. In such
a case, the 45 cases that contribute most of the error to the tree can
represent the most natural opportunity to improve the tree. Finding a way to
improve the fits for those cases leads to a relatively large increase in the
overall performance of the tree.

This condition can also indicate that you can have greater confidence in
nodes of the tree that do not have cases with the largest errors. Because most
of the error comes from a small number of cases, the fits for the other cases
are relatively more accurate.