Percent of error statistics due to largest residuals for Fit Model and Discover Key Predictors with TreeNet® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Use the percent of error statistics to examine the amount of error in the model fits from the worst fits. When the analysis uses a validation technique, you can also compare the statistics of the model for the training and test data.

Each row of the table shows the error statistics for the given percentage of residuals. The percent of the Mean Squared Error (MSE) that comes from the largest residuals is usually higher than the percent for the other two statistics. MSE uses the squares of the errors in the calculations, so the most extreme observations typically have the greatest influence on the statistic. Large differences between the percent of error for MSE and the other two measures can indicate that the model is more sensitive to the selection of splitting the nodes with least squared error or least absolute deviation.

When you use a validation technique, Minitab calculates separate statistics for the training data and for the test data. You can compare the statistics to examine the relative performance of the model on the training data and on new data. The test statistics are usually a better measure of how the model will perform for new data.

A possible pattern is that a small percentage of the residuals account for a large portion of the error in the data. For example, in the following table, the total size of the data set is about 4400. From the perspective of the MSE, that indicates that 1% of the data account for about 13% of the error. In such a case, the 31 cases that contribute most of the error to the model can represent the most natural opportunity to improve the model. Finding a way to improve the fits for those cases leads to a relatively large increase in the overall performance of the model.

This condition can also indicate that you can have greater confidence in nodes of the model that do not have cases with the largest errors. Because most of the error comes from a small number of cases, the fits for the other cases are relatively more accurate.

TreeNet® Regression: Loan Amount vs Annual Incom, Income Ratio, ...

Percent of Error Statistics Due to Largest Residuals % of Largest Training Test Residuals Count % MSE % MAD % MAPE Count % MSE % MAD 1.0 31 13.2824 4.9997 8.0885 14 21.6989 6.9082 2.0 62 21.3764 8.9374 12.9910 27 31.9396 11.6377 2.5 77 24.7125 10.6967 14.9989 33 35.7935 13.6106 3.0 93 27.9315 12.4817 17.0128 40 39.8022 15.7838 4.0 123 33.2979 15.6372 20.4671 53 45.8259 19.4124 5.0 154 38.1707 18.6937 23.7785 66 50.8291 22.7194 7.5 231 47.9001 25.4954 31.0104 98 59.7000 29.6264 10.0 307 55.3764 31.4216 37.0787 131 66.4339 35.7333 15.0 461 66.7462 41.8167 47.2740 196 75.4853 45.6703 20.0 614 74.8066 50.5429 55.5443 261 81.6292 53.8603 % of Largest Residuals % MAPE 1.0 9.0517 2.0 14.0987 2.5 16.1761 3.0 18.4925 4.0 22.4744 5.0 25.9526 7.5 33.2548 10.0 39.2610 15.0 48.6658 20.0 56.3489