Percent of error statistics due to largest residuals for Fit Model and Discover Key Predictors with TreeNet^® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Use the percent of error statistics to examine the amount of error in the model fits from the worst fits. When the analysis uses a validation technique, you can also compare the statistics of the model for the training and test data.

Each row of the table shows the error statistics for the given percentage of residuals. The percent of the Mean Squared Error (MSE) that comes from the largest residuals is usually higher than the percent for the other two statistics. MSE uses the squares of the errors in the calculations, so the most extreme observations typically have the greatest influence on the statistic. Large differences between the percent of error for MSE and the other two measures can indicate that the model is more sensitive to the selection of splitting the nodes with least squared error or least absolute deviation.

When you use a validation technique, Minitab calculates separate statistics for the training data and for the test data. You can compare the statistics to examine the relative performance of the model on the training data and on new data. The test statistics are usually a better measure of how the model will perform for new data.

A possible pattern is that a small percentage of the residuals account for a large portion of the error in the data. For example, in the following table, the total size of the data set is about 4400. From the perspective of the MSE, that indicates that 1% of the data account for about 13% of the error. In such a case, the 31 cases that contribute most of the error to the model can represent the most natural opportunity to improve the model. Finding a way to improve the fits for those cases leads to a relatively large increase in the overall performance of the model.

This condition can also indicate that you can have greater confidence in nodes of the model that do not have cases with the largest errors. Because most of the error comes from a small number of cases, the fits for the other cases are relatively more accurate.

Percent of Error Statistics Due to Largest Residuals

% of Largest Residuals	Training				Test
% of Largest Residuals	Count	% MSE	% MAD	% MAPE	Count	% MSE	% MAD	% MAPE
1.0	31	13.2824	4.9997	8.0885	14	21.6989	6.9082	9.0517
2.0	62	21.3764	8.9374	12.9910	27	31.9396	11.6377	14.0987
2.5	77	24.7125	10.6967	14.9989	33	35.7935	13.6106	16.1761
3.0	93	27.9315	12.4817	17.0128	40	39.8022	15.7838	18.4925
4.0	123	33.2979	15.6372	20.4671	53	45.8259	19.4124	22.4744
5.0	154	38.1707	18.6937	23.7785	66	50.8291	22.7194	25.9526
7.5	231	47.9001	25.4954	31.0104	98	59.7000	29.6264	33.2548
10.0	307	55.3764	31.4216	37.0787	131	66.4339	35.7333	39.2610
15.0	461	66.7462	41.8167	47.2740	196	75.4853	45.6703	48.6658
20.0	614	74.8066	50.5429	55.5443	261	81.6292	53.8603	56.3489

Percent of error statistics due to largest residuals for Fit Model and Discover Key Predictors with TreeNet® Regression

Note

Percent of Error Statistics Due to Largest Residuals

Percent of error statistics due to largest residuals for Fit Model and Discover Key Predictors with TreeNet^® Regression