Percent of error statistics due to largest residuals for Random Forests^® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Use the percent of error statistics to examine the amount of error in the model fits from the worst fits.

Each row of the table shows the error statistics for the given percentage of residuals. The percent of the Mean Squared Error (MSE) that comes from the largest residuals is usually higher than the percent for the other two statistics. MSE uses the squares of the errors in the calculations, so the most extreme observations typically have the greatest influence on the statistic.

If you select validation with a test set in addition to out-of-bag validation, then the table displays results for both the out-of-bag data and the test set data.

A possible pattern is that a small percentage of the residuals account for a large portion of the error in the data. For example, in the following table, the total size of the data set is about 2930. From the perspective of the MSE, that indicates that 1% of the data account for about 36% of the error. In such a case, the 30 cases that contribute most of the error to the model can represent the most natural opportunity to improve the model. Finding a way to improve the fits for those cases leads to a relatively large increase in the overall performance of the model.

This condition can also indicate that you can have greater confidence in nodes of the model that do not have cases with the largest errors. Because most of the error comes from a small number of cases, the fits for the other cases are relatively more accurate.

Percent of Error Statistics Due to Largest Residuals

% of Largest Residuals	Out-of-Bag
% of Largest Residuals	Count	% MSE	% MAD	% MAPE
1.0	30	36.3855	9.5840	13.0409
2.0	59	46.9434	14.8347	18.0932
2.5	74	50.3622	16.9953	20.2317
3.0	88	53.1701	18.8880	22.0186
4.0	118	58.0879	22.5527	25.4151
5.0	147	62.0425	25.7845	28.3840
7.5	220	69.7824	32.9504	34.8161
10.0	293	75.0273	38.8507	40.2386
15.0	440	82.2816	48.6881	49.2733
20.0	586	86.9557	56.5610	56.7304

Percent of error statistics due to largest residuals for Random Forests® Regression

Note

Percent of Error Statistics Due to Largest Residuals

Percent of error statistics due to largest residuals for Random Forests^® Regression