Methods and formulas for the model summary in Random Forests^® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

In This Topic

Important variables
Out-of-bag and test predictions
R-squared
Root mean squared error (RMSE)
Mean squared error (MSE)
Mean absolute deviation (MAD)
Mean absolute percent error (MAPE)
Notation

Important variables

Minitab Statistical Software determines the importance of a variable in Random Forests^® Regression with the permutation method. The permutation method uses the out-of-bag data. For a given tree, j, in the analysis, predict the out-of-bag data with the tree. Repeat the prediction for every tree in the forest. Then, compute the average of the out-of-bag predictions for each row that appears at least once in the out-of-bag data. Use the predictions to compute the mean squared error for the out-of-bag data:

where

Term	Description
	value of the response variable for row i
	number rows that appear in the out-of-bag data over the entire forest
	out-of-bag prediction for row i

Then, randomly permute the values of a variable, x_m through the out-of-bag data. Leave the response values and the other predictor values the same. Then, use the same steps to calculate the mean squared error for the permuted data, .

The importance for variable x_m comes from the difference of the two mean squared errors:

Minitab rounds values smaller than 10^–7 to 0.

Repeat this process for every variable in the analysis. The variable with the highest importance is the most important variable. The relative variable importance scores are scaled by the importance of the most important variable:

Out-of-bag and test predictions

The predicted calculations for the following measures of model accuracy depend on the validation method. The out-of-bag predictions come only from the trees where a row is out-of-bag. For a given tree, j, in the analysis, predict the out-of-bag data with the tree. Repeat the prediction for every tree in the forest. Then, compute the average of the out-of-bag predictions for each row that appears at least once in the out-of-bag data. For the evaluation of the model with the out-of-bag data, the average of the response variable is the average across all rows in the out-of-bag data.

For the test data set, use each tree in the forest to predict each value in the test data set. Then, average the predictions from all the trees to get the prediction for the model. For the evaluation of the model with the test set, the average response is the average of the rows in the test set.

R-squared

The calculation of R² uses the out-of-bag data or the test data. The predictions differ in these two cases. In general, the formula for R² has the following form: