Model summary table for Random Forests® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Find definitions and interpretation guidance for the Model summary table. If you add validation with a test set to validation with the out-of-bag data, then Minitab displays results for both validation methods.

Total predictors

The number of total predictors available for the Random Forests® model. The total is the sum of the continuous and categorical predictors that you specify.

Important predictors

The number of important predictors in the Random Forests® model. Important predictors have importance scores greater than 0. You can use the Relative Variable Importance chart to display the order of relative variable importance. For instance, suppose 10 of 20 predictors are important in the model, the Relative Variable Importance chart displays the variables in importance order.

Average −Loglikelihood

Minitab calculates the average of the negative log-likelihood when the response is binary. Compare the average –log-likelihood values from different models to determine the model with the best fit. You can also use this statistic to compare models from other commands, such as CART® Classification and TreeNet® Classification. Lower average –log-likelihood values indicate a better fit.

Area under ROC curve

The ROC curve plots the true positive rate (TPR), also known as power, on the y-axis. The ROC curve plots the false positive rate (FPR), also known as type 1 error, on the x-axis. The area under an ROC curve indicates whether the classification tree is a good classifier.

For classification trees, the area under the ROC curve values typically range from 0.5 to 1. Larger values indicate a better classification model. When the model can perfectly separate the classes, then the area under the curve is 1. When the model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.

Lift

Minitab displays lift when the response is binary. The lift is the cumulative lift for the 10% of the data with the best chance of correct classification.

Lift represents the ratio of the target response divided by the average response. When lift is greater than 1, a segment of the data has a greater than expected response.

Misclassification rate

The misclassification rate indicates how often the model accurately classifies the events and nonevents. Smaller values indicate better performance.