Model summary table for Fit Model and Discover Key Predictors with TreeNet® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Find definitions and interpretation guidance for the Model summary table.
Note

Minitab displays results for both the training and test data set. The test results indicate whether the model can adequately predict the response values for new observations, or properly summarize the relationships between the response and the predictor variables. Use the training results to evaluate overfitting of the model.

Total predictors

The number of total predictors available for the TreeNet® model. The total is the sum of the continuous and categorical predictors that you specify.

Important predictors

The number of important predictors in the TreeNet® model. Important predictors have importance scores greater than 0. You can use the Relative Variable Importance chart to display the order of relative variable importance. For instance, suppose 10 of 20 predictors are important in the model, the Relative Variable Importance chart displays the variables in importance order.

Number of trees grown

By default, Minitab grows 300 small CART® trees to produce the TreeNet® model. While this value works well for exploration of the data, consider whether to grow more trees to produce a final model. To change the number of trees grown, go to the Options subdialog box.

Optimal number of trees

The optimal number of trees corresponds to the lowest value of average negative log-likelihood or misclassification rate, or the highest value of the area under the ROC curve.

When the optimal number of trees is close to the maximum number of trees that the model grows, consider an analysis with more trees. Thus, if you grow 300 trees and the optimal number comes back as 298, then re-build the model with more trees. If the optimal number continues to be close to the maximum number, continue to increase the number of trees.

Average −Loglikelihood

Minitab calculates the average of the negative log-likelihood function when the response is binary. Compare the average –log-likelihood values for test from different models to determine the model with the best fit. Lower average –log-likelihood values indicates a better fit.

Area under ROC curve

The ROC curve plots the True Positive Rate (TPR), also known as power, on the y-axis. The ROC curve plots the False Positive Rate (FPR), also known as type 1 error, on the x-axis. The area under an ROC curve indicates whether the model is a good classifier.

For classification trees, the area under the ROC curve values typically range from 0.5 to 1. Larger values indicate a better classification model. When the model can perfectly separate the classes, then the area under the curve is 1. When the model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.

Lift

Minitab displays lift when the response is binary. The lift is the cumulative lift for the 10% of the data with the best chance of correct classification.

Lift represents the ratio of the target response divided by the average response. When lift is greater than 1, a segment of the data has a greater than expected response.

Misclassification rate

The optimal misclassification rate occurs at the tree with the optimal area under the ROC curve. The misclassification rate indicates how often the model accurately classifies the events and nonevents.

Smaller values indicate better performance.