Model evaluation by eliminating unimportant or important predictors for Discover Key Predictors with TreeNet® Classification

Find definitions and interpretation guidance for the model evaluation table.
Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Note

When you specify the options for Discover Key Predictors, you can choose model selection results for both training and test data. The test results indicate whether the model can adequately predict the response values for new observations, or properly summarize the relationships between the response and the predictor variables. The training results are generally for reference only.

Use the results to compare the models from different steps. To further explore an alternative model from the table, click Select an Alternative Model. Minitab produces a full set of results for the alternative model. You can tune the hyperparameters and make predictions accordingly.

Optimal number of trees

The optimal number of trees usually differs at each step. If the optimal number is close to the total number of trees for the analysis, the model is more likely to improve. You can consider whether to further explore an alternative model that seems likely to improve.

Average –loglikelihood

The average –loglikelihood is a measure of model accuracy. Smaller values indicate a better fit.

When the response is binary you can use the maximum loglikelihood as the criterion for the selection of the best model. The full results that follow the table are for the model with the least value of the average –loglikelihood. If a model with a smaller number of terms has an average –loglikelihood that is close to the optimal value, then consider whether to further explore the alternative model. A model with fewer predictors is easier to interpret and allows you to work with a smaller number of predictors.

Area under ROC curve

The ROC curve plots the True Positive Rate (TPR), also known as power, on the y-axis. The ROC curve plots the False Positive Rate (FPR), also known as type 1 error, on the x-axis. The area under an ROC curve indicates whether the model is a good classifier.

For classification trees, the area under the ROC curve values typically range from 0.5 to 1. Larger values indicate a better classification model. When the model can perfectly separate the classes, then the area under the curve is 1. When the model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.

When you use the maximum area under the ROC curve as the criterion for the selection of the best model, then the table includes the area under the ROC curve for each model. The full results that follow the table are for the model with the largest area under the ROC curve. If a model with a smaller number of terms has a value that is close to the optimal value, then consider whether to further explore the alternative model. A model with fewer predictors is easier to interpret and allows you to work with a smaller number of predictors.

Misclassification rate

The misclassification rate indicates how often the model accurately classifies the response values. Smaller values indicate better performance.

When you use the minimum misclassification rate as the criterion for the selection of the best model, then the table includes the misclassification rate for each model. The full results that follow the table are for the model with the least misclassification rate. If a model with a smaller number of terms has a value that is close to the optimal value, then consider whether to further explore the alternative model. A model with fewer predictors is easier to interpret and allows you to work with a smaller number of predictors.

Predictor count

The predictor count is the number of predictors in the model. The number of predictors in the first row of the table is always all the predictors that the analysis considers. After the first row, the number of predictors depends on whether the analysis eliminates unimportant predictors or important predictors.

When the analysis removes the least important predictors, then the number of predictors decreases by a specified number of predictors in each step, plus any predictors that have importance scores of 0. For example, if the analysis eliminates 10 predictors per step, has 900 predictors, and 450 predictors with importance scores of 0 in the initial model, then the first row of the table has 900 predictors. The second row has 440 predictors because the analysis removes the 450 predictors with importance scores of 0 and the 10 least important predictors.

When the analysis removes the most important predictors, then the number of predictors decreases by the specified number of predictors at each step. Predictors that have 0 importance remain in the model.

Eliminated predictors

The column shows the eliminated predictors at each step. The list shows at most 25 predictor titles at a step. The first row always shows "None" because the model has all the predictors. After the first row, the number of predictors depends on whether the analysis eliminates unimportant predictors or important predictors.

When the analysis removes the least important predictors, then the number of predictors decreases by a specified number of predictors in each step, plus any predictors that have 0 importance scores. If the analysis eliminates predictors that have 0 importance scores, then those predictors are first in the list. When the analysis eliminates more than one predictor in either category, the order of the names is the order of the predictors from the worksheet.

When the analysis removes the most important predictors, then the list shows the eliminated predictors from each step. When the analysis eliminates more than one important predictor at a step, then the order of the names in the list is the order of the predictors from the worksheet.