Optimization of hyperparameters for Fit Model and Discover Key Predictors with TreeNet^® Classification

Find definitions and interpretation guidance for the model evaluation table.

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Learning rate
Subsample fraction
Maximum terminal nodes per tree
Maximum tree depth

Use the results to compare how well models perform with different settings for the hyperparameters. Click Tune Hyperparameters to evaluate additional values of the hyperparameters.

Optimal number of trees

The optimal number of trees usually differs at each step. When the optimal number is close to the maximum number of trees for the analysis, the model is more likely to improve if you increase the number of trees than a model with an optimal number of trees that is far from the maximum. You can consider whether to further explore an alternative model that seems likely to improve.

Average –loglikelihood

The average –loglikelihood is a measure of model accuracy. Smaller values indicate a better fit.

When the response is binary you can use the maximum loglikelihood as the criterion for the selection of the best model. The full results that follow the table are for the model with the least value of the average –loglikelihood.

Area under ROC curve

The ROC curve plots the True Positive Rate (TPR), also known as power, on the y-axis. The ROC curve plots the False Positive Rate (FPR), also known as type 1 error, on the x-axis. The area under an ROC curve indicates whether the model is a good classifier.

For classification trees, the area under the ROC curve values typically range from 0.5 to 1. Larger values indicate a better classification model. When the model can perfectly separate the classes, then the area under the curve is 1. When the model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.

When you use the maximum area under the ROC curve as the criterion for the selection of the best model, then the table includes the area under the ROC curve for each model. The full results that follow the table are for the model with the largest area under the ROC curve.

Misclassification rate

The misclassification rate indicates how often the model accurately classifies the response values. Smaller values indicate better performance.

When you use the minimum misclassification rate as the criterion for the selection of the best model, then the table includes the misclassification rate for each model. The full results that follow the table are for the model with the least misclassification rate.

Learning rate

Low learning rates weigh each new tree in the model less than higher learning rates and sometimes produce more trees for the model. A model with a low learning rate has less chance of overfitting the training data set. Models with low learning rates generally use more trees to find the optimal number of trees.

Subsample fraction

The subsample fraction is the proportion of the data that the analysis uses to build each tree.

Maximum terminal nodes per tree

TreeNet^® Classification combines many small CART® trees into a powerful model. The table includes whichever hyperparameter is in the analysis, either the maximum number of terminal nodes per tree or the maximum tree depth. Trees with more terminal nodes can model more complex interactions. In general, values above 12 could slow the analysis without much benefit to the model.

Maximum tree depth

TreeNet^® Classification combines many small CART® trees into a powerful model. You can specify either the maximum number of terminal nodes or the maximum tree depth for these smaller CART® trees. Deeper trees can model more complex interactions. Values from 4 to 6 are adequate for many datasets.

Optimization of hyperparameters for Fit Model and Discover Key Predictors with TreeNet® Classification