This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.
Use the results to compare how well models perform with different settings for the hyperparameters. Click Tune Hyperparameters to evaluate additional values of the hyperparameters.
The optimal number of trees usually differs at each step. When the optimal number is close to the maximum number of trees for the analysis, the model is more likely to improve if you increase the number of trees than a model with an optimal number of trees that is far from the maximum. You can consider whether to further explore an alternative model that seems likely to improve.
The average –loglikelihood is a measure of model accuracy. Smaller values indicate a better fit.
When the response is binary you can use the maximum loglikelihood as the criterion for the selection of the best model. The full results that follow the table are for the model with the least value of the average –loglikelihood.
The ROC curve plots the True Positive Rate (TPR), also known as power, on the y-axis. The ROC curve plots the False Positive Rate (FPR), also known as type 1 error, on the x-axis. The area under an ROC curve indicates whether the model is a good classifier.
For classification trees, the area under the ROC curve values typically range from 0.5 to 1. Larger values indicate a better classification model. When the model can perfectly separate the classes, then the area under the curve is 1. When the model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.
When you use the maximum area under the ROC curve as the criterion for the selection of the best model, then the table includes the area under the ROC curve for each model. The full results that follow the table are for the model with the largest area under the ROC curve.
The misclassification rate indicates how often the model accurately classifies the response values. Smaller values indicate better performance.
When you use the minimum misclassification rate as the criterion for the selection of the best model, then the table includes the misclassification rate for each model. The full results that follow the table are for the model with the least misclassification rate.
Low learning rates weigh each new tree in the model less than higher learning rates and sometimes produce more trees for the model. A model with a low learning rate has less chance of overfitting the training data set. Models with low learning rates generally use more trees to find the optimal number of trees.
The subsample fraction is the proportion of the data that the analysis uses to build each tree.
TreeNet® Classification combines many small CART® trees into a powerful model. The table includes whichever hyperparameter is in the analysis, either the maximum number of terminal nodes per tree or the maximum tree depth. Trees with more terminal nodes can model more complex interactions. In general, values above 12 could slow the analysis without much benefit to the model.
TreeNet® Classification combines many small CART® trees into a powerful model. You can specify either the maximum number of terminal nodes or the maximum tree depth for these smaller CART® trees. Deeper trees can model more complex interactions. Values from 4 to 6 are adequate for many datasets.