This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.
Minitab displays results for both the training and test data set. The test results indicate whether the model can adequately predict the response values for new observations, or properly summarize the relationships between the response and the predictor variables. Use the training results to evaluate overfitting of the model.
The number of total predictors available for the TreeNet® model. The total is the sum of the continuous and categorical predictors that you specify.
The number of important predictors in the TreeNet® model. Important predictors have importance scores greater than 0. You can use the Relative Variable Importance chart to display the order of relative variable importance. For instance, suppose 10 of 20 predictors are important in the model, the Relative Variable Importance chart displays the variables in importance order.
By default, Minitab grows 300 small CART® trees to produce the TreeNet® model. While this value works well for exploration of the data, consider whether to grow more trees to produce a final model. To change the number of trees grown, go to the Options subdialog box.
The optimal number of trees corresponds to the lowest value of average negative log-likelihood or misclassification rate, or the highest value of the area under the ROC curve.
When the optimal number of trees is close to the maximum number of trees that the model grows, consider an analysis with more trees. Thus, if you grow 300 trees and the optimal number comes back as 298, then re-build the model with more trees. If the optimal number continues to be close to the maximum number, continue to increase the number of trees.
Minitab calculates the average of the negative log-likelihood function when the response is binary. Compare the average –log-likelihood values for test from different models to determine the model with the best fit. Lower average –log-likelihood values indicates a better fit.
The ROC curve plots the True Positive Rate (TPR), also known as power, on the y-axis. The ROC curve plots the False Positive Rate (FPR), also known as type 1 error, on the x-axis. The area under an ROC curve indicates whether the model is a good classifier.
For classification trees, the area under the ROC curve values typically range from 0.5 to 1. Larger values indicate a better classification model. When the model can perfectly separate the classes, then the area under the curve is 1. When the model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.
Minitab displays lift when the response is binary. The lift is the cumulative lift for the 10% of the data with the best chance of correct classification.
Lift represents the ratio of the target response divided by the average response. When lift is greater than 1, a segment of the data has a greater than expected response.
The optimal misclassification rate occurs at the tree with the optimal area under the ROC curve. The misclassification rate indicates how often the model accurately classifies the events and nonevents.
Smaller values indicate better performance.