Example of Discover Key Predictors with TreeNet® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

A team of researchers collects data about factors that affect a quality characteristic of baked pretzels. Variables include process settings, like Mix Tool, and grain properties, like Flour Protein.

As part of the initial exploration of the data, the researchers decide to use Discover Key Predictors to compare models by sequentially removing unimportant predictors to identify key predictors. The researchers hope to identify key predictors that have large effects on the quality characteristic and to gain more insights into the relationships among the quality characteristic and the key predictors.

  1. Open the sample data, PretzelAcceptability.MTW.
  2. Choose Predictive Analytics Module > TreeNet® Classification > Discover Key Predictors.
  3. From the drop-down list, select Binary response.
  4. In Response, enter Acceptable Pretzel.
  5. In Response event, select 1 to indicate that the pretzel is acceptable.
  6. In Continuous predictors, enter Flour Protein-Bulk Density.
  7. In Categorical predictors, enter Mix Tool-Kiln Method.
  8. Click Discover Key Predictors
  9. In Maximum number of elimination steps enter 29.
  10. Click OK in each dialog box.

Interpret the results

For this analysis, Minitab Statistical Software compares 28 models. The number of steps is less than the maximum number of steps because the Foam Stability predictor has an importance score of 0 in the first model, so the algorithm eliminates 2 variables in the first step. The asterisk in the Model column of the Model Evaluation table shows that the model with the smallest value of the average –loglikelihood statistic is model 23. The results that follow the model evaluation table are for model 23.

Although model 23 has the smallest value of the average –loglikelihood statistic, other models have similar values. The team can click Select Alternative Model to produce results for other models from the Model Evaluation table.

In the results for Model 23, the Average –Loglikelihood vs. Number of Trees Plot shows that the optimal number of trees is almost the number of trees in the analysis. The team can click Tune Hyperparameters to increase the number of trees and to see whether changes to other hyperparameters improve the performance of the model.

The Relative Variable Importance graph plots the predictors in order of their effect on model improvement when splits are made on a predictor over the sequence of trees. The most important predictor variable is Mix Time. If the importance of the top predictor variable, Mix Time, is 100%, then the next important variable, Kiln Temperature, has a contribution of 93.9%. This means that Kiln Temperature is 93.9% as important as Mix Time.

Use the partial dependency plots to gain insight into how the important variables or pairs of variables affect the fitted response values. The fitted response values are on the 1/2 log scale. The partial dependence plots show whether the relationship between the response and a variable is linear, monotonic, or more complex.

The one predictor partial dependence plots show that medium values for Mix Time, Kiln Temperature and Bake Time increase the odds of an acceptable pretzel. A medium value of Dry Time decreases the odds of an acceptable pretzel. The researchers can select One-Predictor Plots to produce plots for other variables.

The two-predictor partial dependence plot of Mix Time and Kiln Temperature shows a more complex relationship between the two variables and the response. While medium values of Mix Time and Kiln Temperature increase the odds of an acceptable pretzel, the plot shows that the best odds occur when both variables are at medium values. The researchers can select Two-Predictor Plots to produce plots for other pairs of variables.

Method

Criterion for selecting optimal number of treesMaximum loglikelihood
Model validation70/30% training/test sets
Learning rate0.05
Subsample selection methodCompletely random
    Subsample fraction0.5
Maximum terminal nodes per tree6
Minimum terminal node size3
Number of predictors selected for node splittingTotal number of predictors = 29
Rows used5000

Binary Response Information



TrainingTest
VariableClassCount%Count%
Acceptable Pretzel1 (Event)216061.8294362.62
  0133438.1856337.38
  All3494100.001506100.00

Model Selection by Eliminating Unimportant Predictors

Test
ModelOptimal
Number
of Trees
Average
-Loglikelihood
Number of
Predictors
Eliminated Predictors
12680.27393629None
22680.27418627Foam Stability, Bulk Density
32340.27384326Least Gelation Concentration
42330.27435025Oven Mode 2
52320.27494324Kiln Method
62730.27555323Oven Mode 1
72440.27481122Mix Speed
82680.27425821Oven Mode 3
92720.27418520Resting Surface
102320.27407719Bake Temperature 3
112870.27359818Mix Tool
122270.27435817Bake Temperature 1
132760.27537416Rest Time
142720.27608215Water
152680.27559514Caustic Concentration
162680.27781013Swelling Capacity
172530.27643612Emulsion Stability
182310.27615911Emulsion Activity
192680.27353710Water Absorption Capacity
202600.2734559Oil Absorption Capacity
212990.2728488Flour Protein
222780.2726297Foam Capacity
23*2990.2671846Flour Size
242970.2886215Bake Temperature 2
252340.3303424Dry Time
262900.3059933Gelatinization Temperature
272450.5343452Bake Time
281460.5998371Kiln Temperature
The algorithm removed one predictor and any predictors with 0 importance at each step.
* Selected model has minimum average -loglikelihood. Output for the selected model follows.

Model Summary

Total predictors6
Important predictors6
Number of trees grown300
Optimal number of trees299
StatisticsTrainingTest
Average -loglikelihood0.24180.2672
Area under ROC curve0.96610.9412
        95% CI(0.9608, 0.9713)(0.9295, 0.9529)
Lift1.61761.5970
Misclassification rate0.09700.0963

Confusion Matrix


Predicted Class (Training)Predicted Class (Test)
Actual ClassCount10% CorrectCount10% Correct
1 (Event)2160194221889.919438469789.71
01334121121390.935634851591.47
All34942063143190.30150689461290.37
Assign a row to the event class if the event probability for the row exceeds 0.5.
     
StatisticsTraining (%)Test (%)
True positive rate (sensitivity or power)89.9189.71
False positive rate (type I error)9.078.53
False negative rate (type II error)10.0910.29
True negative rate (specificity)90.9391.47

Misclassification


TrainingTest
Actual ClassCountMisclassed% ErrorCountMisclassed% Error
1 (Event)216021810.099439710.29
013341219.07563488.53
All34943399.7015061459.63
Assign a row to the event class if the event probability for the row exceeds 0.5.