Example of Discover Key Predictors with TreeNet® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

A team of researchers collects data about factors that affect a quality characteristic of baked pretzels. Variables include process settings, like Mix Tool, and grain properties, like Flour Protein.

As part of the initial exploration of the data, the researchers decide to use Discover Key Predictors to compare models by sequentially removing unimportant predictors to identify key predictors. The researchers hope to identify key predictors that have large effects on the quality characteristic and to gain more insights into the relationships among the quality characteristic and the key predictors.

  1. Open the sample data, PretzelAcceptability.MTW.
  2. Choose Predictive Analytics Module > TreeNet® Classification > Discover Key Predictors.
  3. From the drop-down list, select Binary response.
  4. In Response, enter Acceptable Pretzel.
  5. In Response event, select 1 to indicate that the pretzel is acceptable.
  6. In Continuous predictors, enter Flour Protein-Bulk Density.
  7. In Categorical predictors, enter Mix Tool-Kiln Method.
  8. Click Discover Key Predictors
  9. In Maximum number of elimination steps enter 29.
  10. Click OK in each dialog box.

Interpret the results

For this analysis, Minitab Statistical Software compares 28 models. The number of steps is less than the maximum number of steps because the Foam Stability predictor has an importance score of 0 in the first model, so the algorithm eliminates 2 variables in the first step. The asterisk in the Model column of the Model Evaluation table shows that the model with the smallest value of the average –loglikelihood statistic is model 23. The results that follow the model evaluation table are for model 23.

Although model 23 has the smallest value of the average –loglikelihood statistic, other models have similar values. The team can click Select an Alternative Model to produce results for other models from the Model Evaluation table.

In the results for Model 23, the Average –Loglikelihood vs. Number of Trees Plot shows that the optimal number of trees is almost the number of trees in the analysis. The team can click Tune Hyperparameters to Identify a Better Model to increase the number of trees and to see whether changes to other hyperparameters improve the performance of the model.

The Relative Variable Importance graph plots the predictors in order of their effect on model improvement when splits are made on a predictor over the sequence of trees. The most important predictor variable is Mix Time. If the importance of the top predictor variable, Mix Time, is 100%, then the next important variable, Kiln Temperature, has a contribution of 93.9%. This means that Kiln Temperature is 93.9% as important as Mix Time.

Use the partial dependency plots to gain insight into how the important variables or pairs of variables affect the fitted response values. The fitted response values are on the 1/2 log scale. The partial dependence plots show whether the relationship between the response and a variable is linear, monotonic, or more complex.

The one predictor partial dependence plots show that medium values for Mix Time, Kiln Temperature and Bake Time increase the odds of an acceptable pretzel. A medium value of Dry Time decreases the odds of an acceptable pretzel. The researchers can click Select More Predictors to Plot to produce plots for other variables.

The two-predictor partial dependence plot of Mix Time and Kiln Temperature shows a more complex relationship between the two variables and the response. While medium values of Mix Time and Kiln Temperature increase the odds of an acceptable pretzel, the plot shows that the best odds occur when both variables are at medium values. The researchers can click Select More Predictors to Plot to produce plots for other pairs of variables.

TreeNet® Classification: Acceptable P vs Flour Protei, Water, Mix Time, ...

Method Criterion for selecting optimal number of trees Maximum loglikelihood Model validation 70/30% training/test sets Learning rate 0.05 Subsample selection method Completely random Subsample fraction 0.5 Maximum terminal nodes per tree 6 Minimum terminal node size 3 Number of predictors selected for node splitting Total number of predictors = 29 Rows used 5000
Binary Response Information Training Test Variable Class Count % Count % Acceptable Pretzel 1 (Event) 2160 61.82 943 62.62 0 1334 38.18 563 37.38 All 3494 100.00 1506 100.00

One Predictor Partial Dependence Plots

Select More Predictors to Plot...

Two Predictor Partial Dependence Plots

Select More Predictors to Plot...

TreeNet® Classification - Discover Key Predictors: Acceptable P vs Flour Protei, Water, ...

Predictor Elimination Plot

Model Selection by Eliminating Unimportant Predictors Test Optimal Number Average Number of Model of Trees -Loglikelihood Predictors Eliminated Predictors 1 268 0.273936 29 None 2 268 0.274186 27 Foam Stability, Bulk Density 3 234 0.273843 26 Least Gelation Concentration 4 233 0.274350 25 Oven Mode 2 5 232 0.274943 24 Kiln Method 6 273 0.275553 23 Oven Mode 1 7 244 0.274811 22 Mix Speed 8 268 0.274258 21 Oven Mode 3 9 272 0.274185 20 Resting Surface 10 232 0.274077 19 Bake Temperature 3 11 287 0.273598 18 Mix Tool 12 227 0.274358 17 Bake Temperature 1 13 276 0.275374 16 Rest Time 14 272 0.276082 15 Water 15 268 0.275595 14 Caustic Concentration 16 268 0.277810 13 Swelling Capacity 17 253 0.276436 12 Emulsion Stability 18 231 0.276159 11 Emulsion Activity 19 268 0.273537 10 Water Absorption Capacity 20 260 0.273455 9 Oil Absorption Capacity 21 299 0.272848 8 Flour Protein 22 278 0.272629 7 Foam Capacity 23* 299 0.267184 6 Flour Size 24 297 0.288621 5 Bake Temperature 2 25 234 0.330342 4 Dry Time 26 290 0.305993 3 Gelatinization Temperature 27 245 0.534345 2 Bake Time 28 146 0.599837 1 Kiln Temperature The algorithm removed one predictor and any predictors with 0 importance at each step. * Selected model has minimum average -loglikelihood. Output for the selected model follows.

One Predictor Partial Dependence Plots

Select More Predictors to Plot...

Two Predictor Partial Dependence Plots

Select More Predictors to Plot...

TreeNet® Classification: Acceptable P vs Mix Time, Bake Time, Dry Time, ...

Model Summary Total predictors 6 Important predictors 6 Number of trees grown 300 Optimal number of trees 299 Statistics Training Test Average -loglikelihood 0.2418 0.2672 Area under ROC curve 0.9661 0.9412 95% CI (0.9608, 0.9713) (0.9295, 0.9529) Lift 1.6176 1.5970 Misclassification rate 0.0970 0.0963

One Predictor Partial Dependence Plots

Select More Predictors to Plot...

Two Predictor Partial Dependence Plots

Select More Predictors to Plot...

TreeNet® Classification: Acceptable P vs Mix Time, Bake Time, Dry Time, ...

Confusion Matrix Predicted Class (Training) Predicted Class (Test) Actual Class Count 1 0 % Correct Count 1 0 % Correct 1 (Event) 2160 1942 218 89.91 943 846 97 89.71 0 1334 121 1213 90.93 563 48 515 91.47 All 3494 2063 1431 90.30 1506 894 612 90.37 Assign a row to the event class if the event probability for the row exceeds 0.5.
Statistics Training (%) Test (%) True positive rate (sensitivity or power) 89.91 89.71 False positive rate (type I error) 9.07 8.53 False negative rate (type II error) 10.09 10.29 True negative rate (specificity) 90.93 91.47
Misclassification Training Test Actual Class Count Misclassed % Error Count Misclassed % Error 1 (Event) 2160 218 10.09 943 97 10.29 0 1334 121 9.07 563 48 8.53 All 3494 339 9.70 1506 145 9.63 Assign a row to the event class if the event probability for the row exceeds 0.5.

One Predictor Partial Dependence Plots

Select More Predictors to Plot...

Two Predictor Partial Dependence Plots

Select More Predictors to Plot...