Overview of Fit Model and Discover Key Predictors for TreeNet® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Use TreeNet® Classification to produce gradient boosted classification trees for a categorical response with many continuous and categorical predictor variables. TreeNet® Classification is a revolutionary advance in data mining technology developed by Jerome Friedman, one of the world's outstanding data mining researchers. This flexible and powerful data mining tool is capable of consistently generating extremely accurate models with exceptional speed, and a high tolerance for messy and incomplete data.

For example, a market researcher can use TreeNet® Classification to identify customers that have higher response rates to specific initiatives and to predict those response rates.

CART® Classification is a good data exploratory analysis tool and provides an easy-to-understand model to identify important predictors quickly. However, after initial exploration with CART® Classification, consider TreeNet® Classification as a necessary follow-up step. TreeNet® Classification provides a high-performance and more complex model that can consist of several hundred small trees. Each tree contributes a small amount to the overall model. Based on the TreeNet® Classification results, you can gain insights into the relationship between a categorical response and the important predictors out of many candidate predictors, and predict response class probabilities for new observations with great accuracy.

The TreeNet® Classification analysis provides one and two predictor partial dependency plots. These plots help you to evaluate how changes of key predictor variables affect response values. Thus, this information may be useful for controlling the settings that enable optimal production outcome.

The TreeNet® Classification analysis also provides the capability to try different hyperparameters for a model. The learning rate and the subsample fraction are examples of hyperparameters. Exploration of different values is a common method to improve model performance.

For a more complete introduction to the CART® methodology, see Breiman, Friedman, Olshen and Stone (1984)1 and 2.

Fit Model

Use Fit Model to build a single gradient boosted classification tree model for a categorical response with many continuous and categorical predictor variables. The results are for the model from the learning process with the maximum loglikelihood, the maximum area under the ROC curve, or the minimum misclassification rate.

Discover Key Predictors

For a dataset with many predictors, where some predictors have less effect on the response than others, consider the use of Discover Key Predictors to eliminate unimportant predictors from the model. The removal of the unimportant predictors helps to clarify the effects of the most important predictors and improves the prediction accuracy. The algorithm removes the least important predictors in a sequential way, shows you results that let you compare models with different numbers of predictors, and produces results for the set of predictors with the best value of the accuracy criterion.

For example, a market researcher uses Discover Key Predictors to automatically identify a dozen or so predictors from a set of 500 predictors that effectively model which customers have higher response rates to specific initiatives.

Discover Key Predictors can also remove the most important predictors to quantitatively assess the effect of each important predictor on the prediction accuracy of a model.

Where to find this analysis

To perform a Fit Model, choose Predictive Analytics Module > TreeNet® Classification > Fit Model.

To perform a Discover Key Predictors, choose Predictive Analytics Module > TreeNet® Classification > Discover Key Predictors.

When to use an alternate analysis

If you want to try a parametric regression model with a binary response variable, use Fit Binary Logistic Model.

To compare the performance of a Random Forests® Classification model, use Random Forests® Classification

1 Breiman, Friedman, Olshen & Stone. (1984). Classification and Regression Trees. Boca Raton, Florida: Chapman & Hall/CRC.
2 H. Zhang and B.H. Singer. (2010). Recursive Partitioning and Applications. New York, New York: Springer