Overview of Fit Model and Discover Key Predictors for TreeNet® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Use TreeNet® Regression to produce gradient boosted regression trees for a continuous response with many continuous and categorical predictor variables. TreeNet® Regression is a revolutionary advance in data mining technology developed by Jerome Friedman, one of the world's outstanding data mining researchers. This flexible and powerful data mining tool is capable of consistently generating extremely accurate models with exceptional speed, and a high tolerance for messy and incomplete data.

For example, a medical researcher can use TreeNet® Regression to identify patients that have higher response rates to specific treatments and to predict those response rates.

CART® Regression is a good data exploratory analysis tool and provides an easy-to-understand model to identify important predictors quickly. However, after initial exploration with CART® Regression, consider TreeNet® Regression as a necessary follow-up step. TreeNet® Regression provides a high-performance and more complex model that can consist of several hundred small trees. Each tree contributes a small amount to the overall model. Based on the TreeNet® Regression results, one can gain insights into the relationship between a continuous response and the important predictors out of many candidate predictors, and predict responses for new observations with great accuracy.

The TreeNet® Regression analysis provides one and two predictor partial dependency plots. These plots help you to evaluate how changes of key predictor variables affect response values. Thus, this information may be useful for controlling the settings that enable optimal production outcome.

The TreeNet® Regression analysis also provides the capability to try different hyperparameters for a model. The learning rate and the subsample fraction are examples of hyperparameters. Exploration of different values is a common method to improve model performance.

For a more complete introduction to the CART® methodology, see Breiman, Friedman, Olshen and Stone (1984)1 and 2.

Fit Model

Use Fit Model to build a single gradient boosted regression tree model for a continuous response with many continuous and categorical predictor variables. The results are for the model from the learning process with the maximum R2 value or the minimum least absolute deviation.

Discover Key Predictors

For a dataset with many predictors, where some predictors have less effect on the response than others, consider the use of Discover Key Predictors to eliminate unimportant predictors from the model. The removal of the unimportant predictors helps to clarify the effects of the most important predictors and improves the prediction accuracy. The algorithm removes the least important predictors in a sequential way, shows you results that let you compare models with different numbers of predictors, and produces results for the set of predictors with the best value of the accuracy criterion.

For example, a chemist uses Discover Key Predictors to automatically identify a dozen or so predictors from a set of 500 predictors that effectively model viscosity in a new fuel blend.

Discover Key Predictors can also remove the most important predictors to quantitatively assess the effect of each important predictor on the prediction accuracy of a model.

Where to find this analysis

To perform Fit Model, choose Predictive Analytics Module > TreeNet® Regression > Fit Model.

To perform a Discover Key Predictors, choose Predictive Analytics Module > TreeNet® Regression > Discover Key Predictors.

When to use an alternate analysis

If you want to try a parametric regression model with a continuous response variable, use Fit Regression Model.

To compare the performance of a Random Forests® Regression model, use Random Forests® Regression.

1 Breiman, Friedman, Olshen & Stone. (1984). Classification and Regression Trees. Boca Raton, Florida: Chapman & Hall/CRC.
2 H. Zhang and B.H. Singer. (2010). Recursive Partitioning and Applications. New York, New York: Springer