Overview for Random Forests® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Use Random Forests® Classification to create a high-performance prediction model for a categorical response with many continuous and categorical predictor variables. Random Forests® Classification combines information from many CART® trees to provide a substantial advance in data mining technology.

Random Forests® Classification provides insights for a wide range of applications, including manufacturing quality control, drug discovery, fraud detection, credit scoring, and churn prediction. Use the results to identify important variables, to identify groups in the data with desirable characteristics, and to predict response values for new observations. For example, a market researcher can use Random Forests® Classification to identify customers that have higher response rates to specific initiatives and to predict those response rates.

CART® Classification is a good data exploratory analysis tool and provides an easy-to-understand model to identify important predictors quickly. However, after initial exploration with CART® Classification, consider TreeNet® Classification or Random Forests® Classification as a necessary follow-up step.

Random Forests® Classification output includes relative variable importance charts, ROC curves, and lift and gain charts. These plots help you to evaluate whether the variables in the model predict the response classes with high accuracy and help you to identify the most important predictors for prediction accuracy. This information is useful when you want to control the settings that enable an optimal production outcome.

The method was developed by Leo Breiman and Adele Cutler of the University of California, Berkeley.

Where to find this analysis

To perform a Random Forests® Classification, choose Predictive Analytics Module > Random Forests® Classification.

When to use an alternate analysis

If you want to try a parametric regression model with a categorical response variable, use Fit Binary Logistic Model.