Method table for Random Forests® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Find definitions and interpretation guidance for the Method table.

Model validation

Random Forests® Classification uses out-of-bag validation for every analysis. If you select validation with a test set in addition to out-of-bag validation, then the table displays the column that identifies the test set or the percentage of the data in the test and training sets.

Number of bootstrap samples

The number of bootstrap samples indicates the number of trees in the analysis. When you use the only out-of-bag validation, the sample size is the same as the number of rows in the analysis. When you use validation with a test set, the default sample size is the same as the training data size. If you choose to use a sample size smaller than the training data size, the table displays that size.

Number of predictors selected for node splitting

This row indicates whether the node splitting considers every predictor at each node or a random subset of the predictors. If the node splitting uses a random subset, this row indicates the choice for the number of predictors to consider.

If you use all the predictors initially, consider whether to use a subset of predictors in subsequent models to compare the performance of the models.

Minimum internal node size

The minimum internal node size indicates the minimum number of cases a node can have and still split into more nodes. If the model performance is inadequate, consider whether to increase this value to see the effect on the performance.

Missing value penalty

By default, the analysis does not have a missing value penalty and this row is not present. The missing value penalty penalizes a predictor variable for the proportion of missing values. A variable with a high penalty is less likely to become the splitter for a node.

High level category penalty

By default, the analysis does not have a high level category penalty and this row is not present. The high-level category penalty penalizes a variable based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels in a node is less likely to become the splitter for that node.

Rows used

The number of response observations that are in the analysis.

Rows unused

The number of missing response observations.