Specify the validation method for Random Forests® Classification

Predictive Analytics Module > Random Forests® Classification > Validation

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Select whether to validate with a test data set in addition to the out-of-bag validation.

Validation with out-of-bag data

Random Forests® Classification uses bootstrap sampling for every tree. Every record in a bootstrap sample is randomly selected with replacement from the original data set. Out of the total observations of the original data set, some records will be left out from every bootstrap sample. The left-out rows from a bootstrap sample create an out-of-bag data set.

Validation with a test set in addition to out-of-bag data

Complete the following steps to specify a fraction of the data to use for training and testing. If you select Validation with a test set in addition to out-of-bag data, by default, Minitab uses 30% of the data for testing.

  1. From the drop-down list, select Validation with a test set in addition to out-of-bag data.
  2. Choose one of the following to specify whether to select a fraction of rows randomly or with an ID column.
    • Randomly select a fraction of rows as a test set: Select this option to have Minitab randomly select a fraction of rows for testing. You can specify the fraction. The default value of 0.3 works well in most cases. For larger data sets, you may want to increase the fraction of data used for testing. You can also set a base for the random number generator.
    • Define training/test split by ID column: Select this option to choose the rows to include in the test sample. In ID column, enter the column that indicates which rows to use for the test sample. The ID column must contain only 2 values. In Level for test set, select which level to use as the test sample.
  3. (Optional) Check Store ID column for training/test split to save the ID column.