Specify the validation method for CART® Classification

Predictive Analytics Module > CART® Classification > Validation

Choose the validation method to test your model. Usually, with smaller samples, the K-fold cross-validation method is appropriate. With larger samples, you can select a fraction of cases to use for training and testing.

K-fold cross-validation

Complete the following steps to use the K-fold cross-validation method to validate the test sample. The K-fold cross-validation method is the default method when the number of rows is ≤ 5000.

  1. From the drop-down list, select K-fold cross-validation.
  2. Choose one of the following to specify whether to assign folds randomly or with an ID column.
    • Randomly assign rows of each fold: Select this option to have Minitab randomly select rows for each fold. You can specify the number of folds. The default value of 10 works well in most cases. Using a lower value of K may introduce more bias; however, larger values of K may introduce more variability. You can also set a base for the random number generator.
    • Assign rows of each fold by ID column: Select this option to choose the rows to include in each fold. In ID column, enter the column that contains the rows for each fold.
  3. (Optional) Check Store ID column for K-fold cross-validation to save the ID column.

Validation with a test set

Complete the following steps to specify a fraction of the data to use for training and testing. The Test set validation method is the default method when the number of rows is > 5000. In many cases, 70% of the data is used for training, and 30% of the data is used for testing.

  1. From the drop-down list, select Validation with a test set.
  2. Choose one of the following to specify whether to select a fraction of rows randomly or with an ID column.
    • Randomly select a fraction of rows as a test set: Select this option to have Minitab randomly select a fraction of rows for testing. You can specify the fraction. The default value of 0.3 works well in most cases. For larger data sets, you may want to increase the fraction of data used for testing. You can also set a base for the random number generator.
    • Define training/test split by ID column: Select this option to choose the rows to include in the test sample. In ID column, enter the column that indicates which rows to use for the test sample. The ID column must contain only 2 values. In Level for test set, select which level to use as the test sample.
  3. (Optional) Check Store ID column for training/test split to save the ID column.

None

If None is selected, no additional validation is performed.