Specify the validation method for Fit Model and Discover Key Predictors with TreeNet® Classification

Predictive Analytics Module > TreeNet® Classification > Fit Model > Validation

Predictive Analytics Module > TreeNet® Classification > Discover Key Predictors > Validation

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Choose the validation method to test your model. Usually, with smaller samples, the K-fold cross-validation method is appropriate. With larger samples, you can select a fraction of cases to use for training and testing.

K-fold cross-validation

Complete the following steps to use the K-fold cross-validation method to validate the test sample. The K-fold cross-validation method is the default method when the number of rows is ≤ 2000.

  1. From the drop-down list, select K-fold cross-validation.
  2. Choose one of the following to specify whether to assign folds randomly or with an ID column.
    • Randomly assign rows of each fold: Select this option to have Minitab randomly select rows for each fold. You can specify the number of folds. For Fit Model, the default value of 5 works well in most cases. For Discover Key Predictors, the default value of 3 makes the calculations relatively faster. A larger number of folds in either case increases the chance of selecting a more reliable predictive model, especially for data sets with fewer rows, but can significantly increase the calculation time.
    • Assign rows of each fold by ID column: Select this option to choose the rows to include in each fold. In ID column, enter the column that contains the rows for each fold.
  3. (Optional) Check Store ID column for K-fold cross-validation to save the ID column.

Validation with a test set

Complete the following steps to specify a fraction of the data to use for training and testing. The Test set validation method is the default method when the number of rows is > 2000. In many cases, 70% of the data is used for training, and 30% of the data is used for testing.

  1. From the drop-down list, select Validation with a test set.
  2. Choose one of the following to specify whether to select a fraction of rows randomly or with an ID column.
    • Randomly select a fraction of rows as a test set: Select this option to have Minitab randomly select a fraction of rows for testing. You can specify the fraction. The default value of 0.3 works well in most cases. For larger data sets, you may want to increase the fraction of data used for testing. You can also set a base for the random number generator.
    • Define training/test split by ID column: Select this option to choose the rows to include in the test sample. In ID column, enter the column that indicates which rows to use for the test sample. The ID column must contain only 2 values. In Level for test set, select which level to use as the test sample.
  3. (Optional) Check Store ID column for training/test split to save the ID column.

None

If None is selected, no additional validation is performed.