Specify the validation method for Discover Best Model (Binary Response)

Predictive Analytics Module > Automated Machine Learning > Discover Best Model (Binary Response) > Validation
Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Choose the validation method to determine the best type of model. Usually, with smaller samples, the K-fold cross-validation method is appropriate. With larger samples, you can select a fraction of cases to use for training and testing.

The selections that Minitab presents depend on the size of the data set. The selections combine with selections on the Terms subdialog to provide an analysis that balances rigor and calculation speed:
N < 1,000
The validation method on the Validation subdialog is K-fold cross-validation. The number of folds is 5. The Logistic regression model selection method on the Terms subdialog is Stepwise.
1,000 ≤ N < 1,500
The validation method on the Validation subdialog is K-fold cross-validation. The number of folds is 3. The Logistic regression model selection method on the Terms subdialog is Stepwise.
1,500 ≤ N
The validation method on the Validation subdialog is Validation with a test set. The proportion of data in the test set is 0.3. The Logistic regression model selection method on the Terms subdialog is Forward selection with validation, which uses the test set.

K-fold cross-validation

Complete the following steps to use the K-fold cross-validation method to validate the test sample.

  1. From the drop-down list, select K-fold cross-validation.
  2. Specify the number of folds. The default value works well in most cases. A larger number of folds increases the chance of selecting a more reliable predictive model, especially for data sets with fewer rows. A larger number can significantly increase the calculation time.
  3. (Optional) Select Store ID column for K-fold cross-validation to save the ID column.

Validation with a test set

Complete the following steps to specify a fraction of the data to use for training and testing. In many cases, 70% of the data is used for training, and 30% of the data is used for testing.

  1. From the drop-down list, select Validation with a test set.
  2. Specify the fraction of the data for the test set. The default value of 0.3 works well in most cases. For larger data sets, you may want to increase the fraction of data used for testing. You can also set a base for the random number generator. When you enter the same base in different runs of the analysis, the assignment of rows to the test set is the same.
  3. (Optional) Select Store ID column for training/test split to save the ID column.