Specify the validation method for Discover Best Model (Continuous Response)

Predictive Analytics Module > Automated Machine Learning > Discover Best Model (Continuous Response) > Validation

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Choose the validation method to determine the best type of model. Usually, with smaller samples, the K-fold cross-validation method is appropriate. With larger samples, you can select a fraction of cases to use for training and testing.

The selections that Minitab presents depend on the size of the data set. The selections combine with selections on the Terms subdialog to provide an analysis that balances rigor and calculation speed:

N < 1,500: The validation method on the Validation subdialog is K-fold cross-validation. The number of folds is 5. The Regression model selection method on the Terms subdialog is Stepwise.
1,500 ≤ N < 2,000: The validation method on the Validation subdialog is K-fold cross-validation. The number of folds is 5. The Regression model selection method on the Terms subdialog is Forward selection with validation.
2,000 ≤ N: The validation method on the Validation subdialog is Validation with a test set. The proportion of data in the test set is 0.3. The Regression model selection method on the Terms subdialog is Forward selection with validation.

K-fold cross-validation

Complete the following steps to use the K-fold cross-validation method to validate the test sample.

From the drop-down list, select K-fold cross-validation.
Specify the number of folds. The default value of 5 works well in most cases. A larger number of folds increases the chance of selecting a more reliable predictive model, especially for data sets with fewer rows. A larger number can significantly increase the calculation time.
(Optional) Select Store ID column for K-fold cross-validation to save the ID column.

Validation with a test set

Complete the following steps to specify a fraction of the data to use for training and testing. In many cases, 70% of the data is used for training, and 30% of the data is used for testing.

From the drop-down list, select Validation with a test set.
Specify the fraction of the data for the test set. The default value of 0.3 works well in most cases. For larger data sets, you may want to increase the fraction of data used for testing. You can also set a base for the random number generator. When you enter the same base in different runs of the analysis, the assignment of rows to the test set is the same.
(Optional) Select Store ID column for training/test split to save the ID column.