Specify the validation method for Fit Regression Model

Stat > Regression > Regression > Fit Regression Model > Validation

Choose the validation method to test your model. Usually, with smaller samples, the K-fold cross-validation method is appropriate. With larger samples, you can select a fraction of cases to use for training and for testing.

K-fold cross-validation

Complete the following steps to use K-fold cross validation.

  1. From the drop-down list, select K-fold cross-validation.
  2. Choose one of the following to specify whether to assign folds randomly or with an ID column.
    • Randomly assign rows of each fold: Select this option to have Minitab randomly select rows for each fold. You can specify the number of folds. The default value of 10 works well in most cases. Using a lower value of K may introduce more bias; however larger values of K may introduce more variability. You can also set a base for the random number generator.
    • Assign rows of each fold by ID column: Select this option to choose the rows to include in each fold. In ID column, enter the column that identifies the folds. Each row with the same value in the ID column is in the same fold.
  3. (Optional) Check Store ID column for K-fold cross-validation to save the ID column.

Validation with a test set

Complete the following steps to divide the data into a training data set and a test data set.

  1. From the drop-down list, select Validation with a test set.
  2. Choose one of the following to specify whether to select a fraction of rows randomly or with an ID column.
    • Randomly select a fraction of rows as a test set: Select this option to have Minitab randomly select the test data set. You can specify how much data to use in the test data set. The default value of 0.3 works well in most cases. You want to include enough data in the test data set to evaluate the model well. If you are unsure about the form of the model, a larger test data set provides stronger validation. You also want enough data in the training data set to estimate the model well. Typically, models with more predictors require more training data to estimate.
    • Define training/test split by ID column: Select this option to select the rows to include in the test sample yourself. In ID column, enter the column that indicates which rows to use for the test sample. The ID column must contain only 2 values. In Level for test set, select which level to use as the test sample.
  3. (Optional) Check Store ID column for training/test split to save the ID column.

None

If None is selected, there is no additional validation performed.