Choose the validation method to test your model. Usually, with smaller
samples, the K-fold cross-validation method is appropriate. With larger
samples, you can select a fraction of cases to use for training and for
testing.
Complete the following
steps to use K-fold cross validation.
From the drop-down list, select
K-fold cross-validation.
Choose one of the following to specify whether to assign folds
randomly or with an ID column.
Randomly
assign rows of each fold:
Select this option to have Minitab randomly select rows for each fold. You can
specify the number of folds. The default value of 10 works well in most cases.
Using a lower value of K may introduce more bias; however larger values of K
may introduce more variability. You can also set a base for the random number
generator.
Assign rows of
each fold by ID column:
Select this option to choose the rows to include in each fold. In
ID
column,
enter the column that identifies the folds. Each row with the same value in the
ID column is in the same fold.
(Optional) Select
Store ID column for K-fold
cross-validation
to save the ID column.
Validation with a test set
Complete the following steps to divide the data into a
training data set and a test data set.
From the drop-down list, select
Validation with a test set.
Choose one of the following to specify whether to select a fraction of
rows randomly or with an ID column.
Randomly
select a fraction of rows as a test set:
Select this option to have Minitab randomly select the test data set. You can
specify how much data to use in the test data set. The default value of 0.3
works well in most cases. You want to include enough data in the test data set
to evaluate the model well. If you are unsure about the form of the model, a
larger test data set provides stronger validation. You also want enough data in
the training data set to estimate the model well. Typically, models with more
predictors require more training data to estimate.
Define
training/test split by ID column:
Select this option to select the rows to include in the test sample yourself.
In
ID
column,
enter the column that indicates which rows to use for the test sample. The ID
column must contain only 2 values. In
Level for
test set,
select which level to use as the test sample.
(Optional) Check
Store ID
column for training/test split
to save the ID column.