Complete the following steps to divide the data into a
training data set and a test data set.
-
From the drop-down list, select
Validation with a test set.
-
Choose one of the following to specify whether to select a fraction of
rows randomly or with an ID column.
- Randomly
select a fraction of rows as a test set:
Select this option to have Minitab randomly select the test data set. You can
specify how much data to use in the test data set. The default value of 0.3
works well in most cases. You want to include enough data in the test data set
to evaluate the model well. If you are unsure about the form of the model, a
larger test data set provides stronger validation. You also want enough data in
the training data set to estimate the model well. Typically, models with more
predictors require more training data to estimate.
- Define
training/test split by ID column:
Select this option to select the rows to include in the test sample yourself.
In
ID
column,
enter the column that indicates which rows to use for the test sample. The ID
column must contain only 2 values. In
Level for
test set,
select which level to use as the test sample.
-
(Optional) Check
Store ID
column for training/test split
to save the ID column.