This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

The analysis builds as many trees as you specify, with a small modification to the model from the information in each tree. If the analysis includes a validation method, then the analysis calculates the value of the model selection criterion for the training data and the test data for each number of trees. The optimal value from the test data determines the number of trees in the optimal model.

Optimization criteria, such as the maximum R^{2}, tend to be
optimistic when you calculate them with the same data that you use to fit a
model. Model validation methods leave a portion of the data out of the model
fitting process, then calculate statistics that evaluate the performance of the
model on the omitted data. Model validation techniques provide a better
estimate of how well models perform on new data. Depending on your selection of
the loss function for the analysis, the criterion is the maximum R^{2}
or the least Mean Absolute Deviation (MAD). Minitab offers two validation
methods: k-fold cross-validation and validation with a separate test set.

K-fold cross-validation is the default method in Minitab when the data have 2000 cases or less. Because the process repeats K times, cross-validation is usually slower than validation with test data.

To complete K-fold cross-validation, Minitab Statistical Software follows
the following steps:

- Portion the data into K random subsets of as equal size as possible. The subsets are called folds.
- For fold
*k*,*k = 1, ..., K*, grow the sequence of trees using the remaining*K*–1 folds of data. Calculate the value of the model selection criterion for each tree with the data in the*k*^{th}fold. - Repeat step 2 for all
*K*folds. - Average the values of the
model selection criterion across
*K*folds for each number of trees. The number of trees with the best average value makes the optimal model.

In validation with a test set, a portion of the data is set aside for validation. The remaining data is the training set. First, Minitab grows the sequence of trees with the training set. Then, Minitab calculates the values of the model selection criterion for each number of trees using the test set. The number of trees with the best value makes the optimal model.

Without any validation, Minitab uses the entire data set to fit the model. The final model contains the largest number of trees.