This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Select the criteria to determine the best model and specify options for the different model types. You can also specify a base for the random number generator and when to assign a prediction to the event class.

Choose the method to generate your optimal model. You can compare the
results from several methods to determine the best choice for your application.

- Maximum loglikelihood: The maximum likelihood method finds the maximum of the likelihood functions for the data.
- Maximum area under ROC curve: The maximum area under ROC curve method works well across many applications. The area under the ROC curve measures how well the model ranks rows from most likely to produce an event to least likely to produce an event.
- Minimum misclassification rate: Select this option to display results for the model that minimizes the misclassification rate. The misclassification rate is based on a simple count of how often the model predicts a case correctly or incorrectly.

Specify options for the TreeNet^{®} model.

- Number of trees
- Enter a value between 1 and 5000 to set the number of trees to build. The default value of 300 provides useful initial results.
- Maximum terminal nodes per tree and Maximum tree depth
- You can also limit the size of the trees. Choose one of the following
to limit the size of the trees.
- Maximum terminal nodes per tree: Enter a value between 2 and 2000 to represent the maximum number of terminal nodes of a tree. Usually, the default value of 6 provides a good balance between calculation speed and the investigation of interactions among variables. A value of 2 eliminates the investigation of interactions.
- Maximum tree depth: Enter a value between 2 and 1000 to represent the maximum depth of a tree. The root node corresponds to a depth of 1. The default depth is 4. In many applications, depths from 4 to 6 give reasonably good models.

- Learning rate
- Specify up to 10 learning rates.
- Subsample fraction
- Specify up to 10 subsample fractions. At each iteration, the procedure selects a different subset that contains this fraction of the data to construct a tree. Subsampling protects from overfitting. Subsample fractions must be greater than 0 and less than or equal to 1. The default values are 0.5 and 0.7.
- Number of predictors for node splitting
- Specify the number of predictors to consider for each node split.
Typically, the analysis works well when you consider all the predictors at
every node. However, some data sets have associations among the predictors that
lead to improved model performance when the analysis considers a different
random subset of predictors at each node. For such cases, the square root of
the total number of predictors is a typical starting point. After you use the
square root and view the model, you can consider whether to specify a larger or
smaller number of predictors with a percentage of the total.
- Total number of predictors: Select to use all the predictors for splitting nodes.
- Square root of the total number of predictors: Select to use the square root of the total number of predictors for splitting nodes.
- K percent of the total number of predictors; K =: Select to use a percentage of predictors for splitting nodes.

Specify options for the Random Forests^{®} model.

- Number of bootstrap samples to grow trees
- Enter a value to determine the number of bootstrap samples and the number of trees produced by the analysis. Enter a value between 3 and 3000.
- Specify a bootstrap sample size less than the training data size
- Select to enter a value that sets the bootstrap sample size. You must enter a value greater than or equal to 5. If you enter a size that is greater than the training data size, Minitab uses a sample size equal to the training data size.
- Number of predictors for node splitting
- Specify the number of predictors to consider for each node split.
Typically, the analysis works well when you consider the square root of the
total number of predictors. However, some data sets have associations among the
predictors that lead to improved model performance when the analysis considers
a larger or smaller number of predictors for each node. After you use the
square root and view the model, consider whether to change the number of
predictors to try to improve the performance of the model.
- Total number of predictors: Select to use all the predictors for splitting nodes. The forest created by this option is called a bootstrap forest.
- Square root of the total number of predictors: Select to use the square root of the total number of predictors for splitting nodes.
- K percent of the total number of predictors; K =: Select to use a percentage of predictors for splitting nodes.

- Minimum number of cases to split an internal node
- Specify from 1 to 3 minimum numbers. By default, the analysis evaluates 2, 5, and 8. When the number is 2, all nodes can be split into smaller nodes until another split is impossible. If the model performance is inadequate, consider whether to try other values to determine the effect on the performance.

Specify options for the CART^{®} model.

- Node splitting method
- Choose the splitting method to generate your decision tree. You can
compare the results from several splitting methods to determine the best choice
for your application.
- Gini: The Gini method is the default method. The Gini method works well across many applications. The Gini method usually generates trees that include small nodes with a high concentration of the response of interest.
- Entropy: The Entropy method is proportional to the maximum of certain likelihood functions for the node.

- Criterion for selecting optimal tree
- Choose between the following criteria to select the tree in the
results. You can compare the results from different trees to determine the best
choice for your application.
- Minimum misclassification cost: Select this option to display results for the tree that minimizes the misclassification cost.
- Within K standard errors of minimum misclassification cost; K=: Select this option to display results for the smallest tree with a misclassification cost within K standard errors of the minimum misclassification cost.

- Minimum number of cases to split an internal node
- Enter the minimum number of cases a node can have and still be split into more nodes. The default is 10. With larger sample sizes, you may want to increase this minimum. For example, if an internal node has 10 or more cases, Minitab tries to perform a split. If the internal node has 9 cases or less, Minitab does not try to perform a split.
- Minimum number of cases allowed for a terminal node
- Enter the minimum number of cases that can be in a terminal node. The default is 3. With larger sample sizes, you may want to increase this minimum. For example, if a split would create a node with less than 3 cases, Minitab does not perform a split.

You can specify a base for the random number generator to randomly select the subsamples and the subset of predictors. Typically, you do not need to change the base. You can change the base to explore how sensitive the results are to the random selections or to ensure the same random selection for repeated analyses.

Specify the threshold to assign a case to the event class. This option affects TreeNet® models and binary logistic regression models.

- Event probability exceeds specified value: Specify the minimum predicted probability to assign a case to the event class. For example, a value of 0.5 means that Minitab assigns a case to the event class when the probability of the event is higher than 0.5.
- Event probability exceeds sample event rate: Specify to use the sample event rate from the training data as the threshold to assign the predicted class for a case. When the sample event rate is greater than 0.50, this option makes events less likely to be classified as the event and more likely to be classified as the non-event. Typically, you consider this option when you want to balance the misclassification rates of the events and nonevents compared to what they would be with a threshold of 0.50.