Choose the method to generate your optimal model. You can compare the results from several methods to determine the best choice for your application.
Maximum
loglikelihood: The maximum likelihood method finds the maximum of the likelihood functions for the data. This is the default with a binary response.
Maximum area under
ROC curve: The maximum area under ROC curve method works well across many applications. The area under the ROC curve measures how well the model ranks rows from most likely to produce an event to least likely to produce an event. This option is available with a binary response.
Minimum
misclassification rate: Select this option to display results for the model that minimizes the misclassification rate. The misclassification rate is based on a simple count of how often the model predicts a case correctly or incorrectly. This is the default with a multinomial response.
Enter a value between 1 and 5000 to set the number of trees to build. The default value of 300 provides useful initial results.
If the initial selected model is close to the number of trees that you specify, then consider whether to increase the number of trees to look for a better model.
Maximum
terminal nodes per tree and Maximum tree
depth
You can also limit the size of the trees. Choose one of the following to limit the size of the trees.
Maximum
terminal nodes per tree: Enter a value between 2 and 2000 to represent the maximum number of terminal nodes of a tree. Usually, the default value of 6 provides a good balance between calculation speed and the investigation of interactions among variables. A value of 2 eliminates the investigation of interactions.
Maximum tree
depth: Enter a value between 2 and 1000 to represent the maximum depth of a tree. The root node corresponds to a depth of 1. The default depth is 4. In many applications, depths from 4 to 6 give reasonably good models.
Minimum
number of cases allowed for a terminal node
Enter the minimum number of cases for a terminal node. For example, if the minimum size is 3 and a split would create a node with less than 3 cases, then Minitab does not perform a split.
Overfitting
protection
Use the following options to minimize overfitting of the model.
Learning
rate
The learning rate is one of the two extremely important hyperparameters that you can tune to identify an optimal model for your data.
By default, if the number of cases in your training data is 1000 or less, Minitab uses 0.01 as the learning rate. For data sets with more than 1000 cases, the default learning rate is max[0.01, 0.1 * min(1.0, N/10000)]. For example, when the data set has 9000 responses, then the learning rate = 0.09.
If the initial model doesn't predict your data well, consider increasing or decreasing the learning rate by 5 or ten fold to see whether you can get a better model.
Randomize subsample selection
Choose whether to build each tree in the analysis from a subsample from the entire training data set or from subsamples within each response level.
Within entire
data set: Select a random sample from the entire training data set. Usually, the fraction of 0.5 works well. Consider increasing the fraction from the default value of 0.5 to 0.70 or higher if the initial model doesn't fit your data well.
Within each
response level: Take a subsample from the event class cases in the training data and a subsample from the nonevent class cases in the training data. You can use this option to ensure that enough cases of a rare class are in each subsample. If a class is rare enough, you can enter 1 to include all of its cases in every subsample.
Subsample
fraction
Specify the proportion of the learning data to randomly select to build each tree in the analysis. Usually, the fraction of 0.5 works well. Consider increasing the fraction from the default value of 0.5 to 0.70 or higher if the initial model doesn't fit your data well.
Number of
predictors for node splitting
Specify the number of predictors to consider for each node split. Usually, the analysis works well when you consider all the predictors at every node. However, some data sets have associations among the predictors that lead to improved model performance when the analysis considers a different random subset of predictors at each node. For such cases, the square root of the total number of predictors is a typical starting point. After you use the square root and view the model, you can consider whether to specify a larger or smaller number of predictors with a percentage of the total.
Total number
of predictors: Select to use all the predictors for splitting nodes.
Square
root of the total number of predictors: Select to use the square root of the total number of predictors for splitting nodes.
K
percent of the total number of predictors; K =: Select to use a percentage of predictors for splitting nodes.
Base for random number
generator
You can specify a base for the random number generator to randomly select the subsamples and the subset of predictors. Typically, you do not need to change the base. You can change the base to explore how sensitive the results are to the random selections or to ensure the same random selection for repeated analyses.
Weights
Enter a column that contains the case weights. The column must have the same number of rows as the response column. Values must be ≥ 0. Minitab omits rows that contain missing values or zeros from the analysis.