This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.
Minitab either uses the maximum loglikelihood (default), maximum area under the ROC curve, or the minimum misclassification rate to select the optimal number of trees.
Minitab uses the cross-validation method or uses a separate test set to validate the model. With cross-validation, you can specify the rows for each fold, or allow a random selection. With a separate test set, you can specify the rows for both training and test sets or allow a random selection.
Low learning rates weigh each new tree in the model less than higher learning rates and sometimes produce more trees for the model. The model with a low learning rate has less chance of overfitting the training data set.
The default learning rate = max[0.01, 0.1 * min(1.0, N/10000). If you use a low learning rate, you might want to increase the maximum number of trees in the model so that the optimal number of trees is less than the maximum number of trees.
The subsample selection method shows the fraction of the data that the analysis uses to build each tree. Adjust this parameter if overfitting is a concern. If the analysis specifies a separate fraction for each class in a binary response variable, then the method shows both values. The option to specify the fraction for each response level ensures that the trees contain a minimal amount of each response value when one of the values is rare.
Indicates the minimum number of cases for a terminal node. For example, if the minimum size is 3 and a split would create a node with less than 3 cases, then Minitab does not perform a split.
This row indicates whether the node splitting considers every predictor at each node or a random subset of the predictors. If the node splitting uses a random subset, this row indicates the choice for the number of predictors to consider.
If you use all the predictors initially, consider whether to use a subset of predictors in subsequent models to compare the performance of the models.
By default, the analysis does not have a missing value penalty and this row is not present. The missing value penalty penalizes a predictor variable for the proportion of missing values. A variable with a high penalty is less likely to become the splitter for a node.
By default, the analysis does not have a high level category penalty and this row is not present. The high-level category penalty penalizes a variable based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels is less likely to become the splitter for a node.
Indicates the column that is used to weight the response.
The number of response observations that are in the analysis that fits and evaluates the model.
The number of missing response observations. This also includes missing values or zeros in the weight column.