This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.
The loss function that Minitab uses to create the model. Minitab either uses the squared error (default), the absolute deviation, or the Huber loss function.
The absolute deviation loss function attempts to decrease the influence of the points that fit the least well compared to the least squared error loss function. The Huber loss function compromises between the other two loss functions by using the least squared error loss function for smaller absolute residuals and the least absolute deviation loss function for the largest absolute residuals.
Minitab uses the cross-validation method or uses a separate test set to validate the model. With cross-validation, you can specify the rows for each fold, or allow a random selection. With a separate test set, you can specify the rows for both training and test sets or allow a random selection.
Low learning rates weigh each new tree in the model less than higher learning rates and sometimes produce more trees for the model. The model with a low learning rate has less chance of overfitting the training data set.
The default learning rate = max[0.01, 0.1 * min(1.0, N/10000). If you use a low learning rate, you might want to increase the maximum number of trees in the model so that the optimal number of trees is less than the maximum number of trees.
The subsample fraction shows the fraction of the data that the analysis uses to build each tree. Adjust this parameter if overfitting is a concern.
Indicates the minimum number of cases for a terminal node. For example, if the minimum size is 3 and a split would create a node with less than 3 cases, then Minitab does not perform a split.
This row indicates whether the node splitting considers every predictor at each node or a random subset of the predictors. If the node splitting uses a random subset, this row indicates the choice for the number of predictors to consider.
If you use all the predictors initially, consider whether to use a subset of predictors in subsequent models to compare the performance of the models.
By default, the analysis does not have a missing value penalty and this row is not present. The missing value penalty penalizes a predictor variable for the proportion of missing values. A variable with a high penalty is less likely to become the splitter for a node.
By default, the analysis does not have a high level category penalty and this row is not present. The high-level category penalty penalizes a variable based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels is less likely to become the splitter for a node.
Indicates the column that is used to weight the response.
The number of response observations that are in the analysis that fits and evaluates the model.
The number of missing response observations. This also includes missing values or zeros in the weight column.