Method table for Fit Model and Discover Key Predictors with TreeNet^® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

In This Topic

Loss function
Model validation
Learning rate
Subsample fraction
Maximum terminal nodes per tree or Maximum tree depth
Minimum terminal node size

Number of predictors selected for node splitting
Missing value penalty
High level category penalty
Weights
Rows used
Rows unused

Loss function

The loss function that Minitab uses to create the model. Minitab either uses the squared error (default), the absolute deviation, or the Huber loss function.

The absolute deviation loss function attempts to decrease the influence of the points that fit the least well compared to the least squared error loss function. The Huber loss function compromises between the other two loss functions by using the least squared error loss function for smaller absolute residuals and the least absolute deviation loss function for the largest absolute residuals.

Model validation

Minitab uses the cross-validation method or uses a separate test set to validate the model. With cross-validation, you can specify the rows for each fold, or allow a random selection. With a separate test set, you can specify the rows for both training and test sets or allow a random selection.

Learning rate

Low learning rates weigh each new tree in the model less than higher learning rates and sometimes produce more trees for the model. The model with a low learning rate has less chance of overfitting the training data set.

The default learning rate = max[0.01, 0.1 * min(1.0, N/10000). If you use a low learning rate, you might want to increase the maximum number of trees in the model so that the optimal number of trees is less than the maximum number of trees.

Subsample fraction

The subsample fraction shows the fraction of the data that the analysis uses to build each tree. Adjust this parameter if overfitting is a concern.

Maximum terminal nodes per tree or Maximum tree depth

TreeNet^® Regression combines many small CART® trees into a powerful model. You can specify either the maximum number of terminal nodes or the maximum tree depth for these smaller CART® trees.

Maximum terminal nodes per tree: The default maximum number of terminal nodes is 6. While a larger maximum number of terminal nodes per tree can improve the ability to detect interactions, values above 12 could slow the analysis without much benefit to the model.
Maximum tree depth: The default maximum tree depth is 4. If the initial fitted model doesn't perform well, consider a larger maximum tree depth, such as 5 or 6, to see whether a larger maximum tree depth improves the model.

Minimum terminal node size

Indicates the minimum number of cases for a terminal node. For example, if the minimum size is 3 and a split would create a node with less than 3 cases, then Minitab does not perform a split.

Number of predictors selected for node splitting

This row indicates whether the node splitting considers every predictor at each node or a random subset of the predictors. If the node splitting uses a random subset, this row indicates the choice for the number of predictors to consider.

If you use all the predictors initially, consider whether to use a subset of predictors in subsequent models to compare the performance of the models.

Missing value penalty

By default, the analysis does not have a missing value penalty and this row is not present. The missing value penalty penalizes a predictor variable for the proportion of missing values. A variable with a high penalty is less likely to become the splitter for a node.

High level category penalty

By default, the analysis does not have a high level category penalty and this row is not present. The high-level category penalty penalizes a variable based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels is less likely to become the splitter for a node.

Weights

Indicates the column that is used to weight the response.

Rows used

The number of response observations that are in the analysis that fits and evaluates the model.

Rows unused

The number of missing response observations. This also includes missing values or zeros in the weight column.

Method table for Fit Model and Discover Key Predictors with TreeNet® Regression