Select hyperparameter values to evaluate from the results for TreeNet® Regression

Run Predictive Analytics Module > TreeNet® Regression > Fit Model. Click the Tune Hyperparameters to Identify a Better Model button after the Model Summary table.

Run Predictive Analytics Module > TreeNet® Regression > Discover Key Predictors. Click the Tune Hyperparameters to Identify a Better Model button after the Model Summary table.

Run Predictive Analytics Module > Automated Machine Learning > Discover Best Model (Continuous Response). Click the Select an Alternative Model button after the Model Selection table.

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Overview

The performance of TreeNet® models is generally sensitive to values of the learning rate, the subsample fraction, and the complexity of the individual trees that form the model. In the results for a model, click Tune Hyperparameters to Identify a Better Model to evaluate multiple values of these hyperparameters to learn which combination produces the best values of an accuracy criterion, such as the maximum R2 value. Better values of these hyperparameters have the potential to significantly improve prediction accuracy, so the exploration of different values is a common step in the analysis.

You can also adjust the number of predictors for node splitting and the number of trees that the model includes. Usually, the analysis works well when you consider all the predictors at every node. However, some data sets have associations among the predictors that lead to improved model performance when the analysis considers a different random subset of predictors at each node.

In general, 300 trees is enough to distinguish values of the hyperparameters. Generally, you increase the number of trees when the optimal number of trees for one or more models of interest is close to the maximum number of trees. If the number of trees is closer to the maximum number, an increase in the number of trees is more likely to improve the performance of the model.

Overfitting Protection Parameters

Specify one or more values for each hyperparameter to evaluate. The analysis evaluates the hyperparameters to find the combination with the best value of the accuracy criterion. If you enter no values for a hyperparameter, the evaluation uses the value for that hyperparameter from the model in the results. If the response is binary and the original model specifies the proportion of events and nonevents to sample, the evaluation always uses the proportions from the original model.

Learning rate

Enter up to 10 values. Eligible values are from 0.0001 to 1. Unless you select Evaluate complete parameter combinations, the evaluation of the learning rate is first. If the evaluation happens first, then the evaluation of the learning rate uses the least value of the learning rate and the subsample fraction.

Subsample fraction

Enter up to 10 values. Eligible values are greater than 0 and less than or equal to 1. Unless you select Evaluate complete parameter combinations, the evaluation of the subsample is second. If the evaluation happens second, then the evaluation of the subsample fraction uses the best value the analysis found for the learning rate and the least value of the subsample fraction.

Subsample fraction is disabled when the original model specifies the proportion of events and nonevents to sample for a binary response.

Individual tree complexity parameter

Choose whether to evaluate the Maximum terminal nodes or the Maximum tree depth. Usually, either choice is a reasonable way to identify a useful model and the selection depends only on individual preference. Unless you select Evaluate complete parameter combinations, the evaluation of the complexity parameter is last. If the evaluation happens last, then the evaluation uses the best value that the analysis already found for the learning rate and for the subsample fraction.
Maximum terminal nodes
Enter up to 3 values. Eligible values are between 2 and 2000. Usually, the default value of 6 provides a good balance between calculation speed and the investigation of interactions among variables. A value of 2 eliminates the investigation of interactions.
Maximum tree depth
Enter up to 3 values. Eligible values are between 2 and 1000 to represent the maximum depth of a tree. The root node corresponds to a depth of 1. In many applications, depths from 4 to 6 give reasonably good models

Number of predictors for node splitting

Enter up to 3 values. Eligible values are between 1 and the total number of predictors. Usually, the analysis works well when you consider the total number of predictors. However, some data sets have associations among the predictors that lead to improved model performance when the analysis considers a smaller number of predictors for each node.

Number of trees

Enter a value between 1 and 5000 to set the maximum number of trees to build. The default value of 300 usually provides useful results for the evaluation of the hyperparameter values.

If one or more models of interest have a number of trees that is close to the number of trees that you specify, then consider whether to increase the number of trees. If the number of trees is closer to the maximum number, an increase in the number of trees is more likely to improve the performance of the model.

Evaluate complete parameter combinations

If you specify values for more than one hyperparameter, then the models in the evaluation table depend on whether you evaluate the complete combinations of the hyperparameters.
  • If you select Evaluate complete parameter combinations, then the algorithm evaluates every combination of the hyperparameters. This option generally takes longer to calculate.
  • Otherwise, the algorithm evaluates the hyperparameters in this order:
    1. Learning rate
    2. Subsample fraction
    3. Individual tree complexity parameter
    For example, suppose that the algorithm receives the following hyperparameters:
    • Learning rates: 0.001, 0.01, 0.1
    • Subsample fractions: 0.4, 0.5, 0.7
    • Maximum numbers of terminal nodes: 4, 6
    1. The algorithm sets the subsample proportion to 0.4 and the maximum number of terminal nodes to 4. Then, the algorithm evaluates the learning rates in order from least to greatest: 0.001, 0.01, 0.1.
    2. Suppose the algorithm identifies 0.01 as the best learning rate. Then the algorithm sets the learning rate to 0.01 and the maximum number of terminal nodes to 4. Then, the algorithm evaluates the subsample proportions of 0.4, 0.5, and 0.7.
    3. Suppose that the algorithm identifies 0.5 as the best subsample proportion. Then the algorithm sets the learning rate to 0.01, the subsample proportion to 0.5. Then, the algorithm evaluates the maximum numbers of nodes of 4 and 6.
    4. Suppose that the algorithm identifies 6 as the best maximum number of terminal nodes. Then Minitab produces the evaluation table and the results for the model with learning rate = 0.01, subsample proportion 0.5, and maximum number of terminal nodes 6.

    In this example, the analysis that does not evaluate the complete set of parameter combinations includes 8 models in the evaluation table. An analysis of all the parameter combinations has 3 × 3 × 2 = 18 combinations and takes longer to calculate.

Display Results

After you specify the values to examine, click Display Results. In a new set of results, Minitab produces a table that compares the accuracy criterion for the hyperparameter combinations and the results for the model with the best value of the accuracy criterion.

Minitab recreates the same tables and graphs for the new model as for the original model. The tables and graphs for the new model are in a new set of results. Storage is the same as for the original analysis. The storage columns are in the same worksheet. For example, if the original analysis stored the fitted values in a column titled "Fit," then the new analysis titles an empty column "Fit_1" and stores the fitted values.