Select the options for Discover Best Model (Continuous Response)

Predictive Analytics Module > Automated Machine Learning > Discover Best Model (Continuous Response) > Options

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

In This Topic

Criterion for selecting the best model
Use Huber loss function with switching value K to fit TreeNet® Regression models: K =
Options for TreeNet® Regression models
Options for Random Forests® Regression models
Options for CART® Regression models
Options for MARS® Regression models
Base for random number generator

Select the criteria to determine the best model and specify options for the different model types. You can also specify a base for the random number generator.

Criterion for selecting the best model

Choose the criterion to determine the best type of model. You can compare the results from several methods to determine the best choice for your application.

Maximum R-squared: The default method works well across many applications. This method minimizes the sum of the squared errors.
Minimum mean absolute deviation: This method minimizes the sum of absolute values of errors.

Use Huber loss function with switching value K to fit TreeNet® Regression models: K =

The Huber function is a hybrid of the maximum R-squared and the minimum mean absolute deviation functions. With the Huber function, specify a switching value. The loss function starts as the squared error. The loss function remains the squared error as long as the value is less than the switching value. If the squared error exceeds the switching value, then the loss function becomes the absolute deviation. If the absolute deviation becomes less than the switching value, then the loss function becomes the squared error again.

Options for TreeNet® Regression models

Specify options for the TreeNet^® model.

Number of trees

Enter a value between 1 and 5000 to set the number of trees to build. The default value of 300 provides useful initial results.

If the initial selected model is close to the number of trees that you specify, then consider whether to increase the number of trees to look for a better model.

Maximum terminal nodes per tree and Maximum tree depth

You can also limit the size of the trees. Choose one of the following to limit the size of the trees.

Maximum terminal nodes per tree: Enter a value between 2 and 2000 to represent the maximum number of terminal nodes of a tree. Usually, the default value of 6 provides a good balance between calculation speed and the investigation of interactions among variables. A value of 2 eliminates the investigation of interactions.
Maximum tree depth: Enter a value between 2 and 1000 to represent the maximum depth of a tree. The root node corresponds to a depth of 1. The default depth is 4. In many applications, depths from 4 to 6 give reasonably good models.

Learning rate

Specify up to 10 learning rates.

By default, the analysis evaluates 3 learning rates. The analysis usually tunes the hyperparameters with 3 values of K: 0.001, 0.1, and max(0.01, 0.1 * min(1.0, N/10000)), where N = number of rows in the response column. If max(0.01, 0.1 * min(1.0, N/10000)) = 0.001 or 0.1, then the analysis tunes the hyperparameters with 0.001, 0.01 and 0.1.

Subsample fraction

Specify up to 10 subsample fractions. At each iteration, the procedure selects a different subset that contains this fraction of the data to construct a tree. Subsampling protects from overfitting. Subsample fractions must be greater than 0 and less than or equal to 1. The default values are 0.5 and 0.7.

Number of predictors for node splitting

Specify the number of predictors to consider for each node split. Typically, the analysis works well when you consider all the predictors at every node. However, some data sets have associations among the predictors that lead to improved model performance when the analysis considers a different random subset of predictors at each node. For such cases, the square root of the total number of predictors is a typical starting point. After you use the square root and view the model, you can consider whether to specify a larger or smaller number of predictors with a percentage of the total.

Total number of predictors: Select to use all the predictors for splitting nodes.
Square root of the total number of predictors: Select to use the square root of the total number of predictors for splitting nodes.
K percent of the total number of predictors; K =: Select to use a percentage of predictors for splitting nodes.

Options for Random Forests® Regression models

Specify options for the Random Forests^® model.

Number of bootstrap samples to grow trees

Enter a value to determine the number of bootstrap samples and the number of trees produced by the analysis. Enter a value between 3 and 3000.

Specify a bootstrap sample size less than the training data size

Select to enter a value that sets the bootstrap sample size. You must enter a value greater than or equal to 5. If you enter a size that is greater than the training data size, Minitab uses a sample size equal to the training data size.

Number of predictors for node splitting

Specify the number of predictors to consider for each node split. Typically, the analysis works well when you consider the square root of the total number of predictors. However, some data sets have associations among the predictors that lead to improved model performance when the analysis considers a larger or smaller number of predictors for each node. After you use the square root and view the model, consider whether to change the number of predictors to try to improve the performance of the model.

Total number of predictors: Select to use all the predictors for splitting nodes. The forest created by this option is called a bootstrap forest.
Square root of the total number of predictors: Select to use the square root of the total number of predictors for splitting nodes.
K percent of the total number of predictors; K =: Select to use a percentage of predictors for splitting nodes.

Minimum number of cases to split an internal node

Specify from 1 to 3 minimum numbers. By default, the analysis evaluates 2, 5, and 8. When the number is 2, all nodes can be split into smaller nodes until another split is impossible. If the model performance is inadequate, consider whether to try other values to determine the effect on the performance.

Options for CART® Regression models

Specify options for the CART^® model.

Criterion for selecting optimal tree

Choose between these criteria to produce the tree in the results. You can compare results from different trees to determine the best choice for your application.

Within K standard errors of maximum R-squared; K=: Select this option to have Minitab choose the smallest tree with an R² value that falls within K standard errors of the tree with the maximum R² value. By default, K=1, so the tree in the results is the smallest regression tree with an R² value within 1 standard error of the maximum R² value.
Maximum R-squared: Select this option to display results for the tree with the maximum R-squared value.

Minimum number of cases to split an internal node

Enter the minimum number of cases a node can have and still be split into more nodes. The default is 10. With larger sample sizes, you may want to increase this minimum. For example, if an internal node has 10 or more cases, Minitab tries to perform a split. If the internal node has 9 cases or less, Minitab does not try to perform a split.

The internal node limit is relevant only when the value is at least twice the terminal node limit. Internal node limits of at least 3 times the terminal node limits allow a reasonable number of splitters. Usually, larger limits are reasonable for larger data sets.

Minimum number of cases allowed for a terminal node

Enter the minimum number of cases that can be in a terminal node. The default is 3. With larger sample sizes, you may want to increase this minimum. For example, if a split would create a node with less than 3 cases, Minitab does not perform a split.

Options for MARS® Regression models

Specify options for the MARS^® model.

Maximum number of basis functions

The default value of 30 works well in most cases. Consider a larger value when 30 basis functions seems too small for the data. For example, consider a larger value when you believe that more than 30 predictors are important.

If you are uncertain whether 30 is enough, review the initial results. For example, a larger value is more likely to improve the fit of the model if the R-squared value trends upwards as the analysis adds basis functions.

Minimum number of observations between knots

Allow MARS® to choose: The analysis uses sample size and model complexity to automatically select a value. The automatic value works well in most cases.
User specified: A value of 1 indicates that consecutive data points are eligible to be points where the basis function changes. The value of 1 allows the most rapid changes in the model predictions. Consider different values to see the effect on the fit of the model. For example, for some data larger values create smoother models that are less likely to overfit the training data. Such smoother models are sometimes less accurate over certain ranges of the data.

Allowed predictor interactions

Allow predictor interactions up to order that you specify. An interaction means that the effect of a predictor depends on the value of other predictors. For example, the rate at which grain dries in an oven depends on the time in the oven, but the effect of time depends on the temperature of the oven. The time and temperature variables interact.

Do not allow any interactions (Additive model)

Do not allow predictor interactions. In this case, Minitab uses the additive model where the basis functions do not interact.

Allow all interactions up to order 2

Order specifies the number of different predictors that can be in a basis function. For example, an order of 2 indicates that the effect of a predictor can depend on the value of 1 other predictor. The following basis functions are an example of an interaction of order 2:

BF1 = max(0, X₁ − 800)
BF2 = max(0, X₂ − 50) * BF1

Base for random number generator

You can specify a base for the random number generator to randomly select the subsamples and the subset of predictors. Typically, you do not need to change the base. You can change the base to explore how sensitive the results are to the random selections or to ensure the same random selection for repeated analyses.