Models from predictive analytics provide insights for a wide range of applications, including manufacturing quality control, drug discovery, fraud detection, credit scoring, and churn prediction. Use the results to identify important variables, to identify groups in the data with desirable characteristics, and to predict response values for new observations. For example, a market researcher can use a predictive analytics model to identify customers that have higher response rates to specific initiatives and to predict those response rates.

In many applications, an important step in model construction is to consider various
types of models. Analysts find the best type for an application at a specific time, find
the optimal version of that model, and use the model to generate the most accurate
predictions possible. To assist in the consideration of various models, Minitab
Statistical Software provides the capability to compare different model types in a
single analysis if you have a continuous response variable or a binary response
variable.

If you have a categorical response variable with more than 2 categories, create models one-by-one.

A multiple regression model assumes that the average response is a parametric function of the predictors. The model uses the least-squares criterion to estimate the parameters for a data set. If a parametric regression model fits the relationship between the response and its predictors, then the model predicts the response values with new observations accurately. For example, Hooke's Law in physics says that the force to extend a spring has a linear relationship with the distance of extension, so a regression model fits the relationship very well.

A multiple regression model simplifies the identification of optimal settings for the predictors. The effective fit also means that the fitted parameters and standard errors are useful for statistical inference, such as the estimation of confidence intervals for the predicted response values.

Multiple regression models are flexible and often fit the true form of relationships
in the data. Even so, sometimes a multiple regression model does not fit a data set
well or characteristics of the data prevent the construction of a multiple
regression model. The following examples are common cases of when a multiple
regression model has a poor fit:

- The relationships between the response and the predictor do not follow a model that a multiple regression model can fit.
- The data do not have enough observations to estimate enough parameters to find a multiple regression model that fits well.
- The predictors are random variables.
- The predictors contain many missing values.

In such cases, tree-based models are good alternative models to consider.

In the Predictive Analytics Module, Minitab Statistical Software fits multiple regression models to continuous and binary response variables with the Discover Best Model commands. For a list of other multiple regression models in Minitab Statistical Software, go to Which regression and correlation analyses are included in Minitab?.

CART^{®}, TreeNet^{®}, and Random Forests^{®} are 3
tree-based methods. Among the tree-based models, CART^{®} is easiest to
understand because CART^{®} uses a single decision tree. A single decision
tree starts from the entire data set as the first parent node. Then, the tree splits
the data into 2 more homogenous child nodes using the node-splitting criterion. This
step repeats iteratively until all unsplit nodes meet a criteria to be a terminal
node. After that, cross-validation or validation with a separate test set is used to
trim the tree to obtain the optimal tree, which is the CART^{®} model.
Single decision trees are easy to understand and can fit data sets with a wide
variety of characteristics.

Single decision trees can be less robust and less powerful than the other 2
tree-based methods. For example, a small change in the predictor values in a data
set can lead to a very different CART^{®} model. The TreeNet^{®} and
Random Forests^{®} methods use sets of individual trees to create models
that are more robust and more accurate than models from single decision trees.

Minitab Statistical Software fits tree-based models to continuous response variables,
binary response variables, and nominal response variables. To see an example of each
model in Minitab Statistical Software, select a model type:

MARS^{®}
Regression first constructs an extensive set of basis functions that fit the data as well as
possible. After forming the extensive model, the analysis reduces the risk of
overfitting by searching for an optimal subset of the basis functions. The reduced
model remains adaptable to various non-linear dependencies in the data. The
resulting model is a multiple linear regression model in the space of these basis
functions. The characteristic of searching for different fits for different regions
of the data in a stepwise fashion connects MARS^{®}
Regression to tree-based models. Because of the tree-based characteristics, MARS^{®}
Regression provides some of the same advantages:^{®}
Regression to multiple regression models. Because of the multiple regression
characteristics, MARS^{®}
Regression also provides some of the advantages of this model type:^{®}
Regression provide accurate predictions and can provide insights into the form of the model
that improve the fit of other types of models. Minitab Statistical Software fits
MARS^{®} Regression models to continuous response variables. To see an
example of MARS^{®}
Regression in Minitab Statistical Software, go to Example of MARS® Regression.

- Automatic detection of the model form
- Automatic handling of missing values
- Automatic selection of the most relevant predictors

- A regression equation makes the effects of the variables easy to understand.
- The continuous function means that small changes in the predictors result in small changes in the predictions.
- Even for small models, different values of the predictors yield different predictions.