Using stepwise regression and best subsets regression

What is stepwise regression?

Stepwise regression is an automated tool used in the exploratory stages of model building to identify a useful subset of predictors. The process systematically adds the most significant variable or removes the least significant variable during each step.

For example, a housing market consulting company collects data on home sales for the previous year with the goal of predicting future sales prices. With more than 100 predictor variables, finding a model can be a time-consuming task. Minitab's stepwise regression feature automatically identifies a sequence of models to consider. Statistics such as AICc, BIC, test R2, R2, adjusted R2, predicted R2, S, and Mallows' Cp help you to compare models. Minitab displays complete results for the model that is best according to the stepwise procedure that you use.

The following analyses in Minitab can automatically perform stepwise selection so that you can evaluate model summary statistics for many potential models in one set of output.
  • Stat > Regression > Regression > Fit Regression Model
  • Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model
  • Stat > Regression > Poisson Regression > Fit Poisson Model
  • Stat > ANOVA > General Linear Model > Fit General Linear Model
  • Stat > DOE > Screening > Analyze Screening Design
  • Stat > DOE > Screening > Analyze Binary Response
  • Stat > DOE > Factorial > Analyze Factorial Design
  • Stat > DOE > Factorial > Analyze Binary Response
  • Stat > DOE > Response Surface > Analyze Response Surface Design
  • Stat > DOE > Response Surface > Analyze Binary Response

Problems with stepwise regression

Exercise caution when using variable selection procedures such as best subsets and stepwise regression. One problem is that these procedures cannot consider special knowledge the analyst might have about the data. The procedure cannot consider the practical importance of any of the predictors.

A related problem to the procedure's inability to consider special knowledge is that when two predictors are highly correlated, the procedure can select only one of the two predictors even though either can be important. For example, the procedure can remove a predictor that is cheap and easy to measure in favor of a correlated predictor that is difficult and expensive to measure. The analyst would have to use their knowledge of the data to make judgements about criteria that the procedure cannot consider.

Another problem with stepwise procedures is that the different models can optimize different criteria. For example, the model with the highest adjusted R2 value will not necessarily be the model with the highest test R2 value. The analyst has to consider the different criteria to select a final model.

Also, when you fit a model to data, the goodness of the fit comes from two basic sources:
  • The underlying structure of the data (a structure that will apply to other data sets collected in the same way).
  • The peculiarities of the data set that you analyze.

To ensure that your model doesn't just fit one specific data set, you should verify the model found by the selection procedure on a new set of data. You can also take the original data set, randomly divide it into two parts, use one part to select a model, and then verify the fit on the second part. This procedure helps ensure that the model you select will apply to other data sets. Go to the section on stepwise procedures with automatic validation to learn about commands that can partition your data automatically and calculate validation statistics.

Stepwise procedures

All of the analyses that include automatic stepwise procedures in Minitab include the following procedures. The following methods let you quickly evaluate a high number of different models in terms of their model summary statistics for the data that you use to build the model.

  • Standard stepwise regression adds or removes a predictor for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value and when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value.
  • The forward information criteria procedure adds the term with the lowest p-value to the model at each step. Additional terms can enter the model in 1 step if the settings for the analysis allow consideration of non-hierarchical terms but require each model to be hierarchical. Minitab calculates the information criteria for each step. In most cases, the procedure continues until one of the following conditions occurs:
    • The procedure does not find a new minimum of the criterion for 8 consecutive steps.
    • The procedure fits the full model.
    • The procedure fits a model that leaves 1 degree of freedom for error.
    If you specify settings for the procedure that require a hierarchical model at each step and allow only one term to enter at a time, then the procedure continues until it either fits the full model or fits a model that leaves 1 degree of freedom for error. Minitab displays the results of the analysis for the model with the minimum value of the selected information criterion, either AICc or BIC.
  • Forward selection starts with an empty model or a model with terms that you specify. Then, Minitab adds the most significant term for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value.
  • Backward elimination starts with all predictors in the model and Minitab removes the least significant variable for each step. Minitab stops when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value.

Stepwise regression procedures with automatic validation

For the following commands, the analysis in Minitab can include an automatic validation technique as well as a stepwise procedure. Automatic validation saves time for an analyst who would do model validation for themselves after a stepwise procedure. The following commands can divide your data into a training data set and a test data set during the stepwise procedure:

The stepwise procedure that Minitab can automatically perform with a test data set is called forward selection with validation with a test data set. In this procedure, the initial model is empty or includes model terms that you specifically select. Then, Minitab adds the next potential term with the smallest p-value at each step. Minitab calculates the test R2 for the model at each step as the R2 value for the model on the test data set. The model results that Minitab presents are for the model with the maximum value of the test R2 value.

For Fit Regression Model, you can choose a second validation technique to perform with stepwise selection called forward selection with k-fold cross-validation. In k-fold cross-validation, Minitab divides the dataset into k subsets. These subsets are called folds. Most often, validation uses 10 folds, but other numbers are possible. The folds have as close to equal numbers of observations as possible. Minitab performs forward selection k times. For each forward selection, k–1 folds are the training data set and the last fold is the test data set. As in other forward selection procedures, the initial model is empty or includes model terms that you specifically select. Then, Minitab adds the next potential term with the smallest p-value at each step. For each step, Minitab calculates the k-fold stepwise R2 value by combining the information from the different stepwise selection procedures.

Hierarchy

A hierarchical model is a model where for each term in the model, all lower order terms contained in it must also be in the model. For example, suppose there is a model with four factors: A, B, C, and D. If the term A * B * C is in the model then the terms A, B, C, A*B, A*C, and B*C must also be in the model, though any terms with D do not have to be in the model.

The terms that enter or leave a model at a step depend on the specifications for hierarchy. By default, Minitab Statistical Software requires a hierarchical model at each step, requires hierarchy for all terms, and allows only one term to enter the model at each step. These settings limit the terms that Minitab considers at each step. For example, a two-way interaction cannot enter the model unless both of the lower-order terms in the interaction are already in the model. You can adjust these settings by clicking Hierarchy when you select a stepwise method.

What is best subsets regression?

Best subsets regression is an automated tool used in the exploratory stages of model building to identify a useful subset of predictors. The procedure displays model summary results for the number of models that you request for each size: models with one predictor, models with two predictors, and so on. The models that display have the highest values of R2 among the possible models of that size. To use best subsets regression in Minitab, choose Stat > Regression > Regression > Best Subsets.

As an automatic selection procedure, best subsets regression shares many problems with stepwise regression. The procedure cannot use specialized knowledge that an analyst has, nor is there any guarantee that different criteria identify the same model. Correlations among the predictors can make the identification of the best models more difficult. Validation of the model with new data increases the confidence you can have in the performance of the model.

Comparison of best subsets regression and stepwise regression

Best subsets is an analysis in Minitab Statistical Software. Stepwise regression is an option in several analyses. Both of these automated model selection techniques provide information about the fit of several different models. From the different models, you can identify any models that deserve further exploration.

The differences between the techniques in Minitab can help you to decide whether to use one technique over the other or to use both techniques. The following are some general points to consider:
Characteristic Best Subsets Regression Stepwise regression
Models considered All possible models for the predictors. A sequence of models chosen by the statistical significance of the terms.
Number of predictors to consider Up to 31 free predictors, plus any predictors that you require in every model. No set limit.
Types of predictors Numeric columns in the worksheet. Text or numeric columns plus interaction terms and other higher-order terms.
Types of response variables One numeric column. Different analyses in Minitab can analyze different types of response variables. For stepwise regression, you can choose an analysis for a continuous response variable, a binary response variable, or a Poisson response variable.
Results The results include model summary statistics that explore the fit of the data. To view full regression results, such as residual plots, explore your chosen model in an analysis like Fit Regression Model. The analysis displays full regression results for the optimal model according to a criterion that you select. You can also choose to look at model summary statistics for each step in the procedure.