What is stepwise regression?

Stepwise regression is an automated tool used in the exploratory stages of model building to identify a useful subset of predictors. The process systematically adds the most significant variable or removes the least significant variable during each step.

For example, a housing market consulting company collects data on home sales for the previous year with the goal of predicting future sales prices. With more than 100 predictor variables, finding a model can be a time-consuming task. Minitab's stepwise regression feature automatically identifies a sequence of models to consider. Statistics such as AICc, BIC, R2, adjusted R2, predicted R2, S, and Mallows' Cp help you to compare models. Minitab displays complete results for the model that is best according to the stepwise procedure that you use.

Common stepwise regression procedures

  • Standard stepwise regression both adds and removes predictors as needed for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value and when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value.
  • Forward information criterion selection starts with an empty model and Minitab adds the term that has the smallest p-value at each step. Minitab stops when the model uses all degrees of freedom or when there are no other terms to add. The model results that Minitab presents are for the model with the minimum value of the information criterion that you select for the procedure. This information criterion is either AICc or BIC. The largest model from the final step does not necessarily have the smallest value of the criterion.
  • Forward selection starts with an empty model and Minitab adds the most significant term for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value.
  • Backward elimination starts with all predictors in the model and Minitab removes the least significant variable for each step. Minitab stops when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value.

Problems with stepwise regression

  • If two predictor variables are highly correlated, only one might end up in the model even though either may be important.
  • Because the procedure fits many models, it could be selecting ones that fit the data well because of chance alone.
  • Stepwise regression might not always stop with the model with the best value of any given criterion for a given set of predictors.
  • Automatic procedures cannot consider special knowledge the analyst might have about the data. Therefore, the model selected might not be the best from a practical point of view.
    By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy