What is stepwise regression?

Stepwise regression is an automated tool used in the exploratory stages of model building to identify a useful subset of predictors. The process systematically adds the most significant variable or removes the least significant variable during each step.

For example, a housing market consulting company collects data on home sales for the previous year with the goal of predicting future sales prices. With more than 100 predictor variables, finding the most significant models could be a time consuming task. Minitab's stepwise regression feature automatically outputs the most significant models along with the R2, adjusted R2, predicted R2, S, and Mallows' Cp to provide a good first step.

Common stepwise regression procedures

  • Standard stepwise regression both adds and removes predictors as needed for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value and when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value.
  • Forward selection starts with an empty model and Minitab adds the most significant term for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value.
  • Backward elimination starts with all predictors in the model and Minitab removes the least significant variable for each step. Minitab stops when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value.

Problems with stepwise regression

  • If two predictor variables are highly correlated, only one might end up in the model even though either may be important.
  • Because the procedure fits many models, it could be selecting ones that fit the data well because of chance alone.
  • Stepwise regression might not always stop with the model with the highest R2 value possible for a specified number of predictors.
  • Automatic procedures cannot consider special knowledge the analyst might have about the data. Therefore, the model selected might not be the best from a practical point of view.
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy