# Stepwise Regression

## Summary

Provides a method of evaluating multiple process inputs without the use of a designed experiment. Stepwise regression is a highly automated, "black-box" solution that automatically determines which inputs should be included in a predictive model for the output. It also allows you to predict the value of the output Y for any combination of values of the inputs (X's).

• Which process inputs have the largest effects on the process output (which inputs are the key inputs)?
• Do any important interactions exist between process inputs?
• How much of the variation in the process output can be explained by varying the process inputs?
• What is the equation (Y = f(X)) relating the process output to the settings of the inputs?
• What settings of the key inputs result in the optimal process output?
When to Use Purpose
Mid-project In projects wherein you do not test multiple inputs using a designed experiment, you can use stepwise regression to determine which inputs are the key inputs, develop a predictive model using the key inputs, and find the optimal settings of the key inputs.

### Data

Continuous Y, numeric X's (Note: You can convert categorical X's into indicator variables.)

## How-To

1. Verify the measurement systems for the Y data and the process inputs are adequate.
2. Develop a data collection strategy (who should collect the data, as well as where and when; how many data values are needed; the preciseness of the data; how to record the data, and so on).
3. Enter the Y data into a single column. These are your response data.
4. In other columns enter the input (X) data, one column for each X. These are the predictor data.
5. To include squared terms in the model, you must manually create the squared terms by multiplying an X-variable by itself and storing the result in a new column.
6. To include interactions between X-variables in the model, you must manually create them by multiplying the appropriate X-variables and storing the result in a new column. Repeat this for each desired interaction.
7. In Minitab, use Stat > Regression > Stepwise.

## Guidelines

• Take samples across the entire inference space.
• Do not extrapolate; do not use the equation to predict Y values outside the range of sampled X's.
• The residuals must be independent, be reasonably normal, and have reasonably equal variances. To check these assumptions, run your selected model identified by stepwise regression manually using the multiple regression tool.
• When comparing models with different numbers of terms, use the r-squared (adjusted) value for comparison rather than the r-squared value.
• Stepwise regression does not identify outliers.
• The (default) stepwise method determines which single X explains the most variation in Y. Given that X is now included in the model, stepwise regression searches for the next best X to add as a second variable. It repeats this step until it can find no more X's that statistically add value. Then, stepwise regression does a backwards sweep as a check. The value of this method is the handling of data sets with large numbers of inputs with high degrees of multicollinearity. It has two important drawbacks:
• Given two highly correlated X's, stepwise regression may eliminate the X that a practical observer would have kept.
• In some cases, it may result in a slightly suboptimum solution.
• You should generally run best subsets regression as a check-and-balance method to evaluate alternate solutions to ensure a proper practical equation has been selected. For example, two highly correlated X's are Car Weight and Engine Size. Stepwise eliminates one and its algorithm may choose to keep Car Weight. From a logical viewpoint, the analyst recognizes that Engine Size is a better term for understanding and a more universal application within the inference space. Best subsets regression showed that replacing Car Weight with Engine Size had minimal effect on the r-squared value.
• If interactions or squared terms are to be evaluated, you must manually create them in the Minitab worksheet using Calc > Calculator.
• When you convert a categorical variable to indicator variables, you create one indicator for each category. To properly model differences between categories, you should use all but one of these indicator variables. If you use stepwise regression with indicator variables, you should be aware that the stepwise algorithm may not include the right number of indicator variables. If some are left out, you will not have complete information about the categorical variables. In that case, you should manually run the regression using the multiple regression tool.
• If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy