# Multiple Regression

## Summary

Provides a method of evaluating multiple process inputs without the use of a designed experiment. Regression is a mathematical method for establishing the best fit relationship between a process output Y and multiple process inputs (X's, also called predictors). Multiple regression enables you to predict the output Y for any combination of input values (X's).

• Which process inputs have the largest effects on the process output (which inputs are the key inputs)?
• Do any important interactions exist between process inputs?
• How much variation in the process output can be explained by varying the process inputs?
• What is the equation (Y = f(X)) relating the process output to the settings of the inputs?
• What settings of the key inputs result in the optimal process output?
When to Use Purpose
Mid-project In projects where testing of multiple inputs is not done using a designed experiment, use multiple regression to determine which inputs are the key inputs, develop a predictive model using the key inputs, and find the optimal settings of the key inputs.

### Data

Continuous Y, Numeric X's (Note: Categorical X's can be converted into indicator variables.)

## How-To

1. Verify the measurement systems for the Y data and the process inputs are adequate.
2. Develop a data collection strategy (who should collect the data, as well as where and when; how many data values are needed; the preciseness of the data; how to record the data, and so on).
3. Enter the Y data into a single column. These are the response data.
4. In other columns, enter the input (X) data, one column for each X. These are the predictor data.
5. To include squared terms in the model, you must manually create the squared terms by multiplying an X-variable by itself and storing the result in a new column.
6. To include interactions between X-variables in the model, you must manually create them by multiplying the appropriate X-variables and storing the result in a new column. Repeat this step for each desired interaction.
7. In Minitab, use Stat > Regression > Regression.
8. Reduce the model using p-values, variance inflation factor (VIF) values, and graphical analysis.

## Guidelines

• Samples should be taken across the entire inference space.
• Do not extrapolate (use the equation to predict Y values outside the range of sampled X's). Check for possible outliers in the unusual observations table (Session window output).
• The residuals must be independent, reasonably normal, and have reasonably equal variances. Multiple regression is quite robust to nonnormality. For multiple regression, the residuals are usually analyzed by a histogram, normal probability plot, residuals versus fits, and residuals versus order. You can display these graphs at one time using the Four in one option.
• When comparing models with different numbers of terms, use the r-squared (adj) for comparison, not the r-squared.
• It is generally good practice to look at all pairwise relationships of the X's for possible multicolinearity with scatterplots, a matrix plot, or fitted line plots.
• Manually reducing to a final multiple regression model can be complex and, in the case of many inputs with high degrees of multicolinearity, can easily result in analysis error. In these cases, you may want to use stepwise or best subsets regression analysis.
• To evaluate the interactions or squared terms, they must be manually created in the Minitab worksheet using Calc > Calculator.
• When you convert a categorical variable to indicator variables, you create one indicator for each category. To properly model differences between categories, you should use all but one of these indicator variables.
• You can use one of three common methods to evaluate multiple regression results:
• Manually analyze multiple regression using a combination of statistical measures (p-values to test statistical significance and variance inflation factor (VIF) values to check for multicollinearity) and graphical analysis of the correlation between the variables. Manually reducing to a final multiple regression model provides more understanding of the model and the relationships of the various X's.
• Best subsets (separate tool – highly automated).
• Stepwise (separate tool – highly automated).
• If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy