A regression analysis generates an equation to describe the statistical relationship between one or more predictors and the response variable and to predict new observations. Linear regression usually uses the ordinary least squares estimation method which derives the equation by minimizing the sum of the squared residuals.

For example, you work for a potato chip company that is analyzing factors that affect the percentage of crumbled potato chips per container before shipping (response variable). You are conducting the regression analysis and include the percentage of potato relative to other ingredients and the cooking temperature (Celsius) as your two predictors. The following is a table of the results.

Regression Equation
Broken Chips = 4.251 - 0.909 Potato Percentage + 0.02231 Cooking temperature

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 4.251 0.659 6.45 0.000
Potato Percentage -0.909 0.331 -2.74 0.011 1.03
Cooking temperature 0.02231 0.00332 6.71 0.000 1.03

Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.115034 66.41% 63.61% 57.96%

The regression results show you that both predictors are significant because of their low p-values. Together, the two predictors explain 66.41% of the variance of broken potato chips. Specifically:

- For each 1 degree Celsius increase in cooking temperature, the percentage of broken chips is expected to increase by 0.022%.
- To predict the percentage of broken chips for settings of 0.5 (50%) potato and a cooking temperature of 175 °C, you calculate an expected value of 7.7% broken potato chips: 4.251 - 0.909 * 0.5 + 0.2231 * 175 = 7.70075.

Regression results identify the direction, size, and statistical significance of the relationship between a predictor and response.

- The sign of each coefficient indicates the direction of the relationship.
- Coefficients represent the mean change in the response for one unit of change in the predictor while holding other predictors in the model constant.
- The P-value for each coefficient tests the null hypothesis that the coefficient is equal to zero (no effect). Therefore, low p-values indicate the predictor is a meaningful addition to your model.
- The equation predicts new observations given specified predictor values.

Models with one predictor are referred to as simple regression. Models with more than one predictor are known as multiple linear regression.

Simple linear regression examines the linear relationship between two continuous variables: one response (y) and one predictor (x). When the two variables are related, it is possible to predict a response value from a predictor value with better than chance accuracy.

Regression provides the line that "best" fits the data. This line can then be used to:

- Examine how the response variable changes as the predictor variable changes.
- Predict the value of a response variable (y) for any predictor variable (x).

Multiple linear regression examines the linear relationships between one continuous response and two or more predictors.

If the number of predictors is large, then before fitting a regression model with all the predictors, you should use stepwise or best subsets model-selection techniques to screen out predictors not associated with the responses.

In ordinary least squares (OLS) regression, the estimated equation is calculated by determining the equation that minimizes the sum of the squared distances between the sample's data points and the values predicted by the equation.

OLS regression provides the most precise, unbiased estimates only when the following assumptions are met:

- The regression model is linear in the coefficients. Least squares can model curvature by transforming the variables (instead of the coefficients). You must specify the correct functional form in order to model any curvature.
- Residuals have a mean of zero. Inclusion of a constant in the model will force the mean to equal zero.
- All predictors are uncorrelated with the residuals.
- Residuals are not correlated with each other (serial correlation).
- Residuals have a constant variance.
- No predictor variable is perfectly correlated (r=1) with a different predictor variable. It is best to avoid imperfectly high correlations (multicollinearity) as well.
- Residuals are normally distributed.

Because OLS regression will provide the best estimates only when all the assumptions are met, it is very important to test them. Common approaches include examining residual plots, using lack of fit tests, and viewing the correlation between predictors using the Variance Inflation Factor (VIF).