A regression analysis generates an equation to describe the statistical relationship between one or more predictors and the response variable and to predict new observations. Linear regression usually uses the ordinary least squares estimation method which derives the equation by minimizing the sum of the squared residuals.
For example, you work for a potato chip company that is analyzing factors that affect the percentage of crumbled potato chips per container before shipping (response variable). You are conducting the regression analysis and include the percentage of potato relative to other ingredients and the cooking temperature (Celsius) as your two predictors. The following is a table of the results.
Models with one predictor are referred to as simple regression. Models with more than one predictor are known as multiple linear regression.
Simple linear regression examines the linear relationship between two continuous variables: one response (y) and one predictor (x). When the two variables are related, it is possible to predict a response value from a predictor value with better than chance accuracy.
Multiple linear regression examines the linear relationships between one continuous response and two or more predictors.
If the number of predictors is large, then before fitting a regression model with all the predictors, you should use stepwise or best subsets model-selection techniques to screen out predictors not associated with the responses.
In ordinary least squares (OLS) regression, the estimated equation is calculated by determining the equation that minimizes the sum of the squared distances between the sample's data points and the values predicted by the equation.
Because OLS regression will provide the best estimates only when all the assumptions are met, it is very important to test them. Common approaches include examining residual plots, using lack-of-fit tests, and viewing the correlation between predictors using the Variance Inflation Factor (VIF).