Adjusted mean squares measure how much variation a term or a model explains, assuming that all other terms are in the model, regardless of the order they were entered. Unlike the adjusted sums of squares, the adjusted mean squares consider the degrees of freedom.
The adjusted mean square of the error (also called MSE or s^{2}) is the variance around the fitted values.
Minitab uses the adjusted mean squares to calculate the pvalue for a term. Minitab also uses the adjusted mean squares to calculate the adjusted R^{2} statistic. Usually, you interpret the pvalues and the adjusted R^{2} statistic instead of the adjusted mean squares.
Adjusted sums of squares are measures of variation for different components of the model. The order of the predictors in the model does not affect the calculation of the adjusted sum of squares. In the Analysis of Variance table, Minitab separates the sums of squares into different components that describe the variation due to different sources.
Minitab uses the adjusted sums of squares to calculate the pvalue for a term. Minitab also uses the sums of squares to calculate the R^{2} statistic. Usually, you interpret the pvalues and the R^{2} statistic instead of the sums of squares.
A regression coefficient describes the size and direction of the relationship between a predictor and the response variable. Coefficients are the numbers by which the values of the term are multiplied in a regression equation.
The coefficient for a term represents the change in the mean response associated with a change in that term, while the other terms in the model are held constant. The sign of the coefficient indicates the direction of the relationship between the term and the response. The size of the coefficient is usually a good way to assess the practical significance of the effect that a term has on the response variable. However, the size of the coefficient does not indicate whether a term is statistically significant because the calculations for significance also consider the variation in the response data. To determine statistical significance, examine the pvalue for the term.
For example, a manager determines that an employee's score on a job skills test can be predicted using the regression model, y = 130 + 4.3x_{1} + 10.1x_{2}. In the equation, x_{1} is the hours of inhouse training (from 0 to 20). The variable x_{2} is a categorical variable that equals 1 if the employee has a mentor and 0 if the employee does not have a mentor. The response is y and is the test score. The coefficient for the continuous variable of training hours, is 4.3, which indicates that, for every hour of training, the mean test score increases by 4.3 points. The coefficient for the categorical variable of mentoring indicates that employees with mentors have scores that are an average of 10.1 points greater than employees without mentors.
These confidence intervals (CI) are ranges of values that are likely to contain the true value of the coefficient for each term in the model.
Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. However, if you take many random samples, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.
Use the confidence interval to assess the estimate of the population coefficient for each term in the model.
For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the value of the coefficient for the population. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.
Cp (also known as Mallows' Cp) can help you choose between competing multiple regression models. Cp compares the full model to models with the best subsets of predictors. It helps you strike an important balance with the number of predictors in the model. A model with too many predictors can be relatively imprecise while a model with too few predictors can produce biased estimates. Using Cp to compare regression models is only valid when you start with the same complete set of predictors.
A Cp value that is close to the number of predictors plus the constant indicates that the model produces relatively precise and unbiased estimates.
A Cp value that is greater than the number of predictors plus the constant indicates that the model is biased and does not fit the data well.
The total degrees of freedom (DF) are the amount of information in your data. The analysis uses that information to estimate the values of unknown population parameters. The total DF is determined by the number of observations in your sample. The DF for a term show how much information that term uses. Increasing your sample size provides more information about the population, which increases the total DF. Increasing the number of terms in your model uses more information, which decreases the DF available to estimate the variability of the parameter estimates.
If two conditions are met, then Minitab partitions the DF for error. The first condition is that there must be terms you can fit with the data that are not included in the current model. For example, if you have a continuous predictor with 3 or more distinct values, you can estimate a quadratic term for that predictor. If the model does not include the quadratic term, then a term that the data can fit is not included in the model and this condition is met.
The second condition is that the data contain replicates. Replicates are observations where each predictor has the same value. For example, if you have 3 observations where pressure is 5 and temperature is 25, then those 3 observations are replicates.
If the two conditions are met, then the two parts of the DF for error are lackoffit and pure error. The DF for lackoffit allow a test of whether the model form is adequate. The lackoffit test uses the degrees of freedom for lackoffit. The more DF for pure error, the greater the power of the lackoffit test.
Fitted values are also called fits or . The fitted values are point estimates of the mean response for given values of the predictors. The values of the predictors are also called xvalues.
Fitted values are calculated by entering the specific xvalues for each observation in the data set into the model equation.
For example, if the equation is y = 5 + 10x, the fitted value for the xvalue, 2, is 25 (25 = 5 + 10(2)).
Observations with fitted values that are very different from the observed value may be unusual or influential. Observations with unusual predictor value may be influential. If Minitab determines that your data include unusual values, your output includes the table of Fits and Diagnostics for Unusual Observations, which identifies the unusual observations. The observations that Minitab labels do not follow the proposed regression equation well. However, it is expected that you will have some unusual observations. For example, based on the criteria for large standardized residuals, you would expect roughly 5% of your observations to be flagged as having a large standardized residual. For more information on unusual values, go to Unusual observations.
Minitab uses the Fvalue to calculate the pvalue, which you use to make a decision about the statistical significance of the terms and model. The pvalue is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
A sufficiently large Fvalue indicates that the term or model is significant.
If you want to use the Fvalue to determine whether to reject the null hypothesis, compare the Fvalue to your critical value. You can calculate the critical value in Minitab or find the critical value from an Fdistribution table in most statistics books. For more information on using Minitab to calculate the critical value, go to Using the inverse cumulative distribution function (ICDF) and click "Use the ICDF to calculate critical values".
The histogram of the residuals shows the distribution of the residuals for all observations.
Pattern  What the pattern may indicate 

A long tail in one direction  Skewness 
A bar that is far away from the other bars  An outlier 
Because the appearance of a histogram depends on the number of intervals used to group the data, don't use a histogram to assess the normality of the residuals. Instead, use a normal probability plot.
A histogram is most effective when you have approximately 20 or more data points. If the sample is too small, then each bar on the histogram does not contain enough data points to reliably show skewness or outliers.
The normal plot of the residuals displays the residuals versus their expected values when the distribution is normal.
Use the normal probability plot of residuals to verify the assumption that the residuals are normally distributed. The normal probability plot of the residuals should approximately follow a straight line.
If you see a nonnormal pattern, use the other residual plots to check for other problems with the model, such as missing terms or a time order effect. If the residuals do not follow a normal distribution, prediction intervals can be inaccurate. If the residuals do not follow a normal distribution and the data have fewer than 15 observations, then confidence intervals for predictions, confidence intervals for coefficients, and pvalues for coefficients can be inaccurate.
The pvalue is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
The pvalue is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
If the pvalue is larger than the significance level, the test does not detect any lackoffit.
The pvalue is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
If the pvalue is greater than the significance level, you cannot conclude that the model explains variation in the response. You may want to fit a new model.
The pvalue is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
R^{2} is the percentage of variation in the response that is explained by the model. It is calculated as 1 minus the ratio of the error sum of squares (which is the variation that is not explained by model) to the total sum of squares (which is the total variation in the model).
Use R^{2} to determine how well the model fits your data. The higher the R^{2} value, the better the model fits your data. R^{2} is always between 0% and 100%.
R^{2} always increases when you add additional predictors to a model. For example, the best fivepredictor model will always have an R^{2} that is at least as high the best fourpredictor model. Therefore, R^{2} is most useful when you compare models of the same size.
Small samples do not provide a precise estimate of the strength of the relationship between the response and predictors. If you need R^{2} to be more precise, you should use a larger sample (typically, 40 or more).
R^{2} is just one measure of how well the model fits the data. Even when a model has a high R^{2}, you should check the residual plots to verify that the model meets the model assumptions.
Adjusted R^{2} is the percentage of the variation in the response that is explained by the model, adjusted for the number of predictors in the model relative to the number of observations. Adjusted R^{2} is calculated as 1 minus the ratio of the mean square error (MSE) to the mean square total (MS Total).
Use adjusted R^{2} when you want to compare models that have different numbers of predictors. R^{2} always increases when you add a predictor to the model, even when there is no real improvement to the model. The adjusted R^{2} value incorporates the number of predictors in the model to help you choose the correct model.
Step  % Potato  Cooling rate  Cooking temp  R^{2}  Adjusted R^{2}  Pvalue 

1  X  52%  51%  0.000  
2  X  X  63%  62%  0.000  
3  X  X  X  65%  62%  0.000 
The first step yields a statistically significant regression model. The second step adds cooling rate to the model. Adjusted R^{2} increases, which indicates that cooling rate improves the model. The third step, which adds cooking temperature to the model, increases the R^{2} but not the adjusted R^{2}. These results indicate that cooking temperature does not improve the model. Based on these results, you consider removing cooking temperature from the model.
Predicted R^{2} is calculated with a formula that is equivalent to systematically removing each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. The value of predicted R^{2} ranges between 0% and 100%. (While the calculations for predicted R^{2} can produce negative values, Minitab displays zero for these cases.)
Use predicted R^{2} to determine how well your model predicts the response for new observations. Models that have larger predicted R^{2} values have better predictive ability.
A predicted R^{2} that is substantially less than R^{2} may indicate that the model is overfit. An overfit model occurs when you add terms for effects that are not important in the population, although they may appear important in the sample data. The model becomes tailored to the sample data and therefore, may not be useful for making predictions about the population.
Predicted R^{2} can also be more useful than adjusted R^{2} for comparing models because it is calculated with observations that are not included in the model calculation.
For example, an analyst at a financial consulting company develops a model to predict future market conditions. The model looks promising because it has an R^{2} of 87%. However, the predicted R^{2} is only to 52%, which indicates that the model may be overfit.
Use the regression equation to describe the relationship between the response and the terms in the model. The regression equation is an algebraic representation of the regression line. The regression equation for the linear model takes the following form: y = b_{0} + b_{1}x_{1}. In the regression equation, y is the response variable, b_{0} is the constant or intercept, b_{1} is the estimated coefficient for the linear term (also known as the slope of the line), and x_{1} is the value of the term.
The regression equation with more than one term takes the following form:
y = b_{0} + b_{1}x_{1} + b_{2}x_{2} + ... + b_{k}x_{k}
If the model contains both continuous and categorical variables, the regression equation table can display an equation for each level of the categorical variable. To use these equations for prediction, you must choose the correct equation, based on the values of the categorical variables, and then enter the values of the continuous variables.
A residual (e_{i}) is the difference between an observed value (y) and the corresponding fitted value, (), which is the value predicted by the model.
Plot the residuals to determine whether your model is adequate and meets the assumptions of regression. Examining the residuals can provide useful information about how well the model fits the data. In general, the residuals should be randomly distributed with no obvious patterns and no unusual values. If Minitab determines that your data include unusual observations, it identifies those observations in the Fits and Diagnostics for Unusual Observations table in the output. The observations that Minitab labels as unusual do not follow the proposed regression equation well. However, it is expected that you will have some unusual observations. For example, based on the criteria for large residuals, you would expect roughly 5% of your observations to be flagged as having a large residual. For more information on unusual values, go to Unusual observations.
The residuals versus fits graph plots the residuals on the yaxis and the fitted values on the xaxis.
Use the residuals versus fits plot to verify the assumption that the residuals are randomly distributed and have constant variance. Ideally, the points should fall randomly on both sides of 0, with no recognizable patterns in the points.
Pattern  What the pattern may indicate 

Fanning or uneven spreading of residuals across fitted values  Nonconstant variance 
Curvilinear  A missing higherorder term 
A point that is far away from zero  An outlier 
A point that is far away from the other points in the xdirection  An influential point 
If you identify any patterns or outliers in your residual versus fits plot, consider the following solutions:
Issue  Possible solution 

Nonconstant variance  Transform the response variable. You can transform the variable in Minitab Statistical Software. 
An outlier or influential point 

A missing higherorder term  Add the term and refit the model. 
The residual versus order plot displays the residuals in the order that the data were collected.
The residuals versus variables plot displays the residuals versus another variable. The variable could already be included in your model. Or, the variable may not be in the model, but you suspect it affects the response.
If the variable is already included in the model, use the plot to determine whether you should add a higherorder term of the variable. If the variable is not already included in the model, use the plot to determine whether the variable is affecting the response in a systematic way.
Pattern  What the pattern may indicate 

Pattern in residuals  The variable affects the response in a systematic way. If the variable is not in your model, include a term for that variable and refit the model. 
Curvature in the points  A higherorder term of the variable should be included in the model. For example, a curved pattern indicates that you should add a squared term. 
S represents how far the data values fall from the fitted values. S is measured in the units of the response.
Use S to assess how well the model describes the response. S is measured in the units of the response variable and represents the how far the data values fall from the fitted values. The lower the value of S, the better the model describes the response. However, a low S value by itself does not indicate that the model meets the model assumptions. You should check the residual plots to verify the assumptions.
For example, you work for a potato chip company that examines the factors that affect the percentage of crumbled potato chips per container. You reduce the model to the significant predictors, and S is calculated as 1.79. This result indicates that the standard deviation of the data points around the fitted values is 1.79. If you are comparing models, values that are lower than 1.79 indicate a better fit, and higher values indicate a worse fit.
The standard error of the coefficient estimates the variability between coefficient estimates that you would obtain if you took samples from the same population again and again.
Use the standard error of the coefficient to measure the precision of the estimate of the coefficient. The smaller the standard error, the more precise the estimate. Dividing the coefficient by its standard error calculates a tvalue. If the pvalue associated with this tstatistic is less than your significance level (denoted as alpha or α), you conclude that the coefficient is statistically significant.
In this model, North and South measure the position of a focal point in inches. The coefficients for North and South are similar in magnitude. The standard error of the South coefficient is smaller than that of North. Therefore, the model is able to estimate the coefficient for South with greater precision.
The standard error of the North coefficient is nearly as large as the value of the coefficient itself. The resulting pvalue is greater than common levels of the significance level, so you cannot conclude that the coefficient for North differs from 0.
While the coefficient for South is closer to 0 than the coefficient for North, the standard error of the coefficient for South is also smaller. The resulting pvalue is smaller than common significance levels. Because the estimate of the coefficient for South is more precise, you can conclude that the coefficient for South differs from 0.
Statistical significance is one criterion you can use to reduce a model in multiple regression. For more information, go to Model reduction.
The standardized residual equals the value of a residual (e_{i}) divided by an estimate of its standard deviation.
Use the standardized residuals to help you detect outliers. Standardized residuals greater than 2 and less than −2 are usually considered large. The Fits and Diagnostics for Unusual Observations table identifies these observations with an 'R'. The observations that Minitab labels do not follow the proposed regression equation well. However, it is expected that you will have some unusual observations. For example, based on the criteria for large standardized residuals, you would expect roughly 5% of your observations to be flagged as having a large standardized residual. For more information, go to Unusual observations.
Standardized residuals are useful because raw residuals might not be good indicators of outliers. The variance of each raw residual can differ by the xvalues associated with it. This unequal variation causes it to be difficult to assess the magnitudes of the raw residuals. Standardizing the residuals solves this problem by converting the different variances to a common scale.
The tvalue measures the ratio between the coefficient and its standard error.
Minitab uses the tvalue to calculate the pvalue, which you use to test whether the coefficient is significantly different from 0.
You can use the tvalue to determine whether to reject the null hypothesis. However, the pvalue is used more often because the threshold for the rejection of the null hypothesis does not depend on the degrees of freedom. For more information on using the tvalue, go to Using the tvalue to determine whether to reject the null hypothesis.
The variance inflation factor (VIF) indicates how much the variance of a coefficient is inflated due to the correlations among the predictors in the model.
Use the VIF to describe how much multicollinearity (which is correlation between predictors) exists in a regression analysis. Multicollinearity is problematic because it can increase the variance of the regression coefficients, making it difficult to evaluate the individual impact that each of the correlated predictors has on the response.
VIF  Status of predictors 

VIF = 1  Not correlated 
1 < VIF < 5  Moderately correlated 
VIF > 5  Highly correlated 
For more information on multicollinearity and how to mitigate the effects of multicollinearity, see Multicollinearity in regression.