If the p-value is greater than the significance level, you cannot conclude that there is a statistically significant association between the response variable and the term. If you fit a quadratic model or a cubic model and the quadratic or cubic terms are not statistically significant, you may want to select a different model.
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Regression | 2 | 12189.4 | 6094.70 | 106.54 | 0.000 |
Error | 26 | 1487.3 | 57.21 | ||
Total | 28 | 13676.7 |
Source | DF | SS | F | P |
---|---|---|---|---|
Linear | 1 | 11552.8 | 146.86 | 0.000 |
Quadratic | 1 | 636.6 | 11.13 | 0.003 |
In these results, the p-value for the linear term, Density is 0.000 and for the quadratic term, Density2 is 0.003. Both values are less than the significance level of 0.05. These results indicate that the association between stiffness and density is statistically significant.
If the p-value of the term is significant, you can examine the regression equation and the coefficients to understand how the term is related to the response.
Use the regression equation to describe the relationship between the response and the terms in the model. The regression equation is an algebraic representation of the regression line. The regression equation for the linear model takes the following form: Y= b0 + b1x1. In the regression equation, Y is the response variable, b0 is the constant or intercept, b1 is the estimated coefficient for the linear term (also known as the slope of the line), and x1 is the value of the term.
The coefficient of the term represents the change in the mean response for one-unit change in that term. The sign of the coefficient indicates the direction of the relationship between the term and the response. If the coefficient is negative, as the term increases, the mean value of the response decreases. If the coefficient is positive, as the term increases, the mean value of the response increases.
For example, a manager determines that an employee's score on a job skills test can be predicted using the regression model, y = 130 + 4.3x. In the equation, x is the hours of in-house training (from 0 to 20) and y is the test score. The coefficient, or slope, is 4.3, which indicates that, for every hour of training, the mean test score increases by 4.3 points.
For more information on coefficients, go to Regression coefficients.
S | R-sq | R-sq(adj) |
---|---|---|
7.56342 | 89.13% | 88.29% |
The coefficient for the predictor, Density, is –1.517 and for Density2 the coefficient is 0.1622. Thus, with a quadratic relationship, the average stiffness of the particle board increases more rapidly with larger density values than with smaller density values.
To determine how well the model fits your data, examine the goodness-of-fit statistics in the Model Summary table.
R2 is the percentage of variation in the response that is explained by the model. The higher the R2 value, the better the model fits your data. R2 is always between 0% and 100%.
R2 always increases when you add additional predictors to a model. For example, the best five-predictor model will always have an R2 that is at least as high as the best four-predictor model. Therefore, R2 is most useful when you compare models of the same size.
Use adjusted R2 when you want to compare models that have different numbers of predictors. R2 always increases when you add a predictor to the model, even when there is no real improvement to the model. The adjusted R2 value incorporates the number of predictors in the model to help you choose the correct model.
Small samples do not provide a precise estimate of the strength of the relationship between the response and predictors. For example, if you need R2 to be more precise, you should use a larger sample (typically, 40 or more).
Goodness-of-fit statistics are just one measure of how well the model fits the data. Even when a model has a desirable value, you should check the residual plots to verify that the model meets the model assumptions.
S | R-sq | R-sq(adj) |
---|---|---|
7.56342 | 89.13% | 88.29% |
In these results, the density of the particle board explains approximately 89% of the variation in the stiffness of the boards. The R2 value indicates that the model fits the data well.
Use the residual plots to help you determine whether the model is adequate and meets the assumptions of the analysis. If the assumptions are not met, the model may not fit the data well and you should use caution when you interpret the results.
Use the residuals versus fits plot to verify the assumption that the residuals are randomly distributed and have constant variance. Ideally, the points should fall randomly on both sides of 0, with no recognizable patterns in the points.
Pattern | What the pattern may indicate |
---|---|
Fanning or uneven spreading of residuals across fitted values | Nonconstant variance |
Curvilinear | A missing higher-order term |
A point that is far away from zero | An outlier |
A point that is far away from the other points in the x-direction | An influential point |
Use the normal probability plot of the residuals to verify the assumption that the residuals are normally distributed. The normal probability plot of the residuals should approximately follow a straight line.
Pattern | What the pattern may indicate |
---|---|
Not a straight line | Nonnormality |
A point that is far away from the line | An outlier |
Changing slope | An unidentified variable |
For more information on how to handle patterns in the residual plots, go to Residual plots for Fitted Line Plot.