Covariates are usually used in ANOVA and DOE. In these models, a covariate is any continuous variable, which is usually not controlled during data collection. Including covariates the model allows you to include and adjust for input variables that were measured but not randomized or controlled in the experiment. Including a covariate in the model can reduce the error in the model to increase the power of the factor tests.
For example, an engineer wants to study the level of corrosion on four types of iron beams. The engineer exposes each beam to a liquid treatment to accelerate corrosion, but cannot control the temperature of the liquid. Temperature is a covariate that should be considered in the model.
In a DOE, an engineer may be interested in the effect of the covariate ambient temperature on the drying time of two different types of paint.
A textile company uses three different machines to manufacture monofilament fibers. They want to determine whether the breaking strength of the fiber differs based on which machine is used. They collect data on the strength and diameter for 5 randomly selected fibers from each machine. Because fiber strength is related to its diameter, they also record the fiber diameter for use as a possible covariate.
C1 | C2 | C3 |
---|---|---|
Machine | Diameter | Strength |
1 | 20 | 36 |
1 | 25 | 41 |
1 | 24 | 39 |
1 | 25 | 42 |
1 | 32 | 49 |
2 | 22 | 40 |
2 | 28 | 48 |
2 | 22 | 39 |
2 | 30 | 45 |
2 | 28 | 44 |
3 | 21 | 35 |
3 | 23 | 37 |
3 | 26 | 42 |
3 | 21 | 34 |
3 | 15 | 32 |
For the fiber production data, Minitab displays the following results:
The F-statistic for machines is 2.61 and the p-value is 0.118. Because the p-value >0.05, you fail to reject the null hypothesis that the fiber strengths do not differ based on the machine used at the 5% significance level. You can assume the fiber strengths are the same on all the machines. Notice that the F-statistic for diameter (covariate) is 69.97 with a p-value of 0.000. This indicates that the covariate effect is significant. That is, diameter has a statistically significant impact on the fiber strength.
Now, suppose you rerun the analysis and omit the covariate. This will result in the following output:
Notice that the F-statistic is 4.09 with a p-value of 0.044. Without the covariate in the model, you reject the null hypothesis at the 5% significance level and conclude the fiber strengths do differ based on which machine is used.
This conclusion is completely opposite the conclusion you got when you performed the analysis with the covariate. This example shows how the failure to include a covariate can produce misleading analysis results.