By default, Minitab uses the (1,0) coding scheme for regression, but you can choose to change it to the (-1, 0, +1) coding scheme in the Coding subdialog box. For more information, go to Coding schemes for categorical predictors.
First, consider a balanced, one factor design with three levels for the factor.
C1 | C2 - T |
---|---|
Response | Factor |
1 | A |
3 | A |
2 | A |
2 | A |
4 | B |
6 | B |
3 | B |
5 | B |
8 | C |
9 | C |
7 | C |
10 | C |
Examine the descriptive statistics, concentrating on the means.
The estimated regression equation is:
Level C is the baseline, and thus has a coefficient of 0. In the case of only one factor, the intercept is equal to the mean of the baseline level.
The coefficient corresponding to level A is –6.5. It is the difference that level A is from the baseline level. If you take the coefficient for A and add the intercept (or baseline mean) to it, you get the mean for level A: –6.5 + 8.5 = 2.0
Similarly, the coefficient corresponding to level B is –4.0. It is the difference that level B is from the baseline level. If you take the coefficient for level B and add the intercept, you get the mean for level B: –4.0 + 8.5 = 4.5
The regression equation is:
The intercept is the overall mean.
The coefficient for A is the effect for factor level A. It is the difference between the mean for level A and the overall mean.
The coefficient for B is the effect for factor level B. It is the difference between the mean for level B and the overall mean.
You can obtain the effect size for level C by adding all the coefficients (excluding the intercept) and multiplying by a negative 1: -1 * [(-3.0) + (-0.5)] = 3.5
Now consider a balanced, two factor design with three levels for the first factor and two levels for the second factor.
C1 | C2 - T | C3 - T |
---|---|---|
Response | Factor 1 | Factor 2 |
1 | A | High |
3 | A | Low |
2 | A | High |
2 | A | Low |
4 | B | High |
6 | B | Low |
3 | B | High |
5 | B | Low |
8 | C | High |
9 | C | Low |
7 | C | High |
10 | C | Low |
Examine the descriptive statistics, concentrating on the means.
The estimated regression equation is:
Again, the coefficient corresponding to level A is –6.5. This is still the distance that level A is from the baseline level (Level C). If you take the mean for level A and subtract from it the mean for the baseline level, you get the coefficient: 2 – 8.5 = -6.5.
Similarly, the coefficient corresponding to level B is still –4.0. It is the distance that level B is from the baseline level for factor 1. If you take the mean for level B and subtract from it the mean for the baseline level, you get the coefficient: 4.5 - 8.5 = -4.0.
Finally, the coefficient corresponding to the High level of factor 2 is the distance that “High” is from the baseline level for factor 2 (Low). So, if you take the mean for the High level of factor 2 and subtract from it the mean for the baseline level for factor 2, you get the coefficient: 4.1667 – 5.8333 = -1.667.
Notice that with this coding scheme the coefficients haven’t changed from the one factor model. You now have an additional coefficient for the second factor.
The regression equation is:
When you have only two levels and equal sample sizes, the level effect will be equal in magnitude because the mean is exactly in the middle.
The intercept is the overall mean.
The coefficients are the effect for each factor level. They represent the difference between the mean for that level and the overall mean.