When you perform a regression analysis with categorical predictors, Minitab uses a coding scheme to make indicator variables out of the categorical predictor. When models get more complicated, interpretations are similar. However, if you add a covariate or have unequal sample sizes within each group, coefficients are based on weighted means for each factor level instead of the arithmetic mean (sum of the observations divided by n). The interpretation is usually the same, however:

- Using 1, 0 coding, coefficients represent the distance between factor levels and their baseline level.
- Using 1, 0, -1 coding coefficients represent the distance between factor levels and the overall mean.

By default, Minitab uses the (1,0) coding scheme for regression, but you can choose to change it to the (-1, 0, +1) coding scheme in the Coding subdialog box. For more information, go to Coding schemes for categorical predictors.

First, consider a balanced, one factor design with three levels for the factor.

C1 | C2 - T |
---|---|

Response | Factor |

1 | A |

3 | A |

2 | A |

2 | A |

4 | B |

6 | B |

3 | B |

5 | B |

8 | C |

9 | C |

7 | C |

10 | C |

Examine the descriptive statistics, concentrating on the means.

Statistics
Total
Variable Count Mean
Response 12 5.000

Statistics
Total
Variable Factor Count Mean
Response A 4 2.000
B 4 4.500
C 4 8.500

To get the output do the following:

- Choose .
- In Responses, enter Response.
- In Categorical predictors, enter Factor.
- Click Coding. Under Reference level, choose C.
- Click OK in each dialog.

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 8.500 0.577 14.72 0.000
Factor
A -6.500 0.816 -7.96 0.000 1.33
B -4.000 0.816 -4.90 0.001 1.33

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 86.00 43.000 32.25 0.000
Factor 2 86.00 43.000 32.25 0.000
Error 9 12.00 1.333
Total 11 98.00

Remember that the factor level means are:

- A = 2.0
- B = 4.5
- C = 8.5

The estimated regression equation is:

Regression Equation
Response = 8.500 - 6.500 Factor_A - 4.000 Factor_B + 0.0 Factor_C

Level C is the baseline, and thus has a coefficient of 0. In the case of only one factor, the intercept is equal to the mean of the baseline level.

The coefficient corresponding to level A is –6.5. It is the difference that level A is from the baseline level. If you take the coefficient for A and add the intercept (or baseline mean) to it, you get the mean for level A: –6.5 + 8.5 = 2.0

Similarly, the coefficient corresponding to level B is –4.0. It is the difference that level B is from the baseline level. If you take the coefficient for level B and add the intercept, you get the mean for level B: –4.0 + 8.5 = 4.5

To get the following output:

- Choose .
- In Responses, enter Response.
- In Categorical predictors, enter Factor.
- Click Coding. Under Coding for categorical predictors, choose (-1, 0, +1).
- Click OK in each dialog.

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 86.00 43.000 32.25 0.000
Factor 2 86.00 43.000 32.25 0.000
Error 9 12.00 1.333
Total 11 98.00

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 5.000 0.333 15.00 0.000
Factor
A -3.000 0.471 -6.36 0.000 1.33
B -0.500 0.471 -1.06 0.316 1.33

Remember the overall mean and the factor level means:

- Overall Mean = 5.0
- A = 2.0
- B = 4.5
- C = 8.5

The regression equation is:

Regression Equation
Response = 5.000 - 3.000 Factor_A - 0.500 Factor_B + 3.500 Factor_C

The effect for any specific factor level is the Level Mean – Overall Mean. Thus,

- Level A effect = 2.0 - 5.0 = -3.0
- Level B effect = 4.5 - 5.0 = -0.5
- Level C effect = 8.5 - 5.0 = 3.5

The intercept is the overall mean.

The coefficient for A is the effect for factor level A. It is the difference between the mean for level A and the overall mean.

The coefficient for B is the effect for factor level B. It is the difference between the mean for level B and the overall mean.

You can obtain the effect size for level C by adding all the coefficients (excluding the intercept) and multiplying by a negative 1: -1 * [(-3.0) + (-0.5)] = 3.5

You can get the level means by taking the effect size and adding the overall mean:

- Mean for Level A = coefficient for A + Intercept = -3.0 + 5.0 = 2.0
- Mean for Level B = coefficient for B + Intercept = -0.5 + 5.0 = 4.5
- Mean for Level C = Intercept - coefficient for A - coefficient for B = 5.0 – (- 3.0) – (-0.5) = 5.0 + 3.0 + 0.5 = 8.5

Now consider a balanced, two factor design with three levels for the first factor and two levels for the second factor.

C1 | C2 - T | C3 - T |
---|---|---|

Response | Factor 1 | Factor 2 |

1 | A | High |

3 | A | Low |

2 | A | High |

2 | A | Low |

4 | B | High |

6 | B | Low |

3 | B | High |

5 | B | Low |

8 | C | High |

9 | C | Low |

7 | C | High |

10 | C | Low |

Examine the descriptive statistics, concentrating on the means.

Rows: Factor 1 Columns: Factor 2
High Low All
A 1.500 2.500 2.000
B 3.500 5.500 4.500
C 7.500 9.500 8.500
All 4.167 5.833 5.000
Cell Contents
Response : Mean

To get the following output:

- Choose .
- In Responses, enter Response.
- In Categorical predictors, enter Factor 1 and Factor 2.
- Click Coding. Under Coding for categorical predictors, choose (1, 0).
- Under Reference level, choose C for Factor 1 and Low for Factor 2.
- Click OK in each dialog.

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 9.333 0.391 23.88 0.000
Factor 1
A -6.500 0.479 -13.58 0.000 1.33
B -4.000 0.479 -8.36 0.000 1.33
Factor 2
High -1.667 0.391 -4.26 0.003 1.00

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 3 94.3333 31.4444 68.61 0.000
Factor 1 2 86.0000 43.0000 93.82 0.000
Factor 2 1 8.3333 8.3333 18.18 0.003
Error 8 3.6667 0.4583
Lack-of-Fit 2 0.6667 0.3333 0.67 0.548
Pure Error 6 3.0000 0.5000
Total 11 98.0000

Remember that the factor level means are:

- A = 2.0
- B = 4.5
- C = 8.5

The estimated regression equation is:

Regression Equation
Response = 9.333 - 6.500 Factor 1_A - 4.000 Factor 1_B + 0.0 Factor 1_C
- 1.667 Factor 2_High + 0.0 Factor 2_Low

Again, the coefficient corresponding to level A is –6.5. This is still the distance that level A is from the baseline level (Level C). If you take the mean for level A and subtract from it the mean for the baseline level, you get the coefficient: 2 – 8.5 = -6.5.

Similarly, the coefficient corresponding to level B is still –4.0. It is the distance that level B is from the baseline level for factor 1. If you take the mean for level B and subtract from it the mean for the baseline level, you get the coefficient: 4.5 - 8.5 = -4.0.

Finally, the coefficient corresponding to the High level of factor 2 is the distance that “High” is from the baseline level for factor 2 (Low). So, if you take the mean for the High level of factor 2 and subtract from it the mean for the baseline level for factor 2, you get the coefficient: 4.1667 – 5.8333 = -1.667.

To get the following output:

- Choose .
- In Responses, enter Response.
- In Categorical predictors, enter Factor 1 and Factor 2.
- Click Coding. Under Coding for categorical predictors, choose (-1, 0, +1).
- Click OK in each dialog.

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 86.00 43.000 32.25 0.000
Factor 1 2 86.00 43.000 32.25 0.000
Error 9 12.00 1.333
Total 11 98.00

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 5.000 0.333 15.00 0.000
Factor 1
A -3.000 0.471 -6.36 0.000 1.33
B -0.500 0.471 -1.06 0.316 1.33

Notice that with this coding scheme the coefficients haven’t changed from the one factor model. You now have an additional coefficient for the second factor.

Now consider the overall mean and the factor level means:

- Overall Mean = 5.0
- A = 2.0
- B = 4.5
- C = 8.5
- High = 4.1667
- Low = 5.8333

The regression equation is:

Regression Equation
Response = 5.000 - 3.000 Factor 1_A - 0.500 Factor 1_B + 3.500 Factor 1_C

The effect for any specific factor level is the Level Mean – Overall Mean. Thus,

- Level A effect = 2.0 - 5.0 = -3.0
- Level B effect = 4.5 - 5.0 = -0.5
- Level C effect = 8.5 - 5.0 = 3.5
- Level High effect = 4.1667 – 5.0 = -0.883
- Level Low effect = 5.8333 – 5.0 = 0.883

When you have only two levels and equal sample sizes, the level effect will be equal in magnitude because the mean is exactly in the middle.

The intercept is the overall mean.

The coefficients are the effect for each factor level. They represent the difference between the mean for that level and the overall mean.