What are the coding schemes for categorical predictors?

When you perform least squares, logistic, or Poisson regression analysis with categorical predictors, Minitab uses a coding scheme to make indicator variables out of the categorical predictor. The default coding scheme is 1, 0 (also known as binary and dummy coding) is commonly used in regression analyses.
  • Using 1, 0 coding, coefficients represent the distance between factor levels and the reference level.
  • Using 1, 0, -1 coding coefficients represent the distance between factor levels and the overall mean.
For predictors with 1, 0 coding, by default, Minitab sets the following reference levels based on the data type:
  • For numeric categorical predictors, the reference level is the level with the least numeric value.
  • For date/time categorical predictors, the reference level is the level with the earliest date/time.
  • For text categorical predictors, the reference level is the level that is first in value order, which is alphabetical order, by default.
For predictors with -1, 0, 1 coding, by default, Minitab sets the following reference levels based on the data type:
  • For numeric categorical predictors, the reference level is the level with the largest numeric value.
  • For date/time categorical predictors, the reference level is the level with the latest date/time.
  • For text categorical predictors, the reference level is the level that is last in alphabetical order.

How to change the coding scheme

In regression analyses, including regression, binary logistic regression, and Poisson regression, Minitab uses the 1, 0 coding by default. If you want to change the coding scheme to -1, 0, 1, go to the Coding subdialog box. For PLS, you can change the reference level in the Options subdialog box.

How coding schemes works

To include categorical predictors in your general regression model, Minitab codes the categories so they can be included in the regression equation. Regression does this automatically, creating columns for the categorical predictors based on which coding scheme is used. One column of codes is created for each factor level except for the reference level. Minitab creates columns and assigns a 1 when a row belongs to the column group. No column is created for the reference level. For more information on the coding scheme and the design matrix, go to How Minitab uses the design matrix for regression.

The following examples show how the coding schemes work for a categorical predictor for Location with three levels: Hong Kong, London, and New York. If the coding scheme is -1, 0, 1, the default reference level is New York. No column is created for New York and no coefficient for New York appears in the coefficients table in the output. A column is created for Hong Kong and London, and if the row of any column corresponds to New York (the reference level), it is assigned a -1.

If the location is Hong Kong London
Hong Kong 1 0
London 0 1
New York -1 -1

If the coding scheme is 1, 0, the default reference level is Hong Kong, because it is first in alphabetical order. No column is created for Hong Kong and no coefficient for Hong Kong appears in the coefficients table in the output. A column is created for London and New York.

If the location is London New York
Hong Kong 0 0
London 1 0
New York 0 1

For more information on interpreting the coefficients for fit regression model, go to Interpreting categorical predictors.

For more information on interpreting the coefficients for fit binary logistic regression, go to Interpreting the estimated coefficients in binary logistic regression.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy