About the "rank deficiency" message

Linear models are full rank when there are an adequate number of observations per factor level combination to be able to estimate all terms included in the model. When not enough observations are in the data to fit the model, Minitab removes terms until the model is small enough to fit. It is possible that other models may fit the data better.

Suppose you have a two-factor GLM model. You try to fit the model with terms A B and A*B and receive an error about "rank deficiency." This indicates that there are not enough observations per factor level combination. Try removing the interaction term (A*B).

What is rank deficiency?

Rank deficiency is a condition that can prevent Minitab from performing matrix calculations. For example, consider the following data set with two predictor variables and one response variable:

C1 C2 C3
X1 X2 Y
1.5 9.7 15.0
1.4 8.4 14.0
1.6 8.6 16.0
1.7 8.9 17.0
1.7 8.1 14.5

X1 and X2 are the predictor variables and Y is the response variable. The regression analysis in Minitab uses least squares to calculate the estimated coefficients b0, b1, b2, in the following linear equation:

Y = b0 + b1X1 + b2X2

The least squares procedure is equivalent to solving the set of matrix equations

b = (XTX)-1XTY

where b is a column vector containing the estimated model coefficients, X is a matrix whose first column is a column of ones (used for estimating the intercept/constant) and whose remaining columns are the columns of predictor data (X1, X2,…), and Y is the column vector of response data. For the previous data set, the matrices are:

Minitab uses the QR decomposition to calculate the estimates of the parameters (b0, b1, and b2) and the standard deviations of the parameters. The calculation depends on the eigenvalues of the (XTX) matrix. If some eigenvalues of the (XTX) are essentially zero, the square matrix (XTX) is either singular, or close to being singular, and Minitab will not be able to do the calculations.

What causes rank deficiency?

Rank deficiency occurs if any X variable columns can be written as a linear combination of the other X columns. Two examples are shown, using C1, C2, and C3 as predictor (X) variables:

Example 1

C1 C2 C3
X1 X2 X3
1 2 3
2 3 5
1.5 2.5 4

Example 2

C1 C2 C3
X1 X2 X3
1 2 3
2 4 5
1.5 3 4

In the first example, notice that C1 + C2 = C3.

In the second example, notice that 2*C1 = C2.

If you try to perform regression (or ANOVA) using these predictors, Minitab will remove terms from the model in order to perform the analysis.

Rank deficiency can also occur with categorical data:

Example 3

C1 C2 C3
Machine Operator Response
1 Joel 15
1 Joel 18
1 Joel 17
2 Bill 14
2 Bill 15
2 Bill 16

In this example, notice that the machine column has the exact same pattern as the operator column. If you perform ANOVA with this data set, Minitab will remove terms from the model in order to perform the analysis.

When you perform ANOVA, rank deficiency can also occur for the following reasons:
  • An interaction term included does not have at least one observation for each combination of the factor levels. For example, if A has 3 levels, B has 4 levels, and you include the A*B interaction in the model without having at least one observation for all 12 combinations of the factor levels.
  • There is unbalanced nesting.
  • A continuous variable in the model is not specified as a covariate.
  • The degrees of freedom for Error are negative.