The maximum likelihood method estimates the factor loadings, assuming the data follow a multivariate normal distribution. As its name implies, this method finds estimates of the factor loadings and unique variances by maximizing the likelihood function associated with the multivariate normal model. Equivalently, this is done by minimizing an expression involving the variances of the residuals. The algorithm iterates until a minimum is found or until the maximum specified number of iterations (the default is 25) is reached.
Minitab uses an algorithm based on Joreskog,1,2 with some adjustments to improve convergence. We give a brief summary of the algorithm here.
Suppose we have p variables and want to fit a model with m factors. Let R be the p × p correlation matrix of the variables, L be the p × m matrix of factor loadings, and Ψ be a p × p diagonal matrix whose diagonal elements are the unique variances, Ψi. Then we need to find values for L and Ψ that maximize the likelihood function, f(L,Ψ). This involves a two-step procedure, first finding a value for Ψ, then for L.
You can indirectly specify the initial value of Ψ. In the Factor Analysis - Options subdialog box, enter the column containing the initial values for the communalities in Use initial communality estimates in. Minitab then calculates the diagonal elements of Ψ as (1 − communalities).
For a fixed value of Ψ, we maximize f(L,Ψ) with respect to L. This is a simple matrix calculation. The value of L is then substituted into f(L,Ψ). Now f can be viewed as a function of Ψ. A simple transformation of this function gives
where λ1 < λ2 < ... λp are eigenvalues of Ψ R- 1Ψ. We then minimize g(Ψ), using a Newton-Raphson procedure. This gives an estimate of Ψ, which is then substituted into the likelihood f(L,Ψ). Then the likelihood is again maximized with respect to L. Then a new value for g(Ψ) is calculated, and so on. By default, iterations continue up to 25 steps if convergence is not achieved. If the algorithm does not convergence in 25 steps, you may want to change the default maximum number of iterations in the Options subdialog box.
Convergence is reached at step n, if either of the following are true:
- The function g(Ψ) does not change very much between consecutive steps. Specifically, if:
- | [g(Ψ) at step n] − [g(Ψ) at step (n − 1)] | < 10-6
- None of the unique variances change very much between consecutive steps. Specifically, if:
- | ln(Ψi at step n) − ln(Ψi at step n − 1) | < K2,
for all i = 1, ... , p, where Ψi the ith diagonal element of Ψ, is the unique variance corresponding to variable i.
The value of K2 can be specified in Convergence in the Options subdialog box. By default, the value is 0.005.
Choose All and MLE iterations in the Results subdialog box to display information on each iteration. The value of the objective function, g(Ψ), is displayed, then the maximum change in ln(Ψi). If, on an iteration, the value of g(Ψ) does not decrease, then a smaller (half the size) step is taken. Half-stepping is continued until g(Ψ) decreases or 25 half-steps are taken. The number of half steps is displayed. If g(Ψ) did not decrease in 25 half-steps, the algorithm stops and a message is displayed.
A matrix of second derivatives is used in the minimization of g(Ψ). This matrix is not always positive definite. If it is not, an approximation is used. An asterisk is displayed on the results when Minitab uses the exact matrix.
When minimizing the function g(Ψ), it is possible to find values of the diagonal element of Ψ that are 0 or negative. To prevent this, Minitab's algorithm bounds the diagonal elements of Ψ away from 0. Specifically, if a unique variance Ψi is less than K2, it is set equal to K2. K2 is the value set in Convergence in the Options subdialog box.
When the algorithm converges, a final check is performed on the unique variances. If any of the unique variances are less than K2, they are set equal to 0. The corresponding communality is then equal to 1. This result is called a Heywood case and Minitab displays a message to inform the user of this result. Optimization algorithms, such as the one used for maximum likelihood factor analysis, can give different answers with minor changes in the input. For example, if you change a few data points, change the starting values in Use initial communality estimates in, or change the convergence criterion in Convergence, you may see differences in factor analysis results. This is especially true if the solution lies in a relatively flat place on the maximum likelihood surface.