Methods for Individual Distribution Identification

In This Topic

Maximum likelihood estimates
Goodness-of-fit test
Likelihood-ratio test

Maximum likelihood estimates

Maximum likelihood estimates of the parameters in the distribution are calculated by maximizing the likelihood function with respect to the parameters. For a given data set, the likelihood function of a distribution estimates the probability of generating the data under that distribution.

The Newton-Raphson algorithm is used to calculate maximum likelihood estimates of the parameters which define the distribution. The Newton-Raphson algorithm is a recursive method for calculating the maximum of a function. ¹ The percentiles are then calculated from the distribution.

Note

Minitab calculates the parameter estimates using the maximum likelihood method for all the distributions except the normal distribution and the lognormal distribution. For the normal distribution and the lognormal distribution, Minitab calculates unbiased parameter estimates.

Goodness-of-fit test

Minitab uses Anderson-Darling statistics to perform the goodness-of-fit test.

Let Z = F(X), where F(X) is the cumulative distribution function. Suppose that a sample X₁, .., X_n gives values Z₍_i) = F(X_i), i=1,.., n. Rearrange Z_(i) in ascending order, Z₍₁₎ < Z₍₂₎ <...<Z₍_n₎. Then the Anderson-Darling statistic (A²) is calculated as follows:

A² = –n - (1/n) Σ_i[(2i – 1) log Z_(i) + (2n + 1 – 2i) log (1 – Z_(i))]

The modified Anderson-Darling goodness-of-fit test statistic is calculated for each distribution. The p-values are based on tables 4.8−4.22 in D'Agostino and Stephens² If no exact p-value is found in the table, Minitab calculates the p-value based on interpolation using the range of the p-value.

Note

P-values for the Anderson-Darling test are not available for 3-parameter distributions, except for the Weibull distribution.

Likelihood-ratio test

The likelihood-ratio test compares the fit of a larger distribution family with a subset of the same family and determines whether there is a significant improvement in fit with the larger distribution. For instance, for a 2-parameter exponential distribution, the likelihood-ratio test compares the fit of 2-parameter exponential distribution family with the fit of 1-parameter exponential distribution family (a subset with the second parameter being 0). If a 2-parameter exponential distribution significantly improves the fit, then the p-value for likelihood-ratio test statistic is very small.

The likelihood-ratio test statistic is calculated as follows.

Let A be the maximum likelihood estimate (MLE) of the parameter vector for the larger distribution family (for example, the 3-parameter distribution family), and L(A) be the log likelihood. Let B be the MLE of the parameter vector for the corresponding smaller distribution family (for example, the corresponding 2-parameter distribution family), and L(B) be the log likelihood.

Likelihood-ratio test statistic = 2 * L(A) 2 * L(B).

Under the null hypothesis, the smaller distribution family fits the data well. The likelihood-ratio test statistic is chi-square distributed with df = dimension of vector (A) – dimension of vector (B).

¹ W. Murray, Ed. (1972). Numerical Methods for Unconstrained Optimization. Academic Press.

² M.A. Stephens (1986). Chapter 4: Tests based on EDF statistics. Goodness-of-Fit Techniques, ed. R.B. D'Agostino and M.A. Stephens.Marcel Dekker, Inc. 97-193.