Goodness of fit for Individual Distribution Identification

Find definitions and interpretation guidance for every goodness-of-fit statistic that is provided with individual distribution identification.

Probability plot

A probability plot displays each data point versus the percentage of values in the sample that are less than or equal to it.
The plot includes:
Middle line
The expected percentile from the distribution based on maximum likelihood parameter estimates.
Confidence bound lines
A left curved line indicates the lower bounds of the confidence intervals for the percentiles. A right curved line indicates the upper bounds of the confidence intervals for the percentiles.

Interpretation

Use the probability plot to assess how closely your data follow each distribution.

If the distribution is a good fit for the data, the points should fall closely along the fitted distribution line. Departures from the straight line indicate that the fit is unacceptable.

Good fit
Poor fit

In addition to the probability plot, use the goodness-of-fit measures, such as the AD p-values and the LRT p-values, to evaluate the distribution fit.

When selecting a distribution to model your data, also rely on your process knowledge. If several distributions provide a good fit, use the following strategies to choose a distribution:
  • Choose the distribution that is most commonly used in your industry or application.
  • Choose the distribution that provides the most conservative results. For example, if you are performing capability analysis, you can perform the analysis using different distributions and then choose the distribution that produces the most conservative capability indices. For more information, go to Distribution percentiles for Individual Distribution Identification and click "Percents and percentiles".
  • Choose the simplest distribution that fits your data well. For example, if a 2-parameter and a 3-parameter distribution both provide a good fit, you might choose the simpler 2-parameter distribution.

P

For each distribution, Minitab reports a p-value (P) for the Anderson-Darling (AD) test. The p-value is a probability that measures the evidence against the null hypothesis. For an AD test, the null hypothesis is that the data follow the distribution. Therefore, lower p-values provide stronger evidence that the data do not follow the distribution.
Note

No p-value for the AD test is available for the 3-parameter distributions, except for the Weibull distribution.

Interpretation

Use the p-value to assess the fit of the distribution.

Compare the p-value for each distribution or transformation to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the data do not follow the distribution when they actually do follow the distribution.
P ≤ α: The data do not follow the distribution (Reject H0)
If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis and conclude that your data do not follow the distribution.
P > α: Cannot conclude the data do not follow the distribution (Fail to reject H0)
If the p-value is greater than the significance level, the decision is to fail to reject the null hypothesis. There is not enough evidence to conclude that the data do not follow the distribution. You can assume the data follow the distribution.
When selecting a distribution to model your data, also rely on your process knowledge. If several distributions provide a good fit, use the following strategies to choose a distribution:
  • Choose the distribution that is most commonly used in your industry or application.
  • Choose the distribution that provides the most conservative results. For example, if you are performing capability analysis, you can perform the analysis using different distributions and then choose the distribution that produces the most conservative capability indices. For more information, go to Distribution percentiles for Individual Distribution Identification and click "Percents and percentiles".
  • Choose the simplest distribution that fits your data well. For example, if a 2-parameter and a 3-parameter distribution both provide a good fit, you might choose the simpler 2-parameter distribution.
Important

Use caution when you interpret results from a very small or a very large sample. If you have a very small sample, a goodness-of-fit test may not have enough power to detect significant deviations from the distribution. If you have a very large sample, the test may be so powerful that it detects even small deviations from the distribution that have no practical significance. Use the probability plots in addition to the p-values to evaluate the distribution fit.

Goodness of Fit Test

DistributionADPLRT P
Normal0.7540.046 
Box-Cox Transformation0.4140.324 
Lognormal0.6500.085 
3-Parameter Lognormal0.341*0.017
Exponential20.614<0.003 
2-Parameter Exponential1.6840.0140.000
Weibull1.442<0.010 
3-Parameter Weibull0.230>0.5000.000
Smallest Extreme Value1.656<0.010 
Largest Extreme Value0.394>0.250 
Gamma0.7020.071 
3-Parameter Gamma0.268*0.006
Logistic0.7260.034 
Loglogistic0.6590.050 
3-Parameter Loglogistic0.432*0.027
Johnson Transformation0.1240.986 

In these results, several distributions have a p-value greater than 0.05. The 3-parameter Weibull distribution (P > 0.500) and the largest extreme value distribution (P > 0.250) have the largest p-values, and appear to fit the sample data better than the other distributions. Also, the Box-Cox transformation (P = 0.324) and the Johnson transformation (P = 0.986) are effective in transforming the data to follow a normal distribution.

Note

For several distributions, Minitab also displays results for the distribution with an additional parameter. For example, for the lognormal distribution, Minitab displays results for both the 2-parameter and 3-parameter versions of the distribution. For distributions that have additional parameters, use the likelihood-ratio test p-value (LRT P) to determine whether adding another parameter significantly improves the fit of the distribution. An LRT p-value that is less than 0.05 suggests that the improvement in fit is significant. For more information, see the section on LRT P.

LRT P

For several distributions, Minitab also displays results for the distribution with an additional parameter. For each extra-parameter version of a distribution, Minitab reports a p-value for the likelihood-ratio test (LRT P). A p-value is a probability that measures the evidence against the null hypothesis. For the likelihood-ratio test in individual distribution identification, the null hypothesis is that the data follow the smaller (lower parameter) distribution. Therefore, lower LRT p-values provide stronger evidence that the distribution fit is significantly improved by using an additional parameter.

Interpretation

Use the LRT p-value to determine whether adding the extra parameter significantly improves the fit over the distribution without the extra parameter.

For each distribution or transformation, compare the LRT p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the extra-parameter significantly improves the distribution fit, when it actually does not.
P ≤ α: The larger (higher parameter) distribution provides a significantly better fit. (Reject H0)
If the p-value is less than or equal to the significance level, you reject the null hypothesis and conclude that the distribution fit is significantly improved by using an additional parameter.
P > α: Cannot conclude that the larger (higher parameter) distribution provides a significantly better fit (Fail to reject H0)
If the p-value is greater than the significance level, you fail to reject the null hypothesis. There is not enough evidence to conclude that the distribution fit is significantly improved by using an additional parameter.

The LRT p-value is also useful for 3-parameter distributions for which there is no established method for calculating the p-value. In these cases, first examine the p-value for the corresponding 2-parameter distribution. Then examine the LRT p-value for the 3-parameter distribution to determine whether the 3-parameter distribution is significantly better than the 2-parameter distribution.

In these results, the LRT p-values for the 3-parameter lognormal (0.017), 3-parameter Weibull (0.000), 3-parameter gamma (0.006), and 3-parameter loglogistic (0.027) distributions suggest that these distributions significantly improve the fit compared to their 2-parameter counterparts.

Goodness of Fit Test

DistributionADPLRT P
Normal0.7540.046 
Box-Cox Transformation0.4140.324 
Lognormal0.6500.085 
3-Parameter Lognormal0.341*0.017
Exponential20.614<0.003 
2-Parameter Exponential1.6840.0140.000
Weibull1.442<0.010 
3-Parameter Weibull0.230>0.5000.000
Smallest Extreme Value1.656<0.010 
Largest Extreme Value0.394>0.250 
Gamma0.7020.071 
3-Parameter Gamma0.268*0.006
Logistic0.7260.034 
Loglogistic0.6590.050 
3-Parameter Loglogistic0.432*0.027
Johnson Transformation0.1240.986