What to consider when you evaluate distribution fit

Selecting an appropriate distribution is an essential first step in performing reliability analyses. If the selected distribution does not fit the data well, then the reliability estimates will be inaccurate. A well-fitting distribution model is also needed in order to extrapolate beyond the range of data. Consider the following criteria when choosing the most appropriate distribution for your reliability data:
  • Use engineering and historical knowledge of the situation. For example, do the data follow a symmetric distribution? Is the hazard constant, increasing, or decreasing? What distribution has worked historically for similar situations?
  • Perform a distribution analysis and use probability plots to compare the candidate distributions or to assess the appropriateness of the chosen distribution.
  • Evaluate the Anderson-Darling goodness-of-fit statistic and the Pearson correlation coefficient:
    • Substantially lower values of Anderson-Darling generally indicate a better fitting distribution. The Anderson-Darling statistic is calculated for both the maximum likelihood estimation method (MLE) and the least squares estimation method (LSE).
    • Substantially higher values of the Pearson correlation coefficient identify a better fitting distribution. The correlation coefficient is available for the LSE method.
  • Evaluate how different distributions affect your conclusions:
    • If several distributions provide an adequate fit to the data and result in similar conclusions, then it probably does not matter which distribution you choose.
    • If your conclusions depend on the distribution that you choose, you may want to report the most conservative conclusion or collect more information.

Distributions you can use to model skewed data or symmetric data

Frequently, you can model a set of data with more than one distribution, or with a distribution that has one, two, or three parameters. For example, for each type of data, several distributions may be fit:
Right-skewed data
Often, you can fit either the Weibull or the lognormal distribution and obtain a good fit to the data.
Symmetric data
Often, you can fit the Weibull or the lognormal distribution. Sometimes, you can fit the normal distribution (depending on the heaviness of the tails) and obtain similar results.
Left-skewed data
Often, you can fit the Weibull or the smallest extreme value distribution.
A particular set of data can sometimes be modeled using either 2 or 3 parameters. A 3-parameter model can provide a better fit for some data, but can also result in overfitting the model. Overfitting means that the model fits the sample data well, but would not fit another sample from the same population. Usually, experts advise choosing the simplest model that works.