Distribution percentiles for Individual Distribution Identification

Find definitions and interpretation guidance for every distribution percentile statistic that is provided with Individual Distribution Identification.

Percents and percentiles

If you choose to estimate percentiles for selected percents of data, Minitab displays a table of percentiles. The percentile for P percent is the value below which you can expect P percent of the population values to fall for each distribution. By default, Minitab displays percentiles for 0.135%, 0.5%, 2%, and 5%.

Interpretation

Sometimes it can be difficult to determine the best distribution based only on the probability plot and goodness-of-fit measures. In that case, you can compare percentiles for selected percent values of each distribution to assess how using different distributions affects your conclusions.
  • If several distributions provide a reasonable fit to the data, and their percentile values are close enough so that you are likely to draw similar conclusions using any of the distributions, then it probably does not matter which distribution you choose.
  • If the percentiles for the distributions with a reasonable fit differ by an amount that could affect your analysis results, you may want to select the distribution that provides the most conservative results for your application.

For example, suppose a process has a lower specification limit of 46.2. In that case, the largest extreme value distribution provides slightly more conservative results when you evaluate the capability of the process on the lower tail of the distribution. If the difference is important for the application, you might use the largest extreme distribution to avoid possibly overestimating the capability of the process.

Table of Percentiles Standard Distribution Percent Percentiles Error 95.0% CI Normal 0.5 43.6604 0.81715 42.1 45.3 Box-Cox Transformation 0.5 0.0000 0.00000 0.0 0.0 Lognormal 0.5 44.1612 0.70063 42.8 45.6 3-Parameter Lognormal 0.5 46.3662 0.51400 45.4 47.4 Exponential 0.5 0.2545 0.03600 0.2 0.3 2-Parameter Exponential 0.5 46.7391 0.00288 46.7 46.7 Weibull 0.5 38.7359 1.31065 36.3 41.4 3-Parameter Weibull 0.5 46.7913 0.17247 46.7 47.1 Smallest Extreme Value 0.5 36.5526 1.76758 33.1 40.0 Largest Extreme Value 0.5 45.8856 0.43646 45.0 46.7 Gamma 0.5 44.0724 0.72433 42.7 45.5 3-Parameter Gamma 0.5 46.4331 0.17091 46.1 46.8 Logistic 0.5 42.1299 1.03294 40.1 44.2 Loglogistic 0.5 42.8370 0.86658 41.2 44.6 3-Parameter Loglogistic 0.5 46.2924 0.70522 45.5 47.7 Johnson Transformation 0.5 -2.4771 0.28756 -3.0 -1.9

In these results, the 3-parameter Weibull distribution and the largest extreme value distribution both provide a reasonable fit for the data based on the probability plots and p-values (not shown). For the 3-parameter Weibull distribution, you can expect 1% of the data to fall below 46.8668. For the largest extreme value distribution, you can expect 1% of the data to fall below and 46.1898. Depending on the context, this additional information may help you select the better distribution. If one value provides more conservative estimates, you might select that distribution.

Note

The values for the Box-Cox and Johnson transformations are based on the transformed values rather than on the raw data, which makes the percentiles difficult to interpret.

Standard error of the percentile

The standard error of the percentile estimates the variability between the sample percentiles that you would obtain if you took repeated samples from the same population. Whereas the standard error of the mean estimates the variability between samples, the standard deviation measures the variability within a single sample.

Interpretation

Use the standard error of the percentile to determine how precisely the sample percentile estimates the population percentile for each distribution.

A smaller value of the standard error indicates a more precise estimate of the population percentile. Usually, a larger standard deviation results in a larger standard error and a less precise estimate of the population percentile. A larger sample size results in a smaller standard error and a more precise estimate of the population percentile.

Minitab uses the standard error of the percentile to calculate the confidence interval, which is a range of values for the population percentile.

CI of percentiles

The confidence interval provides a range of likely values for a population percentile. A confidence interval is defined by a lower bound and an upper bound. The bounds are calculated by determining a margin of error for the sample estimate of the percentile. The lower confidence bound defines a value that the percentile is likely to be greater than. The upper confidence bound defines a value that the percentile is likely to be less than.

Interpretation

Because samples of data are random, two samples collected from your process are unlikely to yield identical estimates of a percentile. To calculate the actual value of the percentile for your process, you would need to analyze the data for all the items that the process produces, which is not feasible. Instead, you can use a confidence interval to determine a range of likely values for the percentile.

At a 95% confidence level, you can be 95% confident that the actual value of the percentile is contained within the confidence interval. That is, if you collect 100 random samples from your process, you can expect approximately 95 of the samples to produce intervals that contain the actual value of the percentile.

The width of a confidence interval tends to decrease with larger sample sizes or less variability in the data. A narrow confidence interval indicates that the sample estimate is reliable and not likely to be strongly influenced by variability due to random sampling. If the confidence interval for a percentile is wide, be cautious when using the percentile point estimate to draw conclusions about your process. In the confidence interval is wide, you may want to base your estimate of the percentile value on the lower bound or the upper bound of the confidence interval, whichever produces the more conservative results for your application.

For example, the results for the largest extreme value distribution indicate that you can expect 1% of the data to fall below the value 46.1898 based on the sample estimate. The 95% confidence interval is (45.4, 47). Suppose the lower specification limit for a process is 47. To be cautious, you may want to use the lower bound (45.4) of the confidence interval for the percentile estimate. Using the lower bound, you can expect 1% of the data fall to below the value 45.4, which provides a more conservative estimate in this situation.

Table of Percentiles Standard Distribution Percent Percentiles Error 95.0% CI Normal 0.5 43.6604 0.81715 42.1 45.3 Box-Cox Transformation 0.5 0.0000 0.00000 0.0 0.0 Lognormal 0.5 44.1612 0.70063 42.8 45.6 3-Parameter Lognormal 0.5 46.3662 0.51400 45.4 47.4 Exponential 0.5 0.2545 0.03600 0.2 0.3 2-Parameter Exponential 0.5 46.7391 0.00288 46.7 46.7 Weibull 0.5 38.7359 1.31065 36.3 41.4 3-Parameter Weibull 0.5 46.7913 0.17247 46.7 47.1 Smallest Extreme Value 0.5 36.5526 1.76758 33.1 40.0 Largest Extreme Value 0.5 45.8856 0.43646 45.0 46.7 Gamma 0.5 44.0724 0.72433 42.7 45.5 3-Parameter Gamma 0.5 46.4331 0.17091 46.1 46.8 Logistic 0.5 42.1299 1.03294 40.1 44.2 Loglogistic 0.5 42.8370 0.86658 41.2 44.6 3-Parameter Loglogistic 0.5 46.2924 0.70522 45.5 47.7 Johnson Transformation 0.5 -2.4771 0.28756 -3.0 -1.9

In these results, using the largest extreme value distribution, you can expect 1% of the data to fall below the value 46.1898 based on the sample estimate. The 95% confidence interval is (45.4, 47). Suppose the lower specification limit for a process is 47. To be cautious, you may want to use the lower bound (45.4) of the confidence interval for the percentile estimate. Using the lower bound, you can expect 1% of the data fall to below the value 45.4, which provides a more conservative estimate in this situation.

Note

The values for the Box-Cox and Johnson transformations are based on the transformed values rather than on the raw data, which makes the percentiles difficult to interpret.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy