Interpret the key results for Histogram

Complete the following steps to interpret a histogram.

Step 1: Assess the key characteristics

Examine the distribution of your sample data, including the peaks, spread, and symmetry. Assess how the sample size may affect the appearance of the histogram.

Peaks and spread

Identify the peaks, which are the tallest clusters of bars. The peaks represent the most common values. Assess the spread of your sample to understand how much your data varies.

For example, in the following histogram of customer wait times, the peak of the data occurs at about 6 minutes. The data spread is from about 2 minutes to 12 minutes.

Investigate any surprising or undesirable characteristics on the histogram. For example, the histogram of customer wait times showed a spread that is wider than expected. An investigation revealed that a software update to the computers caused delays in customer wait times.

Symmetry

When data are skewed, the majority of the data are located on one side of the histogram.
Right skew
The data in the following graph are right-skewed. Most of the sample values are clustered on the right side of the histogram.
Left skew
The data in the following graph are left-skewed. Most of the sample values are clustered on the left side of the histogram.

Some theoretical distributions, such as the normal distribution, are symmetric. Other theoretical distributions, such as the exponential distribution and the lognormal distribution, are right skewed. The Weibull distribution can be symmetric, right skewed, or left skewed. The skew of a Weibull distribution is determined by the value of the scale parameter. For more information, go to Weibull distribution.

Sample size (N)

A histogram works best when the sample size is at least 20. If the sample size is too small, each bar on the histogram may not contain enough data points to accurately show the distribution of the data. If the sample size is less than 20, consider using Individual Value Plot instead.

For example, although the following histograms seem quite different, both of them were created using randomly selected samples of data from the same population.

N = 20
N = 100

Step 2: Look for multiple modes and outliers

Multiple peaks (also called modes) often indicate that important variables are not yet accounted for. Outliers may indicate other conditions in your data.

Multiple modes

Multi-modal data have multiple peaks, also called modes. Multi-modal data often indicate that important variables are not yet accounted for.

For example, a bank manager creates a histogram of customer wait times from two bank locations and notices that the histogram has two peaks. The manager creates another histogram to show the data for each location as a separate group. The histogram with groups confirms that the two peaks in the original histogram correspond to a difference in mean wait times between the two locations.
Wait times
Wait times by location

Outliers

Outliers, which are data values that are far away from other data values, can strongly affect your results. Often, outliers are easiest to identify on a boxplot. On a histogram, isolated bars at the ends identify outliers.

Try to identify the cause of any outliers. Correct any data-entry errors or measurement errors. Consider removing data values that are associated with abnormal, one-time events (special causes). Then, repeat the analysis.

Step 3: Fit a theoretical distribution

You can add a fitted distribution line to assess whether your data follow a specific theoretical distribution, such as the normal distribution. For more information, go to Customize the histogram and click "Distribution Fit".

Tip

Use Distribution Plot to create and compare theoretical distributions and to see how changing the population parameters affects the shape of each distribution.

Distribution fit

Minitab uses the data in your sample to estimate the parameters for the fitted distribution line. For example, if you fit a normal distribution, Minitab estimates the mean and the standard deviation from your sample. Evaluate how closely the heights of the bars follow the shape of the line. Data that fit the distribution well have bars that closely follow the line.

Good fit
Poor fit

Anderson-Darling test

To determine whether the data do not follow the specified distribution, compare the p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates that the risk of concluding the data do not follow the specified distribution—when, actually, the data do follow the specified distribution—is 5%.
P-value ≤ α: The data do not follow the specified distribution (Reject H0)
If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis and conclude that your data do not follow the specified distribution.
P-value > α: Cannot conclude the data do not follow the specified distribution (Fail to reject H0)
If the p-value is larger than the significance level, the decision is to fail to reject the null hypothesis because you do not have enough evidence to conclude that your data do not follow the specified distribution. However, you cannot conclude that the data do follow the specified distribution.
Anderson-Darling Test
AD-Value
P-Value
Key Result: P-Value

In these results, the null hypothesis states that the data follow a normal distribution. Because the p-value is 0.4631, which is greater than the significance level of 0.05, the decision is to fail to reject the null hypothesis. You cannot conclude that the data do not follow a normal distribution.

Step 4: Assess and compare groups

If your histogram has groups, assess and compare the center and spread of groups.

Centers

Look for differences between the centers of the groups. For example, the following histograms show the completion time for three versions of a credit card application. The center for each version of the credit card application is in a different location. The differences in the locations indicate that the mean completion times are different.

To determine whether a difference in means is statistically significant, do one of the following:

Spreads

Look for differences between the spreads of the groups. For example, the following histograms show the weights of jars that were filled by three machines. Although the histograms have almost the same center, some histograms are wider and more spread out. The wider spread indicates that those machines fill jars less consistently.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy