# Interpret all statistics and graphs for Graphical Summary

Find definitions and interpretation guidance for every statistic that is provided with graphical summary.

## A-Squared

The Anderson-Darling goodness-of-fit statistic (A-Squared) measures the area between the fitted line (based on the normal distribution) and the empirical distribution function (which is based on the data points). The Anderson-Darling statistic is a squared distance that is weighted more heavily in the tails of the distribution.

### Interpretation

Minitab uses the Anderson-Darling statistic to calculate the p-value. The p-value is a probability that measures the evidence against the null hypothesis. A smaller p-value provides stronger evidence against the null hypothesis. A smaller value for the Anderson-Darling statistic indicates that the data follow the normal distribution more closely.

## P-Value

The p-value is a probability that measures the evidence against the null hypothesis. A smaller p-value provides stronger evidence against the null hypothesis.

### Interpretation

Use the p-value to determine whether the data do not follow a normal distribution.

To determine whether the data do not follow a normal distribution, compare the p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the data do not follow a normal distribution when the data do follow a normal distribution.
P-value ≤ α: The data do not follow a normal distribution (Reject H0)
If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis and conclude that your data do not follow a normal distribution.
P-value > α: You cannot conclude that the data do not follow a normal distribution (Fail to reject H0)
If the p-value is larger than the significance level, the decision is to fail to reject the null hypothesis. You do not have enough evidence to conclude that your data do not follow a normal distribution.

## Mean

The mean is the average of the data, which is the sum of all the observations divided by the number of observations.

For example, the wait times (in minutes) of five customers in a bank are: 3, 2, 4, 1, and 2. The mean waiting time is calculated as follows: On average, a customer waits 2.4 minutes for service at the bank.

### Interpretation

Use the mean to describe the sample with a single value that represents the center of the data. Many statistical analyses use the mean as a standard measure of the center of the distribution of the data.

The median and the mean both measure central tendency. But unusual values, called outliers, can affect the median less than they affect the mean. If your data are symmetric, the mean and median are similar.

## StDev

The standard deviation is the most common measure of dispersion, or how spread out the data are about the mean. The symbol σ (sigma) is often used to represent the standard deviation of a population, while s is used to represent the standard deviation of a sample. Variation that is random or natural to a process is often referred to as noise.

Because the standard deviation is in the same units as the data, it is usually easier to interpret than the variance.

### Interpretation

Use the standard deviation to determine how spread out the data are from the mean. A higher standard deviation value indicates greater spread in the data. A good rule of thumb for a normal distribution is that approximately 68% of the values fall within one standard deviation of the mean, 95% of the values fall within two standard deviations, and 99.7% of the values fall within three standard deviations.

The standard deviation can also be used to establish a benchmark for estimating the overall variation of a process.

## Variance

The variance measures how spread out the data are about their mean. The variance is equal to the standard deviation squared.

### Interpretation

The greater the variance, the greater the spread in the data.

Because variance (σ2) is a squared quantity, its units are also squared, which may make the variance difficult to use in practice. The standard deviation is usually easier to interpret because it's in the same units as the data. For example, a sample of waiting times at a bus stop may have a mean of 15 minutes and a variance of 9 minutes2. Because the variance is not in the same units as the data, the variance is often displayed with its square root, the standard deviation. A variance of 9 minutes2 is equivalent to a standard deviation of 3 minutes.

## Skewness

Skewness is the extent to which the data are not symmetrical.

## Kurtosis

Kurtosis indicates how the tails of a distribution differ from the normal distribution.

### Interpretation

Use kurtosis to initially understand general characteristics about the distribution of your data.

## N

The number of non-missing values in the sample.

In this example, there are 141 recorded observations.
Total count N N*
149 141 8

## Minimum

The minimum is the smallest data value.

In these data, the minimum is 7.

 13 17 18 19 12 10 7 9 14

### Interpretation

Use the minimum to identify a possible outlier or a data-entry error. One of the simplest ways to assess the spread of your data is to compare the minimum and maximum. If the minimum value is very low, even when you consider the center, the spread, and the shape of the data, investigate the cause of the extreme value.

## 1st Quartile

Quartiles are the three values—the 1st quartile at 25% (Q1), the second quartile at 50% (Q2 or median), and the third quartile at 75% (Q3)— that divide a sample of ordered data into four equal parts.

The 1st quartile is the 25th percentile and indicates that 25% of the data are less than or equal to this value.

## Median

The median is the midpoint of the data set. This midpoint value is the point at which half the observations are above the value and half the observations are below the value. The median is determined by ranking the observations and finding the observation that are at the number [N + 1] / 2 in the ranked order. If the number of observations are even, then the median is the average value of the observations that are ranked at numbers N / 2 and [N / 2] + 1.

### Interpretation

The median and the mean both measure central tendency. But unusual values, called outliers, can affect the median less than they affect the mean. If your data are symmetric, the mean and median are similar.

## 3rd Quartile

Quartiles are the three values—the 1st quartile at 25% (Q1), the second quartile at 50% (Q2 or median), and the third quartile at 75% (Q3)— that divide a sample of ordered data into four equal parts.

The third quartile is the 75th percentile and indicates that 75% of the data are less than or equal to this value.

## Maximum

The maximum is the largest data value.

In these data, the maximum is 19.

 13 17 18 19 12 10 7 9 14

### Interpretation

Use the maximum to identify a possible outlier or a data-entry error. One of the simplest ways to assess the spread of your data is to compare the minimum and maximum. If the maximum value is very high, even when you consider the center, the spread, and the shape of the data, investigate the cause of the extreme value.

## Confidence Interval

The confidence interval provides a range of likely values for the population parameter. Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. But, if you repeated your sample many times, a certain percentage of the resulting confidence intervals or bounds would contain the unknown population parameter. The percentage of these confidence intervals or bounds that contain the parameter is the confidence level of the interval. For example, a 95% confidence level indicates that if you take 100 random samples from the population, you could expect approximately 95 of the samples to produce intervals that contain the population parameter.

An upper bound defines a value that the population parameter is likely to be less than. A lower bound defines a value that the population parameter is likely to be greater than.

The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size. For more information, go to Ways to get a more precise confidence interval.

## Histogram

A histogram divides sample values into many intervals and represents the frequency of data values in each interval with a bar.

### Interpretation

Use a histogram to assess the shape and spread of the data. Histograms are best when the sample size is greater than 20.

Skewed data

You can use a histogram of the data overlaid with a normal curve to examine the normality of your data. A normal distribution is symmetric and bell-shaped, as indicated by the curve. It is often difficult to evaluate normality with small samples. A probability plot is best for determining the distribution fit.

Outliers

Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers.

Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often indicate that important variables are not yet accounted for.

If you have additional information that allows you to classify the observations into groups, you can create a group variable with this information. Then, you can create the graph with groups to determine whether the group variable accounts for the peaks in the data.

## Boxplot

A boxplot provides a graphical summary of the distribution of a sample. The boxplot shows the shape, central tendency, and variability of the data.

### Interpretation

Use a boxplot to examine the spread of the data and to identify any potential outliers. Boxplots are best when the sample size is greater than 20.

Skewed data

Examine the spread of your data to determine whether your data appear to be skewed. When data are skewed, the majority of the data are located on the high or low side of the graph. Often, skewness is easiest to detect with a histogram or boxplot.

Outliers

Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy