Interpret all statistics and graphs for 1-Sample Sign

Find definitions and interpretation guidance for every statistic and graph that is provided with the 1-sample sign analysis.

N

The sample size (N) is the total number of observations in the sample.

Interpretation

The sample size affects the confidence interval and the power of the test.

Usually, a larger sample size results in a narrower confidence interval. A larger sample size also gives the test more power to detect a difference. For more information, go to What is power?.

Median

The median is the midpoint of the data set. This midpoint value is the point at which half the observations are above the value and half the observations are below the value. The median is determined by ranking the observations and finding the observation that are at the number [N + 1] / 2 in the ranked order. If the number of observations are even, then the median is the average value of the observations that are ranked at numbers N / 2 and [N / 2] + 1.

Interpretation

The median of the sample data is an estimate of the population median.

Because the median is based on sample data and not on the entire population, it is unlikely that the sample median equals the population median. To better estimate the population median, use the confidence interval.

Confidence interval (CI) and bounds

The confidence interval provides a range of likely values for the population median. Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. But, if you repeated your sample many times, a certain percentage of the resulting confidence intervals or bounds would contain the unknown population median. The percentage of these confidence intervals or bounds that contain the median is the confidence level of the interval. For example, a 95% confidence level indicates that if you take 100 random samples from the population, you could expect approximately 95 of the samples to produce intervals that contain the population median.

An upper bound defines a value that the population median is likely to be less than. A lower bound defines a value that the population median is likely to be greater than.

The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.

The 1-sample sign test does not always achieve the confidence level that you specify because the sign test statistic is discrete. Because of this, Minitab calculates three confidence intervals with varying levels of precision. You should use the shortest interval for which the achieved confidence level is closest to the target confidence level.
  • The first confidence interval has the highest achievable confidence level that is less than the confidence level that you specify. The position indicates which observation Minitab uses for the upper and lower bound. or example, if the position is (7,14), the confidence interval is between the 7th smallest observation and the 14th smallest observation.
  • The second confidence interval is always at the confidence level that you specify. The upper and lower bounds of the confidence interval are not actual observations from the sample, so there is no position. Minitab uses nonlinear interpolation (NLI) to calculate this confidence interval.
  • The third confidence interval has the achievable confidence level that is greater than the confidence level that you specify. This is usually the widest interval.
Descriptive Statistics
N
Median
95% Confidence Interval for η
Achieved Confidence
Position
(4, 9)
Interpolation
(3, 10)

In these results, the estimate of the population median for the percentage of chromium is 17.7. You can use the second interval because it is the shortest interval that has a confidence interval closest to the target of 95%. You can be 95% confident that the population median is between 17.43 and 18.76.

Achieved Confidence

The achieved confidence level is the confidence level that is below or above the confidence level that you specify. The achieved confidence level indicates how likely it is that the population median is contained in the confidence interval. For example, a 95% confidence level indicates that if you take 100 random samples from the population, you could expect approximately 95 of the samples to produce intervals that contain the population median.

The 1-sample sign test does not always achieve the confidence level that you specify because the sign test statistic is discrete. Because of this, Minitab calculates three confidence intervals with varying levels of precision. You should use the shortest interval for which the achieved confidence level is closest to the target confidence level.
  • The first confidence interval has the highest achievable confidence level that is less than the confidence level that you specify. The position indicates which observation Minitab uses for the upper and lower bound. or example, if the position is (7,14), the confidence interval is between the 7th smallest observation and the 14th smallest observation.
  • The second confidence interval is always at the confidence level that you specify. The upper and lower bounds of the confidence interval are not actual observations from the sample, so there is no position. Minitab uses nonlinear interpolation (NLI) to calculate this confidence interval.
  • The third confidence interval has the achievable confidence level that is greater than the confidence level that you specify. This is usually the widest interval.

Position

The position is the ordered rank of the data. The position indicates which observation Minitab uses for the upper and lower bound of the first and third confidence intervals. For example, if the position is (7,14), the confidence interval is between the 7th smallest observation and the 14th smallest observation.

For the second interval, Minitab uses nonlinear interpolation, which does not require a position.

Null hypothesis and alternative hypothesis

The null and alternative hypotheses are two mutually exclusive statements about a population. A hypothesis test uses sample data to determine whether to reject the null hypothesis.
Null hypothesis
The null hypothesis states that a population parameter (such as the mean, the standard deviation, and so on) is equal to a hypothesized value. The null hypothesis is often an initial claim that is based on previous analyses or specialized knowledge.
Alternative hypothesis
The alternative hypothesis states that a population parameter is smaller, greater, or different than the hypothesized value in the null hypothesis. The alternative hypothesis is what you might believe to be true or hope to prove true.

Interpretation

In the output, the null and alternative hypotheses help you to verify that you entered the correct value for the hypothesized median.

Significance level

The significance level (denoted as α or alpha) is the maximum acceptable level of risk for rejecting the null hypothesis when the null hypothesis is true (type I error). Usually, you choose the significance level before you analyze the data. In Minitab, you can select the significance level by specifying the Confidence level on the Options tab, because the significance level equals 1 minus the confidence level. Because the default confidence level in Minitab is 0.95, the default significance level is 0.05.

Interpretation

Compare the significance level to the p-value to decide whether to reject or fail to reject the null hypothesis (H0). If the p-value is less than the significance level, the usual interpretation is that the results are statistically significant, and you reject H0.

Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.
  • Choose a higher significance level, such as 0.10, to be more certain that you detect any difference that possibly exists. For example, a quality engineer compares the stability of new ball bearings with the stability of current bearings. The engineer must be highly certain that the new ball bearings are stable because unstable ball bearings could cause a disaster. The engineer chooses a significance level of 0.10 to be more certain of detecting any possible difference in stability of the ball bearings.
  • Choose a lower significance level, such as 0.01, to be more certain that you detect only a difference that actually exists. For example, a scientist at a pharmaceutical company must be very certain about a claim that the company's new drug significantly reduces symptoms. The scientist chooses a significance level of 0.001 to be more certain that any significant difference in symptoms does exist.

Number >

This value is the number of values in the sample that are greater than the test median.

Interpretation

Minitab uses the number of values in the sample that are less than, equal to, and greater than the test median to calculate the p-value. Usually, larger differences between the number of observations that are greater or and less than the median produce lower p-values. Minitab removes the observations that are equal to the test median and then Minitab reduces the number of observations that it uses to calculate the p-value by the number of observations that it removed.

Number <

This value is the number of values in the sample that are less than the test median.

Interpretation

Minitab uses the number of values in the sample that are less than, equal to, and greater than the test median to calculate the p-value. Usually, larger differences between the number of observations that are greater or and less than the median produce lower p-values. Minitab removes the observations that are equal to the test median and then Minitab reduces the number of observations that it uses to calculate the p-value by the number of observations that it removed.

Number =

This value is the number of values in the sample that are equal to the test median.

Interpretation

Minitab uses the number of values in the sample that are less than, equal to, and greater than the test median to calculate the p-value. Usually, larger differences between the number of observations that are greater or and less than the median produce lower p-values. Minitab removes the observations that are equal to the test median and then Minitab reduces the number of observations that it uses to calculate the p-value by the number of observations that it removed.

P-Value

The p-value is a probability that measures the evidence against the null hypothesis. A smaller p-value provides stronger evidence against the null hypothesis.

Interpretation

Use the p-value to determine whether the population median is statistically different from the hypothesized median.

To determine whether the difference between the population median and the hypothesized median is statistically significant, compare the p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.
P-value ≤ α: The difference between the medians is significantly different (Reject H0)
If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis. You can conclude that the difference between the population median and the hypothesized median is statistically significant. Use your specialized knowledge to determine whether the difference is practically significant. For more information, go to Statistical and practical significance.
P-value > α: The difference between the medians is not significantly different (Fail to reject H0)
If the p-value is greater than the significance level, the decision is to fail to reject the null hypothesis. You do not have enough evidence to conclude that the population median is significantly different from the hypothesized median. You should make sure that your test has enough power to detect a difference that is practically significant. For more information, go to Increase the power of a hypothesis test.

Boxplot

A boxplot provides a graphical summary of the distribution of a sample. The boxplot shows the shape, central tendency, and variability of the data.

Interpretation

Use a boxplot to identify any potential outliers. Boxplots are best when the sample size is greater than 20.

Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

On a boxplot, asterisks (*) denote outliers.

Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers.

Histogram

A histogram divides sample values into many intervals and represents the frequency of data values in each interval with a bar.

Interpretation

Use the graphs to look for symmetry and to identify potential outliers.

Symmetric and skewed data

A distribution is symmetric when a vertical line can be drawn down the middle and the two sides will mirror each other. When the data are not symmetric, they are skewed to one side or the other.

Symmetric normal distribution
Symmetric nonnormal distribution
Symmetrical distributions

The normal distribution is the most common symmetric distribution, but the data do not have to be normal to be symmetric.

Right skewed distributions

Right-skewed data (also called positive-skewed data) are so named because the "tail" of the distribution points to the right. The histogram with right-skewed data shows wait times. Most of the wait times are relatively short, and only a few wait times are long.

Left skewed distributions

Left-skewed data (also called negative-skewed data) are so named because the "tail" of the distribution points to the left. The histogram with left-skewed data shows failure time data. A few items fail immediately, and many more items fail later.

If your data come from a symmetric distribution, use a 1-Sample Wilcoxon.

Outliers

Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

On a histogram, isolated bars at either ends of the graph identify possible outliers.

Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers.

Individual value plot

An individual value plot displays the individual values in the sample. Each circle represents one observation. An individual value plot is especially useful when you have relatively few observations and when you also need to assess the effect of each observation.

Interpretation

Use an individual value plot to examine the spread of the data and to identify any potential outliers. Individual value plots are best when the sample size is less than 50.

Symmetric and skewed data

A distribution is symmetric when a vertical line can be drawn down the middle and the two sides will mirror each other. When the data are not symmetric, they are skewed to one side or the other.

Symmetric normal distribution
Symmetric nonnormal distribution
Symmetrical distributions

The normal distribution is the most common symmetric distribution, but the data do not have to be normal to be symmetric.

Right-skewed
Left-skewed
Skewed distributions

The individual value plot with right-skewed data shows wait times. Most of the wait times are relatively short, and only a few wait times are long. The individual value plot with left-skewed data shows failure time data. A few items fail immediately, and many more items fail later.

If your data come from a symmetric distribution, use a 1-Sample Wilcoxon.

Outliers

Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

On an individual value plot, unusually low or high data values indicate possible outliers.

Try to identify the cause of any outliers. Correct any data–entry errors or measurement errors. Consider removing data values for abnormal, one-time events (also called special causes). Then, repeat the analysis. For more information, go to Identifying outliers.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy