The sample size (N) is the total number of observations in each group.
The sample size affects the confidence interval and the power of the test.
Usually, a larger sample yields a narrower confidence interval. A larger sample size also gives the test more power to detect a difference.
The mean of the observations within each group. The mean describes each group with a single value identifying the center of the data. It is the sum of all the observations with a group divided by the number of observations in that group.
The mean of each sample provides an estimate of each population mean. The differences between sample means are the estimates of the difference between the population means.
Because the difference between the group means are based on data from a sample and not the entire population, you cannot be certain it equals the population difference. To get a better sense of the population difference, you can use the confidence interval.
Standard Deviation (StDev)
The standard deviation is the most common measure of dispersion, or how spread out the data are around the mean. The symbol σ (sigma) is often used to represent the standard deviation of a population. The symbol s is used to represent the standard deviation of a sample.
The standard deviation uses the same units as the variable. A higher standard deviation value indicates greater spread in the data. A good rule of thumb for a normal distribution is as follows:
Approximately 68% of the values fall within one standard deviation of the mean.
95% of the values fall within two standard deviations.
99.7% of the values fall within three standard deviations.
The sample standard deviation of a group is an estimate of the population standard deviation of that group. The standard deviations are used to calculate the confidence intervals and the p-values. Larger sample standard deviations result in less precise (wider) confidence intervals and lower statistical power.
Analysis of variance assumes that the population standard deviations for all levels are equal. If you cannot assume equal variances, use Welch's ANOVA, which is an option for One-Way ANOVA.
Confidence Interval for group means (95% CI)
These confidence intervals (CI) are ranges of values that are likely to contain the true mean of each population. The confidence intervals are calculated using the pooled standard deviation.
Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. But, if you repeat your sample many times, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.
The confidence interval is composed of the following two parts:
The point estimate is the estimate of the parameter that is calculated from the sample data. The confidence interval is centered around this value.
Margin of error
The margin of error defines the width of the confidence interval and is determined by the observed variability in the sample, the sample size, and the confidence level. To calculate the upper limit of the confidence interval, the error margin is added to the point estimate. To calculate the lower limit of the confidence interval, the error margin is subtracted from the point estimate.
Use the confidence interval to assess the estimate of the population mean for each group.
For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the group mean. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.
The pooled standard deviation is an estimate of the common standard deviation for all levels. The pooled standard deviation is the standard deviation of all data points around their group mean (not around the overall mean). Larger groups have a proportionally greater influence on the overall estimate of the pooled standard deviation.
A higher standard deviation value indicates greater spread in the data. A higher value produces less precise (wider) confidence intervals and low statistical power.
Minitab uses the pooled standard deviation to create the confidence intervals for both the group means and the differences between group means.
Example of a pooled standard deviation
Suppose your study has four groups, as shown in the following table.
The first three groups are equal in size (n=50) with standard deviations around 3. The fourth group is much larger (n=200) and has a higher standard deviation (6.8). Because the pooled standard deviation uses a weighted average, its value (5.488) is closer to the standard deviation of the largest group.