Power for equivalence tests

Power is an important consideration for equivalence tests, just as it is for other statistical tests. However, the hypotheses for an equivalence test are different from the hypotheses for a typical test of population means.

Consider the difference between a 2-sample t-test and a 2-sample equivalence test. You use a 2-sample t-test to test whether the means of two populations are different. The hypotheses for the test are as follows:

Null hypothesis (H₀): The means of the two populations are the same.
Alternative hypothesis (H₁): The means of the two populations are different.

If the p-value for the test is less than alpha (α), then you reject the null hypothesis and conclude that the means are different.

In contrast, you use a 2-sample equivalence test to test whether the means of two populations are equivalent. Equivalence for the test is defined by a range of values that you specify (also called the equivalence interval). The hypotheses for the test are as follows:

Null hypothesis (H₀): The difference between the means is outside your equivalence interval. The means are not equivalent.
Alternative hypothesis (H₁): The difference between the means is inside your equivalence interval. The means are equivalent.

If the p-value for the test is less than α, then you reject the null hypothesis and conclude that the means are equivalent.

Thus, power for an equivalence test is the likelihood that you will conclude that the difference is within your equivalence limits, when this is true. If your test has low power, you may mistakenly conclude that the difference is not within your equivalence limits when it actually is. The following factors affect the power of your test:

Sample size: Larger samples give your test more power.
Difference: When the difference is close to the center of the two equivalence limits, your test has more power.
Standard deviation: Lower variability gives your test more power.
Alpha: Higher values for α give your test more power. However, α represents the probability of type I error. So increasing α increases your chance of claiming equivalence when it is not true.