Example of Equivalence Test for a 2x2 Crossover Design

A quality engineer at a consumer healthcare company wants to determine whether their generic antacid is equivalent to a name-brand antacid. Two groups of participants receive a 5-day course of one antacid, followed by a 2-week washout period, and then a 5-day course of the other antacid. Group 1 receives the generic antacid (the test treatment) followed by the name-brand antacid (the reference treatment). Group 2 receives the name-brand antacid followed by the generic antacid. The engineer measures the gastric pH on the last day of each treatment. Because lower pH values are more acidic, higher values mean the drug is more effective. The engineer will consider the antacids equivalent if the test pH is within 10% of the reference pH.

The engineer performs an equivalence test for a 2x2 crossover design to determine whether the test pH is within 10% of the reference pH.

  1. Open the sample data, StomachAcid.MTW.
  2. Choose Stat > Equivalence Tests > 2x2 Crossover Design.
  3. From the drop-down list, select Data for two sequences are unstacked.
  4. From Treatment order for sequence 1, select Test, Reference.
  5. In Sequence 1, Period 1, enter Group 1, Generic. In Sequence 1, Period 2, enter Group 1, Brand.
  6. In Sequence 2, Period 1, enter Group 2, Brand. In Sequence 2, Period 2, enter Group 2, Generic.
  7. From Hypothesis about, select Test mean - reference mean.
  8. From What do you want to determine? (Alternative hypothesis), select Lower limit < test mean - reference mean < upper limit.
  9. In Lower limit, enter –0.1.
  10. In Upper limit, enter 0.1.
  11. Select Multiply by reference mean.
  12. Click Options.
  13. In Label for reference treatment, type Brand. In Label for test treatment, type Generic.
  14. Click OK in each dialog box.

Interpret the results


If either the carryover effect or the period effect is significant, then the results of the equivalence test may not be reliable.

The p-value for the carryover effect (0.498) and the p-value for the period effect (0.128) are both greater than 0.05. Thus, these effects are not significant at the 0.05 level.

The p-value for the treatment effect (0.000) is less than 0.05. Thus the treatment effect is significant at the 0.05 level. The significant treatment effect indicates that one antacid is better than the other at raising gastric pH. The generic antacid did not raise gastric pH as much as the brand-name antacid. The mean gastric pH after using the generic antacid was approximately 0.321 less than the mean pH after using the brand-name antacid.

The confidence interval for equivalence (−0.42735, 0) falls partly outside of the equivalence interval (−0.42503, 0.42503). Thus, the engineer cannot claim that the two antacids are equally effective at reducing stomach acid.


Treatment order for subjects in sequence 1: Generic, Brand
Treatment order for subjects in sequence 2: Brand, Generic
Lower equivalence limit = -0.1 × sample reference mean = -0.42503
Upper equivalence limit = 0.1 × sample reference mean = 0.42503

Descriptive Statistics

Period 1Period 2
Within-subject standard deviation = 0.08825


EffectSEDFT-ValueP-Value95% CI for Equivalence
Carryover0.451810.64988150.695210.498(-0.93339, 1.8370)
Treatment-0.321040.06064115-5.29410.000(-0.45030, -0.19179)
Period-0.0977080.06064115-1.61120.128(-0.22696, 0.031546)

Difference: Mean(Generic) - Mean(Brand)

DifferenceSE95% CI for
Equivalence Interval
-0.321040.060641(-0.427349, 0)(-0.425035, 0.425035)
CI is not within the equivalence interval. Cannot claim equivalence.


Null hypothesis:Difference ≤ -0.42503 or Difference ≥ 0.42503
Alternative hypothesis:-0.42503 < Difference < 0.42503
α level:0.05
Null HypothesisDFT-ValueP-Value
Difference ≤ -0.42503151.71490.053
Difference ≥ 0.4250315-12.3030.000
The greater of the two P-Values is 0.053. Cannot claim equivalence.