What is power?

The power of a hypothesis test is the is the probability that the test correctly rejects the null hypothesis. The power of a hypothesis test is affected by the sample size, the difference, the variability of the data, and the significance level of the test.

If a test has low power, you might fail to detect an effect and mistakenly conclude that none exists. If a test has power that is too high, very small and possibly uninteresting effects might seem to be significant.

No test is perfect; there is always the possibility that the results of a test will lead you to reject the null hypothesis (H₀) when it is actually true (a type I error) or to fail to reject H₀ when it is actually false (a type II error). This is because in order to estimate population means, you have to take random samples, and random samples are just that, random. Thus, it is always possible that your sample mean will be very different from the population mean.

For example, suppose that a certain normally distributed population has a mean (μ) of 10 and a standard deviation (σ) of 2. This distribution indicates that 95.44% of the values in this population are between 6 and 14. However, it is always possible that you could select 10 observations at random and end up with a sample mean of 4. From such a sample you wouldn't guess that the population mean of the population is actually 10!

Of course, the odds of getting such a sample are incredibly small, but it is nevertheless, possible. Sampling error can sometimes lead you to the wrong conclusion. While you can't know when this will occur, you can estimate how often it will occur. That's where power comes in.

For example, suppose you perform a 1-sample t-test to determine whether the mean volume of product dispensed into shampoo bottles in your factory is different from the target volume of 8 oz. You decide to take a random sample of 10 bottles. If μ is actually 7.5 oz (the bottles are being under filled by 0.5 oz) and σ is actually 0.43 oz, then the test has a power of 0.9039.

A power value of 0.9039 means that if you go out and repeat the same experiment many times (taking a new random sample each time), about 90.39% of the time you will end up correctly rejecting the null hypothesis. The other 9.61% of the time, sampling error will cause you to fail to reject H₀, even though it is really false. Of course, you are not likely to go out and repeat the test more than one time, but it is good to know that the odds of getting a misleading sample are relatively small.