Chapter 5 Which Sample Size Do I Need? Power!

Key concepts: minimum sample size, practical relevance, unstandardized effect size, standardized effect size, Cohen’s d for means, Type I error, Type II error, test power.

Watch this micro lecture on test power for an overview of the chapter.

Summary

How large should my sample be?
What does it mean if a statistical test is not significant?

At the start of a quantitative research project, we are confronted with a seemingly simple practical question: How large should our sample be? In some cases, the statistical test that we plan to use gives us rules of thumb for the minimum size that we need for this test.

This may tell us the minimum sample size but not necessarily the optimal sample size. Even if we can apply the statistical test technically, sample size may not be sufficient for the test to signal the population differences or associations, for short, the effect sizes, that we are interested in.

If we want to know the minimum sample size that we need to signal important effects in our data, things become rather complicated. We have to decide on the size of effects that we deem important. We also have to decide on the minimum probability that the statistical test will actually reject the hypothesis of no effect (the nil) if the true effect in the population has the selected interesting size.

This probability is the power of a test: the probability of rejecting a null hypothesis of no effect if the effect in the population is of a size interesting to us. If we do not reject a false null hypothesis, we make a Type II error.

Thinking about sample size thus confronts us with a problem that we have hitherto neglected, namely the problem of not rejecting a false null hypothesis. This problem is very important if the null hypothesis represents our research hypothesis. If the null hypothesis represents our research hypothesis, our expectations are confirmed if we do not succeed in rejecting the null hypothesis.

However, if we do not reject the null hypothesis, we cannot make a Type I error, namely rejecting a true null hypothesis. As a consequence, the significance level of our test, which is the maximum probability of making a Type I error, is meaningless. We must know the probability of rejecting a false null hypothesis—the power of the test—to express our confidence that our research hypothesis is true.

This chapter reviews concepts that are central to understanding what statistical significance of a test means: sampling distribution, hypotheses, statistical significance, and Type I error. It adds concepts that we need to interpret a test that is not statistically significant: effect size, practical relevance, Type II error, and test power.