11.3 Sample Size, Effect Size, and Power
Finally, after all these sections, we can answer the question raised in the beginning of this chapter: How large should my sample be? To answer this question, we must consider effect size, type of test, significance level, and test power.
Sample size, statistical significance, effect size, and test power are related. To determine the size of your sample, you have three sliders that you should adjust simultaneously. Statistical significance is the easiest slider to decide on; we usually leave the significance level at .05. We do not select a smaller value because it will reduce the power of the test (with the same sample size and effect size) as you may have noticed while answering Question 4.
For effect size, we have to choose between a small, moderate, or large effect. Previous results of research similar to our research project can help us decide whether we should expect small effect sizes or not. If we have a concrete number for the (standardized) minimum effect size that is of practical relevance, we can use that number.
For power, the conventional rule of thumb is that we like to have at least 80 per cent probability of rejecting a false null hypothesis. You may note that the probability of not rejecting a true null hypothesis is higher: 95 per cent. Remember, the probability to reject a true null hypothesis, which is the significance level, is usually set to five per cent, so the probability of not rejecting a true null hypothesis is 95 per cent.
Power is set to a lower level because the null hypothesis is usually assumed to reflect our best knowledge about the world. From this perspective, we are keener on avoiding the error of falsely rejecting the null hypothesis (our current best knowledge) than falsely not rejecting (accepting) it. This approach, however, is not without criticisms as we will discuss in Chapter 12. Anyway, if you want to raise the power to the same level of .95, you can do so; it will require a larger sample.
Unfortunately, test power receives little attention in several software packages for statistical analysis. Using power and effect size to calculate the required sample size is usually not provided in the package. To calculate sample size, we need dedicated software, for example GPower.
11.3.1 So how do we determine sample size?
All in all, using effect size and test power to determine the size of the sample requires several decisions on the part of the researcher.
Of course, it is important to ensure that our sample meets the requirements of the tests that we want to specify (Section ??). In practice, researchers often go well beyond this minimum. They try to collect as large a sample as is feasible just to be on the safe side.
Does this mean that all we have learned about effect size and test power is useless? Certainly not. First of all, we should have learned that effect size is more important than statistical significance because effect size relates to practical relevance.
Second, test power and Type II errors are important in situations in which we do not reject the null hypothesis. Then, we should calculate test power to get an impression of our confidence in the result. Is our test of sufficient power to yield significant results if there is an effect in the population? This is the topic of the next section.