5.2 Hypothetical World Versus Imaginary True World

In the preceding paragraphs, we determined sample size using the effect size that we expect to find in our sample. We should realize, however, that we are interested in the effect size in the population. The ‘true’ effect size, so to speak.

The effect of a new medicine or media campaign in our sample is not important but the effect in the population is. This complicates the calculation of the sample size that we need. Instead of using the effect size in our (future!) sample, we must use the effect size in the population.

5.2.1 Imagining a population with a small effect

Our null hypothesis states that average candy weight in the population is 2.8 grams. Let us decide that a small effect size is practically relevant. We can think now of a population that could be the true population if the effect size is small. For example, a population in which average candy weight is 2.9 grams (and the standard deviation is 0.5).

We do not know whether average candy weight is 2.9 grams in the true population. So we may regard the statement that average candy weight is 2.9 grams in the population as another hypothesis. Let us call this the alternative hypothesis H1. Note that this is not an ordinary alternative hypothesis because it does not include all outcomes not covered by the null hypothesis (H0). Instead, it represents only one value, which is an important value to us because it represents a population with an interesting effect size.

Figure 5.3 illustrates this situation. Before reading on, try to make sense of the steps in this figure. The questions accompanying the figure walk you through these steps.

Figure 5.3: The relation between significance tests, effect size, Type II error, and power.

5.2.2 The world of the researcher

We have two populations, a hypothetical population defined by our null hypothesis (H0) and an imaginary true population defined by the alternative hypothesis (H1). Once we have drawn our sample, we only deal with the hypothetical population as we have done in all preceding chapters.

Acting as if the null hypothesis is true, we determine how (un)likely the sample is that we have drawn. If it is very unlikely, we have a p value below the significance level and we reject the null hypothesis. We say: If the null hypothesis is true, our sample is too unlikely, so we reject the null hypothesis.

We can be wrong. Perhaps the null hypothesis is actually true and we were just very unfortunate to draw a sample that is very different from the population. If so, we make a Type I error (see Section ??). The probability that we will make this error is the significance level of the test, which is usually set to .05.

This is what we are doing once we have the sample. Let us call this the world of the researcher.

5.2.3 The alternative world of a small effect

Now let us ask ourselves: What is going to happen to our statistical test if the true population from which we draw our sample has average candy weight that is a bit higher (small effect) than candy weight according to our null hypothesis?

If we actually sample from this imaginary true population, the sampling distribution centered around the alternative hypothesis (H1) in Figure 5.3 represents our true sampling distribution. It shows us the true probabilities (areas under the curve) of drawing a sample with a particular minimum or maximum value. These are the probabilities of our sample if there is a small effect in the population.

Now that we know the true sampling distribution if there is a small true effect in the population, we can foresee what is going to happen when we enter the world of the researcher. The researcher is going to use the rejection region to decide on the null hypothesis. If the sample mean is in the rejection region, the researcher is going to reject the null hypothesis. Otherwise, the null hypothesis is not rejected.

5.2.4 Type II error

If there is a (small) true effect in the population, the null hypothesis is not true. For example, average candy weight is not 2.8 grams, it is 2.9 grams in the population. If our sample mean is close to 2.8 grams, we may not reject the null hypothesis even if it is not true. This is a Type II error: not rejecting a false null hypothesis.

The probability that we make a Type II error if there is a small effect is expressed by the yellow section in Figure 5.3, Steps 3 and 4. It is usually denoted by the Greek letter beta (\(\beta\)). The yellow section represents the probability of drawing a sample from the population with a small effect size that is not in the rejection region, so the null hypothesis is not rejected.

Table 5.1 summarizes the four possible situations that may arise if we test a null hypothesis. The null hypothesis may be true or false and we may or may not reject the null hypothesis.

Table 5.1: Error types and their probabilities.
Null is true Null is false
Null is rejected Type I error, Significance level (alpha) No error, Power (1 - beta)
Null is not rejected No error, (1 - alpha) Type II error, (beta)

5.2.5 Power of the test

The probability of not making a Type II error is called the power of the test. It is equal to one minus the probability of making a Type II error, that is, 1 - \(\beta\). The power of the test is represented by the green section in Figure 5.3, Step 4. It represents the probability of getting a sample that makes the researcher reject the null hypothesis. So a false null hypothesis is rejected and we do not make an error.

In the example of Figure 5.3 Step 4, the power of the test is 84 per cent (0.84). If average candy weight is 2.9 grams in the population, we have 84 per cent probability of rejecting the null hypothesis that it is 2.8 grams if we draw a sample (of the current size) from this population. This is usually considered an acceptable level of power (see Section 5.3).

Note that the power of a test is defined in relation to the true value in the population in, in this case an avarage candy weight of 2.9 grams, with a true effect size of (2.8 - 2.9) 0.1 gram. In research we often have no idea about the actual value in the population. We therefore often report the post hoc power and the observed effect size.

5.2.6 Post hoc power

The post hoc power refers to probability of rejecting the null hypothesis assuming the alternative hypothesis has a population mean equal to the observed semple mean or more accurately the observed test statistic. Though an often used approach, it is obvious that multiple replications of a research study will yield different results. As the true population mean is not a random variable, the acutal power of a test should not vary.