12.2 Statistical Tests

A statistical test determines whether a statement about a population is plausible given a sample that is drawn from this population. In essence, a statistical test answers the question: Is the sample that we have drawn sufficiently probable if the assumption about the population would be true?

We need several ingredients to apply a statistical test:

  1. An assumption about a population.

  2. A criterion to decide if the assumption is sufficiently plausible.

  3. A sample from the population supplying information about the assumption

  4. A probability for the sample showing how plausible the assumption is.

This section discusses these four ingredients of a statistical test. The assumption about a population is the null hypothesis of the test (Section 12.2.1). We select a significance level, usually five per cent, as criterion to decide whether the null hypothesis is sufficiently plausible or not. If it is not sufficiently plausible, we reject the null hypothesis. The values for which we reject the null hypothesis constitute the rejection region of the test (Section 12.2.2). We need a sample to test whether the assumption about the population is sufficiently plausible. Finally, we let the computer calculate a probability (p value) of drawing a sample that differs at least as much from the null hypothesis as the sample that we have drawn. If this probability is smaller than the selected significance level, the sample is in the rejection region, so we must reject the null hypothesis (Section ??). This concludes the statistical test.

12.2.1 The null hypothesis

The most important statistical hypothesis is called the null hypothesis (H0). The null hypothesis specifies one value for a population statistic.

We can test this statement about the population with a random sample of children drawn from the population in which we measure their media literacy. Once we have the measurements, we can calculate average media literacy in the sample. We can compare the sample average to the hypothesized average media literacy in the population. If they are not too far apart, we conclude that the null hypothesis is plausible. If they are too far apart, we don’t think the null hypothesis is plausible and we reject it.

12.2.2 Significance level (\(\alpha\)), significance, rejection region, and Type I error

How far apart must the sample statistic value and the hypothesized population value be to conclude that the null hypothesis is not plausible? The null hypothesis is implausible if the sample that we have drawn is among the samples that are very unlikely if the null hypothesis is true. A commonly accepted threshold value is that the sample is among the five per cent most unlikely samples. This threshold is called the significance level of the test. It is often represented by the symbol \(\alpha\) (the Greek letter alpha). If our sample is among the five per cent most unlikely samples, we reject the null hypothesis and we say that the test is statistically significant.

Note that we can construct a sampling distribution for the null hypothesis only if the hypothesis specifies one value for the population statistic. If we would have multiple population values in our null hypothesis, for example, average media literacy is 5.5, 5.0, or 4.5 in the population, we would have multiple sampling distributions: one for each value. This is why the null hypothesis must specify a single value.

According to our null hypothesis, the population average is 5.5. If average media literacy of children in the population would really be 5.5, which average sample media literacy scores are most unlikely? We can use a hypothetical sampling distribution with 5.5 as mean value to answer this question.

Average media literacy can be too low to maintain the null hypothesis that it is 5.5 in the population, but it can also be too high. The significance level of five per cent is divided into two halves of 2.5% per cent; one for each tail of the sampling distribution. Graphically speaking (Figure 4.8), the significance level cuts off a part of the left-hand tail and a part of the right-hand tail of the sampling distribution. Sample means in these tails are too unlikely to be found in a sample if the null hypothesis is true.

These values constitute the rejection region of the test. If the sample statistic is in the rejection region, we reject the null hypothesis. This is the rule of the game. However, rejecting the null hypothesis does not prove that it is wrong. Perhaps, average media literacy is really 5.5 in the population, but we were so unfortunate to draw a sample of children with very low media literacy scores. This error is called a Type I error: rejecting a null hypothesis that is actually true.

We don’t know whether or when we make this error. We cannot entirely avoid this error because samples can be very different from the population from which they are drawn, as we learned in Chapter 1. Thankfully, however, we know the probability that we make this error. This probability is the significance level.

You should understand the exact meaning of probabilities here. A significance level of .05 allows five per cent of all possible samples to be so different from the population that we reject the null hypothesis even if it is true.

In other words, if we draw many samples and decide on the null hypothesis for each sample, we would reject a true null hypothesis in five per cent of our decisions. So we have a five per cent chance of making a Type I error. We decide on that probability when we select the significance level of the test. We think that 5 percent (.05) is an acceptable probability for making this type of error.