Chapter 11 Critical Discussion of Null Hypothesis Significance Testing
Key concepts: problems with null hypothesis significance testing, meta-analysis, replication, frequentist versus Bayesian inference, theoretical population, data generating process.
Watch this micro lecture on criticisms of null hypothesis significance testing for an overview of the chapter.
Summary
In the preceding chapters, we learned to test null hypotheses. Null hypothesis significance testing is widely used in the social and behavioral sciences. There are, however, problems with null hypothesis significance tests that are increasingly being recognized.
The statistical significance of a null hypothesis test depends strongly on the size of the sample (Chapters 4 and 4.2.3), so non-significance may merely mean that the sample is too small. In contrast, irrelevant tiny effects can be statistically significant in a very large sample. Finally, we normally test a null hypothesis that there is no effect whereas we have good reasons to believe that there is an effect in the population. What does a significant test result really tell us if we reject an unlikely null hypothesis?
Among the alternatives to null hypothesis significance testing, using a confidence interval to estimate effects in the population is easiest to apply. It is closely related to null hypothesis testing, as we have seen in Section ??, but it offers us information with which we can draw a more nuanced conclusion about our results.
11.0.1 Knocking down straw men (over and over again)
There is another aspect in the practice of null hypothesis significance testing that is not very satisfactory. Remember that null hypothesis testing was presented as a means for the researcher to use previous knowledge as input to her research (Section 4.1). The development of science requires us to expand existing knowledge. Does this really happen in the practice of null hypothesis significance testing?
Imagine that previous research has taught us that one additional unit of exposure to advertisements for a brand increases a person’s brand awareness on average by 0.1 unit if we use well-tested standard scales for exposure and brand awareness. If we want to use this knowledge in our own research, we would hypothesize that the regression coefficient of exposure is 0.1 in a regression model predicting brand awareness.
Well, try to test this null hypothesis in your favourite statistics software. Can you actually tell the software that the null hypothesis for the regression coefficient is 0.1? Most likely you can’t because the software automatically tests the null hypothesis that the regression coefficient is zero in the population.
This approach is so prevalent that null hypotheses equating the population value of interest to zero have received a special name: the nil hypothesis or the nil for short (see Section ??). How can we include previous knowledge in our test if the software always tests the nil?
The null hypothesis that there is no association between the independent variable and the dependent variable in the population may be interesting to reject if you really have no clue about the association. But in the example above, previous knowledge makes us expect a positive association of a particular size. Here, it is not interesting to reject the null hypothesis of no association. The null hypothesis of no association is a straw man in this example. It is unlikely to stand the test and nobody should applaud if we knock it down. Rejecting an unlikely statement is called a strawman argument in rhetorics.
Rejecting the nil time and again should make us wonder about scientific progress and our contribution to it. Are we knocking down straw men hypotheses over and over again? Is there no way to accumulate our efforts?