4.8 Test Recipe and Rules for Reporting
Testing a null hypothesis consists of several steps, which are summarized below, much like a recipe in a cookbook.
- Specify the statistical hypotheses.
In the first step, translate the research hypothesis into a null and alternative hypothesis. This requires choosing the right statistic for testing the research hypothesis (Section 4.3.1) and choosing between a one-sided or two-sided test if applicable (Section 4.5).
- Select the significance level of the test.
Before we execute the test, we have to choose the maximum probability of rejecting the null hypothesis if it is actually true. This is the significance level of the test. We almost always select .05 (5%) as the significance level. If we have a very large sample, e.g., several thousands of cases, we may select a lower significance level, for instance, 0.01. See Chapter 4.1.6 for more details.
- Select how the sampling distribution is created.
Are you going to use bootstrapping, an exact approach, or a theoretical probability distribution? Theoretical probability distributions are the most common choice. If you are working with statistical software, you automatically select the correct probability distribution by selecting the correct test. For example, a test on the means of two independent samples in SPSS uses the t distribution.
- Execute the test.
Let your statistical software calculate the p value of the test and/or the value of the test statistic. It is important that this step comes after the first three steps. The first three steps should be made without knowledge of the results in the sample (see Section 4.10).
- Decide on the null hypothesis.
Reject the null hypothesis if the p value is lower than the significance level or if the sample outcome is outside the confidence interval.
- Report the test results.
The ultimate goal of the test is to increase our knowledge. To this end, we have to communicate our results both to fellow scientists and to the general reader who is interested in the subject of our research.
4.8.1 Reporting to fellow scientists
Fellow scientists need to be able to see the precise statistical test results. According to the APA guidelines, we should report the test statistic, the associated degrees of freedom (if any), the value of the test statistic, the p value of the test statistic, and the confidence interval (if any). APA requires a particular format for presenting statistical results and it demands that the results are included at the end of a sentence.
The statistical results for a t test on one mean, for example, would be:
The degrees of freedom are between parentheses directly after the name of the test statistic. Chi-squared tests add sample size to the degrees of freedom, for instance: chi-squared (12, N = 89) = 23.14, p = .027.
The value of the test statistic is 2.73 in this example.
The p value is .004. Note that we report all results with two decimal places except probabilities, which are reported with three decimals. We are usually interested in small probabilities—less than .05—so we need the third decimal here. If SPSS rounds the p value to .000, report: p < .001. Add (one-sided) after the p value if the test is one-sided.
The 95% confidence interval is 4.13 to 4.87, so with 95% confidence we state that the population mean is between 4.13 and 4.87. Add (bootstrapped) after the confidence interval if the confidence interval is bootstrapped.
Not all tests produce all results reported in the example above. For example, a z test does not have degrees of freedom and F or chi-squared tests do not have confidence intervals. Exact tests or bootstrap tests usually do not have a test statistic. Just report the items that your statistical software produces, and give them in the correct format.
4.8.2 Reporting to the general reader
For fellow scientists and especially for the general reader, it is important to read an interpretation of the results that clarifies both the subject of the test and the test results. Make sure that you tell your reader who or what the test is about:
What is the population that you investigate?
What are the variables?
What are the values of the relevant sample statistics?
Which comparison(s) do you make?
Are the results statistically significant and, if so, what are the estimates for the population?
If the results are statistically significant, how large are the differences or associations?
A test on one proportion, for example, the proportion of all households reached by a television station, could be reported as follows:
The interpretation of this test tells us the population (“all households in Greece”), the variable (“reaching a household”) and the sample statistic of interest (61%, indicating a proportion). It tells us that the result is statistically significant, which a fellow scientist can check with the reported p value.
Note that the actual p value is well below .001. If we would round it to three decimals, it would become .000. This suggests that the probability is zero but there is always some probability of rejecting the null hypothesis if it is true. For this reason, APA wants us to report p < .001 instead of p = .000.
Finally, the interpretation tells us that the difference from .5 is substantial. Sometimes, we can express the difference in a number, which is called the effect size, and give a more precise interpretation (see Chapter 4.1.6 for more information).