Alternatives to NHST

Klinkenberg

University of Amsterdam

2024-09-23

H0 and HA distribution

G*Power

Determine the required sample size for a desired test power, significance level, and effect size.

G*Power is a tool to compute statistical power analyses for many different \(t\) tests, \(F\) tests, \(\chi^2\) tests, \(z\) tests and some exact tests.

gpower.hhu.de

The problem with
P-values

There is no problem

The problem with P-values is that they are often misunderstood and misinterpreted. The P-value is the probability of observing a sample statistic as or more extreme as the one obtained, given that the null hypothesis is true. It is NOT the probability that the null hypothesis is true. The P-value is NOT a measure of the strength of the evidence against the null hypothesis.

The misinterpretation is the problem, and not adhering to the Nayman-Pearson paradigm

The dance of the P-value

Confidence Interval

Testing hypothesis with CI’s

Using confidence intervals to test hypotheses is a common practice in scientific research.

We can use this knowledge to test hypotheses. If the confidence interval does not contain the null hypothesis value, the null hypothesis can be rejected.

Note that this is not the same as our decision rule for NHST. In NHST, we compare the p-value to the alpha level. In the case of the confidence interval, we compare the confidence interval to the null hypothesis value.

Standard Error

95% confidence interval

\[SE = \frac{\text{Standard deviation}}{\text{Square root of sample size}} = \frac{s}{\sqrt{n}}\]

Lowerbound = \(\bar{x} - 1.96 \times SE\)
Upperbound = \(\bar{x} + 1.96 \times SE\)

Candy weight example

Plot CI

5 out of 100 samples

Common Misinterpretations

Confidence intervals and levels are frequently misunderstood, and published studies have shown that even professional scientists often misinterpret them (Wikipedia, 2024)

Hoekstra, Morey, Rouder, & Wagenmakers (2014) administerred the following questionair to 120 researchers.

Important

All of the statements are false

Researcher don’t know

Table 2 from Hoekstra et al. (2014)
#True	First-Year Students (n = 442)	Master Students (n = 34)	Researchers (n = 118)
0	2%	0%	3%
1	6%	24%	9%
2	14%	18%	14%
3	26%	15%	25%
4	30%	12%	22%
5	15%	21%	16%
6	7%	12%	11%

Conclusion

Use both NHST and confidence intervals

Example

We also studied the validity by comparing the mean ability ratings of children in different grades. We expected a positive relation between grade and ability. Figure 5 shows the average ability rating for each grade and domain. As expected, children in older age groups had a higher rating than children in younger age groups. In all four domains, there is an overall significant effect of grade: addition \(F(5,1456)=1091.4,p<.01,\omega^2=.78\), subtraction \(F(5,1363)=780.5,p<.01,\omega^2=.74\), multiplication \(F(5,1215)=409.6,p<.01,\omega^2=.62\), and \(F(5,973)=223.31,p<.01,\omega^2=.53\) for division. Levene’s tests show differences in variances for the domains multiplication and division. However, the non-parametric Kruskal-Wallis tests also show significant differences for these domains: \(\chi^2(5)=753.28,p<.01\) for multiplication and \(\chi^2(5)=505.17,p<.01\) for division. For all domains, post hoc analyses show significant differences between all grades, except for the differences between grades five and six (Klinkenberg, Straatemeier, & van der Maas, 2011).

End

Contact

References

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.

Klinkenberg, S., Straatemeier, M., & van der Maas, H. L. J. (2011). Computer adaptive practice of maths ability using a new item response model for on the fly ability and difficulty estimation. Computers & Education, 57(2), 1813–1824. https://doi.org/https://doi.org/10.1016/j.compedu.2011.02.003

Wikipedia. (2024). Confidence interval — Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Confidence%20interval&oldid=1207611938.

#True	First-Year Students (n = 442)	Master Students (n = 34)	Researchers (n = 118)
0	2%	0%	3%
1	6%	24%	9%
2	14%	18%	14%
3	26%	15%	25%
4	30%	12%	22%
5	15%	21%	16%
6	7%	12%	11%

#True	First-Year Students (n = 442)	Master Students (n = 34)	Researchers (n = 118)
0	2%	0%	3%
1	6%	24%	9%
2	14%	18%	14%
3	26%	15%	25%
4	30%	12%	22%
5	15%	21%	16%
6	7%	12%	11%

#True	First-Year Students (n = 442)	Master Students (n = 34)	Researchers (n = 118)
0	2%	0%	3%
1	6%	24%	9%
2	14%	18%	14%
3	26%	15%	25%
4	30%	12%	22%
5	15%	21%	16%
6	7%	12%	11%