12.3 Take-Home Points

  • We use a statistical test if we want to decide on a null hypothesis: reject or not reject?

  • The decision rules should be specified beforehand: Decide on the direction of the test (one-sided or two-sided) and the significance level.

  • The null and alternative hypotheses always concern a population statistic. Together they cover all possible outcomes for the statistic. The null hypothesis always specifies one (boundary) value for the population statistic.

  • We reject the null hypothesis if a test is statistically significant. This means that the probability of drawing a sample with the current or a more extreme outcome (even more inconsistent with the null hypothesis) if the null hypothesis is true (conditional probability) is below the significance level.

  • A statistically significant test does not prove that the null hypothesis is false. We can make a Type I error: rejecting a true null hypothesis.

  • The 95% confidence interval includes all null hypotheses that would not be rejected by our current sample in a two-sided test at five per cent significance level. It contains the population values that are not sufficiently contradicted by the sample data.

  • The calculated p value is only correct if the data is used for no more than one null hypothesis test and the null hypothesis was formulated beforehand.

  • If the same data is used for more null hypotheses tests, the probability of a Type I error increases. We obtain too many significant results, which is called capitalization on chance.

Bullock, J. G., & Ha, S. E. (2011). Mediation analysis is harder than it looks. In J. N. Druckman, D. P. Green, J. H. Kuklinski, & A. Lupia (Eds.), Cambridge handbook of experimental political science (pp. 508–522). Cambridge University Press. https://doi.org/10.1017/CBO9780511921452.035
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. San Diego, CA: Academic Press.
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.
Davis, J. A. (1985). The logic of causal order. Beverly Hills, CA: Sage.
de Groot, A. D. (1969). Methodology: Foundations of Inference and Research in the Behavioral Sciences. The Hague: Mouton.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann.Statist., 7(1), 1–26.
Efron, Bradley. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. https://doi.org/10.1080/01621459.1987.10478410
Erdogan, B. Z. (1999). Celebrity endorsement: A literature review. Journal of Marketing Management, 15(4), 291–314.
Fisher, R. A. (1919). The correlation between relatives on the supposition of mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52(2), 399–433. https://doi.org/10.1017/S0080456800012163
Fisher, Ronald Aylmer. (1935). The design of experiments. Edinburgh: Oliver; Boyd.
Fisher, Ronald Aylmer. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society.Series B (Methodological), 17(1), 69–78. Retrieved from http://www.jstor.org/stable/2983785
Frick, R. W. (1998). Interpreting statistical testing: Process and propensity, not population and random sampling. Behavior Research Methods, Instruments, & Computers, 30(3), 527–535. https://doi.org/10.3758/BF03200686
Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium auctore carolo friderico gauss. sumtibus Frid. Perthes et IH Besser.
Hainmueller, J., Mummolo, J., & Xu, Y. (2016). How much should we trust estimates from multiplicative interaction models? Simple tools to improve empirical practice. https://doi.org/10.2139/ssrn.2739221
Halpin, P. F., & Stam, H. J. (2006). Inductive inference or inductive behavior: Fisher and neyman: Pearson approaches to statistical testing in psychological research (1940-1960). The American Journal of Psychology, 119(4), 625–653. https://doi.org/10.2307/20445367
Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press.
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Holbert, R. L., & Park, E. (2019). Conceptualizing, Organizing, and Positing Moderation in Communication Research. Communication Theory. https://doi.org/10.1093/ct/qtz006
Laplace, P. S. de. (1812). Théorie analytique des probabilités (Vol. 7). Courcier.
Lehmann, E. L. (1993). The fisher, neyman-pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249. https://doi.org/10.1080/01621459.1993.10476404
Lyon, A. (2014). Why are normal distributions normal? The British Journal for the Philosophy of Science, 65(3), 621–649.
McCracken, G. (1989). Who is the celebrity endorser? Cultural foundations of the endorsement process. Journal of Consumer Research, 16(3), 310–321.
Neyman, Jerzy. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London.Series A, Mathematical and Physical Sciences, 236(767), 333–380.
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond A, 231(694-706), 289. Retrieved from http://rsta.royalsocietypublishing.org/content/231/694-706/289.abstract
O’Keefe, D. J. (2007). Brief report: Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power: Sorting out appropriate uses of statistical power analyses. Communication Methods and Measures, 1(4), 291–299. https://doi.org/10.1080/19312450701641375
Sawilowsky, S. (2009). New Effect Size Rules of Thumb. Journal of Modern Applied Statistical Methods, 8(2). https://doi.org/10.22237/jmasm/1257035100
Smithson, M. (2001). Correct Confidence Intervals for Various Regression Effect Sizes and Parameters: The Importance of Noncentral Distributions in Computing Intervals. Educational and Psychological Measurement, 61(4), 605–632. https://doi.org/10.1177/00131640121971392
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108
Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594.