12.5 Take-Home Points

  • True effect size is the difference between the hypothesized value and the true (population) value.

  • Observed effect size is the difference between the hypothesized value and the observed (sample) value.

  • Effect size is related to practical relevance. Effect sizes are expressed by (standardized) mean differences, regression coefficients, and measures of association such as the correlation coefficient, R2, and eta2.

  • Statistical significance of a test depends on the observed effect size and sample size. Because sample size affects statistical significance, it is wrong to use significance or a p value as an indication of effect size.

  • If we do not reject a null hypothesis, this does not mean that the null hypothesis is true. We may make a Type II error: not rejecting a false null hypothesis. A researcher can make this error only if the null hypothesis is not rejected.

  • The probability of making a Type II error is commonly denoted with the Greek letter beta (\(\beta\)).

  • The probability of not making a Type II error is the power of the test.

  • The power of a test tells us the probability that we reject the null hypothesis if there is an effect of a particular size in the population. The larger this probability, the more confident we are that we do not overlook an effect when we do not reject the null hypothesis.

  • A practical way to increase test power: Draw a larger sample.

Bullock, J. G., & Ha, S. E. (2011). Mediation analysis is harder than it looks. In J. N. Druckman, D. P. Green, J. H. Kuklinski, & A. Lupia (Eds.), Cambridge handbook of experimental political science (pp. 508–522). Cambridge University Press. https://doi.org/10.1017/CBO9780511921452.035
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. San Diego, CA: Academic Press.
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.
Davis, J. A. (1985). The logic of causal order. Beverly Hills, CA: Sage.
de Groot, A. D. (1969). Methodology: Foundations of Inference and Research in the Behavioral Sciences. The Hague: Mouton.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann.Statist., 7(1), 1–26.
Efron, Bradley. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. https://doi.org/10.1080/01621459.1987.10478410
Erdogan, B. Z. (1999). Celebrity endorsement: A literature review. Journal of Marketing Management, 15(4), 291–314.
Fisher, R. A. (1919). The correlation between relatives on the supposition of mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52(2), 399–433. https://doi.org/10.1017/S0080456800012163
Fisher, Ronald Aylmer. (1935). The design of experiments. Edinburgh: Oliver; Boyd.
Fisher, Ronald Aylmer. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society.Series B (Methodological), 17(1), 69–78. Retrieved from http://www.jstor.org/stable/2983785
Frick, R. W. (1998). Interpreting statistical testing: Process and propensity, not population and random sampling. Behavior Research Methods, Instruments, & Computers, 30(3), 527–535. https://doi.org/10.3758/BF03200686
Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium auctore carolo friderico gauss. sumtibus Frid. Perthes et IH Besser.
Hainmueller, J., Mummolo, J., & Xu, Y. (2016). How much should we trust estimates from multiplicative interaction models? Simple tools to improve empirical practice. https://doi.org/10.2139/ssrn.2739221
Halpin, P. F., & Stam, H. J. (2006). Inductive inference or inductive behavior: Fisher and neyman: Pearson approaches to statistical testing in psychological research (1940-1960). The American Journal of Psychology, 119(4), 625–653. https://doi.org/10.2307/20445367
Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press.
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Holbert, R. L., & Park, E. (2019). Conceptualizing, Organizing, and Positing Moderation in Communication Research. Communication Theory. https://doi.org/10.1093/ct/qtz006
Laplace, P. S. de. (1812). Théorie analytique des probabilités (Vol. 7). Courcier.
Lehmann, E. L. (1993). The fisher, neyman-pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88(424), 1242–1249. https://doi.org/10.1080/01621459.1993.10476404
Lyon, A. (2014). Why are normal distributions normal? The British Journal for the Philosophy of Science, 65(3), 621–649.
McCracken, G. (1989). Who is the celebrity endorser? Cultural foundations of the endorsement process. Journal of Consumer Research, 16(3), 310–321.
Neyman, Jerzy. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London.Series A, Mathematical and Physical Sciences, 236(767), 333–380.
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond A, 231(694-706), 289. Retrieved from http://rsta.royalsocietypublishing.org/content/231/694-706/289.abstract
O’Keefe, D. J. (2007). Brief report: Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power: Sorting out appropriate uses of statistical power analyses. Communication Methods and Measures, 1(4), 291–299. https://doi.org/10.1080/19312450701641375
Sawilowsky, S. (2009). New Effect Size Rules of Thumb. Journal of Modern Applied Statistical Methods, 8(2). https://doi.org/10.22237/jmasm/1257035100
Smithson, M. (2001). Correct Confidence Intervals for Various Regression Effect Sizes and Parameters: The Importance of Noncentral Distributions in Computing Intervals. Educational and Psychological Measurement, 61(4), 605–632. https://doi.org/10.1177/00131640121971392
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108
Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594.