- Sep 2019
[This was a peer review for the journal "Meta-Psychology", and I am posting it via hypothes.is at the journal's suggestion.]
I thank the authors for their response to our article. For full disclosure, I previously reviewed an earlier version of this manuscript. The present version of the manuscript shows improvement, but does not yet address several of my substantial concerns, each of which I believe should be thoroughly addressed if a revision is invited. My concerns are as follows:
1.) The publication bias corrections still rely on incorrect statistical reasoning, and using more appropriate methods yields quite different conclusions.
Regarding publication bias, the first analysis of the number of expected versus observed p-values between 0.01 and 0.05 that is presented on page 3 (i.e., “Thirty nine…should be approximately 4%”) cannot be interpreted as a test of publication bias, as described in my previous review. The p-values would only be uniformly distributed if the null were true for every study in the meta-analysis. If the null does not hold for every study in the meta-analysis, then we would of course expect more than 4% of the p-values to fall in [0.01, 0.05], even in the absence of any publication bias. I appreciate that the authors have attempted to address this by additionally assessing the excess of marginal p-values under two non-null distributions. However, these analyses are still not statistically valid in this context ; they assume that every study in the meta-analysis has exactly the same effect size (i.e., that there is no heterogeneity), which is clearly not the case in the present meta-analyses. Effect heterogeneity can substantially affect the distribution and skewness of p-values in a meta-analysis (see Valen & Yuan, 2007). To clarify the second footnote on page 3, I did not suggest this particular analysis in my previous review, but rather described why the analysis assuming uniformly distributed p-values does not serve as a test of publication bias.
I would instead suggest conducting publication bias corrections using methods that accommodate heterogeneity and allow for a realistic distribution of effects across studies. We did so in the Supplement of our PPS piece (https://journals.sagepub.com/doi/suppl/10.1177/1745691619850104) using a maximum-likelihood selection model that accommodates normally-distributed, heterogeneous true effects and essentially models a discontinuous “jump” in the probability of publication at the alpha threshold of 0.05. These analyses did somewhat attenuate the meta-analyses’ pooled point estimates, but suggested similar conclusions to those presented in our main text. For example, the Anderson (2010) meta-analysis had a corrected point estimate among all studies of 0.14 [95% CI: 0.11, 0.16]. The discrepancy between our findings and Drummond & Sauer’s arises partly because the latter analysis focuses only on pooled point estimates arising from bias correction, not on the heterogeneous effect distribution, which is the very approach that we described as having led to the apparent “conflict” between the meta-analyses in the first place. Indeed, as we described in the Supplement, publication bias correction for the Anderson meta-analyses still yields an estimated 100%, 76%, and 10% of effect sizes above 0, 0.10, and 0.20 respectively. Again, this is because there is substantial heterogeneity. If a revision is invited, I would (still) want the present authors to carefully consider the issue of heterogeneity and its impact on scientific conclusions.
2.) Experimental studies do not always yield higher-quality evidence than observational studies.
Additionally, the authors focus only the subset of experimental studies in Hilgard’s analysis. Although I agree that “experimental studies are the best way to completely eliminate uncontrolled confounds”, it is not at all clear that experimental lab studies provide the overall strongest evidence regarding violent video games and aggression. Typical randomized studies in the video game literature consist, for example, of exposing subjects to violent video games for 30 minutes, then immediately having them complete a lab outcome measure operationalizing aggression as the amount of hot sauce a subject chooses to place on another subject’s food. It is unclear to what extent one-time exposures to video games and lab measures of “aggression” have predictive validity for real-world effects of naturalistic exposure to video games. In contrast, a well-conducted case-control study with appropriate confounding control and assessing violent video game exposure in subjects with demonstrated violent behavior versus those without might in fact provide stronger evidence for societally relevant causal effects (e.g., Rothman et al., 2008).
3.) Effect sizes are inherently contextual.
Regarding the interpretation of small effect sizes, we did indeed state several times in our paper that the effect sizes are “almost always quite small”. However, to universally dismiss effect sizes of less than d = 0.10 as less than “the smallest effect size of practical importance” is too hasty. Exposures, such as violent video games, that have very broad outreach can have substantial effects at the population level when aggregated across many individuals (VanderWeele et al., 2019). The authors are correct that small effect sizes are in general less robust to potential methodological biases than larger effect sizes, but to reiterate the actual claim we made in our manuscript: “Our claim is not that our re-analyses resolve these methodological problems but rather that widespread perceptions of conflict among the results of these meta-analyses—even when taken at face value without reconciling their substantial methodological differences—may in part be an artifact of statistical reporting practices in meta-analyses.” Additionally, the comparison to effect sizes for psychic phenomena does not strike as particularly damning for the violent video game literature. The prior plausibility that psychic phenomena exist is extremely low, as the authors themselves describe, and it is surely much lower than the prior plausibility that video games might increase aggressive behavior. Extraordinary claims require extraordinary evidence, so any given effect size for psychic phenomena is much less credible than for video games.
Signed, Maya B. Mathur Department of Epidemiology Harvard University
Johnson, Valen, and Ying Yuan. "Comments on ‘An exploratory test for an excess of significant findings’ by JPA loannidis and TA Trikalinos." Clinical Trials 4.3 (2007): 254.
Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (Vol. 3). Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins.
VanderWeele, T. J., Mathur, M. B., & Chen, Y. (2019). Media portrayals and public health implications for suicide and other behaviors. JAMA Psychiatry.