2 Matching Annotations
  1. Jul 2018
    1. On 2017 Aug 02, Gregory Francis commented:

      The journal Perspectives on Psychological Science used to have an on-line commenting system. They seem to have discontinued it and removed all past comments. In April 2013, I published a comment on this article. I reproduce it below.

      Simonsohn's arguments are without merit

      Simonsohn presents two arguments to discredit the publication bias analyses I have published over the past year. Neither of these arguments are convincing to me, but some people seem to be taking them seriously. I treat them in more depth in a recent paper (Francis, 2013), but it might be helpful to have a comment with the article itself.

      Whether to ignore data that appears to be biased

      I have argued that sets of experiments that appear to be biased should be treated as "non-scientific" and be replaced by new experiments. Such a recommendation is admittedly a cautious approach, but I feel that such caution is warranted for scientific investigations; especially when it is fairly easy to gather new data. For example, in my POPS article, to which Simonsohn's directs his reply, the publication bias analysis suggests that the aversiveness memory effects reported by Galak & Meyvis (2010) appear to be biased. My recommendation to ignore their data is not so very harsh, since it is quite easy for interested researchers to gather new (unbiased) data and determine the magnitude of the effect. There may be situations where gathering new data is more difficult and such situations might justify efforts to try to mitigate effects of bias. However, existing statistical methods are not very good at compensating for bias, and they do not attempt to compensate for other types of questionable research practices that can also trigger the bias analysis.

      Simonsohn raises an interesting issue about the relation between statistical significance and practical significance. His example of a literature of 100 studies, all with p<.05 and power of 97%, highlights a particularly bad property of bias. As Simonsohn describes it, bias appears to be present (the bias test gives .97<sup>100</sup> = 0.0476) but small because, given the power values, one would expect just three non- significant findings out of 100 experiments. Thus, Simonsohn concludes the bias is small because it appears to be small relative to what is published. Such a view would be fine if there was no reason to suspect bias, but it is difficult to maintain this view given that the analysis suggests there is bias. The bias could be small, but there may have been another 100 non-significant findings that were suppressed, in which case the bias is quite large. There is no way for a reader to know the extent of the bias.

      One of the fundamental goals of science is to reduce uncertainty in measurements, but the appearance of bias introduces uncertainty about the experimental results and conclusions. The responsibility for making a strong scientific argument rests with the authors; and if their results appear to be biased, then it is difficult to make such an argument.

      Cherry picking

      Simonsohn charges that my analyses are ironically biased with the very practice that is used to produce the biased experiment sets that I criticize. To this charge, I reply "guilty", but I have an explanation. There are different types of biases; and some biases misrepresent empirical data, while other biases are simple byproducts of scientific exploration. Understanding the differences between these types of bias is very important.

      Biases that misrepresent the data

      If a researcher runs 10 direct replication experiments and gets six experiments to reject the null hypothesis and four experiments to not reject the null, then bias is going to be introduced if only the six significant experiments are reported. (Whether such bias will be detected by the bias analysis depends on the estimated power of those experiments.) A meta-analysis across those six published experiments will almost surely overestimate the true effect size, because the experiments with samples having small effect sizes tend to not reject the null. When the experiments are connected by a theoretical link (in this case by the idea that the experiments are measuring the same effect size), this kind of bias misrepresents the true state of nature, which is clearly a problem for a scientific field.

      Biases that do not misrepresent the data

      If a researcher runs 10 experiments on unconnected topics (separate studies on afterimages, reading rates, Stroop reaction times, aversiveness ratings of memories, etc.) and gets six experiments to reject the null and four experiments to not reject the null, then a bias is introduced by publishing only the six significant experiments. However, this kind of bias is mostly benign because it does not misrepresent the published data. Suppose the study on afterimages finds a significant effect with a standardized effect size of 0.6. None of the other experiments tell us anything about the effect size of afterimages, so whether those experiments are published or not does not influence the scientific conclusions that are drawn about afterimages from this published study.

      The publication bias analyses I have reported over the past year have the latter type of bias. Whether other papers have bias or not is irrelevant to the conclusion about bias for Galak & Meyvis (or any other paper). This is not to say that the bias analysis cannot make a Type I error and conclude bias when it does not exist. If we cannot tolerate making a Type I error, then we should not make decisions.

      An important caveat is that the selective reporting of my analyses prohibits us from estimating the proportion of biased papers in the field. Trying to make such an estimate from the reported analyses changes the bias from being benign to one that misrepresents the data. Importantly, just because this particular interpretation is inappropriate does not mean that the individual analyses are inappropriate.

      In conclusion, Simonsohn's arguments are without merit. His claim that ignoring the data is unwise leads to scientifically vacuous arguments; and his claim of cherry picking is true but its meaning is misunderstood.

      Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of Mathematical Psychology. http://dx.doi.org/10.1016/j.jmp.2013.02.003

      Conflict of Interest: None declared


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2017 Aug 02, Gregory Francis commented:

      The journal Perspectives on Psychological Science used to have an on-line commenting system. They seem to have discontinued it and removed all past comments. In April 2013, I published a comment on this article. I reproduce it below.

      Simonsohn's arguments are without merit

      Simonsohn presents two arguments to discredit the publication bias analyses I have published over the past year. Neither of these arguments are convincing to me, but some people seem to be taking them seriously. I treat them in more depth in a recent paper (Francis, 2013), but it might be helpful to have a comment with the article itself.

      Whether to ignore data that appears to be biased

      I have argued that sets of experiments that appear to be biased should be treated as "non-scientific" and be replaced by new experiments. Such a recommendation is admittedly a cautious approach, but I feel that such caution is warranted for scientific investigations; especially when it is fairly easy to gather new data. For example, in my POPS article, to which Simonsohn's directs his reply, the publication bias analysis suggests that the aversiveness memory effects reported by Galak & Meyvis (2010) appear to be biased. My recommendation to ignore their data is not so very harsh, since it is quite easy for interested researchers to gather new (unbiased) data and determine the magnitude of the effect. There may be situations where gathering new data is more difficult and such situations might justify efforts to try to mitigate effects of bias. However, existing statistical methods are not very good at compensating for bias, and they do not attempt to compensate for other types of questionable research practices that can also trigger the bias analysis.

      Simonsohn raises an interesting issue about the relation between statistical significance and practical significance. His example of a literature of 100 studies, all with p<.05 and power of 97%, highlights a particularly bad property of bias. As Simonsohn describes it, bias appears to be present (the bias test gives .97<sup>100</sup> = 0.0476) but small because, given the power values, one would expect just three non- significant findings out of 100 experiments. Thus, Simonsohn concludes the bias is small because it appears to be small relative to what is published. Such a view would be fine if there was no reason to suspect bias, but it is difficult to maintain this view given that the analysis suggests there is bias. The bias could be small, but there may have been another 100 non-significant findings that were suppressed, in which case the bias is quite large. There is no way for a reader to know the extent of the bias.

      One of the fundamental goals of science is to reduce uncertainty in measurements, but the appearance of bias introduces uncertainty about the experimental results and conclusions. The responsibility for making a strong scientific argument rests with the authors; and if their results appear to be biased, then it is difficult to make such an argument.

      Cherry picking

      Simonsohn charges that my analyses are ironically biased with the very practice that is used to produce the biased experiment sets that I criticize. To this charge, I reply "guilty", but I have an explanation. There are different types of biases; and some biases misrepresent empirical data, while other biases are simple byproducts of scientific exploration. Understanding the differences between these types of bias is very important.

      Biases that misrepresent the data

      If a researcher runs 10 direct replication experiments and gets six experiments to reject the null hypothesis and four experiments to not reject the null, then bias is going to be introduced if only the six significant experiments are reported. (Whether such bias will be detected by the bias analysis depends on the estimated power of those experiments.) A meta-analysis across those six published experiments will almost surely overestimate the true effect size, because the experiments with samples having small effect sizes tend to not reject the null. When the experiments are connected by a theoretical link (in this case by the idea that the experiments are measuring the same effect size), this kind of bias misrepresents the true state of nature, which is clearly a problem for a scientific field.

      Biases that do not misrepresent the data

      If a researcher runs 10 experiments on unconnected topics (separate studies on afterimages, reading rates, Stroop reaction times, aversiveness ratings of memories, etc.) and gets six experiments to reject the null and four experiments to not reject the null, then a bias is introduced by publishing only the six significant experiments. However, this kind of bias is mostly benign because it does not misrepresent the published data. Suppose the study on afterimages finds a significant effect with a standardized effect size of 0.6. None of the other experiments tell us anything about the effect size of afterimages, so whether those experiments are published or not does not influence the scientific conclusions that are drawn about afterimages from this published study.

      The publication bias analyses I have reported over the past year have the latter type of bias. Whether other papers have bias or not is irrelevant to the conclusion about bias for Galak & Meyvis (or any other paper). This is not to say that the bias analysis cannot make a Type I error and conclude bias when it does not exist. If we cannot tolerate making a Type I error, then we should not make decisions.

      An important caveat is that the selective reporting of my analyses prohibits us from estimating the proportion of biased papers in the field. Trying to make such an estimate from the reported analyses changes the bias from being benign to one that misrepresents the data. Importantly, just because this particular interpretation is inappropriate does not mean that the individual analyses are inappropriate.

      In conclusion, Simonsohn's arguments are without merit. His claim that ignoring the data is unwise leads to scientifically vacuous arguments; and his claim of cherry picking is true but its meaning is misunderstood.

      Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of Mathematical Psychology. http://dx.doi.org/10.1016/j.jmp.2013.02.003

      Conflict of Interest: None declared


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.