- Jul 2018
-
europepmc.org europepmc.org
-
On 2016 Mar 16, Daniel T Gilbert commented:
Our Technical Comment has elicited lengthy responses from several colleagues and counter-responses from us. For those who have not been following this conversation, here is a brief synopsis:
OPEN SCIENCE COLLABORATION: “We have provided a credible estimate of the reproducibility of psychological science.”
GILBERT ET AL: “No, you haven’t, because (1) you violated the basic rules of sampling when you selected studies to replicate, (2) you did unfaithful replications of many of the studies you selected, and (3) you made statistical errors.”
OPEN SCIENCE COLLABORATION & OTHERS: “We don't think we made statistical errors.”
Several colleagues wish to challenge our Point 3 while conveniently ignoring Points 1 or 2. But it requires no sophisticated mathematics to see that Points 1 and 2 are simple facts to which the OSC fully admits, and that these simple facts are by themselves sufficient to repudiate the OSC’s claim. We continue to believe that our Point 3 is correct, but even if it were entirely wrong, the conclusion that OSC2015 does not provide a credible estimate of the reproducibility of psychological science is inescapable, and it remains the one and only conclusion of our Technical Comment. Interested readers will find our full discussion HERE
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Mar 09, Daniël Lakens commented:
Invalid statistical conclusions in Gilbert, King, Pettigrew, and Wilson (2016)
Gilbert, King, Pettigrew, and Wilson (GKPW; 2016) argue that the Reproducibility Project (Open Science Collaboration, 2015) provides no evidence for a ‘replication crisis’ in psychology. Their statistical conclusions are meaningless due to a crucial flaw in their understanding of confidence intervals. The authors incorrectly assume that ‘based on statistical theory we know that 95% of replication estimates should fall within the 95% CI of the original results’. This is incorrect. When original and replication studies have identical sample sizes, 83.4% of confidence intervals from a single study will capture the sample statistic of a replication study. This is known as the capture percentage (Cumming & Maillardet, 2006).
GKPW use data from Many Labs (another large-scale replication project, Klein et al., 2014) to estimate the expected capture percentage in the Reproducibility Project when allowing for random error due to infidelities in the replication study, and arrive at an estimate of 65.5%. They fail to realize that the capture percentage for studies with different sample sizes (in the Many Labs project ranging from 79 to 1329) can be any number between 0 and 1, and can’t be used to estimate ‘infidelities’ in replications in general. Most importantly, the capture percentage observed for replications in the Many Labs dataset does not generalize in any way to the expected capture percentages between original and replication studies in the Reproducibility Project.
Nevertheless, GKPW conclude that the capture percentage in a subset of Reproducibility Project studies overlaps with the “the 65.5% replication rate that one would expect if every one of the original studies had reported a true effect.” Due to this basic statistical misunderstanding, the main claim by GKPW that ‘the reproducibility of psychological science is quite high’, based on the 65.5% estimate, lacks a statistical foundation, and is not valid.
References
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217–227. http://doi.org/10.1037/1082-989X.11.3.217
Gilbert, D., King, G., Pettigrew, S., & Wilson, T. Comment on 'Estimating the reproducibility of psychological science', Science. (4 March 2016), Vol 351, Issue 6277, Pp. 1037a-1037b.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating Variation in Replicability: A “Many Labs” Replication Project. Social Psychology, 45(3), 142–152. http://doi.org/10.1027/1864-9335/a000178
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. http://doi.org/10.1126/science.aac4716
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Mar 06, Sanjay Srivastava commented:
I have written about the analyses in Gilbert et al. Technical Comment elsewhere. Some key points:
(1) The comment proposes to define a "successful" replication as one where the replication effect is contained within the original study's confidence interval. However, it interprets this based on an incorrect definition of a confidence interval. Even more seriously in my view, the comment does not adequately address how using confidence intervals to gauge replication success will be affected by the power of original studies.
(2) The comment claims that high-powered replications have a high success rate, and bases this claim on Many Labs 1 (Klein et al., 2014), stating that ML1 had a "heartening" 85% success rate. However that is incorrect. Using the same replication metric Gilbert et al. define at the start of their comment and use everywhere else in their Technical Comment, Many Labs 1 had only a 40% success rate, which is similar to the Reproducibility Project.
(3) The analysis of replication "fidelity" is based on original authors' judgments of how well replication protocols matched original protocols. However, the analysis by Gilbert et al. combines 18 nonresponses by original authors with 11 objections, labeling the combined group "unendorsed." We do not know whether all 18 nonresponders would have lodged objections; it seems implausible to assume that they would have.
In my view these and other issues seriously undermine the conclusions presented in the Gilbert et al. technical comment. Interested readers can see more here: Evaluating a New Critique of the Reproducibility Project
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Mar 05, Dorothy V M Bishop commented:
My reading of this comment is that it maintains we should not expect high reproducibility for psychological studies because many are looking at effects that are small and/or fragile - in the sense that the result is found only in specific contexts. If that is so, then there is an urgent need to address these issues by doing adequately powered studies that can reliably detect small effects, and, once this is done, establishing the necessary and sufficient conditions for the effect to be observed. Unless we do that, it is very hard to distinguish false positives from effects that are genuine, but small in size and/or fragile - especially when we know that there are two important influences on the false positive rate, namely publication bias and p-hacking. I discuss these issues further on my blog here: http://deevybee.blogspot.co.uk/2016/03/there-is-reproducibility-crisis-in.html
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2016 Mar 05, Dorothy V M Bishop commented:
My reading of this comment is that it maintains we should not expect high reproducibility for psychological studies because many are looking at effects that are small and/or fragile - in the sense that the result is found only in specific contexts. If that is so, then there is an urgent need to address these issues by doing adequately powered studies that can reliably detect small effects, and, once this is done, establishing the necessary and sufficient conditions for the effect to be observed. Unless we do that, it is very hard to distinguish false positives from effects that are genuine, but small in size and/or fragile - especially when we know that there are two important influences on the false positive rate, namely publication bias and p-hacking. I discuss these issues further on my blog here: http://deevybee.blogspot.co.uk/2016/03/there-is-reproducibility-crisis-in.html
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Mar 06, Sanjay Srivastava commented:
I have written about the analyses in Gilbert et al. Technical Comment elsewhere. Some key points:
(1) The comment proposes to define a "successful" replication as one where the replication effect is contained within the original study's confidence interval. However, it interprets this based on an incorrect definition of a confidence interval. Even more seriously in my view, the comment does not adequately address how using confidence intervals to gauge replication success will be affected by the power of original studies.
(2) The comment claims that high-powered replications have a high success rate, and bases this claim on Many Labs 1 (Klein et al., 2014), stating that ML1 had a "heartening" 85% success rate. However that is incorrect. Using the same replication metric Gilbert et al. define at the start of their comment and use everywhere else in their Technical Comment, Many Labs 1 had only a 40% success rate, which is similar to the Reproducibility Project.
(3) The analysis of replication "fidelity" is based on original authors' judgments of how well replication protocols matched original protocols. However, the analysis by Gilbert et al. combines 18 nonresponses by original authors with 11 objections, labeling the combined group "unendorsed." We do not know whether all 18 nonresponders would have lodged objections; it seems implausible to assume that they would have.
In my view these and other issues seriously undermine the conclusions presented in the Gilbert et al. technical comment. Interested readers can see more here: Evaluating a New Critique of the Reproducibility Project
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Mar 09, Daniël Lakens commented:
Invalid statistical conclusions in Gilbert, King, Pettigrew, and Wilson (2016)
Gilbert, King, Pettigrew, and Wilson (GKPW; 2016) argue that the Reproducibility Project (Open Science Collaboration, 2015) provides no evidence for a ‘replication crisis’ in psychology. Their statistical conclusions are meaningless due to a crucial flaw in their understanding of confidence intervals. The authors incorrectly assume that ‘based on statistical theory we know that 95% of replication estimates should fall within the 95% CI of the original results’. This is incorrect. When original and replication studies have identical sample sizes, 83.4% of confidence intervals from a single study will capture the sample statistic of a replication study. This is known as the capture percentage (Cumming & Maillardet, 2006).
GKPW use data from Many Labs (another large-scale replication project, Klein et al., 2014) to estimate the expected capture percentage in the Reproducibility Project when allowing for random error due to infidelities in the replication study, and arrive at an estimate of 65.5%. They fail to realize that the capture percentage for studies with different sample sizes (in the Many Labs project ranging from 79 to 1329) can be any number between 0 and 1, and can’t be used to estimate ‘infidelities’ in replications in general. Most importantly, the capture percentage observed for replications in the Many Labs dataset does not generalize in any way to the expected capture percentages between original and replication studies in the Reproducibility Project.
Nevertheless, GKPW conclude that the capture percentage in a subset of Reproducibility Project studies overlaps with the “the 65.5% replication rate that one would expect if every one of the original studies had reported a true effect.” Due to this basic statistical misunderstanding, the main claim by GKPW that ‘the reproducibility of psychological science is quite high’, based on the 65.5% estimate, lacks a statistical foundation, and is not valid.
References
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217–227. http://doi.org/10.1037/1082-989X.11.3.217
Gilbert, D., King, G., Pettigrew, S., & Wilson, T. Comment on 'Estimating the reproducibility of psychological science', Science. (4 March 2016), Vol 351, Issue 6277, Pp. 1037a-1037b.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating Variation in Replicability: A “Many Labs” Replication Project. Social Psychology, 45(3), 142–152. http://doi.org/10.1027/1864-9335/a000178
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. http://doi.org/10.1126/science.aac4716
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Mar 16, Daniel T Gilbert commented:
Our Technical Comment has elicited lengthy responses from several colleagues and counter-responses from us. For those who have not been following this conversation, here is a brief synopsis:
OPEN SCIENCE COLLABORATION: “We have provided a credible estimate of the reproducibility of psychological science.”
GILBERT ET AL: “No, you haven’t, because (1) you violated the basic rules of sampling when you selected studies to replicate, (2) you did unfaithful replications of many of the studies you selected, and (3) you made statistical errors.”
OPEN SCIENCE COLLABORATION & OTHERS: “We don't think we made statistical errors.”
Several colleagues wish to challenge our Point 3 while conveniently ignoring Points 1 or 2. But it requires no sophisticated mathematics to see that Points 1 and 2 are simple facts to which the OSC fully admits, and that these simple facts are by themselves sufficient to repudiate the OSC’s claim. We continue to believe that our Point 3 is correct, but even if it were entirely wrong, the conclusion that OSC2015 does not provide a credible estimate of the reproducibility of psychological science is inescapable, and it remains the one and only conclusion of our Technical Comment. Interested readers will find our full discussion HERE
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-