2 Matching Annotations
  1. Jul 2018
    1. On 2013 Dec 30, Gregory Francis commented:

      In response to my critique of their original article, Elliot & Maier (E&M) raised three points. I respond in reverse order, because the first point is the most interesting.

      3) E&M propose that the discussion about bias should be at the field level rather than at the level of individual articles. I strongly disagree. A discussion at the field level dilutes responsibility, so researchers can simultaneously believe that the field is biased and believe that their own effects are sound. One gets a hint of such an attitude in E&M, where they describe the problems of bias in the field, but insist that their findings related to the red-attractiveness effect are valid. Scientists fundamentally care about individual effects, so that is the appropriate place to consider the influence of bias.

      2) E&M suggest that my article was incautious because it considered several different analyses of effects. Perhaps the text in Francis (2013) did not make clear that different bias tests were applied in order to see if there might be some way to interpret the experiments so that they did not indicate bias. This exploratory search was unsuccessful; every combination of effects and measures lead to a conclusion of bias for at least some of the effects.

      E&M also suggest that my use of the test is inappropriate because the analytical approach was originally developed by Ioannidis & Trikalinos (2007) to detect bias in large sets of experiments. This claim is entirely without justification. If E&M want to argue that my analysis is flawed, then they should demonstrate an error in the calculations or reasoning.

      1) E&M claim that there is independent evidence that the red-attractiveness effect is real. They cite additional studies (including some from other labs) that report the presence of the effect. These studies may indeed demonstrate the validity of the effect, but their properties do nothing to remove the appearance of bias in the study by Elliot et al. In fact, if E&M intend to use these additional studies as evidence for their theoretical claim, then the success of these experiments only makes the bias problem worse. The probability of success for all the experiments in Elliot et al., low as it is, must be larger than the probability of success for those experiments and the additional studies.

      E&M also reported a new experiment designed to check on the validity of the red-attractiveness effect. In the original study, this experiment produce a standardized effect size of 0.83 (n1=16, n2=17). The new study found a smaller effect size of 0.25 (n1=75, n2=69), which did not produce a significant effect (p=.13). E&M concluded that the red-attractiveness effect is in the small to medium range rather than the large range. Most research psychologists would go further and suggest that there is insufficient evidence to interpret the red-attractiveness effect as being different from zero (for many researchers the null hypothesis is the default position, until proven otherwise).

      Given that Francis (2013) argued that the original experiments were too successful, one might wonder if the new (non-significant) experiment diffuses that critique. A new version of the analysis using the original five experiments in Elliot et al. and the new experiment produces a pooled effect size of 0.53. This value is substantially smaller than the pooled effect for the original studies (0.78) because the new experiment has a small effect size and carries substantial weight in the pooling due to its larger sample size.

      With the new pooled effect size, the estimated power of each of the experiments (in order of publication) is 0.21, 0.30, 0.316, 0.49, 0.26, and 0.89. It is rather odd that the experiment with the largest power estimate is the only experiment to not reject the null hypothesis. Moreover, the probability of rejecting (any) five (or six) out of six experiments like these is 0.03, which is so low that it indicates bias.

      Since it presumably is not biased, the new experiment arguably gives the best estimate of the red-attractiveness effect. The estimated power derived from an effect size of 0.25 for the original experiments is 0.08, 0.10, 0.11, 0.15, and 0.10. The probability that all five such experiments would reject the null is the product of the power values, which is 0.000013. So, if the new experiment provides a valid estimate of the effect, readers should be very skeptical about the validity of the original experiments in Elliot et al.

      Finally, it is worth noting that E&M never explicitly discussed the presence of bias in their reported experiment set. They did not disclose whether there were unreported null findings or whether some experiments used sampling methods that inflate the Type I error rate. It is possible that bias occurred without their awareness, but the absence of a clear statement highlights that readers should be skeptical about the reported experiments.

      Ultimately, the new experiment in E&M only strengthens the argument for some form of publication bias in Elliot et al. The other points deserve a frank and creative exchange of ideas, but they do nothing to alter the unbelievability of their original findings.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2013 Dec 30, Gregory Francis commented:

      In response to my critique of their original article, Elliot & Maier (E&M) raised three points. I respond in reverse order, because the first point is the most interesting.

      3) E&M propose that the discussion about bias should be at the field level rather than at the level of individual articles. I strongly disagree. A discussion at the field level dilutes responsibility, so researchers can simultaneously believe that the field is biased and believe that their own effects are sound. One gets a hint of such an attitude in E&M, where they describe the problems of bias in the field, but insist that their findings related to the red-attractiveness effect are valid. Scientists fundamentally care about individual effects, so that is the appropriate place to consider the influence of bias.

      2) E&M suggest that my article was incautious because it considered several different analyses of effects. Perhaps the text in Francis (2013) did not make clear that different bias tests were applied in order to see if there might be some way to interpret the experiments so that they did not indicate bias. This exploratory search was unsuccessful; every combination of effects and measures lead to a conclusion of bias for at least some of the effects.

      E&M also suggest that my use of the test is inappropriate because the analytical approach was originally developed by Ioannidis & Trikalinos (2007) to detect bias in large sets of experiments. This claim is entirely without justification. If E&M want to argue that my analysis is flawed, then they should demonstrate an error in the calculations or reasoning.

      1) E&M claim that there is independent evidence that the red-attractiveness effect is real. They cite additional studies (including some from other labs) that report the presence of the effect. These studies may indeed demonstrate the validity of the effect, but their properties do nothing to remove the appearance of bias in the study by Elliot et al. In fact, if E&M intend to use these additional studies as evidence for their theoretical claim, then the success of these experiments only makes the bias problem worse. The probability of success for all the experiments in Elliot et al., low as it is, must be larger than the probability of success for those experiments and the additional studies.

      E&M also reported a new experiment designed to check on the validity of the red-attractiveness effect. In the original study, this experiment produce a standardized effect size of 0.83 (n1=16, n2=17). The new study found a smaller effect size of 0.25 (n1=75, n2=69), which did not produce a significant effect (p=.13). E&M concluded that the red-attractiveness effect is in the small to medium range rather than the large range. Most research psychologists would go further and suggest that there is insufficient evidence to interpret the red-attractiveness effect as being different from zero (for many researchers the null hypothesis is the default position, until proven otherwise).

      Given that Francis (2013) argued that the original experiments were too successful, one might wonder if the new (non-significant) experiment diffuses that critique. A new version of the analysis using the original five experiments in Elliot et al. and the new experiment produces a pooled effect size of 0.53. This value is substantially smaller than the pooled effect for the original studies (0.78) because the new experiment has a small effect size and carries substantial weight in the pooling due to its larger sample size.

      With the new pooled effect size, the estimated power of each of the experiments (in order of publication) is 0.21, 0.30, 0.316, 0.49, 0.26, and 0.89. It is rather odd that the experiment with the largest power estimate is the only experiment to not reject the null hypothesis. Moreover, the probability of rejecting (any) five (or six) out of six experiments like these is 0.03, which is so low that it indicates bias.

      Since it presumably is not biased, the new experiment arguably gives the best estimate of the red-attractiveness effect. The estimated power derived from an effect size of 0.25 for the original experiments is 0.08, 0.10, 0.11, 0.15, and 0.10. The probability that all five such experiments would reject the null is the product of the power values, which is 0.000013. So, if the new experiment provides a valid estimate of the effect, readers should be very skeptical about the validity of the original experiments in Elliot et al.

      Finally, it is worth noting that E&M never explicitly discussed the presence of bias in their reported experiment set. They did not disclose whether there were unreported null findings or whether some experiments used sampling methods that inflate the Type I error rate. It is possible that bias occurred without their awareness, but the absence of a clear statement highlights that readers should be skeptical about the reported experiments.

      Ultimately, the new experiment in E&M only strengthens the argument for some form of publication bias in Elliot et al. The other points deserve a frank and creative exchange of ideas, but they do nothing to alter the unbelievability of their original findings.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.