29 Matching Annotations
  1. Nov 2020
    1. statistical powe

      This is like saying that driving 200 miles per hour inside the city compares favorably on speed of getting to your destination. It is important, but meaningless if you do not also consider the number of people you kill along the way. The article misses a careful reflection on Type 1 error rates, and how these compare between tests - I think because Type 1 error rates look very bad for the Bayes factor approach, as it is well known to be biased towards the null - Figure 3 clearly shows Type 1 error rates are through the roof, and largely unacceptable for scientific purposes.

    2. TOST and HDI-ROPE have no discriminatory power form= 0:1

      So you have simulated badly designed studies, because this would not happen in practice after a power analysis. Are your recommendation on which test to use conditional on the requirement of researchers to design bad studies, or do they hold more generally?

    3. Equivalence margins larger than this come closeto what Cohen labeled medium-sized effects and are in most contexts unreasonable largeto demarcate equivalence

      Research on the minimally clinical differences suggests that on an individual level effects of d = 0.5 or smaller are typically too small to be subjectively noticed as an improvement. As this critical design choice in this article lacks any cited justifications, not a discussion of the literature, the authors might want to look into the literature on minimally clinically relevant effects. Also because for such larger effects sizes, it would be interesting to see if the criticism that Bayes factors are biased towards the null leads to unacceptably high Type 1 error rates for an equivalence test.

  2. Apr 2020
    1. positive evidence for successful replicatio

      but the idea by Simonsohn is to combine an equivalence test with NHST - so this is not a real difference.

    2. about .7

      the small telescopes approach is what the study had .33 power for, Lakens et al (2018) suggest using what could have been detected in the original study (typically what the study had 50% power for) - this is a very liberal criterion - what is the justification?

    3. theoretically

      If the value is based on the sample size used by the original researchers, there is no good reason to call this 'theoretically interesting' - sample sizes are chosen based on feasibility, cost, and time - and only then to a certain extent based on what is expected.

  3. Mar 2019
    1. Abstrac

      In his commentary, Alex Holcombe makes the argument that only ‘one or two exemplars of a color category’ are typically examined in color studies, and this is problematic because a color such as ‘red’ is a category, not a single hue.

      Although in some fields it is very important to examine a range of stimuli, and in general examining the generalizability of findings has an important place in research lines, I do not think that currently this issue is a pressing concern in color psychology. Small variations in hue and brightness naturally occur in online studies, and these are assumed not to matter for the underlying mechanism. Schietecat, Lakens, IJsselsteijn, and De Kort (2018) write: “In addition, we conducted Experiments 1 and 3 in a laboratory environment, but Experiments 2, 4, and 5 were conducted in participants’ homes with an internet-based method. Therefore, we could not be completely sure that the presentation of the stimuli on their personal computers was identical for every participant in those experiments. However, we expected that the impact of these variations on our results is not substantial. The labels of the IAT (i.e., red vs blue) increased the salience of the relevant hue dimension, and we do not expect our results to hold for very specific hues, but for colors that are broadly categorized as red, blue, and green. The similar associative patterns across Experiments 2 and 3 seem to support this expectation.”

      We wrote this because there is nothing specific about the hue that is expected to drive the effects in association based accounts of psychological effects of colors. If the color ‘red’ is associated with specific concepts (and the work by Schietecat at all supports the idea that red can activate associations related to either activity and evaluation, such as aggression or enthusiasm, depending on the context). This means that the crucial role of the stimulus is to activate the association with ‘red’, no the perceptual stimulation of the eye in any specific way. The critical manipulation check would thus be is people categorize a stimulus as ‘red’. As long as this is satisfied, we can assume the concept ‘red’ is activated, which can then activate related associations, depending on the context.

      Obviously, the author is correct that there are benefits in testing multiple variations of the color ‘red’ to demonstrate the generalizability of observed effects. However, the authors is writing too much as a perception researcher I fear. If there is a strong theoretical reason to assume slightly different hues and chromas will not matter (because as long as a color is recognized as ‘red’ it will activate specific associations) the research priority of varying colors is much lower than in other fields (e.g., research on human faces) where it is more plausible that the specifics of the stimuli matter. A similar argument holds for the question whether “any link is specifically to red, rather than extending to green, yellow, purple, and brown”. This is too a-theoretical, and even though not all color research has been replicable, and many studies suffered from problems identified during the replication crisis, the theoretical models are still plausible, and specific to predictions about certain hues. We know quite a lot about color associations for prototypical colors in terms of their associations with valence and activity (e.g., Russell & Mehrabian, 1977) and this can be used to make more specific predictions than to a-theoretically test the entire color spectrum.

      Indeed, across the literature many slightly different variations of red are used, or in online studies (Schietecat et al., 2018) studies have been performed online, where different computer screens will naturally lead to some variation in the exact colors presented. This doesn’t mean that more dedicated exploration of the boundaries of these effects can be worthwhile in the future. But currently, the literature is more focused on examining whether these effects are reliable to begin with, and explaining basic questions about their context dependency, than that they are concerned about testing the range of hues for which effects can be observed. So, although in principle it is often true that the generalizability of effects is understudies and deserved more attention, it is not color psychology’s most pressing concern, because we have theoretical predictions about specific colors, and because theoretically as long as a color activates the concept (e.g., ‘red’), the associated concepts that influence subsequent psychological responses are assumed to be activated, irrespective of minor differences in for example hue or brightness.

      Daniel Lakens


      Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11(3), 273–294. DOI: https://doi.org/10.1016/0092-6566(77)90037-X Schietecat, A. C., Lakens, D., IJsselsteijn, W. A., & Kort, Y. A. W. de. (2018). Predicting Context-dependent Cross-modal Associations with Dimension-specific Polarity Attributions. Part 2: Red and Valence. Collabra: Psychology, 4(1). https://doi.org/10.1525/COLLABRA.126

  4. Sep 2018
    1. This development may amuse those who, like me, have taken on board the lessons of the empirical turn in the study of science. One of those lessons after all was that science does not work the way Karl Popper thought it shoul

      It is worth pointing out the reform movement is in no way guided by simplistic and orthodox Popperian views on science - this transition to a less strict falsificationist view on research has already occurred (see Meehl, 1990) so the author seems to be amused by a strawmen. The current reform movement is more diverse, and less single-mindedly focused on Popper, as the author argues here.

  5. Aug 2018
    1. Equivalence Testing and the Second GenerationP-Value

      We look forward to your comments!

  6. Jul 2018
    1. Making ‘Null Effects’ Informative:

      We look forward to any comments you might have!

    1. The 20% Statistician

      Feel free to leave comments!

  7. Apr 2015
    1. confidence

      This is closest to what modern CI advocates propose we use, correct? Perhaps you can explicitly mention this.

    2. known triangular

      It's really too bad you didn't choose an example with a normal distribution.

    3. intentionally simple

      Really? I doubt the average psychologist finds this simple.

    4. statistics

      I guess this shows you should always ask an engineer for help, who would just throw down 50 ropes. But ok. ;P

    5. probability

      So they do not follow the normal distribution, correct? This raises the question in my mind whether this matters or not. Isn't this criticisms relevant when data is normally distributed? If not, please explicitly specify this. If it is, please use normally distributed data, that's closer to what psychologists deal with.

    6. one

      I guess just dropping down 50 lines at 20 cm distances did not occur to anyone? ;)

    7. procedure

      and this is not ' precision'?

    8. precision of

      Perhaps it is important to define ' precision' - I think the term means different things to different people.

    9. 349

      Again, Cumming & Maillardet (2006) explain this problem the best way, I think. They hould be cited here as well.

    10. value

      There have been many, many papers explaining this to lay people (e.g., Lakens & Evers, 2014). It might be fair to acknowledge this.

    11. FCF

      You are not defining the mistake people make very clearly - instead, you use citations of statements by other people. Cumming & Maillardet also clearly show that a single CI will contain the true parameter only 83.4% of the time. It seems you try to judge Cumming on a common language statement, while he clearly would never make the FCF.

    12. understanding

      To be clear: you mean all the three interpretations you specify above are not correct, right?

    13. CP

      in the CI context, the abbreviation CP reminds me strongly of a Capture Probability (or Capture Percentage) of a single CI (see Cumming & Maillardet, 2006). Perhaps better not to abbreviate?

    14. such

      which considerations?

    15. suggest

      Please start this paragraph by ummarizing in 1 sentence how modern proponents suggest I use CI.

    16. may

      What exactly do you mean? Should not be used? You have not yet explained how modern proponents suggest we use CI, so I can not understand this main point at this time.

    17. o

      Capital C