Hypothesis

47 Matching Annotations

Oct 2021
psyarxiv.com psyarxiv.com

759qy

13
1. Lakens 10 Oct 2021
  
  in Public
  
  It is not I who wishes to ignore these issues in favor of a dichotomousand ritualizeddecision to act as if one hypothesis or the other is true,based on a statistical test;it is the NHST aficionados who wish this lack of nuance upon science
  
  As explained before, the authors misunderstands the philosophy of science underlying NP hypothesis testing - we have explained this here: https://psyarxiv.com/af9by/ and we recommend reading through this explanation to see the mistakes in this section.
2. Lakens 10 Oct 2021
  
  in Public
  
  ce, part 1).
  
  This should link to part 2
3. Lakens 10 Oct 2021
  
  in Public
  
  it would be necessary to employ limited sample sizes so as decrease the probability of obtainingstatistical significance
  
  No, you should just perform range predictions, especially in huge sample sizes (see Lakens, 2021, where I explain why the rest of this paragraph is flawed).
4. Lakens 10 Oct 2021
  
  in Public
  
  Because the model is wrong, it should be obvious that provided a sufficiently large sample size is obtained, the null hypothesis will be rejected
  
  And yet, this often does not happen in huge randomized controlled trials. So, it seems to models are close enough to the truth that tiny differences are not detected with tens of thousands of people. If they are the effect size (which also needs to be interpreted) will show the effect is trivially small. So this is a non-issue.
5. Lakens 10 Oct 2021
  
  in Public
  
  NHST is blatantly inadequate at the theory level,
  
  Every approach to statistics is - it must be so. So this is not a problem - it's philosophy of science.
6. Lakens 10 Oct 2021
  
  in Public
  
  Therefore, there is no way to know whether evidence that is unlikely in light of the model is because themodel is slightly wrong, extremelywrong, or somewhere in-between.
  
  This is irrelevant, because the p-value is based on the assumption that the model is true. If you want to falsify that model, collect data and falsify it. You can not claim that all models are wrong - that is unscientific, in a methodological falsificationist perspective on science (which forms the basis of Neyman-Pearson hypothesis testing).
7. Lakens 10 Oct 2021
  
  in Public
  
  t cannot index how wrong M
  
  You might want to kink this to the well-known problem of underdetermination, which seems to be what you actually want to discuss. There is a long literature on this - engage with it.
8. Lakens 10 Oct 2021
  
  in Public
  
  null hypothesis,but between observed data and the whole model in which the null hypothesis is embedded
  
  There is no difference here - the null hypothesis is a model of a data generating process.
9. Lakens 10 Oct 2021
  
  in Public
  
  Firstly, a pvalue is a measure, not of compatibility, but of incompatibility.
  
  This seems semantics - it the p = 1, the model is perfectly compatible with the null. If it is .99, is it almost perfectly compatible with the null. Or rephrase it as only ever so slightly incompatible. More importantly, the author fails to see that I am citing Greenland and colleagues - the use of 'compatibility' is not my term, but theirs, when Greenland et al right: "we will adopt a more general view of the P value as a statistical summary of the compatibility between the observed data and what we would predict or expect to see if we knew the entire statistical model". So please do not ascribe this mistake (Which is not a mistake) to me - I am just citing Greenland.
10. Lakens 10 Oct 2021
  
  in Public
  
  they only have a 5% chance of having wrongly rejected the test hypothesis
  
  This is a misinterpretation of the meaning of a p-value. In any single study, this probability is either 0% or 100% - and it is not known which of these 2 it is. We can only say that in the long run we will not be wrong more than 5% of the time. In this study, they could have a 100% probability of being wrong.
11. Lakens 10 Oct 2021
  
  in Public
  
  has a known probability of wrongly rejecting the test hypothesis
  
  No - they have a MAXIMUM probability in the long run.
12. Lakens 10 Oct 2021
  
  in Public
  
  Lakens (2021) committedthat a pvalue is based on a hypothesis rather than on a mode
  
  This is incorrect - see https://psyarxiv.com/af9by/ where we explain how a claim about the presence or absence of an effect is a basic statement, not a theoretical statement: "The hypothetico-deductive method in general and falsificationist hypothesis testing specifically rely on deductive logical relations between theoretical and basic statements. Simply put, theoretical statements express general conjectures and basic statements express singular facts that can contradict or support these conjectures. Basic statements are dichotomous in that they affirm or deny the existence of singular facts. In the philosophical framework of methodological falsificationism, these dichotomous empirical claims play a crucial role in testing theoretical claims through deductive reasoning." The same point is made in https://psyarxiv.com/tm8jy/ "The falsificationist strategy is summarized by Mook (1983, p.380): “We are not making generalizations, but testing them.” Falsificationists test predictions of a theory, along with a ceteris paribus clause which posits that "nothing else is at work except factors that are totally random" (Meehl, 1990, p.111). If the ceteris paribus clause holds, the claim is generalizable. Yarkoni is correct that ceteris paribus is often not literally true (cf. Meehl, 1990). Systematic non-trivial factors exist. However, all theories are necessarily simplifications: A map is never meant to be the territory (Bateson, 1972, p.459). The challenge is toidentify, from an infinite set of possible factors that falsify the theory’s generalizability claim, which do so in a way that actually matters (Box, 1976)." Moreover, Neyman himself was very aware that the model could be false - statements are made conditional on the model. However, he had a pragmatic viewpoint on this. This means that given the inferences we draw we always have to add ‘granting that the model M of phenomena P is adequate (or valid, or satisfactory, etc.) the conclusion reached within M applies to P” (Neyman, 1955). Let me conclude this point by citing Hand (2014), citing Box: “In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality—a point well made by George Box in his oft-cited remark that "all models are wrong, but some are useful".” So we make claims about basic statements, not theories, and even for the assumptions we built the hypothesis test on, statements are conditional on the model being correct - anyone is free to collect data to falsify these assumptions.
13. Lakens 10 Oct 2021
  
  in Public
  
  properly
  
  this is not the correct title
Visit annotations in context

Annotators

Lakens

URL

psyarxiv.com/759qy
Jul 2021
osf.io osf.io

OSF | Home

5
1. Lakens 11 Jul 2021
  
  in Public
  
  If so, the landscape of open data may not be the democratisation of knowledge, but just a further mechanism whereby the rich get richer
  
  This paragraph is not convincing. The claim that open data is primarily accessible to institutes with more budget might be true - but this is irrelevant. You need to argue that the increase in accessibility has lower marginal utility for poorer institutes - but the opposite must be true. There is too much confirmation bias in this paragraph - you need to do a better job providing a fair cost-benefit analysis - it reads as if you were searching for arguments to support a preconceived notion - not like you tried to honestly weigh costs and benefits.
2. Lakens 11 Jul 2021
  
  in Public
  
  undermine their utility
  
  This is a very strong statement - In monetary terms, it is important to note that APC for people in some countries is 0 after a waiver, and in other countries it costs research budget. A Strong statement requires a much better cost-benefit analysis here to be believable.
3. Lakens 11 Jul 2021
  
  in Public
  
  Open Sciencerelies upon local training
  
  Why? I learned everything I know about Open Science online, through free online courses, and open access articles. This seems to be a very strong and unreasonable assumption.
4. Lakens 11 Jul 2021
  
  in Public
  
  equity is one aim of Open Science amongst other
  
  It would be useful to distinguish schools - there are people who do not believe in making 'Open Science' a container concept that means everything.
5. Lakens 11 Jul 2021
  
  in Public
  
  Open Science2has been proposed at least in part as a corrective for some of these issues
  
  By who? Where? Can you provide references? Regardless who said this, there must be many people who disagree with such a broad definition of open science - open science is typically used to refer to science that is open - not science that is equitable. Equitable science is an orthogonal goal - important, but unrelated to Open Science.
Visit annotations in context

Annotators

Lakens

URL

osf.io/preprints/metaarxiv/cd5j9/
Nov 2020
psyarxiv.com psyarxiv.com

LindeTendeiroSelkerWagenmakersVanravenzwaaij2020.pdf

3
1. Lakens 13 Nov 2020
  
  in Public
  
  statistical powe
  
  This is like saying that driving 200 miles per hour inside the city compares favorably on speed of getting to your destination. It is important, but meaningless if you do not also consider the number of people you kill along the way. The article misses a careful reflection on Type 1 error rates, and how these compare between tests - I think because Type 1 error rates look very bad for the Bayes factor approach, as it is well known to be biased towards the null - Figure 3 clearly shows Type 1 error rates are through the roof, and largely unacceptable for scientific purposes.
2. Lakens 13 Nov 2020
  
  in Public
  
  TOST and HDI-ROPE have no discriminatory power form= 0:1
  
  So you have simulated badly designed studies, because this would not happen in practice after a power analysis. Are your recommendation on which test to use conditional on the requirement of researchers to design bad studies, or do they hold more generally?
3. Lakens 13 Nov 2020
  
  in Public
  
  Equivalence margins larger than this come closeto what Cohen labeled medium-sized effects and are in most contexts unreasonable largeto demarcate equivalence
  
  Research on the minimally clinical differences suggests that on an individual level effects of d = 0.5 or smaller are typically too small to be subjectively noticed as an improvement. As this critical design choice in this article lacks any cited justifications, not a discussion of the literature, the authors might want to look into the literature on minimally clinically relevant effects. Also because for such larger effects sizes, it would be interesting to see if the criticism that Bayes factors are biased towards the null leads to unacceptably high Type 1 error rates for an equivalence test.
Visit annotations in context

Annotators

Lakens

URL

psyarxiv.com/bh8vu
Apr 2020
psyarxiv.com psyarxiv.com

Replication 20-04-16-PD.pages

3
1. Lakens 26 Apr 2020
  
  in Public
  
  positive evidence for successful replicatio
  
  but the idea by Simonsohn is to combine an equivalence test with NHST - so this is not a real difference.
2. Lakens 26 Apr 2020
  
  in Public
  
  about .7
  
  the small telescopes approach is what the study had .33 power for, Lakens et al (2018) suggest using what could have been detected in the original study (typically what the study had 50% power for) - this is a very liberal criterion - what is the justification?
3. Lakens 26 Apr 2020
  
  in Public
  
  theoretically
  
  If the value is based on the sample size used by the original researchers, there is no good reason to call this 'theoretically interesting' - sample sizes are chosen based on feasibility, cost, and time - and only then to a certain extent based on what is expected.
Visit annotations in context

Annotators

Lakens

URL

psyarxiv.com/dje96/
Mar 2019
psyarxiv.com psyarxiv.com

comment_elliotReviewOfGenPsyc.pdf

1
1. Lakens 07 Mar 2019
  
  in Public
  
  Abstrac
  
  In his commentary, Alex Holcombe makes the argument that only ‘one or two exemplars of a color category’ are typically examined in color studies, and this is problematic because a color such as ‘red’ is a category, not a single hue.
  
  Although in some fields it is very important to examine a range of stimuli, and in general examining the generalizability of findings has an important place in research lines, I do not think that currently this issue is a pressing concern in color psychology. Small variations in hue and brightness naturally occur in online studies, and these are assumed not to matter for the underlying mechanism. Schietecat, Lakens, IJsselsteijn, and De Kort (2018) write: “In addition, we conducted Experiments 1 and 3 in a laboratory environment, but Experiments 2, 4, and 5 were conducted in participants’ homes with an internet-based method. Therefore, we could not be completely sure that the presentation of the stimuli on their personal computers was identical for every participant in those experiments. However, we expected that the impact of these variations on our results is not substantial. The labels of the IAT (i.e., red vs blue) increased the salience of the relevant hue dimension, and we do not expect our results to hold for very specific hues, but for colors that are broadly categorized as red, blue, and green. The similar associative patterns across Experiments 2 and 3 seem to support this expectation.”
  
  We wrote this because there is nothing specific about the hue that is expected to drive the effects in association based accounts of psychological effects of colors. If the color ‘red’ is associated with specific concepts (and the work by Schietecat at all supports the idea that red can activate associations related to either activity and evaluation, such as aggression or enthusiasm, depending on the context). This means that the crucial role of the stimulus is to activate the association with ‘red’, no the perceptual stimulation of the eye in any specific way. The critical manipulation check would thus be is people categorize a stimulus as ‘red’. As long as this is satisfied, we can assume the concept ‘red’ is activated, which can then activate related associations, depending on the context.
  
  Obviously, the author is correct that there are benefits in testing multiple variations of the color ‘red’ to demonstrate the generalizability of observed effects. However, the authors is writing too much as a perception researcher I fear. If there is a strong theoretical reason to assume slightly different hues and chromas will not matter (because as long as a color is recognized as ‘red’ it will activate specific associations) the research priority of varying colors is much lower than in other fields (e.g., research on human faces) where it is more plausible that the specifics of the stimuli matter. A similar argument holds for the question whether “any link is specifically to red, rather than extending to green, yellow, purple, and brown”. This is too a-theoretical, and even though not all color research has been replicable, and many studies suffered from problems identified during the replication crisis, the theoretical models are still plausible, and specific to predictions about certain hues. We know quite a lot about color associations for prototypical colors in terms of their associations with valence and activity (e.g., Russell & Mehrabian, 1977) and this can be used to make more specific predictions than to a-theoretically test the entire color spectrum.
  
  Indeed, across the literature many slightly different variations of red are used, or in online studies (Schietecat et al., 2018) studies have been performed online, where different computer screens will naturally lead to some variation in the exact colors presented. This doesn’t mean that more dedicated exploration of the boundaries of these effects can be worthwhile in the future. But currently, the literature is more focused on examining whether these effects are reliable to begin with, and explaining basic questions about their context dependency, than that they are concerned about testing the range of hues for which effects can be observed. So, although in principle it is often true that the generalizability of effects is understudies and deserved more attention, it is not color psychology’s most pressing concern, because we have theoretical predictions about specific colors, and because theoretically as long as a color activates the concept (e.g., ‘red’), the associated concepts that influence subsequent psychological responses are assumed to be activated, irrespective of minor differences in for example hue or brightness.
  
  Daniel Lakens
  
  References
  
  Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11(3), 273–294. DOI: https://doi.org/10.1016/0092-6566(77)90037-X Schietecat, A. C., Lakens, D., IJsselsteijn, W. A., & Kort, Y. A. W. de. (2018). Predicting Context-dependent Cross-modal Associations with Dimension-specific Polarity Attributions. Part 2: Red and Valence. Collabra: Psychology, 4(1). https://doi.org/10.1525/COLLABRA.126
  
  #review #meta-psychology
Visit annotations in context

Tags

#review #meta-psychology

Annotators

Lakens

URL

psyarxiv.com/4bth3
Sep 2018
psyarxiv.com psyarxiv.com

popperpractice.pdf

1
1. Lakens 01 Sep 2018
  
  in Public
  
  This development may amuse those who, like me, have taken on board the lessons of the empirical turn in the study of science. One of those lessons after all was that science does not work the way Karl Popper thought it shoul
  
  It is worth pointing out the reform movement is in no way guided by simplistic and orthodox Popperian views on science - this transition to a less strict falsificationist view on research has already occurred (see Meehl, 1990) so the author seems to be amused by a strawmen. The current reform movement is more diverse, and less single-mindedly focused on Popper, as the author argues here.
Visit annotations in context

Annotators

Lakens

URL

psyarxiv.com/vw39c
Aug 2018
psyarxiv.com psyarxiv.com

manuscript.pdf

1
1. Lakens 28 Aug 2018
  
  in Public
  
  Equivalence Testing and the Second GenerationP-Value
  
  We look forward to your comments!
Visit annotations in context

Annotators

Lakens

URL

psyarxiv.com/7k6ay
Jul 2018
f1000research.com f1000research.com

Feeling the future: A meta-analysis of 90 experiments on the anomalous anticipation of random future events

1
1. Lakens 31 Jul 2018
  
  in Public
  
  Amendments from Version 1
  
  For a blog post commenting on an earlier draft of this article (many points raised remain relevant) see here.
  
  meta-analysis blog
Visit annotations in context

Tags

blog

meta-analysis

Annotators

Lakens

URL

f1000research.com/articles/4-1188/v2
psyarxiv.com psyarxiv.com

Informative_Null_Effects_Preprint.pdf

1
1. Lakens 31 Jul 2018
  
  in Public
  
  Making ‘Null Effects’ Informative:
  
  We look forward to any comments you might have!
Visit annotations in context

Annotators

Lakens

URL

psyarxiv.com/48zca/
daniellakens.blogspot.com daniellakens.blogspot.com

The 20% Statistician

1
1. Lakens 31 Jul 2018
  
  in Public
  
  The 20% Statistician
  
  Feel free to leave comments!
Visit annotations in context

Annotators

Lakens

URL

daniellakens.blogspot.com/
Apr 2015
learnbayes.org learnbayes.org

fundamentalError.pdf

17
1. Lakens 23 Apr 2015
  
  in Public
  
  confidence
  
  This is closest to what modern CI advocates propose we use, correct? Perhaps you can explicitly mention this.
2. Lakens 23 Apr 2015
  
  in Public
  
  known triangular
  
  It's really too bad you didn't choose an example with a normal distribution.
3. Lakens 23 Apr 2015
  
  in Public
  
  intentionally simple
  
  Really? I doubt the average psychologist finds this simple.
4. Lakens 23 Apr 2015
  
  in Public
  
  statistics
  
  I guess this shows you should always ask an engineer for help, who would just throw down 50 ropes. But ok. ;P
5. Lakens 23 Apr 2015
  
  in Public
  
  probability
  
  So they do not follow the normal distribution, correct? This raises the question in my mind whether this matters or not. Isn't this criticisms relevant when data is normally distributed? If not, please explicitly specify this. If it is, please use normally distributed data, that's closer to what psychologists deal with.
6. Lakens 23 Apr 2015
  
  in Public
  
  one
  
  I guess just dropping down 50 lines at 20 cm distances did not occur to anyone? ;)
7. Lakens 23 Apr 2015
  
  in Public
  
  procedure
  
  and this is not ' precision'?
8. Lakens 23 Apr 2015
  
  in Public
  
  precision of
  
  Perhaps it is important to define ' precision' - I think the term means different things to different people.
9. Lakens 23 Apr 2015
  
  in Public
  
  349
  
  Again, Cumming & Maillardet (2006) explain this problem the best way, I think. They hould be cited here as well.
10. Lakens 23 Apr 2015
  
  in Public
  
  value
  
  There have been many, many papers explaining this to lay people (e.g., Lakens & Evers, 2014). It might be fair to acknowledge this.
11. Lakens 23 Apr 2015
  
  in Public
  
  FCF
  
  You are not defining the mistake people make very clearly - instead, you use citations of statements by other people. Cumming & Maillardet also clearly show that a single CI will contain the true parameter only 83.4% of the time. It seems you try to judge Cumming on a common language statement, while he clearly would never make the FCF.
12. Lakens 23 Apr 2015
  
  in Public
  
  understanding
  
  To be clear: you mean all the three interpretations you specify above are not correct, right?
13. Lakens 23 Apr 2015
  
  in Public
  
  CP
  
  in the CI context, the abbreviation CP reminds me strongly of a Capture Probability (or Capture Percentage) of a single CI (see Cumming & Maillardet, 2006). Perhaps better not to abbreviate?
14. Lakens 23 Apr 2015
  
  in Public
  
  such
  
  which considerations?
15. Lakens 23 Apr 2015
  
  in Public
  
  suggest
  
  Please start this paragraph by ummarizing in 1 sentence how modern proponents suggest I use CI.
16. Lakens 23 Apr 2015
  
  in Public
  
  may
  
  What exactly do you mean? Should not be used? You have not yet explained how modern proponents suggest we use CI, so I can not understand this main point at this time.
17. Lakens 23 Apr 2015
  
  in Public
  
  o
  
  Capital C
Visit annotations in context

Annotators

Lakens

URL

learnbayes.org/talks/ConfidenceIntervals/paper_v2/fundamentalError.pdf

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL