Hypothesis

2 Matching Annotations

Jul 2018
europepmc.org europepmc.org

https://www.ncbi.nlm.nih.gov/pubmed/24974990

1
1. PubMedCommonsArchive 02 Jul 2018
  
  in Europe PMC
  
  On 2014 Sep 13, Vahid Rakhshan commented:
  
  Thanks for the authors’ clarifying response. I would like to answer to their response. My answers follow each paragraph of the response of the authors. The authors’ response is in bold font. My response to their response is in normal font.
  1. The estimation of power for this study was not simply based on a pilot study, but also based on the only other study carried out using MRI to assess condylar position/displacement, and where displacements of 1 mm were detected. More importantly, is Dr Rakhshan actually advising the readership that mean (absolute) differences of 0.141 mm (ranging from 0.01 to 0.36 mm) in the glenoid fossae are something of a clinical health concern?
  Dr Rakhshan is not advising something regarding the clinical concerns. An error magnitude should simply be validated. Considering the results showing that the mean difference is as small as 0.1 mm, detecting an error of 1 mm can lead to serious problems.
  The calculated power was reportedly 80% but none of the comparisons was significant. This is what I was talking about. There is a huge difference between the clinical significance with the notion of statistical power calculation which is relevant to "statistical significance", and not to the clinical significance.
  Should we have actually adjusted the detection of differences to 0.1 mm? Would this have been clinically relevant?
  Of course, you needed to adjust it to about 0.1 mm and then increase the sample size to be able to obtain real powers above 80% and not an inflated power of 80%. Again, this has nothing to do with the notion of clinical significance [or “clinical health concern” as put by the author]. Mentioning “clinical health concern” when talking about test powers is misleading.
  2. Repeated-measures ANOVA can be used to assess the variability of techniques. We also agree that with smaller samples a nonparametric test like the Friedman could have also been used, but we used parametric methods.
  I think studying the assumptions of the repeated-measures ANOVA would help us in understanding my humble words. How did normality hold in your case of 2 × 3 sample?
  In addition, I think we should re-read the other part of my paragraph 2 as well which was about the lack of power of that ANOVA and the unreliability of its non-significant result.
  3. In contradiction to the claims made by Dr Rakhshan, in our “Results” section, there is reference to the P values. All P values were lower than 0.05 and, as such, were not statistically significant. The Tukey was the post-hoc test to use if any significant differences were detected; however, there were none.
  First, P values smaller than 0.05 mean significant, in contrast to what was stated by the authors.
  Secondly, I could not find any contradiction in my critique. The authors had used two different ANOVAs but had not mentioned what ANOVA was used when and with which results.
  4. The underlying distribution was deemed normal, and as such, ANOVA was used. Variable results do not necessarily indicate nonnormal underlying distributions.
  Thanks for the clarification that distribution was normal.
  5. Condylar positions were discussed in relation to CO. Because the Roth power and CR bite registrations were supposedly capable of positioning condyles in the glenoid fossae in relation to CO, we discussed the differences in relation to CO.
  I was fine with the information presented. My concern was about the lack of comparisons between CR and Roth power.
  Moreover, from the response of the authors, it appears to me that they “supposed” that the other two methods were the gold standards and therefore they only assessed the CO in relation to them. This needed to be clarified in the original article [that these two methods were assumed as gold standards], and be substantiated by proper citations. Nevertheless, these methods are not real gold standards and cannot be supposedly capable of positioning condyles in the desired position. This is implied even from the introduction of the original article, in which it was stated that these methods are not verified by evidence.
  6. We do not agree that lines 3 to 8 are unclear; they are in fact quite the opposite. If there was any significant positioning difference that was detectable, we would have found it, but we did not. This was more than safe to infer.
  No; the important but forgotten formula is that the absence of evidence does not mean the evidence of absence. This becomes more serious when the power is low. When the sample size is small and the real power is not high enough, there is a great chance that a false negative error happens. This error means that there is some difference, but the data is inadequate to show it.
  7. It appears that Dr Rakhshan has sadly and simply fixated his whole critique on his own interpretation of the statistics, claiming that our findings were unreliable or unsubstantiated. We are acutely aware that limitations do exist with any research. Our study simply and openly shows that the differences detected between the registrations were so small and highly variable that using certain bite registrations to accurately and predictably position condyles into specific locations in the glenoid fossae is not evidence based. Our findings are reliable and substantiated.
  Your study showed that there was a lack of evidence. Unlike what the authors think, their results did not indicate an evidence for the absence.
  Although this result is valuable as a pilot finding, the authors needed to approach their conclusions with a much greater deal of caution. The limitations that the authors are “acutely aware of” disallow them to conclude so strongly. As I had already stated in my original letter to the editor, “The authors needed to state their limitations and warn the reader that their results were inconclusive.” Being aware of the limitations is one thing and clearly disclosing them is another thing.
  Any statistically non-significant result in the absence of a high power is considered inconclusive.
  I think if we study the notion of statistical significance more deeply, we would agree that my points were not my own interpretation of statistics, but were simply the very basics of statistics.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:24974990
Visit annotations in context

Tags

PubMedCommonsArchive

PMID:24974990

Annotators

PubMedCommonsArchive

URL

europepmc.org/abstract/MED/24974990
Feb 2018
europepmc.org europepmc.org

https://www.ncbi.nlm.nih.gov/pubmed/24974990

1
1. PubMedCommonsArchive 09 Feb 2018
  
  in Public
  
  On 2014 Sep 13, Vahid Rakhshan commented:
  
  Thanks for the authors’ clarifying response. I would like to answer to their response. My answers follow each paragraph of the response of the authors. The authors’ response is in bold font. My response to their response is in normal font.
  1. The estimation of power for this study was not simply based on a pilot study, but also based on the only other study carried out using MRI to assess condylar position/displacement, and where displacements of 1 mm were detected. More importantly, is Dr Rakhshan actually advising the readership that mean (absolute) differences of 0.141 mm (ranging from 0.01 to 0.36 mm) in the glenoid fossae are something of a clinical health concern?
  Dr Rakhshan is not advising something regarding the clinical concerns. An error magnitude should simply be validated. Considering the results showing that the mean difference is as small as 0.1 mm, detecting an error of 1 mm can lead to serious problems.
  The calculated power was reportedly 80% but none of the comparisons was significant. This is what I was talking about. There is a huge difference between the clinical significance with the notion of statistical power calculation which is relevant to "statistical significance", and not to the clinical significance.
  Should we have actually adjusted the detection of differences to 0.1 mm? Would this have been clinically relevant?
  Of course, you needed to adjust it to about 0.1 mm and then increase the sample size to be able to obtain real powers above 80% and not an inflated power of 80%. Again, this has nothing to do with the notion of clinical significance [or “clinical health concern” as put by the author]. Mentioning “clinical health concern” when talking about test powers is misleading.
  2. Repeated-measures ANOVA can be used to assess the variability of techniques. We also agree that with smaller samples a nonparametric test like the Friedman could have also been used, but we used parametric methods.
  I think studying the assumptions of the repeated-measures ANOVA would help us in understanding my humble words. How did normality hold in your case of 2 × 3 sample?
  In addition, I think we should re-read the other part of my paragraph 2 as well which was about the lack of power of that ANOVA and the unreliability of its non-significant result.
  3. In contradiction to the claims made by Dr Rakhshan, in our “Results” section, there is reference to the P values. All P values were lower than 0.05 and, as such, were not statistically significant. The Tukey was the post-hoc test to use if any significant differences were detected; however, there were none.
  First, P values smaller than 0.05 mean significant, in contrast to what was stated by the authors.
  Secondly, I could not find any contradiction in my critique. The authors had used two different ANOVAs but had not mentioned what ANOVA was used when and with which results.
  4. The underlying distribution was deemed normal, and as such, ANOVA was used. Variable results do not necessarily indicate nonnormal underlying distributions.
  Thanks for the clarification that distribution was normal.
  5. Condylar positions were discussed in relation to CO. Because the Roth power and CR bite registrations were supposedly capable of positioning condyles in the glenoid fossae in relation to CO, we discussed the differences in relation to CO.
  I was fine with the information presented. My concern was about the lack of comparisons between CR and Roth power.
  Moreover, from the response of the authors, it appears to me that they “supposed” that the other two methods were the gold standards and therefore they only assessed the CO in relation to them. This needed to be clarified in the original article [that these two methods were assumed as gold standards], and be substantiated by proper citations. Nevertheless, these methods are not real gold standards and cannot be supposedly capable of positioning condyles in the desired position. This is implied even from the introduction of the original article, in which it was stated that these methods are not verified by evidence.
  6. We do not agree that lines 3 to 8 are unclear; they are in fact quite the opposite. If there was any significant positioning difference that was detectable, we would have found it, but we did not. This was more than safe to infer.
  No; the important but forgotten formula is that the absence of evidence does not mean the evidence of absence. This becomes more serious when the power is low. When the sample size is small and the real power is not high enough, there is a great chance that a false negative error happens. This error means that there is some difference, but the data is inadequate to show it.
  7. It appears that Dr Rakhshan has sadly and simply fixated his whole critique on his own interpretation of the statistics, claiming that our findings were unreliable or unsubstantiated. We are acutely aware that limitations do exist with any research. Our study simply and openly shows that the differences detected between the registrations were so small and highly variable that using certain bite registrations to accurately and predictably position condyles into specific locations in the glenoid fossae is not evidence based. Our findings are reliable and substantiated.
  Your study showed that there was a lack of evidence. Unlike what the authors think, their results did not indicate an evidence for the absence.
  Although this result is valuable as a pilot finding, the authors needed to approach their conclusions with a much greater deal of caution. The limitations that the authors are “acutely aware of” disallow them to conclude so strongly. As I had already stated in my original letter to the editor, “The authors needed to state their limitations and warn the reader that their results were inconclusive.” Being aware of the limitations is one thing and clearly disclosing them is another thing.
  Any statistically non-significant result in the absence of a high power is considered inconclusive.
  I think if we study the notion of statistical significance more deeply, we would agree that my points were not my own interpretation of statistics, but were simply the very basics of statistics.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:24974990
Visit annotations in context

Tags

PubMedCommonsArchive

PMID:24974990

Annotators

PubMedCommonsArchive

URL

europepmc.org/abstract/MED/24974990

Tags

Annotators

URL

Tags

Annotators

URL