2 Matching Annotations
  1. Jul 2018
    1. On 2015 Jan 03, Harri Hemila commented:

      The review by Caruso TJ, 2007 is misleading about the effect of zinc on the common cold

      In their systematic review, Caruso TJ, 2007 identified 14 zinc trials. They used the quality scoring approach so that for the identified trials they gave 1 point for each of 11 quality items when the requirements were satisfied. In 2 tables and 1 figure, Caruso et al. described the distribution of quality scores and the individual quality features of the trials.

      Caruso TJ, 2007 considered that only studies with the full 11 points were valid: “Four studies met all 11 criteria. Three of these studies reported no therapeutic effect from zinc lozenge or nasal spray. One study reported positive results from zinc nasal spray.” On the basis of this 3 vs. 1 comparison (so called “vote counting”) Caruso et al. concluded that “the therapeutic effectiveness of zinc lozenges has yet to be established.” They proposed that the positive findings for zinc could be explained by methodological faults in the trials.

      The approach to evaluate the quality of trials by a set of explicit criteria was initiated by Thomas Chalmers in the 1980s, see Chalmers TC, 1981. Thereafter dozens of quality scales have been developed. However, the approach was not successful and indeed it is discouraged eg in the Cochrane Handbook (sec 8.3.3; Jan 2015), which states that:

      “The use of [quality] scales for assessing quality or risk of bias is explicitly discouraged in Cochrane reviews. While the approach offers appealing simplicity, it is not supported by empirical evidence. Calculating a summary score inevitably involves assigning ‘weights’ to different items in the scale, and it is difficult to justify the weights assigned. Furthermore, scales have been shown to be unreliable assessments of validity and they are less likely to be transparent to users of the review. It is preferable to use simple approaches for assessing validity that can be fully reported (i.e. how each trial was rated on each criterion).”

      One major problem of quality scoring is the focus on reporting in contrast to the scientific quality of the trial. For example, Caruso et al. gave 1 point if there was “measurement of dropout rate” in the trial. This means that a trial might report a high dropout rate, which means low scientific quality, yet the trial would get 1 point from Caruso et al., because the high dropout rate was reported explicitly. Caruso et al. also gave 1 point for “sample size calculation” which is important when a trial is planned, because it can show that the planned trial is too small. However, this is irrelevant after the trial is published, because then the confidence interval reveals the accuracy of the result. Most of Caruso et al.’s remaining 9 quality items have similar problems, see detailed comments in a separate document.

      Although it is important to consider the methods of a trial, there are no simple criteria that decide whether a trial is reliable or not. For example, in a meta-analysis of 276 RCTs, Balk EM, 2002 concluded that “double blinding and allocation concealment, two quality measures that are frequently used in meta-analyses, were not associated with the treatment effect” meaning that valid estimates of treatment effect can be reached without them. Furthermore, Glasziou P, 2007 pointed out that in some cases firm conclusions of treatment benefits can be drawn even without any control groups.

      Caruso TJ, 2007 did not present any numerical results of the trials, simply classifying them as positive (statistically significant effect) or negative (no statistically significant effect), even though such a “vote counting” approach is strongly discouraged eg by the Cochrane Handbook (sec 9.4.11; Jan 2015), which states that:

      “Two problems can occur with vote counting, which suggest that it should be avoided whenever possible. Firstly, problems occur if subjective decisions or statistical significance are used to define ‘positive’ and ‘negative’ studies… Secondly, vote counting takes no account of the differential weights given to each study.”

      Vote counting can lead to false negative conclusions because it ignores the quantitative findings. For example, a large number of placebo-controlled trials on vitamin C and the common cold found non-significant effects on common cold duration, but the results consistently favoured vitamin C. Quantitative pooling of the results shows a statistically highly significant benefit from the vitamin, see Hemilä H, 2013 and Douglas RM, 2005. Vote counting would simply look at the proportion of studies with P<0.05, ignoring the actual mean and SD values, thereby vote counting would lead to false negative conclusions about the effects of vitamin C.

      Caruso TJ, 2007 did not discuss the possibility that the dose of zinc or the lozenge composition might have an effect on trial results, nor did they refer to any of the numerous papers that discussed the possibility that the level of free zinc ion might be an important variable in zinc lozenge trials by Godfrey JC, 1988, Eby GA, 1988, Martin RB, 1988, Eby GA, 1997, Eby GA, 2001, Eby GA, 2004.

      Although Caruso et al. focused on the methodological features that are mostly irrelevant to trial validity, they stated that a “common deficiency [in the zinc trials] was proof of blinding which was lacking in 7 studies. The placebo effect in the treatment of colds was first shown >70 years ago and has since been demonstrated in subsequent studies.” As a justification for this statement, Caruso et al. refer to the Thomas Chalmers review (1975), Chalmers TC, 1975, and the Thomas Karlowski and Thomas Chalmers trial (1975), Karlowski TR, 1975.

      However, when Caruso TJ, 2007 wrote those sentences in their zinc review, they had already been informed that those 2 papers were erroneous, because I had pointed out problems with those 2 papers in a criticism of a previous Caruso TJ, 2005 review on echinacea and the common cold. In a letter-to-the-editor, I wrote that: “The Chalmers review (1975) was shown to be erroneous a decade ago; it has data inconsistent with the original study publications, errors in calculations, and other problems”, see the letter Hemilä H, 2005.

      The Karlowski TR, 1975 trial found statistically significant benefits of vitamin C against the common cold, yet paradoxically Karlowski et al. concluded that “the effects [of vitamin C] demonstrated might be explained equally well by a break in the double blind.” My comments on the Caruso TJ, 2005 review on echinacea and the common cold briefly summarized the problems of the Karlowski trial (1975) as follows: “the [Karlowski] subgroup analysis excluded 105 episodes of common cold (42% of all episodes of cold), even though the 2 subgroups were presented as if they were complementary. There are numerous additional problems with Karlowski’s placebo effect explanation, and, consequently, it is not a valid interpretation to the study results.” see the letter Hemilä H, 2005.

      My letter-to-the-editor, Hemilä H, 2005 referred to studies that documented in detail the problems of the Chalmers review (1975) in Hemilä H, 1995, and the problems of the Karlowski trial (1975) in Hemilä H, 1996. Thus, in their zinc and the common cold review in 2007, Caruso TJ, 2007 still referred to those old erroneous papers by Chalmers and Karlowski although they were aware of my criticisms, since they had read and responded to my letter-to-the-editor.

      See specific comments on the quality items of the Caruso et al. 11 point scoring system in a separate document.

      Another systematic review on zinc lozenges concluded that there is strong evidence that high dose zinc acetate lozenges shorten the duration of colds by 42%, see Hemilä H, 2011.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2015 Jan 03, Harri Hemila commented:

      The review by Caruso TJ, 2007 is misleading about the effect of zinc on the common cold

      In their systematic review, Caruso TJ, 2007 identified 14 zinc trials. They used the quality scoring approach so that for the identified trials they gave 1 point for each of 11 quality items when the requirements were satisfied. In 2 tables and 1 figure, Caruso et al. described the distribution of quality scores and the individual quality features of the trials.

      Caruso TJ, 2007 considered that only studies with the full 11 points were valid: “Four studies met all 11 criteria. Three of these studies reported no therapeutic effect from zinc lozenge or nasal spray. One study reported positive results from zinc nasal spray.” On the basis of this 3 vs. 1 comparison (so called “vote counting”) Caruso et al. concluded that “the therapeutic effectiveness of zinc lozenges has yet to be established.” They proposed that the positive findings for zinc could be explained by methodological faults in the trials.

      The approach to evaluate the quality of trials by a set of explicit criteria was initiated by Thomas Chalmers in the 1980s, see Chalmers TC, 1981. Thereafter dozens of quality scales have been developed. However, the approach was not successful and indeed it is discouraged eg in the Cochrane Handbook (sec 8.3.3; Jan 2015), which states that:

      “The use of [quality] scales for assessing quality or risk of bias is explicitly discouraged in Cochrane reviews. While the approach offers appealing simplicity, it is not supported by empirical evidence. Calculating a summary score inevitably involves assigning ‘weights’ to different items in the scale, and it is difficult to justify the weights assigned. Furthermore, scales have been shown to be unreliable assessments of validity and they are less likely to be transparent to users of the review. It is preferable to use simple approaches for assessing validity that can be fully reported (i.e. how each trial was rated on each criterion).”

      One major problem of quality scoring is the focus on reporting in contrast to the scientific quality of the trial. For example, Caruso et al. gave 1 point if there was “measurement of dropout rate” in the trial. This means that a trial might report a high dropout rate, which means low scientific quality, yet the trial would get 1 point from Caruso et al., because the high dropout rate was reported explicitly. Caruso et al. also gave 1 point for “sample size calculation” which is important when a trial is planned, because it can show that the planned trial is too small. However, this is irrelevant after the trial is published, because then the confidence interval reveals the accuracy of the result. Most of Caruso et al.’s remaining 9 quality items have similar problems, see detailed comments in a separate document.

      Although it is important to consider the methods of a trial, there are no simple criteria that decide whether a trial is reliable or not. For example, in a meta-analysis of 276 RCTs, Balk EM, 2002 concluded that “double blinding and allocation concealment, two quality measures that are frequently used in meta-analyses, were not associated with the treatment effect” meaning that valid estimates of treatment effect can be reached without them. Furthermore, Glasziou P, 2007 pointed out that in some cases firm conclusions of treatment benefits can be drawn even without any control groups.

      Caruso TJ, 2007 did not present any numerical results of the trials, simply classifying them as positive (statistically significant effect) or negative (no statistically significant effect), even though such a “vote counting” approach is strongly discouraged eg by the Cochrane Handbook (sec 9.4.11; Jan 2015), which states that:

      “Two problems can occur with vote counting, which suggest that it should be avoided whenever possible. Firstly, problems occur if subjective decisions or statistical significance are used to define ‘positive’ and ‘negative’ studies… Secondly, vote counting takes no account of the differential weights given to each study.”

      Vote counting can lead to false negative conclusions because it ignores the quantitative findings. For example, a large number of placebo-controlled trials on vitamin C and the common cold found non-significant effects on common cold duration, but the results consistently favoured vitamin C. Quantitative pooling of the results shows a statistically highly significant benefit from the vitamin, see Hemilä H, 2013 and Douglas RM, 2005. Vote counting would simply look at the proportion of studies with P<0.05, ignoring the actual mean and SD values, thereby vote counting would lead to false negative conclusions about the effects of vitamin C.

      Caruso TJ, 2007 did not discuss the possibility that the dose of zinc or the lozenge composition might have an effect on trial results, nor did they refer to any of the numerous papers that discussed the possibility that the level of free zinc ion might be an important variable in zinc lozenge trials by Godfrey JC, 1988, Eby GA, 1988, Martin RB, 1988, Eby GA, 1997, Eby GA, 2001, Eby GA, 2004.

      Although Caruso et al. focused on the methodological features that are mostly irrelevant to trial validity, they stated that a “common deficiency [in the zinc trials] was proof of blinding which was lacking in 7 studies. The placebo effect in the treatment of colds was first shown >70 years ago and has since been demonstrated in subsequent studies.” As a justification for this statement, Caruso et al. refer to the Thomas Chalmers review (1975), Chalmers TC, 1975, and the Thomas Karlowski and Thomas Chalmers trial (1975), Karlowski TR, 1975.

      However, when Caruso TJ, 2007 wrote those sentences in their zinc review, they had already been informed that those 2 papers were erroneous, because I had pointed out problems with those 2 papers in a criticism of a previous Caruso TJ, 2005 review on echinacea and the common cold. In a letter-to-the-editor, I wrote that: “The Chalmers review (1975) was shown to be erroneous a decade ago; it has data inconsistent with the original study publications, errors in calculations, and other problems”, see the letter Hemilä H, 2005.

      The Karlowski TR, 1975 trial found statistically significant benefits of vitamin C against the common cold, yet paradoxically Karlowski et al. concluded that “the effects [of vitamin C] demonstrated might be explained equally well by a break in the double blind.” My comments on the Caruso TJ, 2005 review on echinacea and the common cold briefly summarized the problems of the Karlowski trial (1975) as follows: “the [Karlowski] subgroup analysis excluded 105 episodes of common cold (42% of all episodes of cold), even though the 2 subgroups were presented as if they were complementary. There are numerous additional problems with Karlowski’s placebo effect explanation, and, consequently, it is not a valid interpretation to the study results.” see the letter Hemilä H, 2005.

      My letter-to-the-editor, Hemilä H, 2005 referred to studies that documented in detail the problems of the Chalmers review (1975) in Hemilä H, 1995, and the problems of the Karlowski trial (1975) in Hemilä H, 1996. Thus, in their zinc and the common cold review in 2007, Caruso TJ, 2007 still referred to those old erroneous papers by Chalmers and Karlowski although they were aware of my criticisms, since they had read and responded to my letter-to-the-editor.

      See specific comments on the quality items of the Caruso et al. 11 point scoring system in a separate document.

      Another systematic review on zinc lozenges concluded that there is strong evidence that high dose zinc acetate lozenges shorten the duration of colds by 42%, see Hemilä H, 2011.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.