138 Matching Annotations
  1. Jul 2018
    1. On 2013 Jun 23, Hilda Bastian commented:

      This trial bears the predominant weight for safety concerns about single-session debriefing in a subsequent influential systematic review (Rose S, 2002, of which the lead trialist here is an author). Its results are potentially affected by multiple serious biases.

      The trial had a high attrition rate (>22%): 23 lost to follow-up (p78 - participants) and 7 who left hospital before intervention (p78 - results). The number of events was low.

      This trial report does not include an intention-to-treat analysis (ITT). ITT was imputed in the systematic review (Rose S, 2002), without description of the additional data or reporting the methods used, and whether or not sensitivity analyses were conducted.

      The intervention group was at higher risk of the event at baseline (25% of the intervention arm had others involved in the trauma vs 4% in the control arm, p=0.01; percentage of the body burned, life threat and past significant trauma were also higher, although not significantly so).

      There was a disproportionately large number in the intervention group (64 vs 46), due to the method of randomization and having stopped the trial early.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2018 Jan 01, Hilda Bastian commented:

      It is great to see randomized trials to test the effects of an infographic. However, I have concerns with the interpretation of the results of this set of 3 trials. The abstract states that these were randomized trials of 171 students, 99 consumers, and 64 doctors. However, those are the numbers of people who completed the knowledge and reading experience questions, not the number randomized: 171 students, 212 consumers, and 108 doctors were randomized. The extremely high dropout rate (e.g. 53% for consumers) leaves only the trial in students as a reliable base for conclusions. And for them, there was no difference in knowledge or reported reading experience - they did not prefer the infographic.

      The authors point out that the high dropout rate may have affected the results for consumers and doctors, especially as they faced a numeracy test after being given the infographic or summary to read. That must have skewed the results. In particular, since the infographic (here) has such different content to the plain language summary (here), this seems inevitably related to the issue of numeracy: the plain language summary is almost number-free, while the infographic is number-heavy (an additional 16 numerical expressions).

      The knowledge test comprised 10 questions, one of which related to the quality of the evidence included in the systematic review. The infographic and plain language summary contained very different information on this. The article's appendix suggests that the correct answer expected was included in the infographic but not in the plain language summary. It would be helpful to know whether this affected the knowledge scores for readers of the plain language summary.

      Cohen's d effect sizes are not reported for the 3 trials separately, and given the heterogeneity in those results, it is not accurate to use the combined result to conclude that all 3 participant groups preferred the infographic and reading it. (In addition, the method for the meta-analysis of effect sizes of the 3 trials is not reported.)

      The specific summary and infographic, although high quality, also point to some of the underlying challenges in communicating with these media to consumers. For example, the infographic uses a coffin as pictograph for mortality, which I don't believe is appropriate in patient information. This highlights the risks inherent in using graphic elements where there aren't well-established conventions. Both the infographic and the plain language summary focus on information about the baby's wellbeing and the birth - but not the impact of the intervention on the pregnant woman, or their views of it. Whatever the format, issues remain with the process of determining the content of research summaries for consumers. (I have written more about the evidence on infographics and this study here.)

      Disclosure: The Cochrane (text) plain language summaries were an initiative of mine in the early days of the Cochrane Collaboration, when I was a consumer advocate. Although I wrote or edited most of those early Cochrane summaries, I had no involvement with the one studied here.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Oct 28, Hilda Bastian commented:

      It would be useful if the authors could provide detail on two key issues not described in the paper. The first is the method for excluding identified references that were published subsequent to the date of the original searches.

      The second is how eligibility for study inclusion was assessed for the ESM group, and by whom. This is a key outcome measure, that is also highly susceptible to bias. A method for reducing this bias, for example, would be assessment by more than one assessor independent of those conducting the searches, blinded to the search strategy by which the study had been identified.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Sep 03, Hilda Bastian commented:

      This is a useful addition on an important topic, and is a good resource for other similar search strategies. Given it is such a long search strategy, it would be useful if the authors could provide a cut and paste version. Small point: the KiMS search strategy cited with reference number 27 in the article is actually at reference number 28 (Wessels M, 2016).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Jun 18, Hilda Bastian commented:

      An assessment of a critical problem, with important conclusions. It would be helpful, though, if the scope of the 4 guidelines were shown. The inclusion criteria are not very specific on this matter, and the citations of the versions of the 4 included guidelines are not provided.

      In addition to the scope, the dating of the guidelines' last search for evidence (if available) with respect to the dates of the systematic reviews would be valuable. Gauging to what extent systematic reviews were not included because of being out of scope, out of date, or not yet published is important to interpreting these findings. Given how quickly systematic reviews can go out of date (Shojania KG, 2007), the non-inclusion of older systematic reviews may have been deliberate.

      The publisher of the article does not appear to have uploaded Appendix A, which includes the references to the systematic reviews. Further, confusion has been created by linking the citations of the first 44 systematic reviews to the references of the article's texts. The end result is that neither the 4 guidelines nor the 71 systematic reviews are identifiable. It would be helpful if the authors would post these 75 citations here.

      Disclosure: I work on PubMed Health, the PubMed resource on systematic reviews and information based on them.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Aug 02, Hilda Bastian commented:

      Thank you for the reply, Professor Sacks. However, the reply does not address the errors I pointed to, nor respond directly the key problems I raised. Much of it is directed to rebutting claims I did not make.

      ... (1) Lack of reporting on the processes for selecting evidence

      My first point was that although the statement asserts that the totality of evidence and recent studies was reviewed, it does not report the process for identifying the systematic reviews it selected. No validated method for evaluating the systematic reviews is reported, and reasons for excluding each of the trials in the chosen systematic reviews are not reported either (with the exception of 6 trials, accounting for 10 trials in total). Hamley S, 2017, for example, lists 19 randomized trials on the question of replacing saturated with polyunsaturated fat, drawn from 8 systematic reviews/meta-analyses (Table 2). I stress that my point here is not related to the conclusions, but rather to the adequacy and transparency of the methodology.

      The totality of evidence approach considering a variety of research types does not obviate the need to explain how the studies were sought, selected, and appraised (Institute of Medicine (US) Committee on Standards for Developing Trustworthy Clinical Practice Guidelines, 2011).

      ... (2) Singling out coconut oil

      The reply reiterates a statement based on a single survey and people's beliefs about coconut oil. But there is no data to show that dietary coconut oil is consumed at levels that warrant this attention, whereas palm oil, for example, does not. I am not sure whether the data I could find on this is an accurate reflection or not (Bastian, June 2017). If it is, however, then the issue of replacing palm oil in commercially produced food would have warranted more attention than coconut oil. Given the very different standards applied to studies of coconut oil, the question of why it was addressed at all, when so much else in scope was not, remains a relevant one.

      ... (3) Inadequacy of Eyres L, 2016 as a basis for wide-ranging conclusions on health effects of coconut oil

      I reiterate the point I made: the conclusions that clinical trials on the effects on CVD measures have not been reported, and that there are "no known offsetting favorable effects" would require a high-quality systematic review on the effects of coconut oil on both CVD and non-CVD health outcomes of dietary coconut oil. Eyres L, 2016 is not that review. Whichever of the validated and accepted methodologies for assessing the quality of a systematic review you would use (Pussegoda K, 2017), the Eyres review would not fare well. It does not include elements required for a high quality systematic review - such as reporting on the excluded studies and including a study-by-study assessment of the methodological characteristics and risk of bias of included studies. More importantly, its scope is too narrow.

      I identified the 8 trials in the 7 papers I mention, in a quick search to test the adequacy of coverage of the Eyres review. I only included those on CVD outcomes. There are undoubtedly further relevant trials. That short search though, established the limits of scope of the Eyres review, even in CVD health.

      This is how the authors of the Eyres characterize the evidence they found:

      "Much of the research has important limitations that warrant caution when interpreting results, such as small sample size, biased samples, inadequate dietary assessment, and a strong likelihood of confounding. There is no robust evidence on disease outcomes, and most of the evidence is related to lipid profiles."

      I agree with that assessment, and the reply offers no methodologically sound counter to this. Instead, the studies not in the Eyres review were critiqued. The reply cites these criteria for excluding all but 3 of the 8 studies as acceptable for consideration (presumably the 2 reported in a single paper were regarded as a single study):

      [A]mong the 7 studies...4 would appropriately be excluded as result of being non-randomized, uncontrolled, using a very small amount, not including a control group or not even being a trial of coconut oil.

      I don't really know what to make of "uncontrolled" and "not including a control group" as 2 criteria, given all these trials are controlled: the final 3 that aren't rejected don't make it clear to me either. No threshold is offered for what is a large enough dose, so I can't work with that either. However, I took the other 2 - randomized or not, and having a solely coconut oil arm as objective criteria I could apply to the 8 trials within Eyres and the 8 trials outside it (and extracted some additional data). This is reported in full on a blog post (Bastian, August 2017). In summary:

      • The Eyres group has fewer randomized trials: 4/8 compared to 7/8 in the non-Eyres group (or 6/7 for non-Eyres after knocking out the trial with no separate coconut arm).
      • There are fewer randomized participants in the Eyres group: 143 compared to 234 in 6 non-Eyres randomized trials with a separate coconut arm.
      • All the trials in the Eyres group only look at blood lipid profiles whereas most in the non-Eyres group assess at least 1 non-blood-test outcome (5/8 or 4/7). That is in part because of the Eyres exclusion criteria (such as rejecting any trial in a specific population or clinical subgroup, such as overweight people).

      The Eyres group cannot be regarded as an adequate or representative subset of trials. And the same level of critique has not been applied even-handedly.

      ... (4) Errors in representation of the Eyres findings on coconut oil versus other saturated fats

      As this was not addressed in the reply, I'll reiterate it, with additional detail. This is what the Eyres review concludes on this question:

      "In comparison with other fat sources, coconut oil did not raise total or LDL cholesterol to the same extent as butter in one of the studies by Cox et al., but it did increase both measures to a greater extent than did cis unsaturated vegetable oils...[W]hen the data from the 5 trials that directly compared coconut oil with another saturated fat are examined collectively, the results are largely inconsistent".

      This is what the AHA writes:

      "The authors also noted that the 7 trials did not find a difference in raising LDL cholesterol between coconut oil and other oils high in saturated fat such as butter, beef fat, or palm oil".

      As there was no meta-analysis of these trials, there is no single estimate to discuss. Of the 5 that did include a comparison with saturated fats, there were differences among their results: the AHA had pointed out 1 of them just a few sentences previous to their "no difference" statement. This is objectively a mis-statement of the Eyres' review's findings, which results in an exaggeration of the strength of the evidence.

      Nothing in the reply to my comment changes, for me, the conclusion I came to in my first blog post on this:

      "On coconut oil, the AHA has taken a stand on very shaky ground with some major claims – as though they had a very strong systematic review of reliable research on all possible health consequences of dietary coconut oil. They don’t. The people arguing the opposite – that coconut oil is so healthy you should try to use it every day – are also on shaky ground".

      Disclosure: I have no financial, livelihood, or intellectual conflicts of interest in relation to coconut or dietary fats. I discuss my personal, social, and professional biases in a blog post that discusses the AHA advisory on coconut oil in detail. (Bastian, August 2017). This PubMed Commons comment also contains some excerpts from that post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Jul 24, Frank M Sacks commented:

      On behalf of the authors, I respond to comments by Hilda Bastian about the American Heart Association Presidential Advisory on Dietary Fats and Cardiovascular disease Sacks FM, 2017.

      The comprehensive advisory includes, (i) Clinical trials that tested the effects of dietary saturated fat compared to unsaturated fat or carbohydrate on cardiovascular disease (CVD) events, e.g. heart attack, (ii) Clinical trials that tested the effects of dietary fats on lipid risk factors, e.g. LDL-cholesterol, (iii) Prospective epidemiological studies on dietary fats and carbohydrates and CVD, and (iv) Animal models of diet and atherosclerosis. Thus, it reflects the “totality of evidence”. The confluence of findings provides a very strong scientific case for the recommendation that dietary saturated fat be replaced with unsaturated fat, especially polyunsaturated fat.

      Recent systematic reviews and meta-analyses Mozaffarian D, 2010, Chowdhury R, 2014, Hooper <PMID: 26068959 used well accepted methodologies, and included trials published up to 2009, 2013, and 2014. Only a small number of clinical trials evaluated direct effects of dietary fat on CVD. Most of these studies, and all that have an impact on the overall findings, were conducted years ago, and are well known. Contrary to the Bastian’s comments, there are no more recent trials on this topic. These 3 meta-analyses each confirm the beneficial effect of replacing saturated with polyunsaturated fat. The similarity of findings lends robustness to the overall conclusions of the report. The meta-analyses and all the individual trials are discussed critically in detail in the advisory.

      Because the topic of the advisory is the effect of dietary fats on CVD, coconut oil is well within its scope. Coconut oil is currently rated as a healthy oil by 72% of the American public, despite its composition derived from 98% saturated fats, which increase the blood level of LDL-cholesterol, a cause of atherosclerosis and CVD. The meta-analysis by Mensink reports the quantitative effects on LDL-cholesterol of the saturated fats that are in coconut oil, mainly lauric, myristic, and palmitic acids. Each of these increase LDL-cholesterol compared to carbohydrate and more so when compared to the unsaturated fats. This is sufficient to warn the public about anticipated adverse effects of coconut oil on CVD.

      Some studies tested coconut oil itself, and found that it increases LDL-cholesterol as would be predicted by its saturated fat content. These studies were identified and summarized in the systematic review by Eyres L, 2016 which used rigorous, well-accepted methodology. The criteria for inclusion of an article in the systematic review were well conceived. Eyres et al. concluded, “Overall, the weight of the evidence from intervention studies to date suggests that replacing coconut oil with cis unsaturated fats would alter blood lipid profiles in a manner consistent with a reduction in risk factors for cardiovascular disease.” Bastian implies that this systematic review is composed of weak studies and omitted several studies that would affect the conclusion of the advisory to avoid eating coconut oil. This is not true. Eyres et al. identified eight studies; all were controlled clinical trials that used valid nutritional protocols and statistical analyses. All reported higher LDL-C levels when coconut oil was consumed compared to unsaturated oils, including olive, corn and soybean oils, statistically significantly in 7 of them. Together, these trials included populations from the US, Sri Lanka, New Zealand, Pacific Islands, and Malaysia, demonstrating generalizability. There is no objective scientific reason to disparage them. The only substantive criticisms mentioned by Bastian are a short duration and small sample. These criticisms are unwarranted. Effects of diet on blood lipids, especially LDL-cholesterol, are established quickly, by 2 weeks. A small sample, with careful dietary control and execution, can yield a well-powered trial with valid results. In summary, the 8 trials in the Eyres et al. systematic review provide strong evidence that coconut oil increases LDL-C levels compared with unsaturated oils.

      What about the 7 studies named by Bastian that were not included in the systematic review? McKenney JM, 1995 reported that coconut oil increased LDL-cholesterol significantly by 12% compared with canola oil in 11 patients with hypercholesterolemia. In a second study in 17 patients treated with lovastatin, LDL-C increased nonsignificantly in the coconut oil period. Thus, the results of this small study would add to the overall effects of coconut oil shown in the other studies to increase LDL-cholesterol. Ganji V, 1996 reported that coconut oil increased LDL-cholesterol compared to soybean oil in 10 normal participants. Assunção ML, 2009 reported no difference in the effects of coconut and soybean oils on LDL-cholesterol levels. However, LDL-cholesterol levels increased during the soybean oil period, clearly an anomolous result. Cardoso DA, 2015 conducted a nonrandomized study comparing coconut oil, 13 mL per day, with no supplemental oil. Because there is no control for the coconut oil, it is unclear how to interpret the lack of difference in LDL-cholesterol between the groups. de Paula Franco E, 2015 conducted a sequential study of a calorie-reduced diet followed by coconut flour, 26 g per day. This study was not randomized and did not have a control group. Enns reported in her Ph.D. degree dissertation at the University of Manitoba the results of a randomized trial that compared a 2:1:1 mix of butter, coconut oil, and high-linoleic safflower oil, 25 g per day, with canola oil, 25 g per day. This trial did not claim to be a study on the effects of coconut oil. Finally, Shedden reported in her M.S. degree thesis at Arizona State University the results of a placebo-controlled randomized trial of coconut oil, 2 g per day. This miniscule amount of coconut oil did not affect LDL-cholesterol. In summary, among the 7 studies cited by Bastian not in the Eyers review, 4 would appropriately be excluded as result of being non-randomized, uncontrolled, using a very small amount, not including a control group or not even being a trial of coconut oil. Among the 3 randomized trials, McKenney et al., Ganji et al. and Assuncao et al., the first two found that coconut oil increased LDL-cholesterol levels. The trial of Assuncao et al. would likely fail an outlier test because it is the only one among 12 studies in which LDL-C levels is lower on coconut than soybean oil. Given the differences in study designs, populations, and localities, the results of coconut oil trials are remarkably uniform showing that it increases LDL-cholesterol levels, an established cause of cardiovascular disease.

      Bastian employs a tactic in common with some other critics of good nutritional science, namely, to a) disparage and misrepresent high quality studies that show harmful effects of saturated fat; b) promote and misrepresent seriously flawed and irrelevant studies that report the opposite; and c) cite meta-analyses with faulty designs often based on inclusion of flawed studies. We offer a challenge to those who assert health benefits to coconut oil, or saturated fat, in general. Produce well-designed and executed studies that show that there are beneficial effects on a bona fide health outcome or a recognized surrogate, e.g., LDL-cholesterol.

      Frank M. Sacks, for the authors.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2017 Jun 30, Hilda Bastian commented:

      The authors state that this advisory "reviews and discusses the scientific evidence, including the most recent studies", and that its primary recommendation is made, "taking into consideration the totality of the scientific evidence, satisfying rigorous criteria for causality". They do not report what evidence was sought and how, or the basis upon which it was selected. There is little in this report to suggest that "the totality of scientific evidence" was considered.

      For example, four reviews of trials are referred to:

      However, the more recent systematic review and meta-analysis within Ramsden CE, 2016 (date of last search March 2015) was not mentioned. Nor are, for example, these systematic reviews: Skeaff CM, 2009; Stroster, 2013; National Clinical Guideline Centre (UK), 2014; Schwingshackl L, 2014; Pimpin L, 2016.

      The AHA advisory includes sections reviewing two specific sources of saturated fat, dairy and coconut oil. Dairy products are a major source of dietary saturated fats. However, no basis for singling out coconut oil is offered, or for not addressing evidence about other, and larger, sources of saturated fats in Americans' diets. The section concludes: "we advise against the use of coconut oil".

      There are three conclusions/statements leading to that recommendation:

      • Eyres L, 2016 "noted that the 7 trials did not find a difference in raising LDL cholesterol between coconut oil and other oils high in saturated fat such as butter, beef fat, or palm oil."
      • "Clinical trials that compared direct effects on CVD of coconut oil and other dietary oils have not been reported."
      • Coconut oil increases LDL cholesterol "and has no known offsetting favorable effects".

      The only studies of coconut oil cited by the advisory to support these conclusions are one review (Eyres L, 2016) - reasonably described as a narrative, not systematic, review by its authors - and 7 of the 8 studies included in that review. The date of search of this study was the end of 2013 (with an apparently abbreviated update search, not fully reported, in 2015). Not only is that too long ago to be reasonably certain there are no recent studies, the review's inclusion and exclusion criteria are too narrow to support broad conclusions about coconut oil and CVD or other health effects.

      The AHA's first statement - that Eyres et al noted no difference between 7 trials comparing coconut oil with other saturated fats - is not correct. Only 5 small trials included such comparisons, and their results were inconsistent (with 2 of the 3 randomized trials finding a difference). There was no meta-analysis, so there was no single summative finding. The trials in question are very small, none lasting longer than eight weeks, and have a range of methodological quality issues. The authors of the Eyres review caution about interpreting conclusions based on the methodologically limited evidence in their paper. In accepting these trials as a reliable basis for a strong recommendation, the AHA has not applied as rigorous a standard of proof as they did for the trials they designated as "non-core" and rejected for their meta-analysis on replacing dietary saturated fat with polyunsaturated fat.

      Further, even a rapid, unsystematic search shows that there are more participants in relevant randomized trials not included in the Eyres review than there are randomized participants within it. For example: McKenney JM, 1995; Ganji V, 1996; Assunção ML, 2009; Cardoso DA, 2015; de Paula Franco E, 2015; and Enns, 2015 (as well as another published since the AHA's panel finished its work, Shedden, 2017).

      The conclusions of the coconut oil section of the AHA advisory are not supported by the evidence it cites. A high quality systematic review that minimizes bias is required to draw any conclusion about the health effects of coconut oil.

      Disclosure: I have no financial, livelihood, or intellectual conflicts of interest in relation to coconut or dietary fats. I discuss my personal, social, and professional biases in a blog post that discusses the AHA advisory on coconut oil in detail (Bastian, June 2017).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 May 26, Hilda Bastian commented:

      These are interesting results, showing the critical importance of transparency about trials of pharmaceuticals. However, it does not identify the trials it found, or identify the phases of those trials. It would be helpful if the authors were to release these data, for those interested in the results of this study, anyone interested in doing similar work, and those looking for trials on these particular drugs.

      The abstract reports the number of participants in the unpublished trials. It would be good to also provide the number of participants in the published trials.

      Note: I wrote a blog post about this study and its context.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 May 06, Hilda Bastian commented:

      The conclusion that implicit bias in physicians "does not appear to impact their clinical decision making" would be good news, but this systematic review does not support it. Coming to any conclusion at all on this question requires a strong body of high quality evidence, with representative samples across a wide range of representative populations, using real-life data not hypothetical situations. None of these conditions pertain here. I think the appropriate conclusion here is that we still do not know what role implicit racial bias, as measured by this test, has on people's health care.

      The abstract reports that "The majority of studies used clinical vignettes to examine clinical decision making". In this instance, "majority" means "all but one" (8 out of 9). And the single exception has a serious limitation in that regard, according to Table 1: "pharmacy refills are only a proxy for decision to intensify treatment". The authors' conclusions are thus related, not to clinical decision making, but to hypothetical decision making.

      Of the 9 studies, Table 1 reports that 4 had a low response rate (37% to 53%), and in 2 studies the response rate was unknown. As this is a critical point, and an adequate response rate was not defined in the report of this review, I looked at the 3 studies (albeit briefly). I could find no response rate in any of the 3. In 1 of these (Haider AH, 2014), 248 members of an organization responded. That organization currently reports having over 2,000 members (EAST, accessed 6 May 2017). (The authors report that only 2 of the studies had a sample size calculation.)

      It would be helpful if the authors could provide the full scoring: given the limitations reported, it's hard to see how some of these studies scored so highly. This accepted manuscript version reports that the criteria themselves are available in a supplement, but that supplement was not included.

      It would have been helpful if additional important methodological details of the included studies were reported. For example, 1 of the studies I looked at (Oliver MN, 2014) included an element of random allocation of race to patient photos in the vignettes: design elements such as this were not included in the data extraction reported here. Along with the use of a non-validated quality assessment method (9 of the 27 components of the instrument that was modified), these issues leave too many questions about the quality rating of included studies. Other elements missing from this systematic review (Shea BJ, 2007) are a listing of the excluded studies and assessing the risk of publication bias.

      The search strategy appears to be incompletely reported: it ends with an empty bullet point, and none of the previous bullet points refer to implicit bias or the implicit association test.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Apr 28, Hilda Bastian commented:

      This is an interesting methodological approach to a thorny issue. But the abstract and coverage (such as in Nature glosses over the fact that the results measure the study method's biases more than they measure scientists on Twitter. I think the method is inferring people who are a subset of people working in limited science-based professions.

      The list of professions sought is severely biased. It includes 161 professional categories and their plural forms, in English only. It was based on a U.S. list of occupations (SOC) and an ad hoc Wikipedia list. A brief assessment of the 161 titles in comparison with an authoritative international list shows a strong skew towards social scientists and practitioners of some science-based occupations, and away from meical science, engineering, and more (United Nations Educational, Scientific and Cultural Organization (UNESCO)'s nomenclature for fields of science and technology, SKOS).

      Of the 161 titles, 17% are varieties of psychologist, for example, but psychiatry isn't there. Genealogists and linguists are there, but geometers, biometricians, and surgeons are not. The U.S. English language bias is a major problem for a global assessment of a platform where people communicating with the general public.

      Influence is measured in 3 ways, but I couldn't find a detailed explanation of the calculations or a reference to one, in the paper. It would be great if the authors could point to that here. More detail on the "Who is who" service used in terms of how up-to-date it is would be useful as well.

      I have written more about this paper at PLOS Blogs, and point to key numbers that aren't reported, for who was excluded at different stages. The paper says that data sharing is limited by Twitter's terms of service, but it doesn't specify what that covers. Providing a full list of proportions in the 161 titles, and descriptions of more than 15 of the communities they found (none of which appear to be medical science circles), seem unlikely to be affected by that restriction. More data would be helpful to anyone trying to make sense of these results, or extend the work in ways that minimize the biases in this first study.

      There is no research cited that establishes the representativeness of data from a method that can only classify less than 2% of people who are on multiple lists. The original application of the method (Sharma, 2011) was aimed at a very different purpose, so representativeness was not such a big issue there. There was no reference in this article to data on list-creating behavior. There could be a reason historians came out on top in this group: list-curating is probably not a randomly-distributed proclivity.

      It might be possible with this method to better identify Twitter users who work in STEM fields. Aiming for "scientists", though, remains, it seems to me, unfeasible at scale. Methods described by the authors as product-centric (e.g. who is sharing links to scientific articles and/or discussing them, or discussing blogs where those articles are cited), and key nodes such as science journals and organizations seem essential.

      I would also be interested to know the authors' rationale for trying to exclude pseudonyms - as well as the data on how many were excluded. I can see why methods gathering citations for Twitter users exclude pseudonyms, but am not sure why else they should be excluded. A key reason for undertaking this kind of analysis is to understand to what extent Twitter expands the impact of scientific knowledge and research. That inherently means looking to wider groups, and the audiences for their conversations. Thank you to the authors, though, for a very interesting contribution to this complex issue.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Feb 01, Hilda Bastian commented:

      The authors raise interesting and important points about the quandaries and complexities involved in updating a systematic review and reporting the update. However, their review of the field and conclusion that of the 250 journals they looked at, only BMC Systematic Reviews has guidance on the process of updating is deeply flawed.

      One of the 185 journals in the original sample they included (Page MJ, 2016) is the Cochrane Database of Systematic Reviews. Section 3.4 of the Cochrane Handbook is devoted to updating, and updating is addressed within several other sections as well. The authors here refer to discussion of updating in Cochrane's MECIR standards. Even though this does not completely cover Cochrane's guidance to authors, it contradicts the authors' conclusion that BMC Systematic Reviews is the only journal with guidance on updating.

      The authors cite a recent useful analysis of guidance on updating systematic reviews (Garner P, 2016). Readers who are interested in this topic could also consider the broader systematic review community and methodological guidance. Garritty C, 2010 found 35 organizations that have policy documents at least on updating, and many of these have extensive methodological guidance, for example AHRQ (Tsertsvadze A, 2008). Recently, guidelines for updating clinical guidelines have also been published (Vernooij RW, 2017).

      The authors reference some studies that address updating strategies, however this literature is quite extensive. You can use this filter in PubMed along with other search terms for studies and guidance: sysrev_methods [sb] (example). (An explanation of this filter is on the PubMed Health blog.)

      Disclosure: I work on PubMed Health, the PubMed resource on systematic reviews and information based on them.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Sep 18, Hilda Bastian commented:

      Thanks, John - that's as close we'll get, and we do agree on far more than we disagree, as ever.

      I agree we should face the data, and be meticulous about it. I just don't agree that indexing has the same effect on a tagged category as it has for a filter: especially not when the filter is so broad that it encompasses the variety of terms people use to describe their work. I remain convinced that the appropriate time trend comparators are filter to filter, with triangulation of sources. I don't think it's highly likely that 90% of the RCTs are in the first 35% of tagged literature.

      I don't think people should hold off publishing a systematic review that was done before deciding to fund or run a trial, until a report of the trial or its methods is published - and ideally, they would be done by different people. Intellectual conflicts of interest can be as powerful as any other. And I don't think that trialists interpreting what their trial means in the context of other evidence meets the criterion, unconflicted. Nor do I think the only systematic reviews we need are those with RCTs.

      I don't think Cochrane reviews are all good quality and unconflicted - in fact, the example of a conflicted review with quality issues in my comment was a Cochrane review. I agree there is no prestigious name that guarantees quality. (It's a long time since I left the Cochrane Collaboration, by the way.) My comments aren't because I disagree that there is a flood of bad quality "systematic" reviews and meta-analyses: the title of your article is one of the many things I agree with. See for example here, here, and quite a few of my comments on PubMed Commons.

      But the main reason for this reply is to add into this stream the reason I feel some grounds for optimism about something else we would both fervently agree on: the need to chip away at the problem of extensive under-reporting of clinical trials. As of January 2017, the mechanisms and incentives for reporting a large chunk of trials - those funded by NIH and affected by the FDA's scope - will change (NIH, 2016). Regardless of what happens with synthesis studies, any substantial uptick in trial reporting would be great news.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Sep 18, Hilda Bastian commented:

      Thanks, John, for taking this so seriously - that's extremely helpful and I certainly agree with you that the rate of publication of systematic reviews is growing faster than RCTs. So the point you are talking about may well be reached at some point. Unless the rate of growth equalizes, and unless the rate of RCTs that are unpublished drops substantially: and both of those remain possible.

      This comparison is much better, but it can't solve the underlying issues. Human indexing resources did not increase exponentially along with the exponential increase of literature. As of today, searching for PubMed records with 2015 in the PubMed entry date [EDAT], only 35% also have a 2015 date for completed indexing [DCOM] (which from PubMed Help looks to me the way you would check for that - but an information specialist may correct me here). That's roughly what I would expect to see: individually indexing well over a million records a year is a colossal undertaking. Being finished 2015 in just a few months while 2016 priorities are pouring in would be amazing. And we know that no process of prioritizing journals will solve this problem for trials, because the scatter across journals is so great (Hoffmann T, 2012).

      So any comparison between a tagged set (RCTs) and a search based on a filter with text words (which includes systematic review or meta-analysis in the title or abstract), could generate potentially very biased estimates, no matter how carefully the results are analyzed. And good systematic reviews of non-randomized clinical trials, and indeed, other methodologies - such as systematic reviews of adverse events, qualitative studies, and more - are valuable too. Many systematic reviews would be "empty" of RCTs, but that doesn't make them useless by definition.

      I couldn't agree with you more enthusiastically, though, that we still need more, not fewer, well-done RCTs, systematic reviews, and meta-analyses by non-conflicted scientists. I do add a caveat though, when it comes to RCTs. RCTs are human experimentation. It is not just that they are resource-intensive: unnecessary RCTs and some of the ways that RCTs can be "bad", can cause direct harm to participants, in a way that an unnecessary systematic review cannot. The constraints on RCTs are greater: so they need to be done on questions that matter the most and where they can genuinely provide better information. If good enough information can come from systematically reviewing other types of research, then that's a better use of scarce resources. And if only so many RCTs can be done, then we need to be sure we do the "right" ones.

      For over 20 years, Iain Chalmers has argued that an RCT should not be done without a systematic review to show the RCT is justified - and there should be an update afterwards. Six years ago - he, Mike Clarke and Sally Hopewell concluded that we were nowhere near achieving that (Clarke M, 2010). The point you make about the waste in systematic reviewing underscores that point, too. But in the ideal world, isn't a greater number of systematic reviews than RCTs just the way it should be?


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2016 Sep 16, Hilda Bastian commented:

      Thanks, John, for the reply - and for giving us all so much to think about, as usual!

      I agree that there are meta-analyses without systematic reviews, but the tagged meta-analyses are included in the filter you used: they are not additional (NLM, 2016). It also includes meta-analysis in the title, guidelines, validation studies, and multiple other terms that add non-systematic reviews, and even non-reviews, to the results.

      In Ebrahim S, 2016, 191 primary trials in only high impact journals were studied. Whether they are typical of all trials is not clear: it seems unlikely that they are. Either way, hundreds of reports for a single trial is far from common: half the trials in that sample had no secondary publications, only 8 had more more than 10, and none had more than 54. Multiple publications from a single trial can sometimes be on quite different questions, which might also need to be addressed in different systematic reviews.

      The number of trials has not been increasing as fast as the number of systematic reviews, but the number has not reached a definite ongoing plateau either. I have posted an October 2015 update to the data using multiple ways to assess these trends in the paper by me, Paul Glasziou, and Iain Chalmers from 2010 (Bastian H, 2010) here. Trials have tended to fluctuate a little from year to year, but the overall trend is growth. As the obligation to report trials grows more stringent, the trend in publication may be materially affected.

      Meanwhile, "systematic reviews" in the filter you used have not risen all that dramatically since February 2014. For the whole of 2014, there were 34,126 and in 2015 there were 36,017 (with 19,538 in the first half of 2016). It is not clear without detailed analysis what part of the collection of types of paper are responsible for that increase. The method used to support the conclusion here about systematic reviews of trials overtaking trials themselves was to restrict the systematic review filter to those mentioning trials or treatment - “trial* OR randomi* OR treatment*”. That does not mean the review is of randomized trials only: no randomized trial need be involved at all, and it doesn't have to be a review.

      Certainly, if you set the number of sizable randomized trials high, there will be fewer of them than of all possible types of systematic review: but then, there might not be all that many very sizable, genuinely systematic reviews either - and not all systematic reviews are influential (or even noticed). And yes, there are reviews that are called systematic that aren't: but there are RCTs called randomized that aren't as well. What's more, an important response to the arrival of a sizeable RCT may well be an updated systematic review.

      Double reports of systematic reviews are fairly common in the filter you used too, although far from half - and not more than 10. Still, the filter will be picking up protocols as well as their subsequent reviews, systematic reviews in both the article version and coverage in ACP Journal Club, the full text of systematic reviews via PubMed Health and their journal versions (and the ACP Journal Club coverage too), individual patient data analyses based on other systematic reviews, and splitting a single systematic review into multiple publications. The biggest issue remains, though, that as it is such a broad filter, casting its net so very wide across the evidence field, it's not an appropriate comparator for tagged sets, especially not in recent years.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2016 Sep 16, Hilda Bastian commented:

      There are many important issues raised in this paper on which I strongly agree with John Ioannidis. There is a lot of research waste in meta-analyses and systematic reviews, and a flood of very low quality, and he points out the contributing factors clearly. However, there are some issues to be aware of in considering the analyses in this paper on the growth of these papers, and their growth in comparison with randomized and other clinical trials.

      Although the author refers to PubMed's "tag" for systematic reviews, there is no tagging process for systematic reviews, as there is for meta-analyses and trials. Although "systematic review" is available as a choice under "article types", that option is a filtered search using Clinical Queries (PubMed Help), not a tagging of publication type. Comparing filtered results to tagged results is not comparing like with like in 2 critical ways.

      Firstly, the proportion of non-systematic reviews in the filter is far higher than the proportion of non-meta-analyses and non-trials in the tagged results. And secondly, full tagging of publication types for MEDLINE/PubMed takes considerable time. When considering a recent year, the gulf between filtered and tagged results widens. For example, as of December 2015 when Ioannidis' searches were done, the tag identified 9,135 meta-analyses. Today (15 September 2016), the same search identifies 11,263. For the type randomized controlled trial, the number tagged increased from 23,133 in December to 29,118 today.

      In the absence of tagging for systematic reviews, the more appropriate comparisons are using filters for both systematic reviews and trials as the base for trends, especially for a year as recent as 2014. Using the Clinical Queries filter for both systematic reviews and therapy trials (broad), for example, shows 34,126 for systematic reviews and 250,195 trials. Page and colleagues estimate there were perhaps 8,000 actual systematic reviews according to a fairly stringent definition (Page MJ, 2016) and the Centre for Reviews and Dissemination added just short of 9,000 systematic reviews to its database in 2014 (PubMed Health). So far, the Cochrane Collaboration has around 38,000 trials in its trials register for 2014 (searching on the word trial in CENTRAL externally).

      The number of systematic reviews/meta-analyses has increased greatly, but not as dramatically as this paper's comparisons suggest, and the data do not tend to support the conclusion in the abstract here that "Currently, probably more systematic reviews of trials than new randomized trials are published annually".

      Ioannidis suggests some bases for some reasonable duplication of systematic reviews - these are descriptive studies, with many subjective choices along the way. However, there is another critical reason that is not raised: the need for updates. This can be by the same group publishing a new version of a systematic review or by others. In areas with substantial questions and considerable ongoing research, multiple reviews are needed.

      I strongly agree with the concerns raised about conflicted systematic reviews. In addition to the issues of manufacturer conflicts, it is important not to underestimate the extent of other kinds of bias (see for example my comment here). Realistically, though, conflicted reviews will continue, building in a need for additional reviewers to tackle the same ground.

      Systematic reviews have found important homes in clinical practice guidelines, health technology assessment, and reimbursement decision-making for both public and private health insurance. But underuse of high quality systematic reviews remains a more significant problem than is addressed here. Even when a systematic review does not identify a strong basis in favor of one option or another, that can still be valuable for decision making - especially in the face of conflicted claims of superiority (and wishful thinking). However, systematic reviews are still not being used enough - especially in shaping subsequent research (see for example Habre C, 2014).

      I agree with Ioannidis that collaborations working prospectively to keep a body of evidence up-to-date is an important direction to go - and it is encouraging that the living cumulative network meta-analysis has arrived (Créquit P, 2016). That direction was also highlighted in Page and Moher's accompanying editorial (Page MJ, 2016). However, I'm not so sure how much of a solution this is going to be. The experience of the Cochrane Collaboration suggests this is even harder than it seems. And consider how excited people were back in 1995 at the groundbreaking publication of the protocol for prospective, collaborative meta-analysis of statin trials (Anonymous, 1995) - and the continuing controversy that swirls, tornado-like, around it today (Godlee, 2016).

      We need higher standards, and skills in critiquing the claims of systematic reviews and meta-analyses need to spread. Meta-analysis factories are a serious problem. But I still think the most critical issues we face are making systematic reviews quicker and more efficient to do, and to use good ones more effectively and thoroughly than we do now (Chalmers I, 2009, Tsafnat G, 2014).

      Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), including some aspects that relate to the inclusion of systematic reviews in PubMed. I co-authored a paper related to issues raised here several years ago (Bastian H, 2010), and was one of the founding members of the Cochrane Collaboration.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Jul 28, Isabelle Boutron commented:

      We would like to thank the Hilda Bastian for her interest in our work. We fully agree that our systematic review has some limitations and we acknowledged most of them in the paper. We also fully agree that the peer review system is a complex system and that we need different approaches to explore this system and that other study designs are also important to tackle this issue. We focused on randomised controlled trials as it provides a high level of evidence and one important result of this systematic review is the appalling lack of randomised controlled trials in this field. Despite huge human and financial investments in the peer review process, its essential role in biomedical research, only 7 RCTs have been published over the last 10 years. Yet, the conduct of randomised controlled trials in this field does not raise any important ethical or methodological concern. These results should be a call for action for editors to facilitate the conduct research in this field and give access to their data.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Jun 12, Hilda Bastian commented:

      This is a helpful broad brush update on randomized controlled trials (RCTs) of peer review interventions in biomedical journals (see older review Jefferson T, 2007 and my comment on that review). However, while the authors list several limitations, including restricting to RCTs and to biomedical journals, there are other limitations that, in turn, highlight the impact of those limitations.

      One of those is the outcomes addressed here. The focus is explicitly on the peer reviews themselves and the process, and not wider outcomes, such as potential benefits and harms to peer reviewers or the impact of policies such as open review on journals (e.g. level of unwillingness to review).

      In particular, the issue of harms brings us back to the limitation of looking only at RCTs, and limiting to the biomedical literature and a limited scope for databases searched. The authors provide no rationale for limiting the review to biomedical publications. Given that there are so few eligible studies within the scope this review, moving past this is essential. (In a blog post on anonymity, openness, and blinding of peer review in March 2015, in addition to the 11 RCTs identified in this systematic review, I identified a further 6 comparative studies, as well as other types of studies relevant to the questions around which known concerns exist.)

      Peer reviewers are not just a means to an end: biases of peer reviewers can have a major impact on the careers of others, and at a minimum, specifically addressing gender, seniority, and institutional/country/language impact is critical to further work on this topic. A more contextual approach is needed to grapple with the complex ecosystem involved here.

      A final point that is less likely to have had an impact, but is worth consideration by others working on this issue. Limiting the search strategy to the single term "peer review" may have an impact on searches, as terms such as open review and post-publication review become more widely used. Terms such as manuscript and editorial review, and peer reviewers could also be considered in constructing a search strategy in this area. (To identify the studies to which I refer above, Google Scholar and a wider range of search terms was necessary.)

      (Disclosure: Part of my job includes working on PubMed Commons, which does not allow anonymous commenting.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Jun 11, Hilda Bastian commented:

      This is a vitally important issue to address, but the data presented in this study do not support the authors' conclusion that disclosure of information is decreasing.

      The study is characterized by a steeply declining availability of data for comparison - in both the response rate and amount of data provided by those clinics responding. From the first to the last year studied:

      • The response rate dropped from 65% to 31%.
      • The percentage of consent forms referring to other information sheets as the vehicles for informing women/couples rose from 55% to 100%.
      • The percentage of those information sheets available for the study dropped from 82% to 18%.

      This suggests that a principal finding, based on a minority of clinics in 2014, is a shift towards providing supplementary information to consent forms, rather than consent forms as the sole formal vehicle of disclosure. If those information sheets were available - and 82% were not in 2014 - the conclusion of this study could be very different.

      It would be useful to know if the consent forms indicated that the person signing had been provided with the supplementary information, as part of the formal disclosure. Clarification from the authors would also be useful on whether data from information sheets was included in the results Table, or whether only consent forms themselves were the source.

      If both types of consent documents are included, then for 2014, complete consent documents were available for only 2 of 35 clinics (6%), compared with 9 of 17 clinics in 1991 (53%). And the increased coverage of items in the earlier years could be attributable to the enlarged scope of materials assessed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Dec 07, Joanne Kamens commented:

      Thank you for this helpful set of definitions and clarifications. Common language will make the discussion more productive. I was prompted by an excellent blog by Hilda Bastian (http://blogs.plos.org/absolutely-maybe/2016/12/05/reproducibility-crisis-timeline-milestones-in-tackling-research-reliability/) and the subsequent twitter conversation to mention that these definitions don't address or even seem to mention the potential, influence or use of reagent/materials reproducibility. Experimental results and interpretation can be dramatically enhanced by the use of the correct standards, materials and/or reagents to reproduce a study. Protocol and methods sections alone are not sufficient to account for this as some reagents are not easily remade and are not always validated as being the same (unless subjected to quality control via repository storage or standard validation).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Aug 30, Hilda Bastian commented:

      Although the authors draw conclusions here about cost and effectiveness of simply offering badges if certain criteria are met, the study does not support these claims. There are, for example, no data on costs for the journal, peer reviewers, or authors. Any conclusions about effectiveness are hampered by the study's design, and the lack of consideration and assessment of any potentially negative repercussions.

      It was not possible for the authors to study the effects of offering badges alone, as this intervention was part of a complex intervention: a package of 5 co-interventions, announced by the journal in November 2013 to begin taking effect from January 2014 (Eich E, 2014). All were designed to improve research transparency and/or reproducibility, and signaled a major change in editorial policy and practice. Any manuscript accepted for publication after 1 January, while being eligible for these badges, was also subject to additional editorial requirements of authors and reviewers. All authors submitting articles from 2014 faced additional reproducibility-related questions before submission, that included data disclosure assurances. Other authors have shown that although these did not all lead to the changes sought, there was considerable impact on some measures (Giofrè D, 2017).

      Data on the impact on submissions, editorial rejections, and length of time until publication of accepted articles is not provided in this paper by Kidwell and colleagues. These would be necessary to gain perspective on the burdens and impact of the intervention package. I had a look at the impact on publications, though. It is clear from the data as collected in this study, and from a more extended timeframe based on analysis of date of e-publication, that the package of interventions appears to have led to a considerable drop in publication of articles (see my blog post, Absolutely Maybe, 2017). The number of articles receiving badges is small. During the year in this study from the awarding of the first badge, it was about 4 articles a month. That first dropped, then rose since, while at the same time the number of publications by Psychological Science has dropped to less than half the rate it was in the year before this package of interventions was introduced, leading to a substantial increase in percentage, while the absolute numbers of compliant articles remains small.

      Taken together, it appears that it was likely there was a process of "natural selection", on the side of the journal and authors, leading to more rigorous reporting and sharing among the reduced number of articles reaching publication. The part that badges alone played in this is unknowable. Higher rates of compliance with such standards have been achieved without badges at other journals (see the blog post for examples). There is some data to suggest that disinclination to data disclosure is part of a range of practices adopted together more by some psychology researchers than others, in one of the studies that spurred Psychological Science to introduce these initiatives (<PMID:26173121). The data in Giofrè D, 2017 tend to support the hypothesis that there is a correlation between some of the data disclosure requirements in the co-interventions, and data-sharing (see my follow-up blog post).

      In addition to not considering a range of possible effects of the practices, or being able to isolate the impact of one of the set of co-interventions, the study used only one data extractor and coder for each article. This is a particularly critical potential source of bias, as assessors could not be blinded to the journals, and the badging intervention was developed and promoted from within the author group.

      It would be useful if the authors could report in more detail what was required for the early screening question of "availability statement, yes or no". Was an explicit data availability statement required here, whether or not there was indeed additional data than was included in the paper and its supplementary materials?

      It would be helpful if the authors could confirm the percentage of articles eligible for badges, where the offer of a badge was rejected.

      At the heart of this badge approach for closed access journals, is a definition of "open-ness" that enables potentially serious limitation of the methodological information and key explanatory data available outside paywalls. In de-coupling the part of the study included in the paper from the study's data, and allowing it be inaccessible to many who could potentially use it or offer useful critique, the intervention is promoting a limited form of open-ness. The trade-off assumed is that this results in more open-ness than there otherwise would be. However, it may have the reverse effect, for example, by encouraging authors to think fully open access doesn't matter and can be foregone with pride and without concern, and if journals believe this "magic bullet" is an easy way out of more effective intensive intervention.

      Disclosures: I have a long relationship with PLOS (which has taken a different approach to increasing openness), including blogging at its Blog Network, and am a user of the Open Science Framework (which is produced by the group promoting the badges). My day job is at NCBI, which maintains literature and data repositories.

      This comment was updated with the two references and data on the question of correlation between data disclosure and data sharing on 1 September, after John Sakaluk tweeted the Giofrè paper to me.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Apr 11, Hilda Bastian commented:

      The strongly declarative title of this paper makes a claim that is not supported by its contents.

      The authors argue that only one study (Galesic M, 2009) has reported "a substantial benefit" of natural frequencies. That claim is not based on an up-to-date systematic review of the studies on this question. A systematic review is needed, because there are multiple studies now with varying methods, in various populations, and in relevant contexts, that need to be considered in detail.

      These authors cite some studies that support their position (all in the context of treatment decisions). However, this is only a part of the relevant evidence. Among the studies not cited in this paper, there is at least one looking at medical tests (Garcia-Retamero R, 2013), others at treatments (e.g. Cuite CL, 2008, Carling CL, 2009, Knapp P, 2009 and Sinayev A, 2015), and at least one in a different field (Hoffrage U, 2015). Some find in favor of natural frequencies, others for percentages, and others find no difference. I don't think it's possible to predict what a thorough systematic review would conclude on this question.

      This study by Pighin and colleagues is done among US residents recruited via Mechanical Turk, and includes some replication of Galesic M, 2009 (a study done in Germany). The authors conclude that Galesic and colleagues' conclusion is attributable to the outcome measure they used (the "scoring artifact" referred to in the abstract here). However, their study comes to the same conclusion - better understanding with natural frequencies - when using the same outcome measure. They then applied more stringent outcome measures for "correct" answers, but the number of people scoring correct were too small to allow for any meaningful conclusion. For their two studies, as well as for Galesic M, 2009, both methodological detail and data are thin. Neither the original study nor this replication and expansion, provide "the answer" to the questions they address.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Mar 15, Hilda Bastian commented:

      Many thanks, Wichor and Dean - that's really helpful. Still not clear on whether there was a language restriction or not. I looked at a couple of the reviews you link to (thanks!), but couldn't see an answer in those either.

      On the question of implications for reviews: being included is a critical measure of value of the search results, but with such major resource implications, it's not enough. One of the reasons more detail about the spread of topics, and the nature of what was not found is important, is to explain the difference in these results compared to other studies (for example, Waffenschmidt S, 2015, Halladay CW, 2015, Golder S, 2014, Lorenzetti DL, 2014).

      Even if studies like this don't go as far as exploring what it might mean to the conclusions of reviews, there are several aspects - like language - that matter. For example, the Cochrane trials register was searched and other places as well. If studies were included from these sources based only on abstracts from conference proceedings for example, then it's clear why they may not be found in EMBASE/MEDLINE. Methodological issues such as language restriction, or whether or not to include non-journal sources, are important questions for a range of reasons.

      One way that the potential impact of studies can be considered is the quality/risk of bias assessment of the studies that would not have been found. As Halladay CW, 2015 found, the impact of studies on systematic reviews can be modest (if they have an impact at all).

      Disclosure: I am the lead editor of PubMed Health, a clinical effectiveness resource and project that adds non-MEDLINE systematic reviews to PubMed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Mar 12, Hilda Bastian commented:

      An interesting and very useful study of Google Scholar (GS). I am unclear, though, about the methods used to compare it with other databases. The abstract includes this step after the systematic review authors had a final list of included studies: "All three databases were then searched post hoc for included references not found in the original search results". That step is clearly described in the article for GS.

      However, for the other 2 databases (EMBASE and MEDLINE Ovid), the article describes the step this way: "We searched for all included references one-by-one in the original files in Endnote". "Overall coverage" is reported only for GS. Could you clarify whether the databases were searched post hoc for all 3 databases?

      I am also unclear about the MEDLINE Ovid search. It is stated that there was also a search of "a subset of PubMed to find recent articles". Were articles retrieved in this way classified as from the MEDLINE Ovid search? And if recent articles from PubMed were searched, does that mean that the MEDLINE Ovid search was restricted to MEDLINE content only, and not additional PubMed records (such as those via PMC)?

      There is little description of the 120 systematic reviews and citations are only provided for 5. One of those (Bramer WM, 2015) is arguably not a systematic review. What kind of primary literature was being sought is not reported, nor whether studies in languages other than English were included. And with only 5 topics given, it is not clear what role the subject matter played here. As Hoffmann T, 2012 showed, research scatter can vary greatly according to the subject. It would be helpful to provide the list of 120 systematic reviews.

      No data or description is provided about the studies missed with each strategy. Firstly, that makes it difficult to ascertain to what extent this reflects the quality of the retrieval rather than the contents of the databases. And secondly, with numbers alone and no information about the quality of the studies missed, the critical issue of the value of the missing studies is a blank space.

      Disclosure: I am the lead editor of PubMed Health, a clinical effectiveness resource and project that adds non-MEDLINE systematic reviews to PubMed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Dec 09, Hilda Bastian commented:

      This is an excellent trial on an important subject, but the authors go beyond what the data in this study can support here: "The overall conclusion is that supported computerised cognitive behaviour therapy confers modest or no benefit over usual GP care..."

      As others have pointed out in rapid responses at the BMJ, this study primarily shows that particularly low adherence to online CBT had little, if any, impact. The study was powered only to detect a difference at the effect sizes for supported computer-based/online CBT, while the type of support provided in this trial was minimal (and not clinician or content-related). The participants were more severely depressed than the groups for whom online CBT was offered in other trials (and in recommendations for its use), other care in each arm often included antidepressants, and the extent of use of CBT (online or otherwise) in the GP group is not known. The results are very relevant to policy on offering online CBT. But I don't think there is enough certainty from this one trial to support a blanket statement about the efficacy of the intervention rather the potential impact of a policy of offering it.

      The size of this study, while large, is smaller than the other studies combined, and without a systematic review it is not clear that this study would shift the current weight of evidence. An important difference between this trial and studies in this field generally is that personal access to the internet was not required. I couldn't locate any data on this in the report. It would be helpful if the authors could provide information here on the level of personal, private access to the internet people had in each arm of the trial, so that it's possible to take this potential confounder into account in interpreting the results.

      Free online CBT is also an option for those who cannot (or will not) get in-person therapeutic care. Many people with mild or moderate depression do not get professional care for it, and it doesn't seem reasonable on the basis of this to discourage people from trying free online CBT out. Yet, the press release for this study was headlined, "Computer assisted cognitive behavioural therapy provides little or no benefits for depression" (PDF), setting off media reports with that message. That far exceeds what the data from this one trial can support.

      Disclosure: I have not been involved in the development of any online, or in-person, therapy for depression. I was co-author of a 2003 systematic review on the impact of the internet, which concluded that CBT-based websites for mental health issues at that time had mixed results (PDF), and I have since written favorably about online CBT.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 May 29, Michal Kicinski commented:

      I thank Dr. Hilda Bastian for her interest in our recent study (Kicinski M, 2015). I strongly believe that post-publication comments very often raise important issues and help the readers to better understand the merits of a study and its limitations. However, I was disappointed to see that the comments of dr. Hilda Bastian do not correspond with the content of our study. For this reason, I feel obliged to clarify a number of issues.

      The study of Ioannidis JP, 2007 points out one of the limitations of a large part of publication bias methods based on the asymmetry of the funnel plot, namely that they do not take between-study heterogeneity into account. This is indeed an important limitation of these methods, as also discussed by other researchers (Song F, 2010). However, please note that we did not rely on the asymmetry of the funnel plot in our analysis. Additionally, please note that our model is just an extension of the standard random effects meta-analysis model, which is a valid approach when between-study variability is present. In fact, the study of Ioannidis JP, 2007 is one of the contributions that motivates our approach to model publication bias since our model takes heterogeneity into account.

      Dr. Hilda Bastian correctly points out that our study is not the first study on publication bias. There are many valuable studies on this topic and we discussed those most relevant to our research questions in our article. The contribution of our study is that we analyzed a very large number of meta-analyses using a model with strong theoretical foundations. Our study is the largest study on publication bias in meta-analyses to date. Please note that previous studies, e.g., Ioannidis JP, 2007, which Dr. Hilda Bastian mentioned, considered small study effects, a phenomenon that may have many different causes, including publication bias (Song F, 2010, Sterne JA, 2011). Another merit of our study is that we estimated the association between the size of publication bias and the publication year of the studies included in the meta-analyses.

      I completely agree that the best solution to the problem of publication bias is the complete reporting of study results. In fact, our findings showing that publication bias is smaller in the meta-analyses of more recent studies support the effectiveness of the measures used to reduce publication bias in clinical trials. I strongly advocate the introduction of new policies aimed to completely eliminate reporting biases from clinical trials and, as written in our article, the implementation of measures to reduce publication bias in research domains other than clinical trials, such as observational studies and preclinical research.

      Although we did not investigate the use of publication bias methods in the meta-analyses from the Cochrane Library, it is clear from previous research that the potential presence of publication bias is often ignored by researchers performing meta-analyses and that the methods accounting for publication bias based on the statistical significance are hardly ever used (Song F, 2010, Onishi A, 2014). When publication bias is present in a meta-analysis, ignoring the problem leads to biased estimates of the effect size (Normand SL, 1999). Therefore, similar to others (Sterne JA, 2011), we argue that researchers should investigate the presence of publication bias and perform sensitivity analyses taking publication bias into account. One difficulty with the use of publication bias methods is that they require researchers to make certain assumptions about the nature of publication bias. For example, the trim and fill method defines publication bias as suppression of a certain number of most extreme negative studies (Duval S, 2000). The use of the Egger’s test (Egger M, 1997) as a publication bias detection tool requires researchers to make the assumption that publication bias leads to a negative association between effect size and precision. The performance of a certain publication bias method depends on whether or not the method’s assumptions are met. For example, it has been demonstrated that publication bias detection tests based on the funnel are characterized by a very low power when publication bias based on the statistical significance is present and the mean effect size equals zero (Kicinski M, 2014). Publication bias based on the statistical significance is the best-documented form of publication bias (Song F, 2009, Dwan K, 2013), The results of our study add to this body of evidence. Therefore, we argue that publication bias tools designed to handle publication bias based on the statistical significance should be used by researchers.

      In the tweet with the link to her comment on PubMed, Dr. Hilda Bastian wrote on the 25th of May: ‘27% of cochranecollab reviews over-estimate effects cos of publication bias? Hmm.’ Please note that our study did not investigate the proportion of meta-analyses that overestimate effects. In fact, the objectives of our study were completely different. We estimated the ratio of the probability of including statistically significant outcomes favoring treatment to the probability of including other outcomes in the meta-analyses of efficacy and the ratio of the probability of including results showing no evidence of adverse effects to the probability of including results demonstrating the presence of adverse effects in the meta-analyses of safety.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2015 May 25, Hilda Bastian commented:

      This is an interesting study. But it's a rather enthusiastic self-assessment of a method not validated by other researchers, and some perspective is useful in thinking about the conclusions.

      Kicinski M, 2015 is neither the first, nor the largest study, of publication bias (PB) in meta-analyses, and the presence of publication bias in them is well-known. These authors used a scraper they have made available on Github to extract meta-analyses from Cochrane reviews. They looked at reviews with placebo or "no treatment" control groups and 10 or more included studies. Whether or not these results are applicable to interventions with active or usual care control groups is unknown.

      For perspective here: Ioannidis JP, 2007 considered PB in 1,669 Cochrane reviews, ultimately analyzing 6,873 meta-analyses. A half of the meta-analyses had no statistically significant results in them, so the problem identified here could not have applied to them. Ioannidis JP, 2007 concluded that only 5% of the full set of Cochrane reviews would qualify for the use of asymmetry tests, and only 12% of those with a larger number of events and participants. They found very little concordance between different asymmetry tests - only around 3-4%. A more important problem according to Ioannidis JP, 2007 was the misapplication and misinterpretation of statistical tests, not under use. False-positives are a problem with tests for PB when there is clinical heterogeneity. Ioannidis JP, 2007 conclude that the only viable solution to the problem of PB is full reporting of results.

      Kicinski M, 2015 conclude that statistical tools for PB are under-utilized, but the extent to which PB is assessed was not part of their study. Although PB itself may be decreasing over time, assessment of PB is increasing, even if the methods for exploring it are still problematic:

      • Palma S, 2005 found that PB was assessed in 11% of trials between 1990 and 2002, increasing from 3% in 1998 to 19% in 2002 (less frequently in Cochrane reviews than others).
      • Moher D, 2007 found that about 23% of systematic reviews in 2004 assessed PB (32% in Cochrane reviews, 18% in others).
      • Riley RD, 2011 found that only 9% of reviews from one Cochrane group assessed PB.
      • van Enst WA, 2014 found that most systematic reviews of diagnostic test accuracy in 2011/2012 mentioned the issue, with 41% measuring PB.

      In assessing only the meta-analyses themselves, and not the reviews that included them, it's not possible to know, as the authors point out, to what extent other studies were included, but without data that could be pooled. An issue not raised by Kicinski M, 2015 are trials reported only in conference abstracts, and thus with minimal data. Cochrane reviews often include studies reported in conference abstracts only, and those are apparently more likely to have non-statistically significant results (Scherer RW, 2007) - as well as relatively little data for the multiple meta-analyses in a review.

      It's important to consider the review, and not just the effect summaries within meta-analyses, because the conclusions of the systematic review should reflect the body of the evidence, not only the meta-analyses. Over-favorable results in a meta-analysis shouldn't be equated with over-favorable conclusions about effectiveness in a review (although unfortunately it often will). We shouldn't jump to conclusions about effect sizes from meta-analyses alone. They can be skewed by clinical heterogeneity and small study size as well as (or instead of) publication bias, and the devil may be more in the interpretation than the calculations.

      Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Feb 18, Hilda Bastian commented:

      This study states as its objective "to determine the efficacy and safety of varenicline" for quitting smoking via smoking reduction. The authors point out that one limitation of the study is its generalizability to a broad population, given its stringent and extensive exclusion criteria. However, it does not stress that both this, and the size of the study, very much preclude this single study "determining" safety of varenicline. The findings in relation to serious adverse events need to be considered in the light of the lower risk for serious adverse events in this study population.

      The paper does not refer readers to the safety warnings and concerns about varenicline issued by both the US FDA and European Medicines Agency (EMA), in relation both to psychiatric (FDA boxed warning) and cardiovascular events (FDA, 2012)(see also EMA). (Readers may also be interested in Singh S, 2011 on the issue of cardiovascular events.)

      UPDATE: On 9 March 2015, the FDA reviewed safety data on varenicline, retaining the boxed safety warning, and including a warning on interaction with alcohol. However, in March a large meta-analysis found that varenicline increased insomnia and bad dreams, but not depression, suicide, or suicidal ideation (Thomas KH, 2015).

      In terms of effectiveness, the authors rightly raise the issue of a lack of direct comparisons between varenicline and others options for smoking reduction. Readers might be interested in Asfar T, 2011, which finds that nicotine replacement therapy (NRT) achieved smoking reduction rates that were not dramatically dissimilar. Cahill K, 2010 found some (inconclusive) evidence that NRT and varenicline result in similar quit rates. Nor are pharmacological means the only successful options for reducing smoking without the risk of serious adverse events.

      Note also that this study was funded by Pfizer, manufacturer of the varenicline product marketed as Chantix in the US (Champix in Europe).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 May 13, Hilda Bastian commented:

      Since writing this editorial, I have expanded on two of my central concerns here in blog posts. One of those is on women scientists making their opinions public. The other is a deep dive into the literature on anonymity and openness in publication review.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Nov 30, Hilda Bastian commented:

      Thanks for the helpful and informative reply, Tetyana.

      While modeling can take account of some known variables, it can't overcome the limitations of measures based on these traditional theories. Other mechanisms that could explain the results remain. There are assumptions used to explain the results of these data simulations (such as that hiring and firing exposes people to conflict and hostility, but pay decisions do not) that remain open to question.

      The results do not exclude the possibility that the women did not have enough authority in comparison with the men with whom they were compared, or other associated (in)tangible benefits that the men could take for granted with the "hire/fire/influence pay" status. Having equal status may indeed have brought similar benefits. Adequate markers for a particular status attainment for the original in-group from whom the measures were derived, may lack the power to discriminate unequal status for others. If so, then like is not necessarily being compared with like. It wasn't possible to "take all other job characteristics into account," because they weren't measured.

      Using unreported modifications of measurement tools for the key outcome makes it difficult for others to be able to assess the validity of the data and its interpretation. It would be helpful if that were done within the larger project, and linked here. Depression implies an adverse mental health condition (both in the community and clinically), and the study's conclusions refer to health benefits, not happiness. The CESD has cut-offs for symptomatology that has no clinical relevance.

      While there's no doubt that workplace circumstances for women and other traditional "out groups" must change, I don't believe on the basis of this data that people should believe that workplace authority over others per se makes women depressed. But the data are enormously valuable, and this work is indeed an important contribution to addressing an important social issue. Thank you for that, as well as the additional information in your reply.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 Nov 29, Hilda Bastian commented:

      This paper uses the terms "depressive symptoms" and "depression" interchangeably. However, the relationship between the screening questions asked and the "clinical" condition of depression is unclear. A modified form of an unspecified version of the CESD screening tool was used. It included an unspecified 10 of the 16 CESD questions, applied only for the last week. In a further variation to the CESD, answers were scored with only a dichotomous outcome.

      The cut-off for determining "depression" was also not explained and the "clinical" relevance of the measure (and associated increase) is unclear. If there has been a validation of an association between the scores used here and depression, it was not referred to in the paper. More details on this would be helpful to people interested in interpreting the results of this study.

      The workplace situation was not equal between the men and women bracketed in the same job authority categories in this sample. The women worked fewer hours per week, earned less than the men of the same age, and were supervised more often. It's women's job authority with less pay and less freedom than men's job authority that is being compared. That would also be a function of the gender inequality the authors identify as a clear problem here. But it raises a question about the level of emphasis given to the psychological impact of having supervisory authority, and, therefore, to know what to do about it.

      The range of workplace factors addressed by this study include the traditional ones related to autonomy. Those questions don't address the kinds of gender-related issues the authors point to in the literature as constituting psychological workplace adversity for women in management: such as endemic social exclusion by peers and supervisors, frequent slights from all directions, being judged more frequently as socially disruptive, unequal opportunity and status attainment, and harassment. More sensitive tools (and relevant data from before the age of 54) would have been needed to unpack what made that generation of women unhappier than the men. The underlying point these authors show, though - that psychological aspects of the workplace experience have serious bearing on women's happiness - is a critical one.

      The full text of this article is available here.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Dec 30, Kausik Datta commented:

      To add to Hilda Bastian's informative comment, the press release mentions the misleading statement not only in the title, but also in the first paragraph - stating definitively: "The study, published Nov. 17 by Proceedings of the National Academy of Sciences, shows that triclosan causes liver fibrosis and cancer in laboratory mice through molecular mechanisms that are also relevant in humans." (Emphasis mine.)

      This is, at best, irresponsible journalism (and at worst, a terrible disservice to people living with cancer). What seems particularly galling is the fact that this sacrifice of scientific accuracy at the altar of needless sensationalism in the press release was perpetrated by none other than the University (UCSD) at which the work was done. This brings to mind once again the age-old tussle in science communication, between science and journalism.

      At the same time, the authors cannot deflect the blame completely, especially since the lead author, quoted in the Press Release, didn't seem to emphasize at all the dosage effect of Triclosan administration and exposure route - which is rather odd, given that the Triclosan was either fed to the mice or injected directly into their peritoneal cavity at a high enough amount, none of which would apply to humans.

      I hope the authors pay heed to the most germane points raised by Hilda about the further inclusion of the data; I'd be most interested in the actual experimental outcomes.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 Nov 21, Hilda Bastian commented:

      The title and abstract of this article focuses on the positive finding in tumor promotion, without emphasizing that the findings were negative on causation, in a way that is clearly accessible for non-specialist readers. This is of particular importance, as a university press release issued for this study was headed with this misleading statement: "The dirty side of soap: Triclosan, a common antimicrobial in personal hygiene products, causes liver fibrosis and cancer in mice." This encouraged unwarranted alarm in the community (which I discuss further in this blog post).

      A 2010 inventory of animal and clinical studies of triclosan safety (Rodricks JV, 2010) found that oncogenicity studies to that point had not found cancer-related increases in any species, except for liver cancer in mice. Without pre-registration of studies on this question, we are unaware of what the outcomes have been for all oncogenicity studies on this substance, and thus whether there is publication bias.

      Further areas of uncertainty relate to the experiments here. The article does not report sufficient data and methodological information to enable adequate assessment of the level of uncertainty associated with the experiments (see the NIH's Proposed Principles and Guidelines for Reporting Preclinical Research). It would be helpful if the authors took the opportunity to include key data here, specifically:

      • how the sample size was determined;
      • the inclusion/exclusion criteria;
      • exact data on the experiments' results (including confidence intervals);
      • whether or not allocation of mice to the groups was random, and if so, details of the method of randomization (including whether or not there was blinding);
      • whether there was blinding in outcome assessment.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Nov 14, Hilda Bastian commented:

      Very useful data on an important issue, given the high proportion of the population using contact lenses (Swanson MW, 2012). On the issue of the level of individual risk, readers might find a review of large-scale epidemiological studies helpful (Stapleton F, 2013).

      The authors stress the importance of good lens hygiene to reduce the risk of infection. That's a critical issue, and people may well over-estimate the adequacy of their own lens care (Bui TH, 2010). Given the increased risk of extended wear (rising from 2-4 per 10,000 for daily use to about 20 for extended wear Stapleton F, 2013), users being better informed about reduced wear as a way of lessening risks may also help (covered along with social and historical aspects in this blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Oct 27, David Colquhoun commented:

      For all the reasons given by Hilda Bastian (and a few more, like P = 0.04 provides lousy evidence) it astonishes me that this study should have been trumpeted as though it represented a great advance. That's the responsibility of Nature Neuroscience (and, ultimately, of the authors).

      I wonder whether what happens is as follows. Authors do big fMRI study. Glamour journal refuses to publish without functional information. Authors tag on a small human study. Paper gets published. Hyped up press releases issued that refer mostly to the add on. Journal and authors are happy. But science is not advanced.

      I certainly got this impression in another recent fMRI paper in Science. Brain stimulation was claimed to improve memory (P = 0.043)

      I guess these examples are quite encouraging for those who think that expensive glamour journals have had their day. Open access and open comments are the way forward.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 Oct 27, Hilda Bastian commented:

      This report of a very small, short-term trial in healthy adults does not meet the CONSORT standards for trial reporting in several key respects. It does not provide sufficient data on the cognitive outcomes assessed, nor an adequate flow chart of outcomes (despite considerable attrition). There is also very little detail provided in the record of this trial at ClinicalTrials.gov.

      The abstract does not make it clear that this is a dietary supplement and exercise trial (partially funded by a manufacturer). There were apparently two cognitive outcome measures on a ModBent task (an adapted test not elsewhere validated): immediate matching and delayed retention. Both relate to very specific functions, not an overall rating of cognitive abilities.

      No effect was found for the exercise component in the trial, and out of the two cognitive measures, some effect was found for one, but not the other. That this is a chance finding surely can't be ruled out.

      This report describes low vs high supplement groups. The study in ClinicalTrials.gov for the trial number they provide, however, was for a supplement and a placebo comparator.

      Despite the major limitations of this single trial to address the question, the "Newsroom" report for the trial claims that it shows that "dietary flavanols reverse age-related memory decline."

      It's good to see claims about dietary supplements tested. However, the results here rely on a chain of yet-to-be-validated assumptions that are still weakly supported at each point. In my opinion, the immodest title of this paper is not supported by its contents.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Sep 13, Hilda Bastian commented:

      The authors make an important point: just because a systematic review has not assessed publication bias (PB), it does not mean that there is none.

      However, in this study, there were only 36 reviews that did not assess publication bias, and nearly half of those were in the minority subset of reviews that didn't have a comprehensive search strategy. For many, a comprehensive search strategy is a defining characteristic of a systematic review (e.g. in DARE, the Database of Reviews of Effects). Those reviews may not be able to provide an adequate overview of published studies, either.

      The authors point out that a limitation of their study is that there were many (planned) subgroup analysis - and it's on a small number of reviews. Especially as the number of adequately systematic reviews was small, the exclusion of the Cochrane Database of Systematic Reviews - a journal that publishes systematic reviews and was eligible for their study - is disappointing. The reason given for the exclusion was because the results of the (Moher D, 2007) study showed that publication bias "is regularly performed in articles published in this database." However the authors of that study concluded the assessment of publication bias was disappointing overall. For Cochrane reviews in that study, publication bias was assessed (or intended to be assessed) in only 32% of those reviews (and it was considered in another 39%).

      (Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Oct 14, Hilda Bastian commented:

      Thanks for drawing attention to this interesting article. Dancer SJ, 2009 argued that what's done now in an outbreak is "a veritable blunderbuss approach." Dancer's own cross-over study of enhanced cleaning addressed MRSA, although the study was too small to identify a definite impact on infection (Dancer SJ, 2009).

      Environmental strategies were included in a systematic review of measures to reduce the spread of VRE (with a search for evidence up to June 2012)(De Angelis G, 2014). De Angelis found only two studies, concluding that no definite impact on infection had been identified (Hayden MK, 2006; Williams VR, 2009).

      This new retrospective study (Everett BR, 2017) seems to be the second looking at the implementation of this particular set of strategies. The first was undertaken by the team that developed the method and run the associated consultancy service (Watson PA, 2012).

      Both routinely and during outbreaks, Dancer SJ, 2009 concluded, "there is a lot of work still to do to establish cleaning as an evidence-based science." That still seems to be the case.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Aug 22, Hilda Bastian commented:

      It's great to see such a thorough and rigorous body of work on this subject. This group provides a good overview of the portion downsizing issue, and the limited evidence base on interventions, at Vermeer WM, 2014.

      A key part of the intervention in this trial (Poelman MP, 2015) is the interactive web-based PortionSize@warenessTool. Its development and trialing is described at Poelman MP, 2013, with these elements: background reading, an interactive flash game with photos of popular food products in the Netherlands, a flash game where you can upsize/downsize portions on screen, self-test score, information on portions for children and more.

      It would be helpful if details about the availability of this intervention could be provided (e.g. where it can be viewed, if the code is open source, and if the license allows translation). The TIDieR checklist (Hoffmann TC, 2014) - the template for intervention description and replication - is a good framework for this. More details on the components of interventions is important for enabling better practice (Glasziou P, 2010).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Sep 02, Hilda Bastian commented:

      An excellent overview of the need for studying humor in science communication, and the academic challenges in it. While there’s some more evidence than that gathered here, I think Hauke Riesch’s conclusions about the uncertainties of benefit and harm are spot on.

      I found the review on studies of humor in teaching he points to (Banas, 2011) helpful as well. From children through to continuing education and the communication of science among peers (Rockwood K, 2004), there’s a lot to learn here.

      In describing the varying results of studies, Riesch doesn’t explicitly address a key confounder in communication research: the quality of the intervention. It’s hard to make sense of bodies of evidence in this field without quality assessments and being able to see the interventions (Glasziou P, 2010). Skill in using humor may account for some of the heterogeneity. And learning about the skills necessary for effectiveness – and how to acquire them – are key issues in this field, too.

      Riesch addresses well the potentially alienating and stereotyping effect of science humor, as well as the potential benefits of social group cohesion. In addition, though, satire in peer-to-peer communication and for policy-related issues is also a critical element of humor in science communication, as it is in other areas of community life (Zyglis, 2003).

      I welcome the author’s desire to “open a discussion” on humor in science communication. But this article being behind a paywall isn’t going to help that process. It would be great to know if the author is engaging with discussion in any other forum.

      I’ve blogged about the science of humor, and humor in science, in response to this article at Scientific American.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Aug 22, Mangesh Thorat commented:

      Response to Hilda Bastian’s recent comment:

      Thank you for the continued discussion. Individual studies like PHS did report a 5-year follow-up, which is not uncommon. Rothwell’s recent overview (Rothwell PM, 2012) did look at studies with shorter follow-up, but the central question in this overview was aspirin’s effect on incidence. The effect on incidence starts to appear at 3 years, while that on mortality takes about 5 years. On the other hand, for example, the endpoint Seshasai SR, 2012 used was mortality and not incidence and therefore they could not observe a significant reduction. Sutcliffe P, 2013 looked at all these data and treated them as equal. Additionally, they did not have access to updated WHS results that showed a significant reduction in CRC. This resulted in their excessive perception of uncertainty; it is prominently reflected in their interpretation.

      We believe that most experts agree that "the evidence supporting aspirin's benefits on cancer is now overwhelming.", the differences in opinion probably only exist for the magnitude and site-specific effects (e.g. 3 of our co-authors). This is the reason we provide several sensitivity analyses that use lower magnitude of benefits, higher magnitude of harms and also lack of effect on certain cancer sites. All these show a net benefit.

      We agree that long-term harms should not be easily dismissed, but we believe that the severity of harms also needs to be considered in any assessment. In our assessments, we have erred on the side of caution and very likely over-estimated the harms. Individual circumstances differ, and therefore we believe that a careful assessment by and an informed discussion with a healthcare professional is necessary.

      We also look forward to the new USPSTF review as we have been informed that on this occasion USPSTF will look at the overall picture by assessing impact on all diseases/conditions affected by aspirin and not just single disease/disease group.

      Response to David Colquhoun’s comment:

      Please note that the NHS Choices comment has been amended to delete the unsubstantiated statement about our study being ‘not reliable’. As stated in the paper this was a benefit-harm analysis based on very recent systematic overviews by some of our co-authors, so it was not necessary to repeat them.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 Aug 16, Hilda Bastian commented:

      Thanks for replying, Mangesh Thorat. I didn't review the primary studies, so hadn't picked up the error in Sutcliffe P, 2013 with respect to the Women's Health Study (Cook NR, 2013). The concern remains valid, as it applies to most of the evidence.

      I disagree, though, that the Sutcliffe review has a "major flaw," considering all studies equal irrespective of follow-up. Their analyses for duration of follow-up are front and center. And they specifically report on, and discuss, 20-year analyses on colorectal cancer, in coming to their conclusions.

      Nor are they the only group in this field to consider studies with shorter follow-up (see for example Rothwell PM, 2012). And the Physicians' Health Study (Steering Committee of the Physicians' Health Study Research Group., 1989) had 5-year follow-up.

      Many people agree with your statement that "the evidence supporting aspirin's benefits on cancer is now overwhelming." But many do not. The National Cancer Institute's recent round-up (NCI, 2014) considers perspectives on the same body of evidence. NCI highlights "mixed opinions" and "reasons for caution."

      While the potential for important net benefit from daily low-dose aspirin for more people is vitally important, I don't think the issue of harms of longterm use should be too easily dismissed. People who have common conditions that are potentially affected by taking aspirin daily (like asthma (Morales DR, 2014)), or at high risk of developing ARMD from mid-life, or whose concomitant medication use may be a relevant consideration (such as with arthritis (Colebatch AN, 2011)) might well want less uncertainty about what this means for them.

      Given the differing interpretations of this body of evidence, the findings of the US Preventive Services Task Force review, expected this year, will be interesting (NCI, 2014).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2014 Aug 15, Mangesh Thorat commented:

      We thank Hilda Bastian for her comment, our response to the points raised is given below:

      Sutcliffe P, 2013's systematic review is not discussed in Cuzick J, 2015 because we believe that it has a major flaw; it considered all reviews to be equal irrespective of the length of follow-up. For example, the review by Seshasai SR, 2012 that failed to show any cancer benefit had a follow-up of only 6 years. As it takes 5 years for aspirin’s beneficial effects on mortality to appear, inclusion of such data by Sutcliffe P, 2013 resulted in underestimation of beneficial effects on cancer. The updated results of WHS (Cook NR, 2013), which showed 42% reduction in CRC incidence were published almost at the same time as Sutcliffe P, 2013, and therefore were not included in this review. Sutcliffe P, 2013 based their interpretation on earlier WHS results (Cook NR, 2005), which did not show any reduction in CRC.

      Sutcliffe P, 2013 also were under wrong impression that all the primary studies and meta-analyses for benefit "assessed reduction in cancer incidence and mortality retrospectively through re-analysis of RCTs of aspirin for primary prevention of CVD." Cancer incidence and mortality is one of the primary endpoints in the WHS (Cook NR, 2013). The importance of WHS lies in the fact that it not only confirmed the benefit in cancer as a primary endpoint, even with alternate day low dose, but also confirmed that there is a long lead time and a prolonged carry-over benefit. This is where the recent WHS publication (Cook NR, 2013) differs from results published earlier (Cook NR, 2005).

      We also disagree with the statement that “uncertainty around the cancer estimates remains high”, a very large body of evidence from observational studies (Bosetti C, 2012; Algra AM, 2012) is consistent with the findings from RCTs and should not be ignored as done in Sutcliffe P, 2013. The evidence supporting aspirin’s benefits on cancer is now overwhelming with over 200 published studies and those with adequate follow up showing very consistent evidence for a reduced incidence and mortality of three major digestive track cancers – colon, stomach and oesophagus.

      It is clear from the evidence that the harms associated with aspirin (and the cardiovascular benefits) begin at the time of use and cease with stoppage of drug use. However, cancer benefits have a lead time before becoming apparent, but these continue for a long period after stopping drug use; a long carry-over effect as seen with other preventive drugs like tamoxifen. With this understanding, mere pooling of data from meta-analyses and trials with variable treatment durations and variable post-treatment follow-up to assess benefit and harms, as Sutcliffe P, 2013 have done is not a reliable method for assessing the impact of aspirin. This is primarily where our work and therefore the results differ.

      In addition, we have modelled benefits and harms of aspirin for the average risk population using actual event rates in the general population to give estimates of the impact of aspirin specifically for this group, which is the major focus of our work.

      We accept that the question of aspirin’s impact on ARMD is unresolved, but ARMD is uncommon (National Eye Institute) below 70 years of age, which again is the group on which we have focussed our attention.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2014 Aug 10, Hilda Bastian commented:

      These authors (Cuzick J, 2015) come to a more positive conclusion about the state of the evidence on routine aspirin use and cancer prevention than do Sutcliffe P, 2013 (also reported at Sutcliffe P, 2013).

      Sutcliffe P, 2013 undertook a thorough and well-reported systematic review of the evidence, based on previous systematic reviews, the primary studies in them, and the relevant RCTs published post-2008, re-analyzing the primary study data. They took into account the same individual patient data and other meta-analyses on which Cuzick J, 2015's interpretation of benefit rely. (Sutcliffe P, 2013's systematic review is not discussed in Cuzick J, 2015.)

      The main data included in Cuzick J, 2015 but unavailable to Sutcliffe P, 2013 appear to be an analysis of harms (where insufficient detail on the sources or selection process have been published), and a long-term follow-up report from the Women's Health Study (Cook NR, 2013). However, as Cook NR, 2013 shows a broadly similar outcome to the <10 year results (no effect on total cancers, but an effect on colorectal cancer only), this does not appear to account for the difference in interpretation of the state of the evidence by these two groups.

      The main data relied on in Sutcliffe P, 2013 that differ to those in key analyses of Cuzick J, 2015 are the Physicians' Health Study (Steering Committee of the Physicians' Health Study Research Group., 1989) and the Women's Health Study (Ridker PM, 2005). These are of long-term aspirin use on alternate days, rather than daily. These two studies include around 62,000 people, and Sutcliffe P, 2013's analyses show they dominate several calculations.

      Sutcliffe P, 2013 point to a critical issue: all the primary studies and meta-analyses for benefit "assessed reduction in cancer incidence and mortality retrospectively through re-analysis of RCTs of aspirin for primary prevention of CVD." They conclude that the uncertainty around the cancer estimates remains high, and the "long term all-cause mortality data does not provide a compelling case for aspirin protection against CVD and cancer mortality."

      With further trials underway, the picture may become clearer in the next few years. While previous trials and analyses address the major harms associated with long-term daily aspirin use (hemorrhagic stroke and gastrointestinal bleeding), many people considering this intervention may also be concerned about additional outcomes. For example, the still-unresolved question of any potential impact on neovascular age-related macular degeneration (Klein BE, 2012, Liew G, 2013, Christen WG, 2014).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Aug 02, Hilda Bastian commented:

      While safer driving by adolescents is a critically important issue - and further research in this area is definitely needed, the authors' conclusions about the effects of this intervention are overly positive.

      The parent in these trials was overwhelmingly the mother (over 80%), mostly college-educated - and non-white families appear to have been under-represented. The participants responded to hearing about the trial rather than being actively recruited, and so were particularly highly motivated - and the trial couldn't reach its recruitment goal. Further, 16% of the intervention group were lost to follow-up at the primary outcome measurement point (compared with 6% in the control group).

      Even with this highly motivated group in a trial setting, of an intervention more intensive than a large-scale program could be (Ramirez M, 2013), and with outcomes based solely on the adolescents' reports, pre-specified primary outcomes (trial registration record) did not achieve statistical significance. While the authors fairly attribute this to low recruitment making the trial under-powered, it isn't very encouraging. As the authors point out, there's no strong effect apparent here.

      Presenting the adolescents' self-reported Risky Driving Score results as risk reduction percentages in the abstract risks giving people an exaggerated impression of effectiveness. The range of possible score isn't very wide, so even a small difference can be a substantial percentage. It would have been good if more details about the score were provided, given that it's a primary outcome measure and it was a trial-specific adaptation of an existing score.

      It's great to see this trial published, even though it didn't meet its goals. But I don't agree with the authors' conclusion that statistical significance levels should be dropped low, in effect, because proven interventions are needed. The interventions that people would use need to make a real difference. As the authors point out, there is evidence that parents can make a difference to their adolescents' behaviors - to their list, I'd add influencing smoking (Thomas RE, 2007). But parents need to know where they could make the best effort, given the other options like Parent-Teen Driving Agreements (Zakrajsek JS, 2013) - or discouraging getting a license early (Ian R, 2001).

      The authors indicate that future research will integrate more objective data, which presumably refers to the unreported data from 2010 for driving citations and crashes in this trial. That will be vital to put this self-reported data on surrogate outcomes in perspective. Access for others to the intervention materials may be important for others in the field (Glasziou P, 2010).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Jul 26, Hilda Bastian commented:

      A thorough and valuable breakdown of what could and should be automated in systematic reviewing. One additional important strategy lies in the hands of everyone doing (and publishing) clinical trials and systematic reviews: following the IJCME recommendation to include the clinical trial registry identification number of every trial at the end of abstracts. This needs to be done using the specific, unaltered formats for each registry in which a study is included, so that IDs are easily retrievable - and the IDs should be with every cited study inside the systematic review, too. Using the WHO's Universal Trial Number (UTN) would also help with the critical, and time-consuming, task of study de-duplication.

      The issue raised in this article of some databases of manually extracted trial data not being publicly available is an important one. It's worth noting, though, that this is not because it's not possible: systematic reviewers have the option of using the open and collaborative public infrastructure of the SRDR (Systematic Review Data Repository) (Ip S, 2012).

      Another option to add to the list of ways of improving the snowballing technique for identifying studies: using the related articles function in PubMed. That's been found to be useful in empirical studies of techniques for updating systematic reviews (Shojania KG, 2007).

      (Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), which is also responsible for ClinicalTrials.gov.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 May 11, Hilda Bastian commented:

      An important reminder that multiple publication bias - which can lead to double-counting of patients - in meta-analysis has not disappeared (PMC full text). Choi and colleagues point to a recent study suggesting the incidence of duplicate publication in the field of otolaryngology didn't change over 10 years (Cheung VW, 2014), although it may have reduced in some other fields.

      Systematic reviewers weed out most duplicate reporting, but as this new study shows, some still slip through. In a meta-analysis, the magnification of events can tip the balance of evidence. A study a decade ago showed that authorship was an unreliable criterion for detecting duplicate publication of trial data (von Elm E, 2004), and the publications don't cross-reference each other, either. Choi and colleagues don't raise the issue of the importance of clinical trial registration here: Antes and Dickersin pointed to this as a key strategy to address this problem (Antes G, 2004).

      There is also duplicate registration of trials in different registers, though (Zarin DA, 2007, Califf RM, 2012). ClinicalTrials.gov aims to identify and resolve duplicate registration of trials (Zarin DA, 2007), and most registered trials are included there. Consistent citation of trial registration numbers, especially the ClinicalTrials.gov identification (NCT number), in all systematic reviews of trials would be useful for readers and those trying to identify studies. It might help reduce reviewers' workload in weeding out duplicate reports, too.

      (I work on projects related to systematic reviews at NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), which is also responsible for ClinicalTrials.gov.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Apr 23, Hilda Bastian commented:

      This is a critical topic for a systematic review, given the potential for decision-making interventions to increase inequality in the community. The conclusions here seem to me over-optimistic. There's another way of putting this: the meta-analyses for most outcomes found no improvement. The weight of evidence for improvement was carried by less than a handful of studies - including some intensive interventions such as community outreach strategy (Wray RJ, 2011).

      It's striking that with so much research in this field, such a small proportion could be found that addresses such a critical question. The results here certainly point to the importance of doing more work on this subject, because the cause clearly is not hopeless. Beyond these studies, though, lies another critical question: who is adopting these practices in the community, and is it contributing to a lessening or increase in inequity? Generally, only concerted effort can prevent those who already have more, getting more - in this case, information and clinicians' time (Bastian H, 2003).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Sep 06, Hilda Bastian commented:

      The authors of this paper state: “Our own findings as well as research by others show that the effect of children on women’s academic careers is so remarkable that it eclipses other factors in contributing to women’s underrepresentation in academic science”.

      This paper fails to support this contention in 5 ways:

      1. Addressing only a subset of the range of factors that potentially contribute to women’s underrepresentation.

      2. Relying on a selected set of literature that fails to discount alternative explanations, in particular that there is no one single factor that accounts for the phenomenon of women’s underrepresentation in science. Multiple contributing factors, even small ones, can contribute to cumulative advantage for men in science (National Academy of Sciences (US), National Academy of Engineering (US), and Institute of Medicine (US) Committee on Maximizing the Potential of Women in Academic Science and Engineering, 2007).

      3. No method to quantify and comparatively weigh contributing factors that could underpin the single remarkable factor hypothesis.

      4. Not satisfactorily demonstrating that motherhood consistently results in high levels of underrepresentation across disciplines of academic science, and not in all other academic careers.

      5. It generalizes to all of academic science, based exclusively on American data of family responsibilities and science careers.

      The authors rely heavily on their previous work: Ceci SJ, 2011. I have addressed that in a PubMed Commons comment (link to comment). That paper also does not contain adequate evidence to sustain the contention of the claim about the motherhood hypothesis presented here.

      The only data sets presented in support of this hypothesis are (in order of appearance):

      • A study including 586 graduate students in 1992 in the US, surveyed again in 2003 and 2004 (Lubinski D, 2006).

      • A figure of the number of ovarian follicles women have by age from birth to 51, overlaid with key scientists’ career stages.

      • A national faculty survey on career and family in 1998 (with over 10,116 respondents across scientific and non-scientific disciplines) (Jacobs, 2004).

      • 2 selected examples of studies from their previous review chosen to illustrate their argument that there is a level playing field for women in the science workforce, along with a blanket claim that I do not believe the evidence in their review supports (Ceci SJ, 2011).

      • A study that included 2 major components (Goulden, 2009):

        (a) Modeling of data from the Survey of Doctorate Recipients (SDR), which had limited data on potential contributing factors to women’s careers (see for example (Bentley 2004). Women with young children had a 4-13% lower odds of achieving tenure than women without, which is not a considerably higher contribution to gender differences than has been in other studies. (Note that age of children is one of the areas with relatively high missing data in the SDR (Hoffer 2002.)

        (b) A survey of 45 female doctoral and postdoctoral at the University of California, including 16 “new mothers”.

      • A survey with 2,503 respondents from 2008/2009 which found that women were more likely than men to wish they had more children (Ecklund EH, 2011) (although it is not included in the article’s list of references, the study was readily identifiable). Williams and Ceci report “Often this regret is associated with leaving the academy”. However, Ecklund and Lincoln report that there was no gender difference in the desire to leave academic science among these respondents. Further, they conclude, “the effect on life satisfaction of having fewer children than desired is more pronounced for male than female faculty, with life satisfaction strongly related to career satisfaction”.

      • A study of people early in their careers, graduating with MBAs from a single US business school between 1990 and 2006. It had a low response rate (31%) and including 629 women (Bertrand, 2010).

      This data basis is inadequate to support the paper’s conclusions and presents highly selected data. The article included a separate extended bibliography, but the basis for the identification and selection of the studies in the bibliography and in the article is not given. In relation to the major review on which they rely (Ceci SJ, 2011), an unsystematic approach and lack of methods to minimize bias has resulted in a very misleading sample of data, and biased reporting and interpretation of that data (see my comment in PubMed Commons).

      Finally, central to the argument presented here is the hypothesis that as societal and policy changes have reduced the impact of blatant and conscious discrimination, the salience of motherhood as a relative barrier to the progression of women’s scientific careers has assumed greater significance.

      However, those same societal changes have also been affecting how people manage and accommodate family responsibilities and careers. For example, later childbirth and fewer children is an ongoing trend in the US (Matthews TJ, 2009, Matthews TJ, 2014), which partially results from, and contributes to, changing attitudes to motherhood and parenting over time. Similarly, increasing workforce participation by women has been changing, and continues to rapidly change, men’s roles in parenting Cabrera NJ, 2000. The authors acknowledge that there has been some accommodation by academic institutions, but their analysis remains largely one-sided.

      For example, this statement is made with neither current nor longitudinal data cited in support: “Men more often have stay-at-home spouses or spouses in flexible careers who bear and raise children while the men are free to focus on academic work”. Indeed, a study they cite in another context found that both men and women scientists with children worked fewer hours than those without children, but similar hours to each other (Ecklund EH, 2011).

      I agree with the authors that much remains to be done to accommodate family responsibilities of all types, not just motherhood. But that will not be a single magic bullet that counteracts the cumulative impact of biases and barriers affecting women related to gender, race, and more as well as family responsibilities. These authors have not made their case for the claim that, “It is when academic scientists choose to be mothers that their real problems start”.

      In addition to comments here on PubMed Commons on the previous review by these authors that supports this paper, I have discussed it on my blog

      Disclosures: I work at the National Institutes of Health (NIH), but not in the granting or women in science policy spheres. The views I express are personal, and do not necessarily reflect those of the NIH. I am an academic editor at PLOS Medicine and on the human ethics advisory group for PLOS One. I am undertaking research in various aspects of publication ethics.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Feb 23, Hilda Bastian commented:

      This paper tackles an important issue. We definitely need better ways to keep up with the evidence - and the rate of growth of that evidence makes it both more difficult and more urgent (Bastian H, 2010). It's particularly helpful that the paper addresses the risks of multiple testing in continuous updating models.

      In calling for "a shift to continuous work process," though, it's important to remember that this shift has long occurred for many organizations and groups. A 2010 survey of agencies that sponsor and conduct systematic reviews (sometimes with clinical practice guidelines as well), found 66 that were already doing this to some extent at least (Garritty C, 2010).

      In this latest proposal for living systematic reviews, several issues reach Table 1 as key challenges, that are unquestionably important. But "validation and acceptance by the academic community" and "ensuring conventional academic incentives are maintained" did not prevent the development of continuous updating models.

      The restriction of access to key databases does contribute to keeping many groups trapped in duplicative updating hamster wheels, though. Poor access leads to critical research waste (Glasziou P, 2014). Making the preservation of conventional academic incentives foundational in Table 1, rather than, say, opening databases, runs the risk of focusing us on technical issues within restricted models, slowing down and limiting both innovation and the entry of new players.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Feb 12, Hilda Bastian commented:

      It would be wonderful if as many post-stroke therapies were as effective, and the evidence for them as strong, as this review concludes. Unfortunately, that's not the case.

      The abstract of this review talks about trials in over 25,000 patients - but it doesn't point out that the numbers for individual interventions is, with only some exceptions, small. The review has several major flaws, in particular having no protocol to guard against problems caused by multiple testing and subgroup analyses. Crossover trials are pooled with parallel trials, and the effect of this on the various analyses is not clear: methodological characteristics of the individual trials are not reported. A scoring method is used for the individual trials, for which only the summary score is available.

      In addition, it's important to note that the search for this review was done in June of 2011. As well as using more robust methods, other reviews are significantly more up-to-date, e.g. systematic reviews on treadmills (Mehrholz J, 2014) and physical fitness training (Saunders DH, 2013).

      Although this review's abstract and conclusions are strongly positive about 30 interventions they consider, the authors do point out in the discussion that: "well controlled, dose-matched trials with significant effects in favor of the experimental intervention have been rather scarce."

      For a good overview to consider alongside well-conducted recent systematic reviews, see Langhorne P, 2011.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Jan 30, Hilda Bastian commented:

      "Urgent work to do," across disciplines and approaches, sums up where we are very well. It's an important debate to have, if public health interventions are to be effective.

      To develop practical methods and tools for measurement, it would be good to explicitly emphasize communities of shared identity more than Jewkes and Murcott's non-spatial definition of community did (Jewkes R, 1996). Whether it's gender, disability, indigenous, race, sexuality, illness or other shared identity, public health services can be particularly critical for those collectives (Bastian H, 1998).

      I was puzzled by the value judgment layer Allmark and colleagues placed on resilience, though. Arguing that a woman who grew up in an abusive household and became a wealthy and successful criminal should not be "judged" as resilient strikes me as as way to get tangled in knots, rather than helping us clarify concepts. This is not a particular weakness of resilience as a concept in relation to other concepts.

      Were she to form a gang, the members would likely be very strong in bonding social capital: the same issues arise irrespective of the measure, if value judgement is going to be conflated. It's a little like arguing that the concept of literacy is flawed, because of the consequences of what disadvantaged people might read.

      The example seems to me to speak instead to the value of a concept of community resilience, rather than it being a conceptual challenge of the "resilience" part of the phrase. A key part of a community that moves towards less discrimination and greater public safety involves strengthening procedural justice (Mazerolle, 2013). The theoretical abused girl who becomes a criminal, may have had more respect for the laws of a community whose justice system had protected her as a child.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Jan 16, Hilda Bastian commented:

      When originally conceived, this trial was expected to show significant improvements in key functional abilities for independent living in older people (Jobe JB, 2001). It was planned to run for two years with training in the experimental groups and one set of booster training (for about half of the people in the experimental groups). There would be 4 periods of testing a number of outcomes, reported in composite endpoints (baseline, after training, then after 1 and 2 years).

      After the trial's completion at 2 years, there was no impact on the primary functional outcomes (Ball K, 2002). A post hoc hypothesis by the authors for this lack of effect was that the testing in the control group may itself have had some cognitive effect. However, the people in the experimental groups received the same tests, so any positive effect would presumably have been experienced across the trial.

      The more likely reason for the lack of effect is that cognitive training in a specific cognitive function affects only that function, without a major practical impact (Melby-Lervåg M, 2013, Reijnders J, 2013). In a comment on that 2-year report, Brenes (Brenes GA, 2003) pointed out, among other issues, that it was not clear that the skills taught in the training (such as mnemonics) were in fact practiced by the participants in their daily lives.

      In this publication after 10 years, the authors write that no benefit in functional living had been expected before at least 5 years, subsequent to the finding of a lack of effect at 2 years. They added a set of booster training and an extra 3 testing periods.

      Results at 5 years found one of the 3 experimental groups, and one of the 3 booster subgroups, each had an effect in one of the functional outcomes, but there were no effects on most of the functional outcomes for most of the groups and subgroups (Willis SL, 2006).

      The data for the individual components of the composites are not included in this report for 10-year results. The people in the experimental groups fared modestly better in the self-report outcome among 3 composite endpoints for primary outcomes, but not in the other 2. As the authors say in relation to those outcomes, "The current study showed weak to absent effects of cognitive training on performance-based measures of daily function."

      One of the groups showing a modest effect in that 1 outcome was the memory training group. However, it is not clear how memory training could be having an effect on function, when the effect on memory had dissipated years before. Given the large number of subjective tests, the modest impact on one of the functional outcomes may be a chance finding.

      (The conflict of interest declaration for this paper discloses that the memory intervention in this trial is being developed commercially.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Dec 24, Hilda Bastian commented:

      An important initiative. There was animated discussion (and a fair amount of cringing) when this paper was presented at the Peer Review Congress earlier this year (see this blog post). Needing to gather, adequately describe and store the data we analyze in a way that others can use it has major implications for the daily life of many researchers.

      Having a spotlight shone on the adequacy of data stewardship is important, but there are some issues to keep in mind. It's in a very specific area of research. Some other fields have particular regulations about the retention, privacy and sharing of all, or some, data. See for example recent analyses of the availability of clinical trial data (Riveros C, 2013).

      The numbers of papers in this study at all dwindle in earlier years: 26 in 1991 compared with 80 in 2011. Data within particular categories (such as definitely lost in any one year) are correspondingly small.

      It was interesting that only 2.4% of studies had made their data available at the time of publication. (Those studies were excluded.)

      The authors practice what they preach: the full data are in Dryad and there's a manuscript in arXiv.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Dec 21, Hilda Bastian commented:

      That's a useful list of reasons for value in social media for physicians. Further support for the "to learn" argument and finding good curators among peers comes from the growth of social media as a gateway to medical literature. James Colbert et al reported at the 2013 Peer Review Congress that for NEJM, Facebook and Twitter are both in the top 10 referring sites to the journal (at 6 and 10 respectively). Readers might also be interested in reports on the role of social media for journals and societies in dermatology (Amir M, 2014) and ophthalmology (Micieli R, 2012).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Dec 04, Hilda Bastian commented:

      This is a non-systematic review with unclear inclusion criteria, limited search strategy, and unclear methods for selection of studies. It includes reviews, primary studies and some animal studies. Important randomized trials in the area of dietary prevention of colorectal cancer have not been included. The conclusions of this paper are more positive about the potential for dietary prevention of colorectal cancer than conclusions from the National Cancer Institute.

      Reviews of randomized trials not considered in this paper include: randomized trials have not shown a benefit of dietary or supplemented fiber (Asano T, 2002); and a combined analysis of 3 large randomized trials showing no clear effect of folate supplementation (Figueiredo JC, 2011).

      The authors point out that their conclusions are largely based on non-randomized studies. Moorthy D, 2013 shows the extent to which results from epidemiological studies of nutrition can vary from randomized trial results. This blog post addresses aspects of the history of diet and the development of colorectal cancer.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Nov 18, James C Coyne commented:

      Yikes, this is a really badly conducted and interpreted meta analysis. Thanks, Hilda Bastian, for calling this to our attention and offering alternative references.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2013 Nov 18, Hilda Bastian commented:

      The conduct and reporting of this systematic review falls so far short of the standards and criteria covered by PRISMA for reporting (Moher D, 2009) and quality appraisal tools such as AMSTAR, that this review does not meet current expectations of a systematic review.

      While conclusions about effectiveness are made, result data from the primary studies are not provided, nor are methods of data extraction and analysis discussed. Despite the large number of included trials, no meta-analyses of suitable data were performed and no reason for this was given.

      What constituted exercise was not specified and the reason for excluding studies prior to 2000 is not given. The reasons for inclusion and exclusion of studies are not entirely clear: for example, studies were excluded because of concomitant drug therapy, which, while a reasonable criterion, was not included in their list. A full list or explanation of exclusions is not provided.

      The search strategy as reported appears to be simplistic and does not include adequate search terms or key databases such as PEDRO. The number of studies for the stages in PRISMA’s flow diagram are not provided (duplicates removed, records screened). The quality of included studies is not assessed.

      If you are interested in reading a systematic review on this question, consider Umpierre D, 2011 - see the DARE critical appraisal.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 23, Hilda Bastian commented:

      This paper by Jager and Leek (Jager LR, 2014) challenges Ioannidis' conclusion that "most published research papers are false" (Ioannidis JP, 2005). Ioannidis responds to this discussion, challenging the data and analytical approach here: (Ioannidis JP, 2014). The conclusions of this paper (Ioannidis JP, 2005) were also challenged by Goodman and Greenfield in 2007 (and responded to by Ioannidis JP, 2007). (I discuss this debate in a blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Aug 10, Hilda Bastian commented:

      This is an excellent overview of the research on the impact of celebrity and public figure announcements around cancer. The conceptual model proposed for studying impact on behavioral and disease outcomes is an important contribution, but I think it would benefit by being extended in several ways.

      The issue of potentially deepening inequalities in cancer (Lorenc T, 2013) is so critical here, that equity needs to be considered at the outcome end of the picture: incorporating demographics and SES as a mediator/moderator isn't enough. Nor is age the only critical socioeconomic factor about the celebrity/public figure that should be taken into account. The authors point to some notable omissions among the cases that have drawn researcher interest. One of the most striking omissions, though, is the lack of study of non-Caucasian celebrities and public figures (for example Robin Roberts and Donna Summer on the timeline in this article).

      Also striking is the extent to which existing stigma around some cancers is reinforced, both in which cancers are publicly discussed by celebrities, and which are studied by researchers. Are we doing, at the community level (and in the researcher community), what we do in private life as well - reinforcing stigma and poor knowledge of critical diseases in our lives (Qureshi N, 2009)? Take colorectal cancer for example. Whether it's the cases in the timeline (which included only Farrah Fawcett) or the included studies (which included only Ronald Reagan), the under-representation of such a stigmatized condition points to a critical issue for research in this field. Impact on stigma would be a valuable addition to the outcomes in the conceptual model, to emphasize the importance of this dimension of belief.

      In general, it would be good if the potential for adverse effects was more explicit in the model. Critically, impact on over-diagnosis and screening/testing-related harm needs to be included - a key issue the paper discusses, for example, after Kylie Minogue's cancer (Kelaher M, 2008, Twine C, 2006). Accuracy in personal risk assessment, similarly, is an important outcome that is an important outcome to consider.

      Focusing on behavioral and disease outcomes in the model leaves out the impact on resources, and ways systems can best respond to these unpredictable events. That was a major issue after Angelina Jolie's announcement (Evans DG, 2014).

      It would be helpful to understand the impact of famous family members' announcements and pleas around cancer, as well: Katie Couric's public intervention (Cram P, 2003), for example, are relevant to this field.

      Finally, the model of considering these events only in terms of cancer prevention as the end interest, risks missing potentially important impacts of these cultural events. They contribute in the complex ways we think about and deal with life-threatening illness, life, and death (Førde OH, 1998). The lack of studies that address these broader issues is striking, too.

      Note: Rick Nolan noted in a comment on my blog that Dan Fogelberg's is wrongly attributed to pancreatic cancer in this article: he died of prostate cancer. I had discussed these issues, and the studies on Angelina Jolie that occurred after these reviewers completed their search, in this blog post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Mar 23, Hilda Bastian commented:

      Readers of this Cochrane review (Gøtzsche PC, 2013) may also be interested in other key reviews that assess much the same body of evidence. One of these is the systematic review undertaken for the US Preventive Services Taskforce (USPSTF) (Nelson HD, 2009). Another is the review by the Independent UK Panel on Breast Cancer Screening (Independent UK Panel on Breast Cancer Screening., 2012; Marmot MG, 2013; full report). In addition, the Canadian Task Force on Preventive Care used the USPSTF review as the basis for its findings and recommendations (Canadian Task Force on Preventive Health Care., 2011).

      Update on 1 May 2014: Another review was published in JAMA in April 2014 (Pace LE, 2014). Its data on breast cancer death use the USPSTF review. The Swiss Medical Board published a review in April 2014 too: its findings and recommendations are based on interpreting the Cochrane, USPSTF and Independent UK Panel data (Takiura K, 1973). And I posted a guide to understanding mammography evidence on my blog at Scientific American.

      Update on 30 October 2014: The WHO published a review of systematic reviews of trials and observational studies, with a search date up to December 2012 (WHO, 2014). Their data interpretation is similar to that of the UK Independent Panel, and they recommend 2-yearly screening from 50 to 69 years of age, where there's a good screening program and informed decisions. Their estimates of harm are lower than those of some others, taking into account more recent practice.

      Differences in the estimates and conclusions about the effect of breast screening with mammography on breast cancer mortality between these reviews are not due to different trials being assessed. The differences principally arise from differing judgments on the strengths and limitations of individual trials, and a focus on local screening practices (which vary in terms of women's ages and whether screening is every one, two or three years).

      There were also some differences in methodologies for analyzing the data. The meta-analyses done by both the review for the USPSTF and the Independent UK Panel used random effects models, there being differences between the trials. The review for the USPSTF used a Bayesian analytic framework. The Cochrane review used a fixed effects model. A fixed effect model assumes that the effect would be consistent across trials.

      The Independent UK Panel by Marmot et al re-analyzed the trial data included in the 2011 version of the Cochrane review (with the same trials and estimates as this version). The Panel derived a comparison of the estimates of various authors, including the reviews included here (Independent UK Panel on Breast Cancer Screening., 2012). In order to prevent one woman's death from breast cancer, the number of women who would need to be invited for screening was estimated as:

      • Cochrane review: 2,000
      • USPSTF, for women aged 50 to 59: 1,339 and for women aged 60 to 69: 377
      • Independent UK Panel, for women aged 55 to 79: 235

        In order to prevent one woman's death from breast cancer, the number of women who would need to be screened was estimated as:

      • Canadian Task Force, for women aged 50 to 69: 720

      • Independent UK Panel, for women aged 55 to 79: 180

      The Independent UK Panel estimated that about 20% of breast cancers detected by mammography screening may be over-diagnosis. They recommended screening only every 3 years to reduce the risk. The Cochrane review suggested this may be 30% or more.

      Longer term follow-up on one of the trials included in these reviews has subsequently been published (Canadian National Breast Screening Study, Miller AB, 2014). That trial is one of the trials judged by reviewers to be of high quality, and has consistently found no significant reduction in deaths attributed to breast cancer. It is the trial with results least favorable to mammography included in these meta-analyses.

      A further systematic review has looked at the question of non-breast cancer mortality in breast cancer screening trials (Erpeldinger S, 2013). Breast cancer trials were not designed to answer this question. These authors conclude that the trials show neither a decrease nor increase in non-breast cancer mortality associated with screening.

      The review for the USPSTF identified two systematic reviews relevant to the question of psychological harm from breast screening with mammography (Brewer NT, 2007, Brett J, 2005). The reviewers concluded false-positives are associated with distress, but no consistent effect on anxiety and depression has been shown for screening with mammography. A more recent systematic review has also looked at the impact of false-positive mammogram results, coming to similar conclusions (Bond M, 2013).

      Marmot pointed out that the members of the UK Independent Panel were chosen both for expertise and not having previously published on the subject, to minimize the risk of a biased approach to analyzing and interpreting evidence (Marmot MG, 2013). The USPSTF commissioned the independent Agency for Health Care Research and Quality (AHRQ) to conduct the review used for its decision-making (Nelson HD, 2009).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 29, Hilda Bastian commented:

      This systematic review identifies serious publication bias, along with small poor quality trials, as contributing to the over-estimation of benefit of viscosupplementation for osteoarthritis of the knee by some other groups (including Bellamy N, 2006). Rutjes and colleagues found that none of the multiple previous systematic reviews on the subject had included all the trial evidence available at the time.

      Relying primarily on larger, better quality studies, the authors conclude that these intra-articular injections have a non-clinically relevant effect on pain, no significant effect on function over time, but are associated with serious unexplained adverse events. It is not clear what effect long-term use of the intervention has on the risk of serious adverse events. Rutjes and colleagues discourage use of the intervention.

      An analysis of this systematic review in DARE goes into detail about the methods and data in this review. That assessment suggests that the pooling of baseline and end of treatment effects introduces minor uncertainty around the review's results.

      Some other systematic reviews had also failed to identify major clinical benefit from viscosupplementation of the knee (including Lo GH, 2003 and Samson DJ, 2007). Rutjes and colleagues conclude that an individual patient data meta-analysis would be required to clarify questions about serious adverse events.

      UPDATE: Bannuru RR, 2015 subsequently published an extensive network meta-analysis, which created a network for comparison that included intra-articular (IA) injection, IA placebo, and oral placebo. While their outcome for IA hyaluronic acid is similar to that in this analysis by Rutjes and colleagues, they identified a clinically relevant difference attributable to IA injection.

      (I discussed the implications of the 2015 review in a February 2015 blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Jun 15, Hilda Bastian commented:

      Assessment of the methodological quality of primary studies plays a central role in interpreting bodies of evidence. This evaluation of an extensively used method is of critical importance for systematic reviewing of clinical effectiveness research.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 11, Hilda Bastian commented:

      This review could not take into account lifestyle factors that often accompany healthier diets and lower risk of cancer, such as not smoking. Studies like those analyzed here probably aren't enough to establish that a nutrient can prevent disease: Moorthy D, 2013. In the case of fiber and colorectal cancer, a systematic review of randomized trials (Asano T, 2002) did not find a reduction of colorectal cancer either from fiber supplements or dietary intake as in, for example, the large National Cancer Institute trial: Schatzkin A, 2000. This trial evidence is not discussed in this review by Aune and colleagues. Anyone interested in this subject would be better off starting with the systematic review of trials and trials on fiber and resistant starch published since then: Ishikawa H, 2005, Burn J, 2011. Further discussion here.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Mar 09, Hilda Bastian commented:

      This is an excellent and important review, and its conclusions are likely to generally still be valid. The authors' reply to a 2011 comment about missing trials by Steven Woloshin and Lisa Schwartz indicates a particularly key trial subsequent to the 2007 search is missing (Waters EA, 2006; Cuite CL, 2008; Woloshin S, 2011). There are likely to be many more. (Several are included in a post of mine on using NNTs, at PLOS Blogs.)

      The relatively small amount of data in this review on some comparisons is, I believe, becoming a problem as the conclusions of the authors are being too readily dismissed. If the update of this review is not likely to be soon, it may be useful to add a comment about the currency of the review, and highlight studies awaiting assessment to counteract the current impression.

      It would also be useful if the authors could clarify in their update the status of the "additional results" in Appendix 4. As these trials are also listed as excluded from the review, it is a little confusing. Indeed, at least some of those studies do seem to be eligible: the time-to-event measure, for example.

      Although I can understand the reasoning for including medical students as lay people rather than health professionals, I think that is potentially problematic, and they require separate analysis as the quantity grows.

      I look forward to the update of this important review. Thanks!


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Oct 12, Stephen Ceci commented:

      Below Hilda Bastian criticizes our 2011 article in the Proceedings of the National Academy of Sciences. The criticisms reflect a simplistic rendering of the rich data landscape on women in academic science. Our conclusion was valid in 2011 and since then new scholarship has continued to support it. Below is an abbreviated response to Bastian’s claims, but a somewhat longer account can be found at: http://www.human.cornell.edu/hd/ciws/publications.cfm Claim 1: Our work failed to represent all research on the topic. This criticism does not take into account the quality of the research and the need to use judgment on study inclusion. Rather than calculate mean effect sizes based on all published studies, it is important to down-weight ones that have been refuted or supplanted. We did this in our narrative review in 2011. Nothing we wrote has changed and the intervening research has reinforced our conclusion of gender-neutrality in journal reviews, grant reviews, and tenure-track hiring. For example, Marcia McNutt, editor of Science, wrote "there was some good news from a panel representing major journals…such as the American Chemical Society (ACS) and the American Geophysical Union (AGU)…female authors are published either at a rate proportional to that at which they submit to those journals, or at proportionally higher rates, as compared with their male colleagues." McNutt, 2016, p. 1035) This may surprise those who read claims that women were selected as reviewers less often than their fraction of the submission pool, but it is true: women’s acceptance rates were, if anything, in excess of men’s. This is not cherry-picking, nor can it be erased by aberrations. These are large-scale analyses of acceptance rates of major journals, and it shows the landscape is either gender-fair or women actually have an advantage—in contrast to what Dr. Bastian alleges. The same is true of funding. To illustrate why it is important to move beyond factoring all the studies into a mean effect size, we offer three examples at http://www.human.cornell.edu/hd/ciws/publications.cfm For example Bornmann et al.’s finding of gender bias in funding using a large sample of grant applications. However, Marsh et al. reanalyzed these findings using a multilevel measurement model and arrived at a different conclusion. Bornmann himself was a coauthor on the Marsh et al. publication and agreed that the new finding of gender-neutrality supplanted his earlier one of gender bias. Marsh et al. found that the mean of the weighted effect sizes based on the 353,725 applicants was actually +.02--in favor of women! (see p. 1301): "The most important result of our study is that for grant applications that include disciplines across the higher education community, there is no evidence for any gender effects in favor of men, and even some evidence in favor of women…This lack of gender difference for grant proposals is very robust, as indicated by the lack of study-to-study variation in the results (nonsignificant tests of heterogeneity) and the lack of interaction effects. This non effect of gender generalized across discipline, the different countries (and funding agencies) considered here, and the publication year.” (p. 1311) Marsh, Bornmann, et al. (2009) (DOI: 10.3102/0034654309334143)

      The rest of our paper concerned hiring and journal publishing. We stand by our conclusion in these two domains as well, as the scientific literature since then has supported us. We do not have time or space here to describe in detail the evidence for this assertion, but the interested reader can find much of it in our over 200 analyses (http://psi.sagepub.com/content/15/3/75.abstract?patientinform-links=yes&legid=sppsi;15/3/75 DOI:10.1177/1529100614541236)Unsurprisingly, the PNAS reviewers were knowledgeable about these domains and agreed with our conclusion. It is incumbent on anyone arguing otherwise to subject their evidence to peer review and show how it overturns our conclusion. Does our claim that gender bias in hiring and publishing lacks support mean there are no gender barriers? Of course not; we have written frequently about them: we have discussed an article that Bastian appears to believe we are unaware of—showing differences in letters of recommendation written for women and men. And we have written about other barriers facing women scientists, such as their teaching ratings downgraded and their lower tenure rates in biology and psychology. However, we stand by our claim that the domains of hiring, funding, and publications are largely gender-neutral. Unless peer reviewers who are experts in this area agree there is compelling counter-evidence, we believe our conclusion reflects the best scientific evidence. Claim 2: We failed to specify what we meant by “women”. Bastian points out differences between women of color, class, etc. We agree these are potentially important moderating factors and we applaud researchers who report their data broken down this way. But the literature on peer review, funding, and hiring rarely reports differences by ethnicity, class, or sexual orientation. Most of the few studies to do so emerged after our study was published. Claim 3: Bastian criticized us for not taking into consideration the size and trajectory of fields, suggesting those with large numbers of scholars may overwhelm smaller ones, or the temporal trajectory of some fields is ahead of others. Field-specific gender differences are a valid consideration but in funding they have been small or non-existent according to several large-scale analyses. Jayasinghe et al.’s (2004) comprehensive analysis of gender effects in reviews of grant proposals (10,023 reviews by 6,233 external assessors of 2,331 proposals from 9 different disciplines), found no gender unfairness in any discipline nor any disciplinary x gender. If anyone has compelling evidence of disciplinary bias against women authors and PIs, they should submit it and allow the peer review process judge how compelling it is. As far as differences among fields in their trajectories, we have done extensive analyses on this, which can be found at the same site above. In these analyses we examined temporal changes in 8 disciplines in salary, tenure, promotion, satisfaction, productivity, impact, etc. With some exceptions we alluded to above, the picture was mainly gender-fair. Finally, Bastian raises analytic issues. We agree these are central. This is why we minimized small-scale, poorly-analyzed reports. We gave more attention to large journals and grant agencies that allowed multilevel models, instead of or in addition to Fixed and Random effects analyses that sometimes violated fundamental statistical assumptions. Both Fixed effect and Random-effects models have limitations. (The latter assumes features of the studies themselves contribute to variability in effect sizes independent of random sampling error, whereas multilevel models permit multiple outcomes included without violating statistical assumptions such as the independence of effect sizes from the same study due to using the same funding agency or multiple disciplines within the same funding agency.) Mean effect sizes are not the analytic endpoint when there is systematic variation among studies beyond that accounted for by sampling variability, which is omnipresent in these studies; it is important to determine which study characteristics account for study-to-study variation. In the past, some have cherry-pick aberrations to support claims of bias, and our 2011 report went beyond doing this to situate claims amidst large-scale, well-analyzed studies, minimizing problematic studies. Although women scientists continue to face challenges that we have written about elsewhere, these challenges are not in the three domains of tenure-track hiring, funding, and publishing.

      Steve Ceci and Wendy M. Williams


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Sep 06, Hilda Bastian commented:

      The conclusions of this review are not supported by the findings of the studies included in it, and much of the evidence cited contradicts the authors’ conclusions. The review suffers from extensive methodological weaknesses, particularly study selection bias and selective reporting. Out of hundreds of studies that were likely to be eligible in the 3 main areas they address (Dehdarirad, 2015), they include only 35. It is not a review of 20 years of data: it is a review based on selected data from the last 20 years. The basis for that selection is not reported.

      Their description of the results of these studies includes, in my opinion, severe levels of 2 key types of review spin (Yavchitz A, 2016): misleading reporting and misleading interpretation. The review contains numerous errors in key issues such as reporting numbers and the methodology of studies. Conclusions about the quality of some evidence are drawn by the authors, but the basis for these judgments is unclear and no methodical process for assessing quality is reported or evident.

      The 3 main areas covered by the review – journal publications, grant applications, and hiring – are also at high risk of publication bias, which is not addressed by the review. Discrimination against women is the subject of legislation in most, if not all, the countries in which these studies were done. Journals, funding agencies, and academic institutions may not be enthusiastic about broadcasting evidence of gender bias.

      For example, of the many thousands of science journals published in 2011, only 6 studies are cited, conducted in 8 to 13 journals in 2 areas of science. In one of those, the author approached 24 journals: only 5 agreed to participate (Tregenza, 2002).

      Ceci and Williams conclude that only 4 of the 35 unique studies they cited suggest the possibility of some gender bias. However, in my opinion an additional 7 studies clearly concluded gender bias remained a problem needing consideration, and others found signs suggesting bias may have been present. Altogether, in 19 studies (54%), there is either selective reporting and descriptions that spin study results in the direction of this review’s conclusions, or inaccurate reporting that could affect the weight placed on the evidence by a knowledgeable reader.

      I identified no instance of spin that did not favor the authors’ conclusions. Some of the studies referenced did not address the questions for which they were cited. Several are short reports in letters, 1 relies on a press release, and another is a news report of a talk.

      Variations in disciplines are not adequately addressed. The authors concentrate on time periods as critical, but the evidence shows that not all disciplines have reached the same level of development in relation to gender participation. Issues related to international differences, and different experiences for groups of women who may experience additional discrimination are not addressed. Although the conclusions are universally framed, they do not address women in science outside academia.

      The authors address only 3 possible explanations for women’s underrepresentation in science: discrimination, women’s choices and preferences (especially relating to motherhood), and gender differences in mathematics ability. They argue that only women’s choices, particularly in relation to family, are a big enough factor to explain women’s underrepresentation. What is arguably the dominant hypothesis in the field is not addressed: that men are overrepresented in science because of cumulative advantage. Advantages do not have to be large individually, to contribute to the end result of underrepresentation in elite institutions and positions. (I have also commented on another paper in which they advance their hypothesis about motherhood and women scientists (Williams WM, 2012) - link to comment.)

      In addition, they do not address the full range of issues within the 3 areas they consider. For example, in grants and hiring, they do not address analyses of potential bias in letters of recommendation (e.g. Van Den Brink, 2006, Schmader T, 2007).

      In my opinion, this review is irredeemably flawed and should be retracted.

      My methodological critique and individual notes on studies are included at my blog.

      Disclosures: I work at the National Institutes of Health (NIH), but not in the granting or women in science policy spheres. The views I express are personal, and do not necessarily reflect those of the NIH. I am an academic editor at PLOS Medicine and on the human ethics advisory group for PLOS One. I am undertaking research in various aspects of publication ethics.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Sep 16, Hilda Bastian commented:

      I have posted updated data on key charts here. The data are updated to 2013 for the charts with systematic reviews and 2014 for trials.

      If you are interested in this topic, the Page MJ, 2016 paper is a must read. We relied in large part on filters to chart trends: Page and colleagues rigorously studied a month's worth of systematic reviews from 2014.

      Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), including some aspects that relate to the inclusion of systematic reviews in PubMed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2013 Nov 13, Hilda Bastian commented:

      Yes, that's an important point to keep in mind: this is just indicative of trends, not a way to find reviews that are rigorously systematic. The methods we used based on the Montori filter are detailed in the supporting information for the article here.

      Filters are not the only way to identify systematic reviews via PubMed services. There is information on ways of finding curated systematic reviews at PubMed Health as well as via PubMed here. (Disclosure: I work on the PubMed Health project.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2013 Nov 12, Hilda Bastian commented:

      We haven't updated the data on trials yet, but will. Trials are subject to some different influences than systematic reviews, such as the impact in recent years of trial registration on the proportion of conducted trials being reported. Yes, the relationship between the two would be interesting to understand.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2013 Nov 01, Hilda Bastian commented:

      Our estimation of "11 systematic reviews a day" is out of date. There has been a striking increase in systematic reviews since that original analysis (ending with data from 2007). In August 2013 Paul Glasziou and I updated the data in Figure 3 estimating the number of systematic reviews, including data up to 2012.

      The update showed that by 2012, there were around 26 systematic reviews a day. The updated figure is available here. Addressing the challenges we identified in keeping up with the evidence have thus become more critical.

      (We are grateful to Claire Allen at the Cochrane Collaboration for providing the data on the number of Cochrane reviews.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Dec 17, Iain Chalmers commented:

      We agree with Hilda Bastian that poor recruitment leads to waste in research, and work to reduce barriers and improve recruitment is needed. We point this out in the book we co-authored for the public - Testing Treatments, http://en.testingtreatments.org/book/what-can-we-do-to-improve-tests-of-treatments/regulating-tests-of-treatments-help-or-hindrance/do-regulatory-systems-for-testing-treatments-get-it-right/. We wrote "And for researchers planning clinical trials, it can take several years to get from a trial idea to recruiting the first patient, and even then recruitment to trials can be slowed by regulatory requirements. But while researchers try to get studies through the system, people suffer unnecessarily and lives are being lost."

      These same barriers also act to inhibit even considering attempts to undertake trials to address uncertainties. With the result that "clinicians are discouraged from assessing treatments fairly, and instead can continue to prescribe treatments without committing to addressing any uncertainty about them."

      As Hilda rightly concludes, "the clinical trial project still has a lot of basic education to do". But informed recruitment to and retention in clinical trials will depend on far greater general knowledge about why it is important to address uncertainties about the effects of treatments, the adverse effects of failing to address uncertainties, and how uncertainties should be addressed. This implies responsibility for the educational challenge being taken up by educators way beyond "the clinical trials project" (see www.informedhealthchoices.org).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Dec 02, Hilda Bastian commented:

      Another major area of research waste is the high rate of trials abandoned for poor recruitment. Briel M, 2017 suggests that about a quarter of all trials in Switzerland are stopped, generally because of poor recruitment. A study of phase II and III trials closed in 2011 in ClinicalTrials.gov found that 19% "either terminated for failed accrual or completed with less than 85% expected enrolment, seriously compromising their statistical power" (Carlisle B, 2015).

      Bower P, 2014 point to the need to develop more effective methods to increase recruitment and retention of participants. That is critical. We still don't know how to prevent all the waste associated with poor recruitment to clinical trials. However, the Swiss study of stakeholders makes it clear that there are serious inadequacies in coordination and preparedness for many trials (Briel M, 2017). The authors point to clear areas of responsibility for funders of trials and others. The NIHR's 70-day rule, a benchmark for time to recruiting the first patient is one example of a funder trying to reduce this area of waste (NIHR).

      Briel M, 2017 also point to the contribution public negativity about clinical trials makes to poor recruitment. That is a problem for clinicians as well, and, too often, members of IRBs/research ethics committees. In every direction, the clinical trial project still has a lot of basic education to do.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 May 13, Hilda Bastian commented:

      This review contributed analysis that has been critical to the debate about editorial peer review. However, the date of the last search for studies was in June 2004. As the eligible literature was sparse, even a few good studies shifts the picture on some questions.

      It seems to me likely that the number of relevant studies published in the last decade is now substantive. I include below studies that would be relevant to an update, particularly on the question of blinding/masking of authors' and/or peer reviewers' identities and affiliations, and open publication of peer review reports. A recent systematic on training is also relevant (Galipeau J, 2015).

      There are some issues that I believe would be helpful for a new/updated review on editorial peer review to address:

      (1) The scope of this review does not include potential sources of editorial/reviewer bias, in particular that related to racial background, gender, country of residence, and institutional prestige. The objective of the review is "to estimate the effect of processes in editorial peer review" and its key focus is the quality of published articles. However, the degree to which these types of biases are minimized in the scientific editorial process has important bearing on the fairness of the processes, as well as the overall quality of literature that may get the most attention in a field.

      "Soundness of ethics" is one of the outcome measures of concern, including the avoidance of harm to research subjects. I believe avoidance of harm to authors, who are in a subordinate power relationship in the editorial process, is also a matter of ethics. Publishing mediocre papers from some groups preferentially over higher quality submissions from others, would patently undermine both the fairness and the value of the peer review process at a journal. That may have the power to influence career progress.

      Systematic reviews should also point to key areas for further research. The lack of studies into methods to reduce editors’ biases is an important gap to point out, as so much of the literature is concerned primarily with peer reviewers’ bias.

      (2) This review did not report on the methods used a priori to systematically assess the risk of bias of included studies, a critical omission in reporting the results of a systematic review (see Oxman AD, 1991, Moher D, 1999, and Liberati A, 2009). A wide variety of study types are eligible for inclusion, raising particular issues specific to them. And studies in this field have a range of specific potential biases. It would be helpful if the experience gained in this review led to an explicit set of criteria for assessing the risk of bias of included studies.

      (3) Given the similarities in editorial processes and challenges across scientific disciplines, I believe a systematic review without this restriction would be more valuable, even if the search strategy may have more limitations.

      Jefferson T, 2007 included studies with designs that were experimental and other comparative studies that included an attempt to control for confounding. I identified the following additional studies of blinding authors/peer reviewers or publishing peer review reports, that I think need to be considered by reviewers on these questions:

      Biomedical science

      In addition, Hopewell S, 2014, while addressing another objective in relation to the impact of peer review, was conducted on published pre-publication peer reviews and subsequent manuscript versions.

      Non-biomedical sciences

      I have written more about the evidence base on anonymity and openness in peer review in this blog post.

      Finally, a trial of blinding critical appraisers of clinical trials in the context of systematic reviewing was included in this systematic review (Jadad AR, 1996). That is not the context of peer reviewing for publication of those trials. (As it only involves 7 reviewers, including it or not has little effect on overall conclusions on this body of evidence.)

      (Disclosure: Part of my job includes working on PubMed Commons, which does not allow anonymous commenting.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 29, Hilda Bastian commented:

      The results of this systematic review are contradicted by a more recent and more comprehensive analysis of the methodologically more rigorous trials of viscosupplementation of the knee (Rutjes AW, 2012). In that more recent review, Rutjes and colleagues identify significant publication bias as a contributing factor in previous over-estimations of the benefit of intra-articular injections.

      Rutjes and colleagues conclude that the intervention has only a clinically irrelevant benefit on pain, no statistically significant effect on function and is associated with serious unexplained adverse events. They discourage the use of the intervention and suggest an individual patient data meta-analysis would be needed to explore the issue of adverse events.

      Doubts about the potential for viscosupplementation of the knee to do more good than harm were also expressed in another systematic review published after that by Bellamy and colleagues (Samson DJ, 2007).

      UPDATE: A network meta-analysis by Bannuru RR, 2015 was able to analyze intra-inarticular injections, intra-articular placebos, and oral placebos, as well as a range of active treatments. It found a clinically meaningful benefit from intra-articular hyaluronic acid injections, in large part attributable to the effects of intra-articular injections per se.

      (I discussed the implications of the 2015 review in a February 2015 blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 23, Hilda Bastian commented:

      The conclusions of this paper (Ioannidis JP, 2005) were challenged by Goodman and Greenfield in 2007 (and responded to by Ioannidis JP, 2007). They were also challenged by Jager and Leek (Jager LR, 2014). Those authors conclude, using a different analytical approach (false discovery rate), that the literature reliably charts scientific progress. Ioannidis then responds to this discussion, challenging the data and analytical approach here: (Ioannidis JP, 2014). (I discuss this debate in a blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 21, Hilda Bastian commented:

      The two ongoing studies discussed in this review have since been published. They are Ishikawa H, 2005 and Burn J, 2011, as discussed in this blog post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 21, Hilda Bastian commented:

      The predominant weight for safety concerns about single-session debriefing in this systematic review are carried by a single trial, at high risk of bias. See discussion in my comment here: Bisson JI, 1997. The method of imputation of adverse events in this trial does not appear to be described in this review.

      A critical quality assessment of that trial was in part made by a systematic review author who was also an author of the trial. A more recent systematic review conducted by others (Gartlehner G, 2013), found the evidence on this intervention to be generally weak. There is further discussion in this blog post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2013 Oct 21, Hilda Bastian commented:

      The predominant weight for safety concerns about single-session debriefing in this systematic review are carried by a single trial, at high risk of bias. See discussion in my comment here: Bisson JI, 1997. The method of imputation of adverse events in this trial does not appear to be described in this review.

      A critical quality assessment of that trial was in part made by a systematic review author who was also an author of the trial. A more recent systematic review conducted by others (Gartlehner G, 2013), found the evidence on this intervention to be generally weak. There is further discussion in this blog post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 21, Hilda Bastian commented:

      The two ongoing studies discussed in this review have since been published. They are Ishikawa H, 2005 and Burn J, 2011, as discussed in this blog post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 23, Hilda Bastian commented:

      The conclusions of this paper (Ioannidis JP, 2005) were challenged by Goodman and Greenfield in 2007 (and responded to by Ioannidis JP, 2007). They were also challenged by Jager and Leek (Jager LR, 2014). Those authors conclude, using a different analytical approach (false discovery rate), that the literature reliably charts scientific progress. Ioannidis then responds to this discussion, challenging the data and analytical approach here: (Ioannidis JP, 2014). (I discuss this debate in a blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 29, Hilda Bastian commented:

      The results of this systematic review are contradicted by a more recent and more comprehensive analysis of the methodologically more rigorous trials of viscosupplementation of the knee (Rutjes AW, 2012). In that more recent review, Rutjes and colleagues identify significant publication bias as a contributing factor in previous over-estimations of the benefit of intra-articular injections.

      Rutjes and colleagues conclude that the intervention has only a clinically irrelevant benefit on pain, no statistically significant effect on function and is associated with serious unexplained adverse events. They discourage the use of the intervention and suggest an individual patient data meta-analysis would be needed to explore the issue of adverse events.

      Doubts about the potential for viscosupplementation of the knee to do more good than harm were also expressed in another systematic review published after that by Bellamy and colleagues (Samson DJ, 2007).

      UPDATE: A network meta-analysis by Bannuru RR, 2015 was able to analyze intra-inarticular injections, intra-articular placebos, and oral placebos, as well as a range of active treatments. It found a clinically meaningful benefit from intra-articular hyaluronic acid injections, in large part attributable to the effects of intra-articular injections per se.

      (I discussed the implications of the 2015 review in a February 2015 blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 May 13, Hilda Bastian commented:

      This review contributed analysis that has been critical to the debate about editorial peer review. However, the date of the last search for studies was in June 2004. As the eligible literature was sparse, even a few good studies shifts the picture on some questions.

      It seems to me likely that the number of relevant studies published in the last decade is now substantive. I include below studies that would be relevant to an update, particularly on the question of blinding/masking of authors' and/or peer reviewers' identities and affiliations, and open publication of peer review reports. A recent systematic on training is also relevant (Galipeau J, 2015).

      There are some issues that I believe would be helpful for a new/updated review on editorial peer review to address:

      (1) The scope of this review does not include potential sources of editorial/reviewer bias, in particular that related to racial background, gender, country of residence, and institutional prestige. The objective of the review is "to estimate the effect of processes in editorial peer review" and its key focus is the quality of published articles. However, the degree to which these types of biases are minimized in the scientific editorial process has important bearing on the fairness of the processes, as well as the overall quality of literature that may get the most attention in a field.

      "Soundness of ethics" is one of the outcome measures of concern, including the avoidance of harm to research subjects. I believe avoidance of harm to authors, who are in a subordinate power relationship in the editorial process, is also a matter of ethics. Publishing mediocre papers from some groups preferentially over higher quality submissions from others, would patently undermine both the fairness and the value of the peer review process at a journal. That may have the power to influence career progress.

      Systematic reviews should also point to key areas for further research. The lack of studies into methods to reduce editors’ biases is an important gap to point out, as so much of the literature is concerned primarily with peer reviewers’ bias.

      (2) This review did not report on the methods used a priori to systematically assess the risk of bias of included studies, a critical omission in reporting the results of a systematic review (see Oxman AD, 1991, Moher D, 1999, and Liberati A, 2009). A wide variety of study types are eligible for inclusion, raising particular issues specific to them. And studies in this field have a range of specific potential biases. It would be helpful if the experience gained in this review led to an explicit set of criteria for assessing the risk of bias of included studies.

      (3) Given the similarities in editorial processes and challenges across scientific disciplines, I believe a systematic review without this restriction would be more valuable, even if the search strategy may have more limitations.

      Jefferson T, 2007 included studies with designs that were experimental and other comparative studies that included an attempt to control for confounding. I identified the following additional studies of blinding authors/peer reviewers or publishing peer review reports, that I think need to be considered by reviewers on these questions:

      Biomedical science

      In addition, Hopewell S, 2014, while addressing another objective in relation to the impact of peer review, was conducted on published pre-publication peer reviews and subsequent manuscript versions.

      Non-biomedical sciences

      I have written more about the evidence base on anonymity and openness in peer review in this blog post.

      Finally, a trial of blinding critical appraisers of clinical trials in the context of systematic reviewing was included in this systematic review (Jadad AR, 1996). That is not the context of peer reviewing for publication of those trials. (As it only involves 7 reviewers, including it or not has little effect on overall conclusions on this body of evidence.)

      (Disclosure: Part of my job includes working on PubMed Commons, which does not allow anonymous commenting.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Dec 02, Hilda Bastian commented:

      Another major area of research waste is the high rate of trials abandoned for poor recruitment. Briel M, 2017 suggests that about a quarter of all trials in Switzerland are stopped, generally because of poor recruitment. A study of phase II and III trials closed in 2011 in ClinicalTrials.gov found that 19% "either terminated for failed accrual or completed with less than 85% expected enrolment, seriously compromising their statistical power" (Carlisle B, 2015).

      Bower P, 2014 point to the need to develop more effective methods to increase recruitment and retention of participants. That is critical. We still don't know how to prevent all the waste associated with poor recruitment to clinical trials. However, the Swiss study of stakeholders makes it clear that there are serious inadequacies in coordination and preparedness for many trials (Briel M, 2017). The authors point to clear areas of responsibility for funders of trials and others. The NIHR's 70-day rule, a benchmark for time to recruiting the first patient is one example of a funder trying to reduce this area of waste (NIHR).

      Briel M, 2017 also point to the contribution public negativity about clinical trials makes to poor recruitment. That is a problem for clinicians as well, and, too often, members of IRBs/research ethics committees. In every direction, the clinical trial project still has a lot of basic education to do.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2017 Dec 17, Iain Chalmers commented:

      We agree with Hilda Bastian that poor recruitment leads to waste in research, and work to reduce barriers and improve recruitment is needed. We point this out in the book we co-authored for the public - Testing Treatments, http://en.testingtreatments.org/book/what-can-we-do-to-improve-tests-of-treatments/regulating-tests-of-treatments-help-or-hindrance/do-regulatory-systems-for-testing-treatments-get-it-right/. We wrote "And for researchers planning clinical trials, it can take several years to get from a trial idea to recruiting the first patient, and even then recruitment to trials can be slowed by regulatory requirements. But while researchers try to get studies through the system, people suffer unnecessarily and lives are being lost."

      These same barriers also act to inhibit even considering attempts to undertake trials to address uncertainties. With the result that "clinicians are discouraged from assessing treatments fairly, and instead can continue to prescribe treatments without committing to addressing any uncertainty about them."

      As Hilda rightly concludes, "the clinical trial project still has a lot of basic education to do". But informed recruitment to and retention in clinical trials will depend on far greater general knowledge about why it is important to address uncertainties about the effects of treatments, the adverse effects of failing to address uncertainties, and how uncertainties should be addressed. This implies responsibility for the educational challenge being taken up by educators way beyond "the clinical trials project" (see www.informedhealthchoices.org).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Nov 01, Hilda Bastian commented:

      Our estimation of "11 systematic reviews a day" is out of date. There has been a striking increase in systematic reviews since that original analysis (ending with data from 2007). In August 2013 Paul Glasziou and I updated the data in Figure 3 estimating the number of systematic reviews, including data up to 2012.

      The update showed that by 2012, there were around 26 systematic reviews a day. The updated figure is available here. Addressing the challenges we identified in keeping up with the evidence have thus become more critical.

      (We are grateful to Claire Allen at the Cochrane Collaboration for providing the data on the number of Cochrane reviews.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Sep 16, Hilda Bastian commented:

      I have posted updated data on key charts here. The data are updated to 2013 for the charts with systematic reviews and 2014 for trials.

      If you are interested in this topic, the Page MJ, 2016 paper is a must read. We relied in large part on filters to chart trends: Page and colleagues rigorously studied a month's worth of systematic reviews from 2014.

      Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), including some aspects that relate to the inclusion of systematic reviews in PubMed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Sep 06, Hilda Bastian commented:

      The conclusions of this review are not supported by the findings of the studies included in it, and much of the evidence cited contradicts the authors’ conclusions. The review suffers from extensive methodological weaknesses, particularly study selection bias and selective reporting. Out of hundreds of studies that were likely to be eligible in the 3 main areas they address (Dehdarirad, 2015), they include only 35. It is not a review of 20 years of data: it is a review based on selected data from the last 20 years. The basis for that selection is not reported.

      Their description of the results of these studies includes, in my opinion, severe levels of 2 key types of review spin (Yavchitz A, 2016): misleading reporting and misleading interpretation. The review contains numerous errors in key issues such as reporting numbers and the methodology of studies. Conclusions about the quality of some evidence are drawn by the authors, but the basis for these judgments is unclear and no methodical process for assessing quality is reported or evident.

      The 3 main areas covered by the review – journal publications, grant applications, and hiring – are also at high risk of publication bias, which is not addressed by the review. Discrimination against women is the subject of legislation in most, if not all, the countries in which these studies were done. Journals, funding agencies, and academic institutions may not be enthusiastic about broadcasting evidence of gender bias.

      For example, of the many thousands of science journals published in 2011, only 6 studies are cited, conducted in 8 to 13 journals in 2 areas of science. In one of those, the author approached 24 journals: only 5 agreed to participate (Tregenza, 2002).

      Ceci and Williams conclude that only 4 of the 35 unique studies they cited suggest the possibility of some gender bias. However, in my opinion an additional 7 studies clearly concluded gender bias remained a problem needing consideration, and others found signs suggesting bias may have been present. Altogether, in 19 studies (54%), there is either selective reporting and descriptions that spin study results in the direction of this review’s conclusions, or inaccurate reporting that could affect the weight placed on the evidence by a knowledgeable reader.

      I identified no instance of spin that did not favor the authors’ conclusions. Some of the studies referenced did not address the questions for which they were cited. Several are short reports in letters, 1 relies on a press release, and another is a news report of a talk.

      Variations in disciplines are not adequately addressed. The authors concentrate on time periods as critical, but the evidence shows that not all disciplines have reached the same level of development in relation to gender participation. Issues related to international differences, and different experiences for groups of women who may experience additional discrimination are not addressed. Although the conclusions are universally framed, they do not address women in science outside academia.

      The authors address only 3 possible explanations for women’s underrepresentation in science: discrimination, women’s choices and preferences (especially relating to motherhood), and gender differences in mathematics ability. They argue that only women’s choices, particularly in relation to family, are a big enough factor to explain women’s underrepresentation. What is arguably the dominant hypothesis in the field is not addressed: that men are overrepresented in science because of cumulative advantage. Advantages do not have to be large individually, to contribute to the end result of underrepresentation in elite institutions and positions. (I have also commented on another paper in which they advance their hypothesis about motherhood and women scientists (Williams WM, 2012) - link to comment.)

      In addition, they do not address the full range of issues within the 3 areas they consider. For example, in grants and hiring, they do not address analyses of potential bias in letters of recommendation (e.g. Van Den Brink, 2006, Schmader T, 2007).

      In my opinion, this review is irredeemably flawed and should be retracted.

      My methodological critique and individual notes on studies are included at my blog.

      Disclosures: I work at the National Institutes of Health (NIH), but not in the granting or women in science policy spheres. The views I express are personal, and do not necessarily reflect those of the NIH. I am an academic editor at PLOS Medicine and on the human ethics advisory group for PLOS One. I am undertaking research in various aspects of publication ethics.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Oct 12, Stephen Ceci commented:

      Below Hilda Bastian criticizes our 2011 article in the Proceedings of the National Academy of Sciences. The criticisms reflect a simplistic rendering of the rich data landscape on women in academic science. Our conclusion was valid in 2011 and since then new scholarship has continued to support it. Below is an abbreviated response to Bastian’s claims, but a somewhat longer account can be found at: http://www.human.cornell.edu/hd/ciws/publications.cfm Claim 1: Our work failed to represent all research on the topic. This criticism does not take into account the quality of the research and the need to use judgment on study inclusion. Rather than calculate mean effect sizes based on all published studies, it is important to down-weight ones that have been refuted or supplanted. We did this in our narrative review in 2011. Nothing we wrote has changed and the intervening research has reinforced our conclusion of gender-neutrality in journal reviews, grant reviews, and tenure-track hiring. For example, Marcia McNutt, editor of Science, wrote "there was some good news from a panel representing major journals…such as the American Chemical Society (ACS) and the American Geophysical Union (AGU)…female authors are published either at a rate proportional to that at which they submit to those journals, or at proportionally higher rates, as compared with their male colleagues." McNutt, 2016, p. 1035) This may surprise those who read claims that women were selected as reviewers less often than their fraction of the submission pool, but it is true: women’s acceptance rates were, if anything, in excess of men’s. This is not cherry-picking, nor can it be erased by aberrations. These are large-scale analyses of acceptance rates of major journals, and it shows the landscape is either gender-fair or women actually have an advantage—in contrast to what Dr. Bastian alleges. The same is true of funding. To illustrate why it is important to move beyond factoring all the studies into a mean effect size, we offer three examples at http://www.human.cornell.edu/hd/ciws/publications.cfm For example Bornmann et al.’s finding of gender bias in funding using a large sample of grant applications. However, Marsh et al. reanalyzed these findings using a multilevel measurement model and arrived at a different conclusion. Bornmann himself was a coauthor on the Marsh et al. publication and agreed that the new finding of gender-neutrality supplanted his earlier one of gender bias. Marsh et al. found that the mean of the weighted effect sizes based on the 353,725 applicants was actually +.02--in favor of women! (see p. 1301): "The most important result of our study is that for grant applications that include disciplines across the higher education community, there is no evidence for any gender effects in favor of men, and even some evidence in favor of women…This lack of gender difference for grant proposals is very robust, as indicated by the lack of study-to-study variation in the results (nonsignificant tests of heterogeneity) and the lack of interaction effects. This non effect of gender generalized across discipline, the different countries (and funding agencies) considered here, and the publication year.” (p. 1311) Marsh, Bornmann, et al. (2009) (DOI: 10.3102/0034654309334143)

      The rest of our paper concerned hiring and journal publishing. We stand by our conclusion in these two domains as well, as the scientific literature since then has supported us. We do not have time or space here to describe in detail the evidence for this assertion, but the interested reader can find much of it in our over 200 analyses (http://psi.sagepub.com/content/15/3/75.abstract?patientinform-links=yes&legid=sppsi;15/3/75 DOI:10.1177/1529100614541236)Unsurprisingly, the PNAS reviewers were knowledgeable about these domains and agreed with our conclusion. It is incumbent on anyone arguing otherwise to subject their evidence to peer review and show how it overturns our conclusion. Does our claim that gender bias in hiring and publishing lacks support mean there are no gender barriers? Of course not; we have written frequently about them: we have discussed an article that Bastian appears to believe we are unaware of—showing differences in letters of recommendation written for women and men. And we have written about other barriers facing women scientists, such as their teaching ratings downgraded and their lower tenure rates in biology and psychology. However, we stand by our claim that the domains of hiring, funding, and publications are largely gender-neutral. Unless peer reviewers who are experts in this area agree there is compelling counter-evidence, we believe our conclusion reflects the best scientific evidence. Claim 2: We failed to specify what we meant by “women”. Bastian points out differences between women of color, class, etc. We agree these are potentially important moderating factors and we applaud researchers who report their data broken down this way. But the literature on peer review, funding, and hiring rarely reports differences by ethnicity, class, or sexual orientation. Most of the few studies to do so emerged after our study was published. Claim 3: Bastian criticized us for not taking into consideration the size and trajectory of fields, suggesting those with large numbers of scholars may overwhelm smaller ones, or the temporal trajectory of some fields is ahead of others. Field-specific gender differences are a valid consideration but in funding they have been small or non-existent according to several large-scale analyses. Jayasinghe et al.’s (2004) comprehensive analysis of gender effects in reviews of grant proposals (10,023 reviews by 6,233 external assessors of 2,331 proposals from 9 different disciplines), found no gender unfairness in any discipline nor any disciplinary x gender. If anyone has compelling evidence of disciplinary bias against women authors and PIs, they should submit it and allow the peer review process judge how compelling it is. As far as differences among fields in their trajectories, we have done extensive analyses on this, which can be found at the same site above. In these analyses we examined temporal changes in 8 disciplines in salary, tenure, promotion, satisfaction, productivity, impact, etc. With some exceptions we alluded to above, the picture was mainly gender-fair. Finally, Bastian raises analytic issues. We agree these are central. This is why we minimized small-scale, poorly-analyzed reports. We gave more attention to large journals and grant agencies that allowed multilevel models, instead of or in addition to Fixed and Random effects analyses that sometimes violated fundamental statistical assumptions. Both Fixed effect and Random-effects models have limitations. (The latter assumes features of the studies themselves contribute to variability in effect sizes independent of random sampling error, whereas multilevel models permit multiple outcomes included without violating statistical assumptions such as the independence of effect sizes from the same study due to using the same funding agency or multiple disciplines within the same funding agency.) Mean effect sizes are not the analytic endpoint when there is systematic variation among studies beyond that accounted for by sampling variability, which is omnipresent in these studies; it is important to determine which study characteristics account for study-to-study variation. In the past, some have cherry-pick aberrations to support claims of bias, and our 2011 report went beyond doing this to situate claims amidst large-scale, well-analyzed studies, minimizing problematic studies. Although women scientists continue to face challenges that we have written about elsewhere, these challenges are not in the three domains of tenure-track hiring, funding, and publishing.

      Steve Ceci and Wendy M. Williams


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Mar 09, Hilda Bastian commented:

      This is an excellent and important review, and its conclusions are likely to generally still be valid. The authors' reply to a 2011 comment about missing trials by Steven Woloshin and Lisa Schwartz indicates a particularly key trial subsequent to the 2007 search is missing (Waters EA, 2006; Cuite CL, 2008; Woloshin S, 2011). There are likely to be many more. (Several are included in a post of mine on using NNTs, at PLOS Blogs.)

      The relatively small amount of data in this review on some comparisons is, I believe, becoming a problem as the conclusions of the authors are being too readily dismissed. If the update of this review is not likely to be soon, it may be useful to add a comment about the currency of the review, and highlight studies awaiting assessment to counteract the current impression.

      It would also be useful if the authors could clarify in their update the status of the "additional results" in Appendix 4. As these trials are also listed as excluded from the review, it is a little confusing. Indeed, at least some of those studies do seem to be eligible: the time-to-event measure, for example.

      Although I can understand the reasoning for including medical students as lay people rather than health professionals, I think that is potentially problematic, and they require separate analysis as the quantity grows.

      I look forward to the update of this important review. Thanks!


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 11, Hilda Bastian commented:

      This review could not take into account lifestyle factors that often accompany healthier diets and lower risk of cancer, such as not smoking. Studies like those analyzed here probably aren't enough to establish that a nutrient can prevent disease: Moorthy D, 2013. In the case of fiber and colorectal cancer, a systematic review of randomized trials (Asano T, 2002) did not find a reduction of colorectal cancer either from fiber supplements or dietary intake as in, for example, the large National Cancer Institute trial: Schatzkin A, 2000. This trial evidence is not discussed in this review by Aune and colleagues. Anyone interested in this subject would be better off starting with the systematic review of trials and trials on fiber and resistant starch published since then: Ishikawa H, 2005, Burn J, 2011. Further discussion here.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Jun 15, Hilda Bastian commented:

      Assessment of the methodological quality of primary studies plays a central role in interpreting bodies of evidence. This evaluation of an extensively used method is of critical importance for systematic reviewing of clinical effectiveness research.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 29, Hilda Bastian commented:

      This systematic review identifies serious publication bias, along with small poor quality trials, as contributing to the over-estimation of benefit of viscosupplementation for osteoarthritis of the knee by some other groups (including Bellamy N, 2006). Rutjes and colleagues found that none of the multiple previous systematic reviews on the subject had included all the trial evidence available at the time.

      Relying primarily on larger, better quality studies, the authors conclude that these intra-articular injections have a non-clinically relevant effect on pain, no significant effect on function over time, but are associated with serious unexplained adverse events. It is not clear what effect long-term use of the intervention has on the risk of serious adverse events. Rutjes and colleagues discourage use of the intervention.

      An analysis of this systematic review in DARE goes into detail about the methods and data in this review. That assessment suggests that the pooling of baseline and end of treatment effects introduces minor uncertainty around the review's results.

      Some other systematic reviews had also failed to identify major clinical benefit from viscosupplementation of the knee (including Lo GH, 2003 and Samson DJ, 2007). Rutjes and colleagues conclude that an individual patient data meta-analysis would be required to clarify questions about serious adverse events.

      UPDATE: Bannuru RR, 2015 subsequently published an extensive network meta-analysis, which created a network for comparison that included intra-articular (IA) injection, IA placebo, and oral placebo. While their outcome for IA hyaluronic acid is similar to that in this analysis by Rutjes and colleagues, they identified a clinically relevant difference attributable to IA injection.

      (I discussed the implications of the 2015 review in a February 2015 blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Mar 23, Hilda Bastian commented:

      Readers of this Cochrane review (Gøtzsche PC, 2013) may also be interested in other key reviews that assess much the same body of evidence. One of these is the systematic review undertaken for the US Preventive Services Taskforce (USPSTF) (Nelson HD, 2009). Another is the review by the Independent UK Panel on Breast Cancer Screening (Independent UK Panel on Breast Cancer Screening., 2012; Marmot MG, 2013; full report). In addition, the Canadian Task Force on Preventive Care used the USPSTF review as the basis for its findings and recommendations (Canadian Task Force on Preventive Health Care., 2011).

      Update on 1 May 2014: Another review was published in JAMA in April 2014 (Pace LE, 2014). Its data on breast cancer death use the USPSTF review. The Swiss Medical Board published a review in April 2014 too: its findings and recommendations are based on interpreting the Cochrane, USPSTF and Independent UK Panel data (Takiura K, 1973). And I posted a guide to understanding mammography evidence on my blog at Scientific American.

      Update on 30 October 2014: The WHO published a review of systematic reviews of trials and observational studies, with a search date up to December 2012 (WHO, 2014). Their data interpretation is similar to that of the UK Independent Panel, and they recommend 2-yearly screening from 50 to 69 years of age, where there's a good screening program and informed decisions. Their estimates of harm are lower than those of some others, taking into account more recent practice.

      Differences in the estimates and conclusions about the effect of breast screening with mammography on breast cancer mortality between these reviews are not due to different trials being assessed. The differences principally arise from differing judgments on the strengths and limitations of individual trials, and a focus on local screening practices (which vary in terms of women's ages and whether screening is every one, two or three years).

      There were also some differences in methodologies for analyzing the data. The meta-analyses done by both the review for the USPSTF and the Independent UK Panel used random effects models, there being differences between the trials. The review for the USPSTF used a Bayesian analytic framework. The Cochrane review used a fixed effects model. A fixed effect model assumes that the effect would be consistent across trials.

      The Independent UK Panel by Marmot et al re-analyzed the trial data included in the 2011 version of the Cochrane review (with the same trials and estimates as this version). The Panel derived a comparison of the estimates of various authors, including the reviews included here (Independent UK Panel on Breast Cancer Screening., 2012). In order to prevent one woman's death from breast cancer, the number of women who would need to be invited for screening was estimated as:

      • Cochrane review: 2,000
      • USPSTF, for women aged 50 to 59: 1,339 and for women aged 60 to 69: 377
      • Independent UK Panel, for women aged 55 to 79: 235

        In order to prevent one woman's death from breast cancer, the number of women who would need to be screened was estimated as:

      • Canadian Task Force, for women aged 50 to 69: 720

      • Independent UK Panel, for women aged 55 to 79: 180

      The Independent UK Panel estimated that about 20% of breast cancers detected by mammography screening may be over-diagnosis. They recommended screening only every 3 years to reduce the risk. The Cochrane review suggested this may be 30% or more.

      Longer term follow-up on one of the trials included in these reviews has subsequently been published (Canadian National Breast Screening Study, Miller AB, 2014). That trial is one of the trials judged by reviewers to be of high quality, and has consistently found no significant reduction in deaths attributed to breast cancer. It is the trial with results least favorable to mammography included in these meta-analyses.

      A further systematic review has looked at the question of non-breast cancer mortality in breast cancer screening trials (Erpeldinger S, 2013). Breast cancer trials were not designed to answer this question. These authors conclude that the trials show neither a decrease nor increase in non-breast cancer mortality associated with screening.

      The review for the USPSTF identified two systematic reviews relevant to the question of psychological harm from breast screening with mammography (Brewer NT, 2007, Brett J, 2005). The reviewers concluded false-positives are associated with distress, but no consistent effect on anxiety and depression has been shown for screening with mammography. A more recent systematic review has also looked at the impact of false-positive mammogram results, coming to similar conclusions (Bond M, 2013).

      Marmot pointed out that the members of the UK Independent Panel were chosen both for expertise and not having previously published on the subject, to minimize the risk of a biased approach to analyzing and interpreting evidence (Marmot MG, 2013). The USPSTF commissioned the independent Agency for Health Care Research and Quality (AHRQ) to conduct the review used for its decision-making (Nelson HD, 2009).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Aug 10, Hilda Bastian commented:

      This is an excellent overview of the research on the impact of celebrity and public figure announcements around cancer. The conceptual model proposed for studying impact on behavioral and disease outcomes is an important contribution, but I think it would benefit by being extended in several ways.

      The issue of potentially deepening inequalities in cancer (Lorenc T, 2013) is so critical here, that equity needs to be considered at the outcome end of the picture: incorporating demographics and SES as a mediator/moderator isn't enough. Nor is age the only critical socioeconomic factor about the celebrity/public figure that should be taken into account. The authors point to some notable omissions among the cases that have drawn researcher interest. One of the most striking omissions, though, is the lack of study of non-Caucasian celebrities and public figures (for example Robin Roberts and Donna Summer on the timeline in this article).

      Also striking is the extent to which existing stigma around some cancers is reinforced, both in which cancers are publicly discussed by celebrities, and which are studied by researchers. Are we doing, at the community level (and in the researcher community), what we do in private life as well - reinforcing stigma and poor knowledge of critical diseases in our lives (Qureshi N, 2009)? Take colorectal cancer for example. Whether it's the cases in the timeline (which included only Farrah Fawcett) or the included studies (which included only Ronald Reagan), the under-representation of such a stigmatized condition points to a critical issue for research in this field. Impact on stigma would be a valuable addition to the outcomes in the conceptual model, to emphasize the importance of this dimension of belief.

      In general, it would be good if the potential for adverse effects was more explicit in the model. Critically, impact on over-diagnosis and screening/testing-related harm needs to be included - a key issue the paper discusses, for example, after Kylie Minogue's cancer (Kelaher M, 2008, Twine C, 2006). Accuracy in personal risk assessment, similarly, is an important outcome that is an important outcome to consider.

      Focusing on behavioral and disease outcomes in the model leaves out the impact on resources, and ways systems can best respond to these unpredictable events. That was a major issue after Angelina Jolie's announcement (Evans DG, 2014).

      It would be helpful to understand the impact of famous family members' announcements and pleas around cancer, as well: Katie Couric's public intervention (Cram P, 2003), for example, are relevant to this field.

      Finally, the model of considering these events only in terms of cancer prevention as the end interest, risks missing potentially important impacts of these cultural events. They contribute in the complex ways we think about and deal with life-threatening illness, life, and death (Førde OH, 1998). The lack of studies that address these broader issues is striking, too.

      Note: Rick Nolan noted in a comment on my blog that Dan Fogelberg's is wrongly attributed to pancreatic cancer in this article: he died of prostate cancer. I had discussed these issues, and the studies on Angelina Jolie that occurred after these reviewers completed their search, in this blog post.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Oct 23, Hilda Bastian commented:

      This paper by Jager and Leek (Jager LR, 2014) challenges Ioannidis' conclusion that "most published research papers are false" (Ioannidis JP, 2005). Ioannidis responds to this discussion, challenging the data and analytical approach here: (Ioannidis JP, 2014). The conclusions of this paper (Ioannidis JP, 2005) were also challenged by Goodman and Greenfield in 2007 (and responded to by Ioannidis JP, 2007). (I discuss this debate in a blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Nov 18, Hilda Bastian commented:

      The conduct and reporting of this systematic review falls so far short of the standards and criteria covered by PRISMA for reporting (Moher D, 2009) and quality appraisal tools such as AMSTAR, that this review does not meet current expectations of a systematic review.

      While conclusions about effectiveness are made, result data from the primary studies are not provided, nor are methods of data extraction and analysis discussed. Despite the large number of included trials, no meta-analyses of suitable data were performed and no reason for this was given.

      What constituted exercise was not specified and the reason for excluding studies prior to 2000 is not given. The reasons for inclusion and exclusion of studies are not entirely clear: for example, studies were excluded because of concomitant drug therapy, which, while a reasonable criterion, was not included in their list. A full list or explanation of exclusions is not provided.

      The search strategy as reported appears to be simplistic and does not include adequate search terms or key databases such as PEDRO. The number of studies for the stages in PRISMA’s flow diagram are not provided (duplicates removed, records screened). The quality of included studies is not assessed.

      If you are interested in reading a systematic review on this question, consider Umpierre D, 2011 - see the DARE critical appraisal.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Dec 04, Hilda Bastian commented:

      This is a non-systematic review with unclear inclusion criteria, limited search strategy, and unclear methods for selection of studies. It includes reviews, primary studies and some animal studies. Important randomized trials in the area of dietary prevention of colorectal cancer have not been included. The conclusions of this paper are more positive about the potential for dietary prevention of colorectal cancer than conclusions from the National Cancer Institute.

      Reviews of randomized trials not considered in this paper include: randomized trials have not shown a benefit of dietary or supplemented fiber (Asano T, 2002); and a combined analysis of 3 large randomized trials showing no clear effect of folate supplementation (Figueiredo JC, 2011).

      The authors point out that their conclusions are largely based on non-randomized studies. Moorthy D, 2013 shows the extent to which results from epidemiological studies of nutrition can vary from randomized trial results. This blog post addresses aspects of the history of diet and the development of colorectal cancer.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Dec 24, Hilda Bastian commented:

      An important initiative. There was animated discussion (and a fair amount of cringing) when this paper was presented at the Peer Review Congress earlier this year (see this blog post). Needing to gather, adequately describe and store the data we analyze in a way that others can use it has major implications for the daily life of many researchers.

      Having a spotlight shone on the adequacy of data stewardship is important, but there are some issues to keep in mind. It's in a very specific area of research. Some other fields have particular regulations about the retention, privacy and sharing of all, or some, data. See for example recent analyses of the availability of clinical trial data (Riveros C, 2013).

      The numbers of papers in this study at all dwindle in earlier years: 26 in 1991 compared with 80 in 2011. Data within particular categories (such as definitely lost in any one year) are correspondingly small.

      It was interesting that only 2.4% of studies had made their data available at the time of publication. (Those studies were excluded.)

      The authors practice what they preach: the full data are in Dryad and there's a manuscript in arXiv.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Jan 16, Hilda Bastian commented:

      When originally conceived, this trial was expected to show significant improvements in key functional abilities for independent living in older people (Jobe JB, 2001). It was planned to run for two years with training in the experimental groups and one set of booster training (for about half of the people in the experimental groups). There would be 4 periods of testing a number of outcomes, reported in composite endpoints (baseline, after training, then after 1 and 2 years).

      After the trial's completion at 2 years, there was no impact on the primary functional outcomes (Ball K, 2002). A post hoc hypothesis by the authors for this lack of effect was that the testing in the control group may itself have had some cognitive effect. However, the people in the experimental groups received the same tests, so any positive effect would presumably have been experienced across the trial.

      The more likely reason for the lack of effect is that cognitive training in a specific cognitive function affects only that function, without a major practical impact (Melby-Lervåg M, 2013, Reijnders J, 2013). In a comment on that 2-year report, Brenes (Brenes GA, 2003) pointed out, among other issues, that it was not clear that the skills taught in the training (such as mnemonics) were in fact practiced by the participants in their daily lives.

      In this publication after 10 years, the authors write that no benefit in functional living had been expected before at least 5 years, subsequent to the finding of a lack of effect at 2 years. They added a set of booster training and an extra 3 testing periods.

      Results at 5 years found one of the 3 experimental groups, and one of the 3 booster subgroups, each had an effect in one of the functional outcomes, but there were no effects on most of the functional outcomes for most of the groups and subgroups (Willis SL, 2006).

      The data for the individual components of the composites are not included in this report for 10-year results. The people in the experimental groups fared modestly better in the self-report outcome among 3 composite endpoints for primary outcomes, but not in the other 2. As the authors say in relation to those outcomes, "The current study showed weak to absent effects of cognitive training on performance-based measures of daily function."

      One of the groups showing a modest effect in that 1 outcome was the memory training group. However, it is not clear how memory training could be having an effect on function, when the effect on memory had dissipated years before. Given the large number of subjective tests, the modest impact on one of the functional outcomes may be a chance finding.

      (The conflict of interest declaration for this paper discloses that the memory intervention in this trial is being developed commercially.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Feb 12, Hilda Bastian commented:

      It would be wonderful if as many post-stroke therapies were as effective, and the evidence for them as strong, as this review concludes. Unfortunately, that's not the case.

      The abstract of this review talks about trials in over 25,000 patients - but it doesn't point out that the numbers for individual interventions is, with only some exceptions, small. The review has several major flaws, in particular having no protocol to guard against problems caused by multiple testing and subgroup analyses. Crossover trials are pooled with parallel trials, and the effect of this on the various analyses is not clear: methodological characteristics of the individual trials are not reported. A scoring method is used for the individual trials, for which only the summary score is available.

      In addition, it's important to note that the search for this review was done in June of 2011. As well as using more robust methods, other reviews are significantly more up-to-date, e.g. systematic reviews on treadmills (Mehrholz J, 2014) and physical fitness training (Saunders DH, 2013).

      Although this review's abstract and conclusions are strongly positive about 30 interventions they consider, the authors do point out in the discussion that: "well controlled, dose-matched trials with significant effects in favor of the experimental intervention have been rather scarce."

      For a good overview to consider alongside well-conducted recent systematic reviews, see Langhorne P, 2011.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Feb 23, Hilda Bastian commented:

      This paper tackles an important issue. We definitely need better ways to keep up with the evidence - and the rate of growth of that evidence makes it both more difficult and more urgent (Bastian H, 2010). It's particularly helpful that the paper addresses the risks of multiple testing in continuous updating models.

      In calling for "a shift to continuous work process," though, it's important to remember that this shift has long occurred for many organizations and groups. A 2010 survey of agencies that sponsor and conduct systematic reviews (sometimes with clinical practice guidelines as well), found 66 that were already doing this to some extent at least (Garritty C, 2010).

      In this latest proposal for living systematic reviews, several issues reach Table 1 as key challenges, that are unquestionably important. But "validation and acceptance by the academic community" and "ensuring conventional academic incentives are maintained" did not prevent the development of continuous updating models.

      The restriction of access to key databases does contribute to keeping many groups trapped in duplicative updating hamster wheels, though. Poor access leads to critical research waste (Glasziou P, 2014). Making the preservation of conventional academic incentives foundational in Table 1, rather than, say, opening databases, runs the risk of focusing us on technical issues within restricted models, slowing down and limiting both innovation and the entry of new players.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Sep 06, Hilda Bastian commented:

      The authors of this paper state: “Our own findings as well as research by others show that the effect of children on women’s academic careers is so remarkable that it eclipses other factors in contributing to women’s underrepresentation in academic science”.

      This paper fails to support this contention in 5 ways:

      1. Addressing only a subset of the range of factors that potentially contribute to women’s underrepresentation.

      2. Relying on a selected set of literature that fails to discount alternative explanations, in particular that there is no one single factor that accounts for the phenomenon of women’s underrepresentation in science. Multiple contributing factors, even small ones, can contribute to cumulative advantage for men in science (National Academy of Sciences (US), National Academy of Engineering (US), and Institute of Medicine (US) Committee on Maximizing the Potential of Women in Academic Science and Engineering, 2007).

      3. No method to quantify and comparatively weigh contributing factors that could underpin the single remarkable factor hypothesis.

      4. Not satisfactorily demonstrating that motherhood consistently results in high levels of underrepresentation across disciplines of academic science, and not in all other academic careers.

      5. It generalizes to all of academic science, based exclusively on American data of family responsibilities and science careers.

      The authors rely heavily on their previous work: Ceci SJ, 2011. I have addressed that in a PubMed Commons comment (link to comment). That paper also does not contain adequate evidence to sustain the contention of the claim about the motherhood hypothesis presented here.

      The only data sets presented in support of this hypothesis are (in order of appearance):

      • A study including 586 graduate students in 1992 in the US, surveyed again in 2003 and 2004 (Lubinski D, 2006).

      • A figure of the number of ovarian follicles women have by age from birth to 51, overlaid with key scientists’ career stages.

      • A national faculty survey on career and family in 1998 (with over 10,116 respondents across scientific and non-scientific disciplines) (Jacobs, 2004).

      • 2 selected examples of studies from their previous review chosen to illustrate their argument that there is a level playing field for women in the science workforce, along with a blanket claim that I do not believe the evidence in their review supports (Ceci SJ, 2011).

      • A study that included 2 major components (Goulden, 2009):

        (a) Modeling of data from the Survey of Doctorate Recipients (SDR), which had limited data on potential contributing factors to women’s careers (see for example (Bentley 2004). Women with young children had a 4-13% lower odds of achieving tenure than women without, which is not a considerably higher contribution to gender differences than has been in other studies. (Note that age of children is one of the areas with relatively high missing data in the SDR (Hoffer 2002.)

        (b) A survey of 45 female doctoral and postdoctoral at the University of California, including 16 “new mothers”.

      • A survey with 2,503 respondents from 2008/2009 which found that women were more likely than men to wish they had more children (Ecklund EH, 2011) (although it is not included in the article’s list of references, the study was readily identifiable). Williams and Ceci report “Often this regret is associated with leaving the academy”. However, Ecklund and Lincoln report that there was no gender difference in the desire to leave academic science among these respondents. Further, they conclude, “the effect on life satisfaction of having fewer children than desired is more pronounced for male than female faculty, with life satisfaction strongly related to career satisfaction”.

      • A study of people early in their careers, graduating with MBAs from a single US business school between 1990 and 2006. It had a low response rate (31%) and including 629 women (Bertrand, 2010).

      This data basis is inadequate to support the paper’s conclusions and presents highly selected data. The article included a separate extended bibliography, but the basis for the identification and selection of the studies in the bibliography and in the article is not given. In relation to the major review on which they rely (Ceci SJ, 2011), an unsystematic approach and lack of methods to minimize bias has resulted in a very misleading sample of data, and biased reporting and interpretation of that data (see my comment in PubMed Commons).

      Finally, central to the argument presented here is the hypothesis that as societal and policy changes have reduced the impact of blatant and conscious discrimination, the salience of motherhood as a relative barrier to the progression of women’s scientific careers has assumed greater significance.

      However, those same societal changes have also been affecting how people manage and accommodate family responsibilities and careers. For example, later childbirth and fewer children is an ongoing trend in the US (Matthews TJ, 2009, Matthews TJ, 2014), which partially results from, and contributes to, changing attitudes to motherhood and parenting over time. Similarly, increasing workforce participation by women has been changing, and continues to rapidly change, men’s roles in parenting Cabrera NJ, 2000. The authors acknowledge that there has been some accommodation by academic institutions, but their analysis remains largely one-sided.

      For example, this statement is made with neither current nor longitudinal data cited in support: “Men more often have stay-at-home spouses or spouses in flexible careers who bear and raise children while the men are free to focus on academic work”. Indeed, a study they cite in another context found that both men and women scientists with children worked fewer hours than those without children, but similar hours to each other (Ecklund EH, 2011).

      I agree with the authors that much remains to be done to accommodate family responsibilities of all types, not just motherhood. But that will not be a single magic bullet that counteracts the cumulative impact of biases and barriers affecting women related to gender, race, and more as well as family responsibilities. These authors have not made their case for the claim that, “It is when academic scientists choose to be mothers that their real problems start”.

      In addition to comments here on PubMed Commons on the previous review by these authors that supports this paper, I have discussed it on my blog

      Disclosures: I work at the National Institutes of Health (NIH), but not in the granting or women in science policy spheres. The views I express are personal, and do not necessarily reflect those of the NIH. I am an academic editor at PLOS Medicine and on the human ethics advisory group for PLOS One. I am undertaking research in various aspects of publication ethics.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Apr 23, Hilda Bastian commented:

      This is a critical topic for a systematic review, given the potential for decision-making interventions to increase inequality in the community. The conclusions here seem to me over-optimistic. There's another way of putting this: the meta-analyses for most outcomes found no improvement. The weight of evidence for improvement was carried by less than a handful of studies - including some intensive interventions such as community outreach strategy (Wray RJ, 2011).

      It's striking that with so much research in this field, such a small proportion could be found that addresses such a critical question. The results here certainly point to the importance of doing more work on this subject, because the cause clearly is not hopeless. Beyond these studies, though, lies another critical question: who is adopting these practices in the community, and is it contributing to a lessening or increase in inequity? Generally, only concerted effort can prevent those who already have more, getting more - in this case, information and clinicians' time (Bastian H, 2003).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 May 11, Hilda Bastian commented:

      An important reminder that multiple publication bias - which can lead to double-counting of patients - in meta-analysis has not disappeared (PMC full text). Choi and colleagues point to a recent study suggesting the incidence of duplicate publication in the field of otolaryngology didn't change over 10 years (Cheung VW, 2014), although it may have reduced in some other fields.

      Systematic reviewers weed out most duplicate reporting, but as this new study shows, some still slip through. In a meta-analysis, the magnification of events can tip the balance of evidence. A study a decade ago showed that authorship was an unreliable criterion for detecting duplicate publication of trial data (von Elm E, 2004), and the publications don't cross-reference each other, either. Choi and colleagues don't raise the issue of the importance of clinical trial registration here: Antes and Dickersin pointed to this as a key strategy to address this problem (Antes G, 2004).

      There is also duplicate registration of trials in different registers, though (Zarin DA, 2007, Califf RM, 2012). ClinicalTrials.gov aims to identify and resolve duplicate registration of trials (Zarin DA, 2007), and most registered trials are included there. Consistent citation of trial registration numbers, especially the ClinicalTrials.gov identification (NCT number), in all systematic reviews of trials would be useful for readers and those trying to identify studies. It might help reduce reviewers' workload in weeding out duplicate reports, too.

      (I work on projects related to systematic reviews at NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), which is also responsible for ClinicalTrials.gov.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Jul 26, Hilda Bastian commented:

      A thorough and valuable breakdown of what could and should be automated in systematic reviewing. One additional important strategy lies in the hands of everyone doing (and publishing) clinical trials and systematic reviews: following the IJCME recommendation to include the clinical trial registry identification number of every trial at the end of abstracts. This needs to be done using the specific, unaltered formats for each registry in which a study is included, so that IDs are easily retrievable - and the IDs should be with every cited study inside the systematic review, too. Using the WHO's Universal Trial Number (UTN) would also help with the critical, and time-consuming, task of study de-duplication.

      The issue raised in this article of some databases of manually extracted trial data not being publicly available is an important one. It's worth noting, though, that this is not because it's not possible: systematic reviewers have the option of using the open and collaborative public infrastructure of the SRDR (Systematic Review Data Repository) (Ip S, 2012).

      Another option to add to the list of ways of improving the snowballing technique for identifying studies: using the related articles function in PubMed. That's been found to be useful in empirical studies of techniques for updating systematic reviews (Shojania KG, 2007).

      (Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), which is also responsible for ClinicalTrials.gov.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Aug 02, Hilda Bastian commented:

      While safer driving by adolescents is a critically important issue - and further research in this area is definitely needed, the authors' conclusions about the effects of this intervention are overly positive.

      The parent in these trials was overwhelmingly the mother (over 80%), mostly college-educated - and non-white families appear to have been under-represented. The participants responded to hearing about the trial rather than being actively recruited, and so were particularly highly motivated - and the trial couldn't reach its recruitment goal. Further, 16% of the intervention group were lost to follow-up at the primary outcome measurement point (compared with 6% in the control group).

      Even with this highly motivated group in a trial setting, of an intervention more intensive than a large-scale program could be (Ramirez M, 2013), and with outcomes based solely on the adolescents' reports, pre-specified primary outcomes (trial registration record) did not achieve statistical significance. While the authors fairly attribute this to low recruitment making the trial under-powered, it isn't very encouraging. As the authors point out, there's no strong effect apparent here.

      Presenting the adolescents' self-reported Risky Driving Score results as risk reduction percentages in the abstract risks giving people an exaggerated impression of effectiveness. The range of possible score isn't very wide, so even a small difference can be a substantial percentage. It would have been good if more details about the score were provided, given that it's a primary outcome measure and it was a trial-specific adaptation of an existing score.

      It's great to see this trial published, even though it didn't meet its goals. But I don't agree with the authors' conclusion that statistical significance levels should be dropped low, in effect, because proven interventions are needed. The interventions that people would use need to make a real difference. As the authors point out, there is evidence that parents can make a difference to their adolescents' behaviors - to their list, I'd add influencing smoking (Thomas RE, 2007). But parents need to know where they could make the best effort, given the other options like Parent-Teen Driving Agreements (Zakrajsek JS, 2013) - or discouraging getting a license early (Ian R, 2001).

      The authors indicate that future research will integrate more objective data, which presumably refers to the unreported data from 2010 for driving citations and crashes in this trial. That will be vital to put this self-reported data on surrogate outcomes in perspective. Access for others to the intervention materials may be important for others in the field (Glasziou P, 2010).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Aug 10, Hilda Bastian commented:

      These authors (Cuzick J, 2015) come to a more positive conclusion about the state of the evidence on routine aspirin use and cancer prevention than do Sutcliffe P, 2013 (also reported at Sutcliffe P, 2013).

      Sutcliffe P, 2013 undertook a thorough and well-reported systematic review of the evidence, based on previous systematic reviews, the primary studies in them, and the relevant RCTs published post-2008, re-analyzing the primary study data. They took into account the same individual patient data and other meta-analyses on which Cuzick J, 2015's interpretation of benefit rely. (Sutcliffe P, 2013's systematic review is not discussed in Cuzick J, 2015.)

      The main data included in Cuzick J, 2015 but unavailable to Sutcliffe P, 2013 appear to be an analysis of harms (where insufficient detail on the sources or selection process have been published), and a long-term follow-up report from the Women's Health Study (Cook NR, 2013). However, as Cook NR, 2013 shows a broadly similar outcome to the <10 year results (no effect on total cancers, but an effect on colorectal cancer only), this does not appear to account for the difference in interpretation of the state of the evidence by these two groups.

      The main data relied on in Sutcliffe P, 2013 that differ to those in key analyses of Cuzick J, 2015 are the Physicians' Health Study (Steering Committee of the Physicians' Health Study Research Group., 1989) and the Women's Health Study (Ridker PM, 2005). These are of long-term aspirin use on alternate days, rather than daily. These two studies include around 62,000 people, and Sutcliffe P, 2013's analyses show they dominate several calculations.

      Sutcliffe P, 2013 point to a critical issue: all the primary studies and meta-analyses for benefit "assessed reduction in cancer incidence and mortality retrospectively through re-analysis of RCTs of aspirin for primary prevention of CVD." They conclude that the uncertainty around the cancer estimates remains high, and the "long term all-cause mortality data does not provide a compelling case for aspirin protection against CVD and cancer mortality."

      With further trials underway, the picture may become clearer in the next few years. While previous trials and analyses address the major harms associated with long-term daily aspirin use (hemorrhagic stroke and gastrointestinal bleeding), many people considering this intervention may also be concerned about additional outcomes. For example, the still-unresolved question of any potential impact on neovascular age-related macular degeneration (Klein BE, 2012, Liew G, 2013, Christen WG, 2014).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Sep 02, Hilda Bastian commented:

      An excellent overview of the need for studying humor in science communication, and the academic challenges in it. While there’s some more evidence than that gathered here, I think Hauke Riesch’s conclusions about the uncertainties of benefit and harm are spot on.

      I found the review on studies of humor in teaching he points to (Banas, 2011) helpful as well. From children through to continuing education and the communication of science among peers (Rockwood K, 2004), there’s a lot to learn here.

      In describing the varying results of studies, Riesch doesn’t explicitly address a key confounder in communication research: the quality of the intervention. It’s hard to make sense of bodies of evidence in this field without quality assessments and being able to see the interventions (Glasziou P, 2010). Skill in using humor may account for some of the heterogeneity. And learning about the skills necessary for effectiveness – and how to acquire them – are key issues in this field, too.

      Riesch addresses well the potentially alienating and stereotyping effect of science humor, as well as the potential benefits of social group cohesion. In addition, though, satire in peer-to-peer communication and for policy-related issues is also a critical element of humor in science communication, as it is in other areas of community life (Zyglis, 2003).

      I welcome the author’s desire to “open a discussion” on humor in science communication. But this article being behind a paywall isn’t going to help that process. It would be great to know if the author is engaging with discussion in any other forum.

      I’ve blogged about the science of humor, and humor in science, in response to this article at Scientific American.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Aug 22, Hilda Bastian commented:

      It's great to see such a thorough and rigorous body of work on this subject. This group provides a good overview of the portion downsizing issue, and the limited evidence base on interventions, at Vermeer WM, 2014.

      A key part of the intervention in this trial (Poelman MP, 2015) is the interactive web-based PortionSize@warenessTool. Its development and trialing is described at Poelman MP, 2013, with these elements: background reading, an interactive flash game with photos of popular food products in the Netherlands, a flash game where you can upsize/downsize portions on screen, self-test score, information on portions for children and more.

      It would be helpful if details about the availability of this intervention could be provided (e.g. where it can be viewed, if the code is open source, and if the license allows translation). The TIDieR checklist (Hoffmann TC, 2014) - the template for intervention description and replication - is a good framework for this. More details on the components of interventions is important for enabling better practice (Glasziou P, 2010).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Sep 13, Hilda Bastian commented:

      The authors make an important point: just because a systematic review has not assessed publication bias (PB), it does not mean that there is none.

      However, in this study, there were only 36 reviews that did not assess publication bias, and nearly half of those were in the minority subset of reviews that didn't have a comprehensive search strategy. For many, a comprehensive search strategy is a defining characteristic of a systematic review (e.g. in DARE, the Database of Reviews of Effects). Those reviews may not be able to provide an adequate overview of published studies, either.

      The authors point out that a limitation of their study is that there were many (planned) subgroup analysis - and it's on a small number of reviews. Especially as the number of adequately systematic reviews was small, the exclusion of the Cochrane Database of Systematic Reviews - a journal that publishes systematic reviews and was eligible for their study - is disappointing. The reason given for the exclusion was because the results of the (Moher D, 2007) study showed that publication bias "is regularly performed in articles published in this database." However the authors of that study concluded the assessment of publication bias was disappointing overall. For Cochrane reviews in that study, publication bias was assessed (or intended to be assessed) in only 32% of those reviews (and it was considered in another 39%).

      (Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Oct 27, Hilda Bastian commented:

      This report of a very small, short-term trial in healthy adults does not meet the CONSORT standards for trial reporting in several key respects. It does not provide sufficient data on the cognitive outcomes assessed, nor an adequate flow chart of outcomes (despite considerable attrition). There is also very little detail provided in the record of this trial at ClinicalTrials.gov.

      The abstract does not make it clear that this is a dietary supplement and exercise trial (partially funded by a manufacturer). There were apparently two cognitive outcome measures on a ModBent task (an adapted test not elsewhere validated): immediate matching and delayed retention. Both relate to very specific functions, not an overall rating of cognitive abilities.

      No effect was found for the exercise component in the trial, and out of the two cognitive measures, some effect was found for one, but not the other. That this is a chance finding surely can't be ruled out.

      This report describes low vs high supplement groups. The study in ClinicalTrials.gov for the trial number they provide, however, was for a supplement and a placebo comparator.

      Despite the major limitations of this single trial to address the question, the "Newsroom" report for the trial claims that it shows that "dietary flavanols reverse age-related memory decline."

      It's good to see claims about dietary supplements tested. However, the results here rely on a chain of yet-to-be-validated assumptions that are still weakly supported at each point. In my opinion, the immodest title of this paper is not supported by its contents.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 Oct 27, David Colquhoun commented:

      For all the reasons given by Hilda Bastian (and a few more, like P = 0.04 provides lousy evidence) it astonishes me that this study should have been trumpeted as though it represented a great advance. That's the responsibility of Nature Neuroscience (and, ultimately, of the authors).

      I wonder whether what happens is as follows. Authors do big fMRI study. Glamour journal refuses to publish without functional information. Authors tag on a small human study. Paper gets published. Hyped up press releases issued that refer mostly to the add on. Journal and authors are happy. But science is not advanced.

      I certainly got this impression in another recent fMRI paper in Science. Brain stimulation was claimed to improve memory (P = 0.043)

      I guess these examples are quite encouraging for those who think that expensive glamour journals have had their day. Open access and open comments are the way forward.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Nov 14, Hilda Bastian commented:

      Very useful data on an important issue, given the high proportion of the population using contact lenses (Swanson MW, 2012). On the issue of the level of individual risk, readers might find a review of large-scale epidemiological studies helpful (Stapleton F, 2013).

      The authors stress the importance of good lens hygiene to reduce the risk of infection. That's a critical issue, and people may well over-estimate the adequacy of their own lens care (Bui TH, 2010). Given the increased risk of extended wear (rising from 2-4 per 10,000 for daily use to about 20 for extended wear Stapleton F, 2013), users being better informed about reduced wear as a way of lessening risks may also help (covered along with social and historical aspects in this blog post.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Nov 21, Hilda Bastian commented:

      The title and abstract of this article focuses on the positive finding in tumor promotion, without emphasizing that the findings were negative on causation, in a way that is clearly accessible for non-specialist readers. This is of particular importance, as a university press release issued for this study was headed with this misleading statement: "The dirty side of soap: Triclosan, a common antimicrobial in personal hygiene products, causes liver fibrosis and cancer in mice." This encouraged unwarranted alarm in the community (which I discuss further in this blog post).

      A 2010 inventory of animal and clinical studies of triclosan safety (Rodricks JV, 2010) found that oncogenicity studies to that point had not found cancer-related increases in any species, except for liver cancer in mice. Without pre-registration of studies on this question, we are unaware of what the outcomes have been for all oncogenicity studies on this substance, and thus whether there is publication bias.

      Further areas of uncertainty relate to the experiments here. The article does not report sufficient data and methodological information to enable adequate assessment of the level of uncertainty associated with the experiments (see the NIH's Proposed Principles and Guidelines for Reporting Preclinical Research). It would be helpful if the authors took the opportunity to include key data here, specifically:

      • how the sample size was determined;
      • the inclusion/exclusion criteria;
      • exact data on the experiments' results (including confidence intervals);
      • whether or not allocation of mice to the groups was random, and if so, details of the method of randomization (including whether or not there was blinding);
      • whether there was blinding in outcome assessment.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2014 Nov 29, Hilda Bastian commented:

      This paper uses the terms "depressive symptoms" and "depression" interchangeably. However, the relationship between the screening questions asked and the "clinical" condition of depression is unclear. A modified form of an unspecified version of the CESD screening tool was used. It included an unspecified 10 of the 16 CESD questions, applied only for the last week. In a further variation to the CESD, answers were scored with only a dichotomous outcome.

      The cut-off for determining "depression" was also not explained and the "clinical" relevance of the measure (and associated increase) is unclear. If there has been a validation of an association between the scores used here and depression, it was not referred to in the paper. More details on this would be helpful to people interested in interpreting the results of this study.

      The workplace situation was not equal between the men and women bracketed in the same job authority categories in this sample. The women worked fewer hours per week, earned less than the men of the same age, and were supervised more often. It's women's job authority with less pay and less freedom than men's job authority that is being compared. That would also be a function of the gender inequality the authors identify as a clear problem here. But it raises a question about the level of emphasis given to the psychological impact of having supervisory authority, and, therefore, to know what to do about it.

      The range of workplace factors addressed by this study include the traditional ones related to autonomy. Those questions don't address the kinds of gender-related issues the authors point to in the literature as constituting psychological workplace adversity for women in management: such as endemic social exclusion by peers and supervisors, frequent slights from all directions, being judged more frequently as socially disruptive, unequal opportunity and status attainment, and harassment. More sensitive tools (and relevant data from before the age of 54) would have been needed to unpack what made that generation of women unhappier than the men. The underlying point these authors show, though - that psychological aspects of the workplace experience have serious bearing on women's happiness - is a critical one.

      The full text of this article is available here.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 May 13, Hilda Bastian commented:

      Since writing this editorial, I have expanded on two of my central concerns here in blog posts. One of those is on women scientists making their opinions public. The other is a deep dive into the literature on anonymity and openness in publication review.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Feb 18, Hilda Bastian commented:

      This study states as its objective "to determine the efficacy and safety of varenicline" for quitting smoking via smoking reduction. The authors point out that one limitation of the study is its generalizability to a broad population, given its stringent and extensive exclusion criteria. However, it does not stress that both this, and the size of the study, very much preclude this single study "determining" safety of varenicline. The findings in relation to serious adverse events need to be considered in the light of the lower risk for serious adverse events in this study population.

      The paper does not refer readers to the safety warnings and concerns about varenicline issued by both the US FDA and European Medicines Agency (EMA), in relation both to psychiatric (FDA boxed warning) and cardiovascular events (FDA, 2012)(see also EMA). (Readers may also be interested in Singh S, 2011 on the issue of cardiovascular events.)

      UPDATE: On 9 March 2015, the FDA reviewed safety data on varenicline, retaining the boxed safety warning, and including a warning on interaction with alcohol. However, in March a large meta-analysis found that varenicline increased insomnia and bad dreams, but not depression, suicide, or suicidal ideation (Thomas KH, 2015).

      In terms of effectiveness, the authors rightly raise the issue of a lack of direct comparisons between varenicline and others options for smoking reduction. Readers might be interested in Asfar T, 2011, which finds that nicotine replacement therapy (NRT) achieved smoking reduction rates that were not dramatically dissimilar. Cahill K, 2010 found some (inconclusive) evidence that NRT and varenicline result in similar quit rates. Nor are pharmacological means the only successful options for reducing smoking without the risk of serious adverse events.

      Note also that this study was funded by Pfizer, manufacturer of the varenicline product marketed as Chantix in the US (Champix in Europe).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 May 25, Hilda Bastian commented:

      This is an interesting study. But it's a rather enthusiastic self-assessment of a method not validated by other researchers, and some perspective is useful in thinking about the conclusions.

      Kicinski M, 2015 is neither the first, nor the largest study, of publication bias (PB) in meta-analyses, and the presence of publication bias in them is well-known. These authors used a scraper they have made available on Github to extract meta-analyses from Cochrane reviews. They looked at reviews with placebo or "no treatment" control groups and 10 or more included studies. Whether or not these results are applicable to interventions with active or usual care control groups is unknown.

      For perspective here: Ioannidis JP, 2007 considered PB in 1,669 Cochrane reviews, ultimately analyzing 6,873 meta-analyses. A half of the meta-analyses had no statistically significant results in them, so the problem identified here could not have applied to them. Ioannidis JP, 2007 concluded that only 5% of the full set of Cochrane reviews would qualify for the use of asymmetry tests, and only 12% of those with a larger number of events and participants. They found very little concordance between different asymmetry tests - only around 3-4%. A more important problem according to Ioannidis JP, 2007 was the misapplication and misinterpretation of statistical tests, not under use. False-positives are a problem with tests for PB when there is clinical heterogeneity. Ioannidis JP, 2007 conclude that the only viable solution to the problem of PB is full reporting of results.

      Kicinski M, 2015 conclude that statistical tools for PB are under-utilized, but the extent to which PB is assessed was not part of their study. Although PB itself may be decreasing over time, assessment of PB is increasing, even if the methods for exploring it are still problematic:

      • Palma S, 2005 found that PB was assessed in 11% of trials between 1990 and 2002, increasing from 3% in 1998 to 19% in 2002 (less frequently in Cochrane reviews than others).
      • Moher D, 2007 found that about 23% of systematic reviews in 2004 assessed PB (32% in Cochrane reviews, 18% in others).
      • Riley RD, 2011 found that only 9% of reviews from one Cochrane group assessed PB.
      • van Enst WA, 2014 found that most systematic reviews of diagnostic test accuracy in 2011/2012 mentioned the issue, with 41% measuring PB.

      In assessing only the meta-analyses themselves, and not the reviews that included them, it's not possible to know, as the authors point out, to what extent other studies were included, but without data that could be pooled. An issue not raised by Kicinski M, 2015 are trials reported only in conference abstracts, and thus with minimal data. Cochrane reviews often include studies reported in conference abstracts only, and those are apparently more likely to have non-statistically significant results (Scherer RW, 2007) - as well as relatively little data for the multiple meta-analyses in a review.

      It's important to consider the review, and not just the effect summaries within meta-analyses, because the conclusions of the systematic review should reflect the body of the evidence, not only the meta-analyses. Over-favorable results in a meta-analysis shouldn't be equated with over-favorable conclusions about effectiveness in a review (although unfortunately it often will). We shouldn't jump to conclusions about effect sizes from meta-analyses alone. They can be skewed by clinical heterogeneity and small study size as well as (or instead of) publication bias, and the devil may be more in the interpretation than the calculations.

      Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2015 Dec 09, Hilda Bastian commented:

      This is an excellent trial on an important subject, but the authors go beyond what the data in this study can support here: "The overall conclusion is that supported computerised cognitive behaviour therapy confers modest or no benefit over usual GP care..."

      As others have pointed out in rapid responses at the BMJ, this study primarily shows that particularly low adherence to online CBT had little, if any, impact. The study was powered only to detect a difference at the effect sizes for supported computer-based/online CBT, while the type of support provided in this trial was minimal (and not clinician or content-related). The participants were more severely depressed than the groups for whom online CBT was offered in other trials (and in recommendations for its use), other care in each arm often included antidepressants, and the extent of use of CBT (online or otherwise) in the GP group is not known. The results are very relevant to policy on offering online CBT. But I don't think there is enough certainty from this one trial to support a blanket statement about the efficacy of the intervention rather the potential impact of a policy of offering it.

      The size of this study, while large, is smaller than the other studies combined, and without a systematic review it is not clear that this study would shift the current weight of evidence. An important difference between this trial and studies in this field generally is that personal access to the internet was not required. I couldn't locate any data on this in the report. It would be helpful if the authors could provide information here on the level of personal, private access to the internet people had in each arm of the trial, so that it's possible to take this potential confounder into account in interpreting the results.

      Free online CBT is also an option for those who cannot (or will not) get in-person therapeutic care. Many people with mild or moderate depression do not get professional care for it, and it doesn't seem reasonable on the basis of this to discourage people from trying free online CBT out. Yet, the press release for this study was headlined, "Computer assisted cognitive behavioural therapy provides little or no benefits for depression" (PDF), setting off media reports with that message. That far exceeds what the data from this one trial can support.

      Disclosure: I have not been involved in the development of any online, or in-person, therapy for depression. I was co-author of a 2003 systematic review on the impact of the internet, which concluded that CBT-based websites for mental health issues at that time had mixed results (PDF), and I have since written favorably about online CBT.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Mar 12, Hilda Bastian commented:

      An interesting and very useful study of Google Scholar (GS). I am unclear, though, about the methods used to compare it with other databases. The abstract includes this step after the systematic review authors had a final list of included studies: "All three databases were then searched post hoc for included references not found in the original search results". That step is clearly described in the article for GS.

      However, for the other 2 databases (EMBASE and MEDLINE Ovid), the article describes the step this way: "We searched for all included references one-by-one in the original files in Endnote". "Overall coverage" is reported only for GS. Could you clarify whether the databases were searched post hoc for all 3 databases?

      I am also unclear about the MEDLINE Ovid search. It is stated that there was also a search of "a subset of PubMed to find recent articles". Were articles retrieved in this way classified as from the MEDLINE Ovid search? And if recent articles from PubMed were searched, does that mean that the MEDLINE Ovid search was restricted to MEDLINE content only, and not additional PubMed records (such as those via PMC)?

      There is little description of the 120 systematic reviews and citations are only provided for 5. One of those (Bramer WM, 2015) is arguably not a systematic review. What kind of primary literature was being sought is not reported, nor whether studies in languages other than English were included. And with only 5 topics given, it is not clear what role the subject matter played here. As Hoffmann T, 2012 showed, research scatter can vary greatly according to the subject. It would be helpful to provide the list of 120 systematic reviews.

      No data or description is provided about the studies missed with each strategy. Firstly, that makes it difficult to ascertain to what extent this reflects the quality of the retrieval rather than the contents of the databases. And secondly, with numbers alone and no information about the quality of the studies missed, the critical issue of the value of the missing studies is a blank space.

      Disclosure: I am the lead editor of PubMed Health, a clinical effectiveness resource and project that adds non-MEDLINE systematic reviews to PubMed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Apr 11, Hilda Bastian commented:

      The strongly declarative title of this paper makes a claim that is not supported by its contents.

      The authors argue that only one study (Galesic M, 2009) has reported "a substantial benefit" of natural frequencies. That claim is not based on an up-to-date systematic review of the studies on this question. A systematic review is needed, because there are multiple studies now with varying methods, in various populations, and in relevant contexts, that need to be considered in detail.

      These authors cite some studies that support their position (all in the context of treatment decisions). However, this is only a part of the relevant evidence. Among the studies not cited in this paper, there is at least one looking at medical tests (Garcia-Retamero R, 2013), others at treatments (e.g. Cuite CL, 2008, Carling CL, 2009, Knapp P, 2009 and Sinayev A, 2015), and at least one in a different field (Hoffrage U, 2015). Some find in favor of natural frequencies, others for percentages, and others find no difference. I don't think it's possible to predict what a thorough systematic review would conclude on this question.

      This study by Pighin and colleagues is done among US residents recruited via Mechanical Turk, and includes some replication of Galesic M, 2009 (a study done in Germany). The authors conclude that Galesic and colleagues' conclusion is attributable to the outcome measure they used (the "scoring artifact" referred to in the abstract here). However, their study comes to the same conclusion - better understanding with natural frequencies - when using the same outcome measure. They then applied more stringent outcome measures for "correct" answers, but the number of people scoring correct were too small to allow for any meaningful conclusion. For their two studies, as well as for Galesic M, 2009, both methodological detail and data are thin. Neither the original study nor this replication and expansion, provide "the answer" to the questions they address.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Aug 30, Hilda Bastian commented:

      Although the authors draw conclusions here about cost and effectiveness of simply offering badges if certain criteria are met, the study does not support these claims. There are, for example, no data on costs for the journal, peer reviewers, or authors. Any conclusions about effectiveness are hampered by the study's design, and the lack of consideration and assessment of any potentially negative repercussions.

      It was not possible for the authors to study the effects of offering badges alone, as this intervention was part of a complex intervention: a package of 5 co-interventions, announced by the journal in November 2013 to begin taking effect from January 2014 (Eich E, 2014). All were designed to improve research transparency and/or reproducibility, and signaled a major change in editorial policy and practice. Any manuscript accepted for publication after 1 January, while being eligible for these badges, was also subject to additional editorial requirements of authors and reviewers. All authors submitting articles from 2014 faced additional reproducibility-related questions before submission, that included data disclosure assurances. Other authors have shown that although these did not all lead to the changes sought, there was considerable impact on some measures (Giofrè D, 2017).

      Data on the impact on submissions, editorial rejections, and length of time until publication of accepted articles is not provided in this paper by Kidwell and colleagues. These would be necessary to gain perspective on the burdens and impact of the intervention package. I had a look at the impact on publications, though. It is clear from the data as collected in this study, and from a more extended timeframe based on analysis of date of e-publication, that the package of interventions appears to have led to a considerable drop in publication of articles (see my blog post, Absolutely Maybe, 2017). The number of articles receiving badges is small. During the year in this study from the awarding of the first badge, it was about 4 articles a month. That first dropped, then rose since, while at the same time the number of publications by Psychological Science has dropped to less than half the rate it was in the year before this package of interventions was introduced, leading to a substantial increase in percentage, while the absolute numbers of compliant articles remains small.

      Taken together, it appears that it was likely there was a process of "natural selection", on the side of the journal and authors, leading to more rigorous reporting and sharing among the reduced number of articles reaching publication. The part that badges alone played in this is unknowable. Higher rates of compliance with such standards have been achieved without badges at other journals (see the blog post for examples). There is some data to suggest that disinclination to data disclosure is part of a range of practices adopted together more by some psychology researchers than others, in one of the studies that spurred Psychological Science to introduce these initiatives (<PMID:26173121). The data in Giofrè D, 2017 tend to support the hypothesis that there is a correlation between some of the data disclosure requirements in the co-interventions, and data-sharing (see my follow-up blog post).

      In addition to not considering a range of possible effects of the practices, or being able to isolate the impact of one of the set of co-interventions, the study used only one data extractor and coder for each article. This is a particularly critical potential source of bias, as assessors could not be blinded to the journals, and the badging intervention was developed and promoted from within the author group.

      It would be useful if the authors could report in more detail what was required for the early screening question of "availability statement, yes or no". Was an explicit data availability statement required here, whether or not there was indeed additional data than was included in the paper and its supplementary materials?

      It would be helpful if the authors could confirm the percentage of articles eligible for badges, where the offer of a badge was rejected.

      At the heart of this badge approach for closed access journals, is a definition of "open-ness" that enables potentially serious limitation of the methodological information and key explanatory data available outside paywalls. In de-coupling the part of the study included in the paper from the study's data, and allowing it be inaccessible to many who could potentially use it or offer useful critique, the intervention is promoting a limited form of open-ness. The trade-off assumed is that this results in more open-ness than there otherwise would be. However, it may have the reverse effect, for example, by encouraging authors to think fully open access doesn't matter and can be foregone with pride and without concern, and if journals believe this "magic bullet" is an easy way out of more effective intensive intervention.

      Disclosures: I have a long relationship with PLOS (which has taken a different approach to increasing openness), including blogging at its Blog Network, and am a user of the Open Science Framework (which is produced by the group promoting the badges). My day job is at NCBI, which maintains literature and data repositories.

      This comment was updated with the two references and data on the question of correlation between data disclosure and data sharing on 1 September, after John Sakaluk tweeted the Giofrè paper to me.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Dec 07, Joanne Kamens commented:

      Thank you for this helpful set of definitions and clarifications. Common language will make the discussion more productive. I was prompted by an excellent blog by Hilda Bastian (http://blogs.plos.org/absolutely-maybe/2016/12/05/reproducibility-crisis-timeline-milestones-in-tackling-research-reliability/) and the subsequent twitter conversation to mention that these definitions don't address or even seem to mention the potential, influence or use of reagent/materials reproducibility. Experimental results and interpretation can be dramatically enhanced by the use of the correct standards, materials and/or reagents to reproduce a study. Protocol and methods sections alone are not sufficient to account for this as some reagents are not easily remade and are not always validated as being the same (unless subjected to quality control via repository storage or standard validation).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Jun 11, Hilda Bastian commented:

      This is a vitally important issue to address, but the data presented in this study do not support the authors' conclusion that disclosure of information is decreasing.

      The study is characterized by a steeply declining availability of data for comparison - in both the response rate and amount of data provided by those clinics responding. From the first to the last year studied:

      • The response rate dropped from 65% to 31%.
      • The percentage of consent forms referring to other information sheets as the vehicles for informing women/couples rose from 55% to 100%.
      • The percentage of those information sheets available for the study dropped from 82% to 18%.

      This suggests that a principal finding, based on a minority of clinics in 2014, is a shift towards providing supplementary information to consent forms, rather than consent forms as the sole formal vehicle of disclosure. If those information sheets were available - and 82% were not in 2014 - the conclusion of this study could be very different.

      It would be useful to know if the consent forms indicated that the person signing had been provided with the supplementary information, as part of the formal disclosure. Clarification from the authors would also be useful on whether data from information sheets was included in the results Table, or whether only consent forms themselves were the source.

      If both types of consent documents are included, then for 2014, complete consent documents were available for only 2 of 35 clinics (6%), compared with 9 of 17 clinics in 1991 (53%). And the increased coverage of items in the earlier years could be attributable to the enlarged scope of materials assessed.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Jun 12, Hilda Bastian commented:

      This is a helpful broad brush update on randomized controlled trials (RCTs) of peer review interventions in biomedical journals (see older review Jefferson T, 2007 and my comment on that review). However, while the authors list several limitations, including restricting to RCTs and to biomedical journals, there are other limitations that, in turn, highlight the impact of those limitations.

      One of those is the outcomes addressed here. The focus is explicitly on the peer reviews themselves and the process, and not wider outcomes, such as potential benefits and harms to peer reviewers or the impact of policies such as open review on journals (e.g. level of unwillingness to review).

      In particular, the issue of harms brings us back to the limitation of looking only at RCTs, and limiting to the biomedical literature and a limited scope for databases searched. The authors provide no rationale for limiting the review to biomedical publications. Given that there are so few eligible studies within the scope this review, moving past this is essential. (In a blog post on anonymity, openness, and blinding of peer review in March 2015, in addition to the 11 RCTs identified in this systematic review, I identified a further 6 comparative studies, as well as other types of studies relevant to the questions around which known concerns exist.)

      Peer reviewers are not just a means to an end: biases of peer reviewers can have a major impact on the careers of others, and at a minimum, specifically addressing gender, seniority, and institutional/country/language impact is critical to further work on this topic. A more contextual approach is needed to grapple with the complex ecosystem involved here.

      A final point that is less likely to have had an impact, but is worth consideration by others working on this issue. Limiting the search strategy to the single term "peer review" may have an impact on searches, as terms such as open review and post-publication review become more widely used. Terms such as manuscript and editorial review, and peer reviewers could also be considered in constructing a search strategy in this area. (To identify the studies to which I refer above, Google Scholar and a wider range of search terms was necessary.)

      (Disclosure: Part of my job includes working on PubMed Commons, which does not allow anonymous commenting.)


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Jul 28, Isabelle Boutron commented:

      We would like to thank the Hilda Bastian for her interest in our work. We fully agree that our systematic review has some limitations and we acknowledged most of them in the paper. We also fully agree that the peer review system is a complex system and that we need different approaches to explore this system and that other study designs are also important to tackle this issue. We focused on randomised controlled trials as it provides a high level of evidence and one important result of this systematic review is the appalling lack of randomised controlled trials in this field. Despite huge human and financial investments in the peer review process, its essential role in biomedical research, only 7 RCTs have been published over the last 10 years. Yet, the conduct of randomised controlled trials in this field does not raise any important ethical or methodological concern. These results should be a call for action for editors to facilitate the conduct research in this field and give access to their data.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2016 Sep 16, Hilda Bastian commented:

      There are many important issues raised in this paper on which I strongly agree with John Ioannidis. There is a lot of research waste in meta-analyses and systematic reviews, and a flood of very low quality, and he points out the contributing factors clearly. However, there are some issues to be aware of in considering the analyses in this paper on the growth of these papers, and their growth in comparison with randomized and other clinical trials.

      Although the author refers to PubMed's "tag" for systematic reviews, there is no tagging process for systematic reviews, as there is for meta-analyses and trials. Although "systematic review" is available as a choice under "article types", that option is a filtered search using Clinical Queries (PubMed Help), not a tagging of publication type. Comparing filtered results to tagged results is not comparing like with like in 2 critical ways.

      Firstly, the proportion of non-systematic reviews in the filter is far higher than the proportion of non-meta-analyses and non-trials in the tagged results. And secondly, full tagging of publication types for MEDLINE/PubMed takes considerable time. When considering a recent year, the gulf between filtered and tagged results widens. For example, as of December 2015 when Ioannidis' searches were done, the tag identified 9,135 meta-analyses. Today (15 September 2016), the same search identifies 11,263. For the type randomized controlled trial, the number tagged increased from 23,133 in December to 29,118 today.

      In the absence of tagging for systematic reviews, the more appropriate comparisons are using filters for both systematic reviews and trials as the base for trends, especially for a year as recent as 2014. Using the Clinical Queries filter for both systematic reviews and therapy trials (broad), for example, shows 34,126 for systematic reviews and 250,195 trials. Page and colleagues estimate there were perhaps 8,000 actual systematic reviews according to a fairly stringent definition (Page MJ, 2016) and the Centre for Reviews and Dissemination added just short of 9,000 systematic reviews to its database in 2014 (PubMed Health). So far, the Cochrane Collaboration has around 38,000 trials in its trials register for 2014 (searching on the word trial in CENTRAL externally).

      The number of systematic reviews/meta-analyses has increased greatly, but not as dramatically as this paper's comparisons suggest, and the data do not tend to support the conclusion in the abstract here that "Currently, probably more systematic reviews of trials than new randomized trials are published annually".

      Ioannidis suggests some bases for some reasonable duplication of systematic reviews - these are descriptive studies, with many subjective choices along the way. However, there is another critical reason that is not raised: the need for updates. This can be by the same group publishing a new version of a systematic review or by others. In areas with substantial questions and considerable ongoing research, multiple reviews are needed.

      I strongly agree with the concerns raised about conflicted systematic reviews. In addition to the issues of manufacturer conflicts, it is important not to underestimate the extent of other kinds of bias (see for example my comment here). Realistically, though, conflicted reviews will continue, building in a need for additional reviewers to tackle the same ground.

      Systematic reviews have found important homes in clinical practice guidelines, health technology assessment, and reimbursement decision-making for both public and private health insurance. But underuse of high quality systematic reviews remains a more significant problem than is addressed here. Even when a systematic review does not identify a strong basis in favor of one option or another, that can still be valuable for decision making - especially in the face of conflicted claims of superiority (and wishful thinking). However, systematic reviews are still not being used enough - especially in shaping subsequent research (see for example Habre C, 2014).

      I agree with Ioannidis that collaborations working prospectively to keep a body of evidence up-to-date is an important direction to go - and it is encouraging that the living cumulative network meta-analysis has arrived (Créquit P, 2016). That direction was also highlighted in Page and Moher's accompanying editorial (Page MJ, 2016). However, I'm not so sure how much of a solution this is going to be. The experience of the Cochrane Collaboration suggests this is even harder than it seems. And consider how excited people were back in 1995 at the groundbreaking publication of the protocol for prospective, collaborative meta-analysis of statin trials (Anonymous, 1995) - and the continuing controversy that swirls, tornado-like, around it today (Godlee, 2016).

      We need higher standards, and skills in critiquing the claims of systematic reviews and meta-analyses need to spread. Meta-analysis factories are a serious problem. But I still think the most critical issues we face are making systematic reviews quicker and more efficient to do, and to use good ones more effectively and thoroughly than we do now (Chalmers I, 2009, Tsafnat G, 2014).

      Disclosure: I work on projects related to systematic reviews at the NCBI (National Center for Biotechnology Information, U.S. National Library of Medicine), including some aspects that relate to the inclusion of systematic reviews in PubMed. I co-authored a paper related to issues raised here several years ago (Bastian H, 2010), and was one of the founding members of the Cochrane Collaboration.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Feb 01, Hilda Bastian commented:

      The authors raise interesting and important points about the quandaries and complexities involved in updating a systematic review and reporting the update. However, their review of the field and conclusion that of the 250 journals they looked at, only BMC Systematic Reviews has guidance on the process of updating is deeply flawed.

      One of the 185 journals in the original sample they included (Page MJ, 2016) is the Cochrane Database of Systematic Reviews. Section 3.4 of the Cochrane Handbook is devoted to updating, and updating is addressed within several other sections as well. The authors here refer to discussion of updating in Cochrane's MECIR standards. Even though this does not completely cover Cochrane's guidance to authors, it contradicts the authors' conclusion that BMC Systematic Reviews is the only journal with guidance on updating.

      The authors cite a recent useful analysis of guidance on updating systematic reviews (Garner P, 2016). Readers who are interested in this topic could also consider the broader systematic review community and methodological guidance. Garritty C, 2010 found 35 organizations that have policy documents at least on updating, and many of these have extensive methodological guidance, for example AHRQ (Tsertsvadze A, 2008). Recently, guidelines for updating clinical guidelines have also been published (Vernooij RW, 2017).

      The authors reference some studies that address updating strategies, however this literature is quite extensive. You can use this filter in PubMed along with other search terms for studies and guidance: sysrev_methods [sb] (example). (An explanation of this filter is on the PubMed Health blog.)

      Disclosure: I work on PubMed Health, the PubMed resource on systematic reviews and information based on them.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Apr 28, Hilda Bastian commented:

      This is an interesting methodological approach to a thorny issue. But the abstract and coverage (such as in Nature glosses over the fact that the results measure the study method's biases more than they measure scientists on Twitter. I think the method is inferring people who are a subset of people working in limited science-based professions.

      The list of professions sought is severely biased. It includes 161 professional categories and their plural forms, in English only. It was based on a U.S. list of occupations (SOC) and an ad hoc Wikipedia list. A brief assessment of the 161 titles in comparison with an authoritative international list shows a strong skew towards social scientists and practitioners of some science-based occupations, and away from meical science, engineering, and more (United Nations Educational, Scientific and Cultural Organization (UNESCO)'s nomenclature for fields of science and technology, SKOS).

      Of the 161 titles, 17% are varieties of psychologist, for example, but psychiatry isn't there. Genealogists and linguists are there, but geometers, biometricians, and surgeons are not. The U.S. English language bias is a major problem for a global assessment of a platform where people communicating with the general public.

      Influence is measured in 3 ways, but I couldn't find a detailed explanation of the calculations or a reference to one, in the paper. It would be great if the authors could point to that here. More detail on the "Who is who" service used in terms of how up-to-date it is would be useful as well.

      I have written more about this paper at PLOS Blogs, and point to key numbers that aren't reported, for who was excluded at different stages. The paper says that data sharing is limited by Twitter's terms of service, but it doesn't specify what that covers. Providing a full list of proportions in the 161 titles, and descriptions of more than 15 of the communities they found (none of which appear to be medical science circles), seem unlikely to be affected by that restriction. More data would be helpful to anyone trying to make sense of these results, or extend the work in ways that minimize the biases in this first study.

      There is no research cited that establishes the representativeness of data from a method that can only classify less than 2% of people who are on multiple lists. The original application of the method (Sharma, 2011) was aimed at a very different purpose, so representativeness was not such a big issue there. There was no reference in this article to data on list-creating behavior. There could be a reason historians came out on top in this group: list-curating is probably not a randomly-distributed proclivity.

      It might be possible with this method to better identify Twitter users who work in STEM fields. Aiming for "scientists", though, remains, it seems to me, unfeasible at scale. Methods described by the authors as product-centric (e.g. who is sharing links to scientific articles and/or discussing them, or discussing blogs where those articles are cited), and key nodes such as science journals and organizations seem essential.

      I would also be interested to know the authors' rationale for trying to exclude pseudonyms - as well as the data on how many were excluded. I can see why methods gathering citations for Twitter users exclude pseudonyms, but am not sure why else they should be excluded. A key reason for undertaking this kind of analysis is to understand to what extent Twitter expands the impact of scientific knowledge and research. That inherently means looking to wider groups, and the audiences for their conversations. Thank you to the authors, though, for a very interesting contribution to this complex issue.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 May 06, Hilda Bastian commented:

      The conclusion that implicit bias in physicians "does not appear to impact their clinical decision making" would be good news, but this systematic review does not support it. Coming to any conclusion at all on this question requires a strong body of high quality evidence, with representative samples across a wide range of representative populations, using real-life data not hypothetical situations. None of these conditions pertain here. I think the appropriate conclusion here is that we still do not know what role implicit racial bias, as measured by this test, has on people's health care.

      The abstract reports that "The majority of studies used clinical vignettes to examine clinical decision making". In this instance, "majority" means "all but one" (8 out of 9). And the single exception has a serious limitation in that regard, according to Table 1: "pharmacy refills are only a proxy for decision to intensify treatment". The authors' conclusions are thus related, not to clinical decision making, but to hypothetical decision making.

      Of the 9 studies, Table 1 reports that 4 had a low response rate (37% to 53%), and in 2 studies the response rate was unknown. As this is a critical point, and an adequate response rate was not defined in the report of this review, I looked at the 3 studies (albeit briefly). I could find no response rate in any of the 3. In 1 of these (Haider AH, 2014), 248 members of an organization responded. That organization currently reports having over 2,000 members (EAST, accessed 6 May 2017). (The authors report that only 2 of the studies had a sample size calculation.)

      It would be helpful if the authors could provide the full scoring: given the limitations reported, it's hard to see how some of these studies scored so highly. This accepted manuscript version reports that the criteria themselves are available in a supplement, but that supplement was not included.

      It would have been helpful if additional important methodological details of the included studies were reported. For example, 1 of the studies I looked at (Oliver MN, 2014) included an element of random allocation of race to patient photos in the vignettes: design elements such as this were not included in the data extraction reported here. Along with the use of a non-validated quality assessment method (9 of the 27 components of the instrument that was modified), these issues leave too many questions about the quality rating of included studies. Other elements missing from this systematic review (Shea BJ, 2007) are a listing of the excluded studies and assessing the risk of publication bias.

      The search strategy appears to be incompletely reported: it ends with an empty bullet point, and none of the previous bullet points refer to implicit bias or the implicit association test.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 May 26, Hilda Bastian commented:

      These are interesting results, showing the critical importance of transparency about trials of pharmaceuticals. However, it does not identify the trials it found, or identify the phases of those trials. It would be helpful if the authors were to release these data, for those interested in the results of this study, anyone interested in doing similar work, and those looking for trials on these particular drugs.

      The abstract reports the number of participants in the unpublished trials. It would be good to also provide the number of participants in the published trials.

      Note: I wrote a blog post about this study and its context.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Jun 30, Hilda Bastian commented:

      The authors state that this advisory "reviews and discusses the scientific evidence, including the most recent studies", and that its primary recommendation is made, "taking into consideration the totality of the scientific evidence, satisfying rigorous criteria for causality". They do not report what evidence was sought and how, or the basis upon which it was selected. There is little in this report to suggest that "the totality of scientific evidence" was considered.

      For example, four reviews of trials are referred to:

      However, the more recent systematic review and meta-analysis within Ramsden CE, 2016 (date of last search March 2015) was not mentioned. Nor are, for example, these systematic reviews: Skeaff CM, 2009; Stroster, 2013; National Clinical Guideline Centre (UK), 2014; Schwingshackl L, 2014; Pimpin L, 2016.

      The AHA advisory includes sections reviewing two specific sources of saturated fat, dairy and coconut oil. Dairy products are a major source of dietary saturated fats. However, no basis for singling out coconut oil is offered, or for not addressing evidence about other, and larger, sources of saturated fats in Americans' diets. The section concludes: "we advise against the use of coconut oil".

      There are three conclusions/statements leading to that recommendation:

      • Eyres L, 2016 "noted that the 7 trials did not find a difference in raising LDL cholesterol between coconut oil and other oils high in saturated fat such as butter, beef fat, or palm oil."
      • "Clinical trials that compared direct effects on CVD of coconut oil and other dietary oils have not been reported."
      • Coconut oil increases LDL cholesterol "and has no known offsetting favorable effects".

      The only studies of coconut oil cited by the advisory to support these conclusions are one review (Eyres L, 2016) - reasonably described as a narrative, not systematic, review by its authors - and 7 of the 8 studies included in that review. The date of search of this study was the end of 2013 (with an apparently abbreviated update search, not fully reported, in 2015). Not only is that too long ago to be reasonably certain there are no recent studies, the review's inclusion and exclusion criteria are too narrow to support broad conclusions about coconut oil and CVD or other health effects.

      The AHA's first statement - that Eyres et al noted no difference between 7 trials comparing coconut oil with other saturated fats - is not correct. Only 5 small trials included such comparisons, and their results were inconsistent (with 2 of the 3 randomized trials finding a difference). There was no meta-analysis, so there was no single summative finding. The trials in question are very small, none lasting longer than eight weeks, and have a range of methodological quality issues. The authors of the Eyres review caution about interpreting conclusions based on the methodologically limited evidence in their paper. In accepting these trials as a reliable basis for a strong recommendation, the AHA has not applied as rigorous a standard of proof as they did for the trials they designated as "non-core" and rejected for their meta-analysis on replacing dietary saturated fat with polyunsaturated fat.

      Further, even a rapid, unsystematic search shows that there are more participants in relevant randomized trials not included in the Eyres review than there are randomized participants within it. For example: McKenney JM, 1995; Ganji V, 1996; Assunção ML, 2009; Cardoso DA, 2015; de Paula Franco E, 2015; and Enns, 2015 (as well as another published since the AHA's panel finished its work, Shedden, 2017).

      The conclusions of the coconut oil section of the AHA advisory are not supported by the evidence it cites. A high quality systematic review that minimizes bias is required to draw any conclusion about the health effects of coconut oil.

      Disclosure: I have no financial, livelihood, or intellectual conflicts of interest in relation to coconut or dietary fats. I discuss my personal, social, and professional biases in a blog post that discusses the AHA advisory on coconut oil in detail (Bastian, June 2017).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Jun 18, Hilda Bastian commented:

      An assessment of a critical problem, with important conclusions. It would be helpful, though, if the scope of the 4 guidelines were shown. The inclusion criteria are not very specific on this matter, and the citations of the versions of the 4 included guidelines are not provided.

      In addition to the scope, the dating of the guidelines' last search for evidence (if available) with respect to the dates of the systematic reviews would be valuable. Gauging to what extent systematic reviews were not included because of being out of scope, out of date, or not yet published is important to interpreting these findings. Given how quickly systematic reviews can go out of date (Shojania KG, 2007), the non-inclusion of older systematic reviews may have been deliberate.

      The publisher of the article does not appear to have uploaded Appendix A, which includes the references to the systematic reviews. Further, confusion has been created by linking the citations of the first 44 systematic reviews to the references of the article's texts. The end result is that neither the 4 guidelines nor the 71 systematic reviews are identifiable. It would be helpful if the authors would post these 75 citations here.

      Disclosure: I work on PubMed Health, the PubMed resource on systematic reviews and information based on them.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Sep 03, Hilda Bastian commented:

      This is a useful addition on an important topic, and is a good resource for other similar search strategies. Given it is such a long search strategy, it would be useful if the authors could provide a cut and paste version. Small point: the KiMS search strategy cited with reference number 27 in the article is actually at reference number 28 (Wessels M, 2016).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2017 Oct 28, Hilda Bastian commented:

      It would be useful if the authors could provide detail on two key issues not described in the paper. The first is the method for excluding identified references that were published subsequent to the date of the original searches.

      The second is how eligibility for study inclusion was assessed for the ESM group, and by whom. This is a key outcome measure, that is also highly susceptible to bias. A method for reducing this bias, for example, would be assessment by more than one assessor independent of those conducting the searches, blinded to the search strategy by which the study had been identified.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2018 Jan 01, Hilda Bastian commented:

      It is great to see randomized trials to test the effects of an infographic. However, I have concerns with the interpretation of the results of this set of 3 trials. The abstract states that these were randomized trials of 171 students, 99 consumers, and 64 doctors. However, those are the numbers of people who completed the knowledge and reading experience questions, not the number randomized: 171 students, 212 consumers, and 108 doctors were randomized. The extremely high dropout rate (e.g. 53% for consumers) leaves only the trial in students as a reliable base for conclusions. And for them, there was no difference in knowledge or reported reading experience - they did not prefer the infographic.

      The authors point out that the high dropout rate may have affected the results for consumers and doctors, especially as they faced a numeracy test after being given the infographic or summary to read. That must have skewed the results. In particular, since the infographic (here) has such different content to the plain language summary (here), this seems inevitably related to the issue of numeracy: the plain language summary is almost number-free, while the infographic is number-heavy (an additional 16 numerical expressions).

      The knowledge test comprised 10 questions, one of which related to the quality of the evidence included in the systematic review. The infographic and plain language summary contained very different information on this. The article's appendix suggests that the correct answer expected was included in the infographic but not in the plain language summary. It would be helpful to know whether this affected the knowledge scores for readers of the plain language summary.

      Cohen's d effect sizes are not reported for the 3 trials separately, and given the heterogeneity in those results, it is not accurate to use the combined result to conclude that all 3 participant groups preferred the infographic and reading it. (In addition, the method for the meta-analysis of effect sizes of the 3 trials is not reported.)

      The specific summary and infographic, although high quality, also point to some of the underlying challenges in communicating with these media to consumers. For example, the infographic uses a coffin as pictograph for mortality, which I don't believe is appropriate in patient information. This highlights the risks inherent in using graphic elements where there aren't well-established conventions. Both the infographic and the plain language summary focus on information about the baby's wellbeing and the birth - but not the impact of the intervention on the pregnant woman, or their views of it. Whatever the format, issues remain with the process of determining the content of research summaries for consumers. (I have written more about the evidence on infographics and this study here.)

      Disclosure: The Cochrane (text) plain language summaries were an initiative of mine in the early days of the Cochrane Collaboration, when I was a consumer advocate. Although I wrote or edited most of those early Cochrane summaries, I had no involvement with the one studied here.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    1. On 2013 Jun 23, Hilda Bastian commented:

      This trial bears the predominant weight for safety concerns about single-session debriefing in a subsequent influential systematic review (Rose S, 2002, of which the lead trialist here is an author). Its results are potentially affected by multiple serious biases.

      The trial had a high attrition rate (>22%): 23 lost to follow-up (p78 - participants) and 7 who left hospital before intervention (p78 - results). The number of events was low.

      This trial report does not include an intention-to-treat analysis (ITT). ITT was imputed in the systematic review (Rose S, 2002), without description of the additional data or reporting the methods used, and whether or not sensitivity analyses were conducted.

      The intervention group was at higher risk of the event at baseline (25% of the intervention arm had others involved in the trauma vs 4% in the control arm, p=0.01; percentage of the body burned, life threat and past significant trauma were also higher, although not significantly so).

      There was a disproportionately large number in the intervention group (64 vs 46), due to the method of randomization and having stopped the trial early.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.