341 Matching Annotations
  1. Oct 2017
    1. The WWARN experience suggests that truly useful data sharing platforms must be thought of as long-term, infrastructural investments; they cannot be thrown up as rapid, project-based responses to funder or journal demands.
    2. Targeted calls and prizes could provide further motivation36,37, but supporters of research may need to earmark a percentage of all funding for capacity building if they wish to redress structural imbalances in biomedical knowledge generation and use.
    3. INDEPTH network,
    4. Alpha Network
    5. the WWARN network published several pooled analyses which established the value of the resource as a source of additional learning. This has encouraged drug developers and global health organisations to begin to request analyses using the database, and has prompted changes to access policies to facilitate use of the resource by all legitimate analysts.
    6. the true potential of data sharing. If that potential is to be achieved, the publication of papers in peer reviewed journals must lose their pre-eminence as a measure of scientific productivity in academia. We believe that depositing data in well-curated, quality-assured databases should be rewarded professionally just as publication of papers in high impact journals now is. The use of data in a pooled analysis that demonstrably changes policy should be rewarded at least as well as a citation in a journal.
    7. By adopting the study group model, which appealed to data contributors, WWARN was able simultaneously to build up the database and to begin to conduct important pooled analyses that have contributed directly to improvements in global policy.
    8. archaic academic norms, coupled with a dysfunctional global health architecture

      well said

    9. Governance structures designed for platforms supporting pooled analysis of post-publication data by academics will not serve the needs of platforms aiming to provide real-time surveillance data.

      diversity of needs & goals

    10. terms
    11. Researchers should plan for sharing, thus reducing costs
    12. Investment in data curation and governance is essential and often substantial

      important for reuse and reusability, among other things

    13. Structural inequities in science must be reduced
    14. People who collect data must be incentivised to share it
    15. Box 1 summarises the characteristics of a data sharing platform that has the potential to increase policy-relevant knowledge.

      useful overview

    16. “The fact that [WWARN] has standardised data so that everyone can learn and analyse with the same tools: that is going to be the future, however much people are conservative and reject it and are afraid of it. This will happen, so let's make it useful.”

      encouraging optimism

    17. An

      this should be "A"

    18. the team has already begun to adapt the data infrastructure, informatics tools, policies and procedures for other diseases; these efforts demonstrate the time and cost savings achieved by building on the WWARN experience.

      efficiency

    19. Infectious Diseases Data Observatory

      They have a website as well: https://www.iddo.org/

    20. The study group model depends on an academic publication incentive which has little value for researchers working in government or medical charities, and which is inimical to the needs of surveillance. However, most interviewees were confident that the structures and tools developed by WWARN for malaria could and should be used as building blocks for shared clinical data platforms for other diseases of poverty.

      incentives

    21. other regulators are expected to follow suit

      Lots of hits for CDISC at the site of the Search Results Pharmaceuticals and Medical Devices Agency in Japan, e.g. https://www.pmda.go.jp/english/review-services/reviews/advanced-efforts/0007.html

    22. From 2017, CDISC metadata standards must be used for all data submitted to United States Food and Drug Administration by organisations seeking to register drugs and medical products
    23. Clinical Data Interchange Standards Consortium (CDISC)

      Some reference would have been nice here as well. Website is https://www.cdisc.org/

    24. importance of data discoverability: an informatics issue that was not considered by WWARN's designers in the initial phase, when it was assumed that access for external users would be limited to the summary data shown on the WWARN Explorer. Discoverability - the ability of potential outside users to find the data set and easily understand what it contains - is critical if the data are to be reused by any investigator with a legitimate research question that may be addressed by data held in the resource. WWARN took care to develop data management tools that included a full audit trail for the variables that they standardised.

      tools and standards again

    25. “We have to change the conversation to: data sharing is a given. If you don't want to share you have to take action yourself to opt out, instead of putting the onus onto people to make sharing happen. If we did it the other way around, people would quickly find data sharing is not as bad as they thought and we'd make progress much faster.”

      Important points

    26. In mid-2016, the WWARN board began to discuss changes to the terms of submission that would allow researchers to grant access to their data in perpetuity if they chose, rather than be recontacted for every use. Access requests would be considered by an independent Data Access Committee. This committee was constituted under the auspices of the WHO's Special Programme for Research and Training in Tropical Diseases (TDR) in April 2017

      facilitating reuse

    27. non-

      seems wrong here

    28. What we have seen is that when we have data, some of the data we don't even think have value, [it] may have very great value to other institutions.

      reuse

    29. When the deliverable is data and information generated in a defined timeframe, you make your bet where you're likely to get the most yield. And that's very likely to be a known investigator and a known institution.

      prestige

    30. The tools also helped researchers to collect data in standard formats that would contribute to quality, and that could be easily ingested into the database and analysed.

      yes, the benefits of tools and standards are much more apparent when they go together

    31. parasite clearance estimator

      A reference would have been nice here. I found http://www.wwarn.org/tools-resources/toolkit/analyse/parasite-clearance-estimator-pce as a starting point.

    32. Papers considered core WWARN analyses are tagged “wwarn_core”.

      Where? Not in the WWARN_Papers.ris file.

    33. At least two of these papers contributed directly to a change in WHO treatment recommendations25.

      research affecting policy

    34. “Suddenly, eighty percent of people wanted to be part of something they couldn't do on their own, because they saw the real value there.”

      incentives

    35. the network changed tack, focusing on academic research rather than real-time tracking of resistance

      change of course by readjusting goals

    36. DP: dihydroartemisinin-piperaquine; AS-AQ: artesunate –amodiaquine; AL: artemether-lumefantrine; WHO/TDR

      Not sure the abbreviations were necessary here, given that they are rarely used in the text, and there is enough space in the table, which might be more easily parsable without them.

    37. 25,000

      Any info on the extent to which these patient populations overlap across those studies? I had the same question for the "135,000 individual patients" mentioned in the abstract, but understood that such detail would not fit into an abstract, so did not comment there.

    38. WWARN in most cases chose options that would maximise the likelihood that researchers would contribute data

      that's natural for many databases at the beginning of their existence

    39. necessary

      but not for ever; some temporal component would be valuable here

    40. real-time surveillance

      one of the original goals

    41. free sequencing had been offered to data contributors

      incentives

    42. Data scientists and informatics staff favoured a clear description of the end-use of the database at the design stage; malaria scientists, wishing to maximise the possible uses of this as yet untested resource, preferred to avoid any definition that would foreclose possible future uses. They advocated maximum flexibility, and resisted a tight, purpose-driven design.

      diversity of perspectives again

    43. “It is true that WHO also influenced NMCPs [national malaria control programmes] not to share the data.”

      sad

    44. Virtually all interviewees

      An actual number may have been more useful here. In other words, did only 1 not say it? Or some only said it indirectly?

    45. Researchers were also concerned that the curation process might reveal weaknesses in the data, potentially calling into question published analyses. Interviewees in industry were most worried that re-analysis might yield results that differed slightly from those used in product registration, while some policy-makers had concerns over data ownership.

      well-captured diversity of perspectives

    46. not data contributor

      important to get those voices heard here as well

    47. By far the most common reason given, especially among respondents from malaria-endemic countries, was the fear that WWARN researchers based at Oxford or other well-resourced universities would analyse their data and publish results before they themselves had time to get out of the clinic and write up their findings.

      That's a slowed-down version of the speed argument from above.

    48. The earliest Terms of Submission

      Now not available any more. Public versioning would have been helpful.

    49. This was in part because of the considerable time that it took WWARN secretariat staff to persuade Oxford University lawyers that seven pages of often arcane legal language could be streamlined into a three-page document in plain English, understandable to malaria researchers worldwide.

      And even those three pages use non-standard terms, which requires more lawyers to assess the compatibility of these terms with those of any other database one might wish to combine with WWARN data.

    50. Demographic and Health Survey
    51. one for personal reasons and two because of difficulties getting visas to the UK

      Kudos for being explicit about this. Visa issues affect research in multiple ways, and in cases like these, the information about even the existence of the problem is largely hidden.

    52. Some 46%

      i.e. 22

    53. Table 1. Characteristics of people interviewed for this study.

      The thumbnail preview of the table is irritating in that it suggests there are only three content rows in the table, when there are several times as many.

    54. 18 of these were publications analysing data contained in the WWARN database, which we term "core" WWARN publications.

      Neither from the WWARN_Papers.ris nor from the rest of the paper, I could infer which these 18 "core" publications were, which would help in understanding the reserch presented here.

    55. 77 papers

      further down, the number 78 is given, and it's not entirely clear to me whether the number should be the same in both places.

    56. approximately 13% of all documents into NVivo for detailed coding

      proprietary_documents_coded.tab lists 69 files

    57. More details related to methods are provided in COREQ file at doi: 10.7910/DVN/V1TKIO20, which follows the COREQ guidelines for reporting qualitative research23.

      That COREQ file is very useful for human consumption, but it would be useful if it were machine actionable as well, which would allow, for instance, to discover it more easily when searching for studies involving, say, native Thai speakers, or NVivo 11.3.2. Just saying - I understand this is beyond the scope of the current paper, but I'm involved in efforts to make data management plans machine actionable, which means dealing with these same issues.

      As an aside, the COREQ file states (at the very bottom):

      For reasons of length, the paper reports only the major themes.

      I don't know of restrictions in Wellcome Open Research in terms of article length, and in the paper's introduction, the authors had stated

      here we focus on findings we believe to be of greatest interest to researchers who share data, or are contemplating doing so.

      which seems a more valid reason to leave out some minor themes.

    58. OxTREC Reference: 593-16

      Kudos for listing the identifier. I googled that, which brought me to http://researchsupport.admin.ox.ac.uk/governance/ethics/committees/oxtrec , and under "Approved studies 2016", I found the entry

      593-16 Elizabeth Pisani What makes data sharing work? WWARN case study Minimal Risk N/A 16/5/16

      which is way more transparent than most ethics statements in published papers.

      Also kudos to the Central University Research Ethics Committee (CUREC) for the decision highlighted under the "Approved studies 2016" headline:

      It was agreed by CUREC in early 2016 that, in the interest of transparency, OxTREC should make publicly available a list of studies that it has approved. The list below sets out those studies that have been approved by OxTREC since January 2016.

  2. Sep 2017
    1. 10.7910/DVN/V1TKIO

      It might well have been better to get each of these documents their own unique identifier rather than bundling them all together under this DOI.

    2. NVivo software Version 11.3.2. (QSR International)

      It would be great if software were cited according to the Software Citation Principles: https://doi.org/10.7717/peerj-cs.86 .

    3. All participants signed forms consenting to the recording of the seminar, and the use of the data for the purposes of this study

      Again, the forms are included in Study_protocol.docx , which is probably worth mentioning.

    4. English, French or Indonesian

      kudos for multilinguality

    5. protocol available at doi: 10.7910/DVN/V1TKIO20

      kudos for citing (rather than just mentioning) the dataset and especially for including the consent forms (they are in Study_protocol.docx)

    6. database

      Is that available somewhere? That curation effort - if shared - could well form the basis of reuse. For an example, see https://github.com/Daniel-Mietchen/ideas/issues/491 .

    7. Documents considered to be highly relevant by either investigator were processed as below.

      Somewhere near here, it would be good to mention that the list of historic records included in the analysis is available as proprietary_documents_coded.tab from the data supplement.

    8. Public Health Research Data Forum
    9. patient and parasite-level data

      what about the vectors?

    10. was

      and still is, I presume

    11. WorldWide Antimalarial Resistance Network (WWARN)

      Since they do have a website, might as well mention it: http://www.wwarn.org/

    12. rests on the belief that data shared will become data reused

      That's not a strong foundation to rest on - sharing data does not mean that reuse will occur. For some data, no reuse will occur within a given time frame, while for other data, reuse may well occur even when the way it was shared did not make reuse easy.

      The point here should rather be that sharing data with reuse in mind (e.g. following the FAIR principles) can facilitate reuse, and such reuse can contribute to "speeding up discoveries". When speed is a concern, the timeliness of sharing should also be considered.

    13. WAARN

      This should be "WWARN".

  3. Aug 2017
  4. Nov 2016
    1. This will give you html results tables in the project folder.

      This took about 15 min (leading to HTML for 609 papers).

    2. norma --project <your folder of papers> -i fulltext.xml -o scholarly.html --transform nlm2html

      I used

      norma --project zika-20161116 -i fulltext.xml -o scholarly.html --transform nlm2html

      This took about 5min (for 666 papers).

    3. form

      Typo

    4. example

      Another example:

      getpapers -q zika -o zika-20161116 -x info: Searching using eupmc API warn: We had to retry the last request 2 times. info: Found 683 open access results Retrieving results [==============================] 100% (eta 0.0s) info: Done collecting results info: Saving result metadata info: Full EUPMC result metadata written to eupmc_results.json info: Individual EUPMC result metadata records written info: Extracting fulltext HTML URL list (may not be available for all articles) info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt warn: Article with pmcid "PMC4931661" was not Open Access (therefore no XML) warn: Article with pmcid "PMC5038420" was not Open Access (therefore no XML) warn: Article with pmcid "PMC5038412" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4998506" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4800905" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4800906" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4998507" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4773937" was not Open Access (therefore no XML) warn: Article with pmcid "PMC5058489" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4344295" was not Open Access (therefore no XML) warn: Article with pmcid "PMC5018718" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4773938" was not Open Access (therefore no XML) warn: Article with pmcid "PMC5054515" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4928527" was not Open Access (therefore no XML) warn: Article with pmcid "PMC4991975" was not Open Access (therefore no XML) warn: Article with pmcid "PMC3854913" was not Open Access (therefore no XML) warn: Article with pmcid "PMC3019486" was not Open Access (therefore no XML) info: Got XML URLs for 666 out of 683 results info: Downloading fulltext XML files Downloading files [==============================] 100% (666/666) [45.6s elapsed, eta 0.0] info: All downloads succeeded!

    5. getpapers -q <your query> -o <a folder to save them in> -x

      When I ran this, I got

      -bash: getpapers: command not found

      even though I had just installed getpapers successfully as per the getpapers section above.

      Re-running

      npm install --global getpapers

      fixed this.

    6. from github (the .zip) here

      Should be consistent to norma above.

    7. Add the bin directory that you unzipped to your path

      i.e. something like

      export PATH=$PATH:~/contentmine/bin

    8. .zip

      This should be linked. I found two .zip files: https://github.com/ContentMine/norma/releases/download/v0.2.26/norma-0.1-SNAPSHOT-bin.zip and https://github.com/ContentMine/norma/archive/v0.2.26.zip . Since this is under the norma heading and referred to as binary release, I assume I should go for the former of these two.

    9. ami

      link to explanation

    10. norma

      again, some link to an explanation would be useful

    11. G

      Consistent spelling makes it easier to find one's way around.

    12. curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.31.0/install.sh | bash nvm install node

      This worked for me, though it required to restart the terminal.

    13. Node

      Would be useful to have some description of what Node is and does and what role it plays in getpapers.

    14. getpapers

      Would be useful if this linked to some description of getpapers

    1. POLICY

      The policy does not mention metadata, but it should, and metadata should basically be shared on an ongoing basis rather than just at the end of the project.

    2. Studies funded through all other research Funding Announcements

      Why not require these to go into a repository as above?

    3. Maintenance of the Full Data Package for Data Sharing for a period of at least seven (7) years;

      Below comes the requirement for the Full Data Package to be deposited in a repository with long-term archiving, so perhaps clarify here that, for some projects, the package has to be additionally maintained locally for at least seven years.

    4. The Full Data Package must be maintained in a selected repository for a period of at least seven (7) years following acceptance by PCORI of the final research report.

      Perhaps worth thinking about scenarios where that repository might disappear before those seven years (or even after) and to put these thoughts into some sort of guideline that could be referenced here.

    5. PCORI will make the Full Protocol publicly available

      Why not the final progress report as well?

    6. upon completion of the study

      at the latest

    7. The plan should include at a minimum each of the components specified below

      The policy could encourage these components to be identified in a machine-actionable manner, as outlined in http://www.slideshare.net/StephanieSimms/making-dmps-actionable-and-public

    8. patient consent

      and ethical review/ approval

    9. a documented data management plan

      Probably worth to explicitly mention that the plan should be updated as the project moves forward, and be maintained with a complete version history.

    10. As outlined below, such plan will be required if an award is made.

      Here, the timing should be indicated in some way. For instance, the data sharing plan should be produced "in Month 1" or "by Month 3" of the project at the latest.

    11. will not be required at the application stage.

      Good move.

    12. Analyzable

      It's not really analyzable if not shared with the protocols and code that produced it. I see why you'd want to make a distinction to the Full Data Package, so I suggest to use a different qualifier word for the data set.

    1. ZIKV homologs of drug targets that have been well-validated in research against dengue and hepatitis C viruses, such as NS5 and Glycoprotein E

      Entity names (e.g. for species, proteins, drugs) could all be communicated more effectively if they were linked to a controlled vocabulary with persistent identifiers.

    2. Table of protein structures, PDB, and models to be used as docking targets.

      The actual core of the project. As is all too common, it is buried here in the supplement, when it should be in the center of attention.

    3. DOCX

      Not a great format for sharing information.

    4. Information on OpenZika.

      A caption with a bit more detail would be more helpful to the reader to decide whether to download the file.

    5. Supporting Information

      As mentioned in comments above, I think this article would have benefited from the supporting information being integrated into the main body.

    6. Concern around intellectual property ownership and whether companies will develop drugs coming from effort

      Would have liked to see this discussed a bit.

    7. Over 715,000 volunteers

      Some of them can be found in the forums, e.g. at http://www.worldcommunitygrid.org/forums/wcg/listthreads?forum=720 .

      One volunteer describes his motivation for contributing in this video: https://www.youtube.com/watch?v=rqEvcFm7pP8

    8. We invite any interested researcher to join us (send us your models or volunteer to assay the candidates we identify through this effort against any of the flaviviruses),

      Collaboration. Reminds me of

      What if everyone in the world were in your lab – a ‘hive mind’ of sorts, but composed of countless creative intellects rather than mindless worker ants, and one in which resources, reagents and effort could be shared, along with ideas, in a manner not dictated by institutional and geographical constraints?

      from https://doi.org/10.1242/dmm.003285

    9. machine learning models (S1 Text, S1 References)

      Those machine learning models are not explained in the supplementary files, but some examples of their use in drug discovery are available here in PLOS NTD, e.g. via http://dx.doi.org/10.1371/journal.pntd.0003878 .

    10. as well as the FDA-approved drugs and the NIH clinical collection, using AutoDock Vina and the homology models and crystal structures (S1 Table, S1 Text, S1 References)

      Citing these three supplementary files all in a row here does not help the reader in finding what information they are referenced for.

    11. docking against the crystal structure of a related protein from a different pathogen can sometimes discover novel hits against the pathogen of interest [8].

      Another argument for data sharing.

    12. The computational and experimental data produced will be published as quickly as possible.

      This declaration of intent fits well with a similar statement made by numerous funders and publishers in February: http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-emergencies/index.htm .

      Would be good to see the data management plan for OpenZika.

    13. OpenZika results are also available upon request

      Why only upon request and not in a more general way?

    14. will be made open to the public on our website (http://openzika.ufg.br/experiments/#tab-id-7

      This has actually started to happen - watch out for "Results" tabs on that page.

    15. wasted

      I don't think that's the right word here; if the computer is dormant, it consumes less energy, and if it is dormant a lot, it may be available for a longer period of time for its primary purpose, which may not require permanent on-time.

    16. We have already prepared the docking input files for ~6 million compounds from ZINC (i.e., the libraries that ALP previously used in the GO Fight Against Malaria project on World Community Grid), which are currently being used in the initial set of virtual screens on OpenZika.

      Nice example of reuse.

    17. S2 Table

      This table too would have been more useful here in the main text than in the supplement.

    18. Download

      Figures could be more reusable if they were shared as SVG rather than in bitmap formats.

    19. Competing interests

      My bias: I have been running the OpenZika software on my machine since May.

    20. two cryo-EM structures and 16 crystal structures of five target classes (S1 Table). These structures, alongside the homology models, represent potential starting points for docking-based virtual screening campaigns to help find molecules that are predicted to have high affinity with ZIKV proteins.

      Here, it would have been nice to have a synthesis of the insights gained from those 18 structures in terms of (accelerating) Zika virus drug discovery.

    21. S1 Table)

      Again, relegating this to the supplement is not helpful.

    22. using freely available software [6] (S1 Table). These were made available online on March 3, 2016. We also predicted the site of glycosylation of glycoprotein E as Asn154, which was recently experimentally verified [7].

      Different ways of sharing along the research cycle. We need more of that.

    23. Abstract

      I could not highlight the CrossMark symbol but when I clicked it, I got "Document is current".

    24. Subject Areas

      These could do with links to a controlled vocabulary with persistent identifiers.

    25. Ekins S, Perryman AL, Horta Andrade C

      The ancient practice of abbreviating author names could well do with an update.

    26. Creative Commons Attribution License,

      Mentioning the version (4.0, as per the link) would be useful here.

    27. OpenZika: An IBM World Community Grid Project to Accelerate Zika Virus Drug Discovery

      This is my first attempt at using Hypothes.is for a journal club.

      I am reading the article with a focus on aspects of data sharing and scholarly communication around Zika virus.

    28. (S1 References

      Is relegating these references into that additional file really useful?

    29. I

      Shouldn't this be "we"?

    30. From Innovation to Application

      The article type "From Innovation to Application" is described at http://journals.plos.org/plosntds/s/other-article-types#loc-from-innovation-to-application :

      These short articles (1,000 words, 10 references) discuss new technologies, such as drugs, vaccines, and diagnostics, relevant to NTDs. Authors are asked to take an objective and critical view, and they should include a box that lists up to 3 advantages and 3 disadvantages of the new technology. We will ask for a second box or table depending on what kind of tool is described (for example, if the tool is a new diagnostic tool, we will ask for a table that gives the sensitivity and specificity of the new tool compared with the existing gold standard). Authors with competing interests related to the technology (e.g., financial ties) will not be allowed to write for this section. We encourage all authors to include a display item (a figure, photo, or illustration).

  5. Aug 2016
    1. 4

      This number 4 seems to appear out of nowhere. Either drop it or make it more obvious what its role is supposed to be. If properly engineered, persistent identifiers for these pages could be a lightweight tool to introduce the concept of open PIDs, as per https://hypothes.is/a/1Ta8dmkGEeaVvnsT43LJqA .

    2. Can't wait? Contact us!

      Somewhere on the page, there should be a license statement for it.

    3. Martin Fisch - The audience is shaking

      Linking and attribution is good, but the license should be mentioned as well.

  6. Apr 2016
    1. 39 Comments

      It's good to have so many comments (and please keep them coming), but they are not necessariy efficient to handle that way.

      Many comments address multiple passages or elements of the draft, but where a mapping between comments and text passages is possible, I propose that we use annotations like https://hyp.is/AVQW11OTH9ZO4OKSlvBo/wiki.surfnet.nl/display/OSCFA/Amsterdam+Call+for+Action+on+Open+Science to do that mapping.

    2. Open science is about the way researchers work, collaborate, interact, share resources and disseminate results.
    1. new solutions for societal challenges

      If TDM is limited to non-commercial and/or academic use, then addressing societal challenges basically excludes contributions from outside the non-commercial and/ or academic realm, i.e. from large parts of society.

    2. preferably

      Delete "preferably". Limiting the scope of text mining to exclude societal and commercial purposes limits the usefulness to enterprises (especially SMEs that cannot mine on their own) as well as to society. These limitations have ramifications in terms of limiting the research questions that researchers can and will pursue.

    3. Encourage researchers not to transfer the copyright on their research outputs before publication.

      This statement is more generally applicable than just to TDM. Besides, "Encourage" is too weak a word here, and from a societal perspective, it would be far better if researchers were to retain their copyright (where it applies), but make their copyrightable works available under open licenses that allow publishers to publish the works, and others to use and reuse it.

    4. preferably

      Delete, as per the comment above on the same word.

  7. Oct 2015