18 Matching Annotations
  1. Jun 2021
    1. Web Application for Discovering Similar Preprints and Journals

      I'm surprised to see no citation given to JANE: https://jane.biosemantics.org/

      https://academic.oup.com/bioinformatics/article/24/5/727/202224

      reviewed in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6300233/

      The 'find journals' functionality of JANE appears somewhat similar to the discovering similar journals functionality here.

    2. text changes

      text changes relative to the journal published version? You might want to make that more explicit. Text changes alone is not adequately specific in my opinion.

    3. Document embeddings derived from bioRxiv reveal fields and subfields

      I would cut this entire subsection from the manuscript to make it shorter (or relegate it to a supplementary section).

      Don't we already know that if one uses full texts we can determine the subfield of the paper? It's not that interesting in my opinion and not relevant to the main hypotheses of the paper -- comparing between preprints and the journal published version.

    4. Most preprints are eventually published

      adjusting for recency? i.e. not sampling 2019 preprints? in figure C the line indicates (if I'm interpreting correctly) that overall only 46.55% are published but that's because it includes very recent preprints that haven't had time to be journal published yet. Just be explicit that you are adjusting for recency (i.e. excluding 2019 and newer preprints) when you say that most preprints are eventually published.

    5. Furthermore, we found that specific changes appeared to be related to journal styles: “figure” was more common in bioRxiv while “fig” was relatively more common in PMCOA

      This is the kind of journal-faff difference that I hypothesise would not be visible or less visible if one did an analysis of preprints vs author manuscripts.

      There is change but is change from figure to 'fig.' to suit journal style actually helpful/valuable? In my opinion it is not!

    6. a limited number of cases in which authors appeared to post preprints after the publication date

      I don't suppose you could possibly be precise about this rather than just 'a limited number'? Is it 5, 50, or 500?

    7. this change will prevent bioRxiv from automatically establishing a link

      hmmm... not a problem of this manuscript, but that's really not good enough from bioRxiv is it? Change one word of a long and complex title and suddenly 'oh, we can't do it'. A comment to suggest that bioRxiv could do better would be fun, no(?) i.e. look at author names AND title and if both are similar enough, then do auto-linking.

    8. 100 most frequently occurring tokens

      I'm sure you've got this in the github, but just to make the manuscript more readily understandable without digging around in github, do you think you could provide as a supplementary file a list of those 100 most frequently occurring tokens, so that people can get a better feel for what the data is here?

    9. 326 spaCy-provided stopwords

      To be clear 326 stopwords is the default setting?

      Interestingly 'ca' is one of those 326 stopwords. I would have thought that one might actually be significant in a life sciences context e.g. calcium channels "Ca2+"

    10. 23,271 preprint-published pairs, only 17,952 pairs

      From the perspective of a person (me!) interested in open access to ('final') peer-reviewed research outputs this is a super interesting result in itself, which should perhaps be remarked upon more in this manuscript.

      It implies that over 77% (17,952/23,271) of biomedical preprints that are detectably linked to a journal published paper, that subsequent journal published paper became open access in the PMCOA corpus (regardless of specific means/route). That's great news. The subset of works from biomedical researchers that do preprinting have a much higher level of open access (to the eventual journal published version) than biomedical research overall (including works that don't have a preprint version)

      See figure 3a from 'Open access levels: a quantitative exploration using Web of Science and oaDOI data' by Bosman and Kramer for a comparator looking at OA levels in biomedical and life science papers https://peerj.com/preprints/3520.pdf even in the 'best' OA performing subfield (Cell Biology) it doesnt reach 70%. 30% to 50% is more typical albeit looking at 2016 publications.

    11. Access to the New York Times Annotated Corpus (NYTAC) is available upon request with the Linguistic Data Consortium at https://catalog.ldc.upenn.edu/LDC2008T19.

      I think this is insufficient information.

      It should be more clearly highlighted that the NYTAC is proprietary data and it may require a fee of $150-300 to be paid to access, if a non-member of the Linguistic Data Consortium. To say merely “is available upon request” and nothing else is not quite true to my eyes - please warn that it may require payment to access, depending on one's institutional affiliation (or lack thereof).

    12. tagged

      minor typo: tagging surely

    13. Since these manuscripts have already been peer-reviewed, we excluded them from our analysis as the scope of our work is focused on examining the beginning and end of a preprint’s life cycle

      Hmmm... this is a pity. Accepted Author Manuscripts (AAMs) in one sense do represent the 'end' of a preprint's life cycle - they are the final version after peer review but before minor copyediting and journal re-styling. A comparison between preprints and AAMs would more closely/strictly measure the contribution of the peer-review process, independent of the copy-editing and minor re-styling. But hey, too late to change that decision. One for future analyses perhaps?

    14. or not participate at all

      presuming a journal allows individual articles to be published with a CC BY licence under a so-called 'hybrid-OA' option, can a journal really NOT participate for those CC BY licenced articles? If biomedically relevant and CC BY licensed surely PMC takes that content at the article level and thus its debatable as to whether journals really have the power to actually 100% not participate.

    15. PMC articles can be closed access ones from research funded by the NIH appearing after an embargo period or be published under Gold Open Access [40]

      actually, there need not necessarily be an embargo period. Many publishers now offer a zero-day embargo so that the author accepted manuscript can be deposited either at acceptance (before even journal publication!) or on the day of journal publication. Even if the journal normally tries to embargo the work, you can see some full text author manuscripts become immediately available well before the journal would normally permit them 'out' thanks to the Plan S Rights Retention Strategy e.g. this one here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7610590/

      So what you should really say here is that full text works appear in PMC as either accepted author manuscripts (green open access) or via open access publishing at the journal (gold open access).

      BTW, I resent calling it a 'closed access' [article?] if the accepted manuscript is fully freely available -- that would seem to give undue primacy to the journal published version. It's an article with different versions - one freely accessible at a repository e.g. PMC, without publisher branding and another behind a paywall at the publisher website with publisher branding

    16. As there were very few withdrawn preprints, we did not treat these as a special case

      but to clarify, you did remove these from the analysis, right? It would just be good to clarify that. They are easy to identify and should just be removed. I can't see how they would add anything but noise to this analysis. What is the total number of preprints after withdrawn preprints are removed from the sample?

    17. an analysis of preprints posted at the beginning of 2020 revealed that most underwent minor changes as they were published [25]

      I think this needs to be made more specific as [25] analysed a few different things.

      Your statement here is true with respect to their analysis of abstract text "Over 50% of abstracts had changes that minorly altered, strengthened, or softened the main conclusions"

      BUT

      it is not true with respect to the panels and tables analysis in [25]:

      "over 70% of 162 published preprints were classified with “no change” or superficial rearrangements to panels and 163 tables, confirming the previous conclusion"

      thus perhaps you should consider writing something like:

      an analysis of preprints posted at the beginning of 2020 revealed that over 50% underwent minor changes in the abstract text as they were published, but over 70% had 'no change' or only superficial rearrangements to panels and tables [25].

  2. Aug 2015
    1. Open access and altmetrics: Distinct but complementary

      My paper is actually open access at the original journal site, here: https://asis.org/Bulletin/Apr-13/AprMay13_Mounce.html

      Please don't pay to access it from here!

      I had no idea Wiley had any arrangement with ASIST to sell access to my article. It's pretty unethical to sell access to an authors work without their consent if you ask me!