1,744 Matching Annotations
  1. Aug 2016
    1. Some of the libraries contributing to the foundation of Slicer were designed in close collaboration and often share the same developer community. These libraries, including CMake, ITK, VTK and CTK, are distributed as part of the National Alliance for Medical Image Computing (NA-MIC) Kit [153], which are actively supported by the NA-MIC research community28. Many popular packages, e.g., ANTs, MindBoggle, ITK-SNAP, DTIPrep, and MITK are also based on the NA-MIC Kit. NIPY29 and NeuroDebian30 are another two major research communities for neuroimaging research and platform development. To promote open science, neuroimaging tools and resources are always shared to other community members, usually through the INCF31 and NITRC32 forums.
    1. This data reduction—taking a rich, multivariate dataset and summarizing it for publication using measures of central tendency, confidence intervals, p-values, and effect sizes—removes the opportunity for future scientists to apply new algorithms, methods, and transdisciplinary ideas that could yield unforeseen insights and discoveries

      I think this is one of the most powerful arguments. Data science is just taking off; to limit future possibilities of re-analysis seems enormously wasteful. I'm also very swayed by the work of Adam Ferguson and Jessica Nielson on the "syndromic space", that is, combining across all studies fills in a picture of a complex phenomenon better than any single study can. But only if the data - all of it- are available.

    2. I outline the overlooked benefits of data sharing: novel remixing and combining as well as bias minimization and meta-analysis.

      It's not that they have been overlooked; this is always one of the main arguments advanced. It's that we've not had great evidence to back such claims up. And we can't get it if no one shares.

    3. Such developments are addressing concerns regarding credit and help motivate data curation and contextualization.

      Plus larger efforts to develop and implement a system of data citation.

    4. Rather than arguing for a centralized, large-scale data repository, I am advocating for a more organic development wherein we, institutionally, encourage the growth of a data ecosystem.

      Which is exactly what is developing.

    1. Where datasets are hosted in public repositories that provide datasets with Digital Object Identifiers (DOIs), we encourage these datasets to be formally cited in reference lists. Citations of datasets, when they appear in the reference list, should include the minimum information recommended by DataCite and follow journal style. For example:Hao, Z., AghaKouchak, A., Nakhjiri, N., Farahmand, A. Global Integrated Drought Monitoring and Prediction System (GIDMaPS) Data sets. figshare. http://dx.doi.org/10.6084/m9.figshare.853801 (2014)
    1. Q1:  Why should repositories care about data citation?  We already ask authors to cite an article that describes our resource.  Isn’t that good enough?
    2. Q4:  What is a globally unique web-resolvable identifier and why is it important?

      Recommendations on data set identifiers.

    3. Q7:  Should I put a version number (i.e. semantic information) in the identifier?
    4. Q6: How should I handle landing pages for datasets with multiple versions?
    5. Q9:   How do I handle a data set  made of multiple parts?  Do all the parts need separate identifiers?

      Recommendations on handling aggregate data sets. Also covered by publisher FAQs.

    1. The discovery of this new lesion might call some of their conclusions about the functions of the medial temporal lobes into question and require a re-­examination of all that old data.

      Reminds me of the arguments about patient NA.

    2. Yeah, but it’s not peer-­reviewed, for one thing. That’s important. The stuff that’s published is good stuff. Peer-­reviewed. You can believe it.

      Again, I feel that this piece paints Sue Corkin in such a bad light that I would want independent corroboration that she embodies the worst aspects of science. She is not alive to defend herself.

    3. Nobody’s gonna look at them.Me: Really? I can’t imagine shredding the files of the most important research subject in history. Why would you do that?Corkin: Well, you can’t just take one test on one day and draw conclusions about it. That’s a very dangerous thing to do.

      The usual arguments against data sharing. But this is a travesty.

    4. I had known Corkin since I was a kid. Remember her friend, the surgeon’s daughter, the one she tin-can-­telephoned with as a little girl? That’s my mom. The surgeon who performed the experimental operation on Henry was my grandfather. When I was growing up, Corkin was a staple at my mom’s dinner parties. We had met many times before.

      Interesting to introduce this now. This is not an objective piece. I didn't realize that this was excerpted from a book and was not a journalistic piece.

    5. because she became such a dominant authority figure in his life.

      Interesting, because it wasn't a clinical relationship. It was a scientist-research subject relationship. I find that disturbing now, although as a student of neuropsychology way back when (did my undergraduate thesis with one of Sue Corkin's post doctoral fellows), I didn't question it.

    6. Even as a nonscientist, I couldn’t help noticing that some of the unpublished data I came across while reporting my book went against the grain of the established narrative of Patient H.M. For example, unpublished parts of a three-page psychological assessment of Henry provided evidence that even before the operation that transformed Henry Molaison into the amnesiac Patient H.M., his memory was already severely impaired.

      Wow.

    7. I don’t think scientists would say that.

      Yes, we would.

    8. But you know, even what’s published — as you know, if you look at the papers, in some sense each paper is just the tip of the iceberg of the work that was done, and the work that was done — all that data floating around underneath — it seems to me that so much of that would be valuable to preserve. That people really may want to go back and review —

      Unbelievable, but apparently the arguments going around about open science haven't penetrated into the halls of MIT.

    9. This would make it easier for scientists to continue their analysis, mining Henry’s brain for any last revelations it contained.

      No, the data should be on-line for everyone. A few scientists have exploited HM for long enough.

    10. I had tracked down and spoken with Henry’s closest living relatives, and some were surprised and disturbed to learn about the things Corkin and her colleagues did with their cousin while he was alive and about the fight over his brain that took place after his death.

      At last, someone talking about the humanity of HM.

    11. I asked Corkin whether she was aware that when Mooney became Henry’s conservator, one of Henry’s first cousins, Frank Molaison, was living nearby — his actual next of kin — and had not been consulted. I mentioned that his name should have made him particularly easy to find.

      Oh my!

    12. allow U.C.-­Davis certain rights to use and distribute the tissue owned by M.I.T. and M.G.H.” As far as Corkin was concerned, she and her colleagues owned Henry’s brain, period, and Annese had no say in the matter whatsoever.

      Who agreed to donate HM's brain to MIT? Give the relationship of Corkin to HM, could he make an informed decision? Where was his family?

      See later section for the answer to this.

    13. The paperwork — the document she passed around the table at the meeting in New York — was only one page, and the crucial part took up just two sentences: “I, Thomas F. Mooney, am the court-­appointed guardian of the person of Henry G. Molaison. I also presently am Henry G. Molaison’s closest living next of kin, and as such I am entitled by law to control Henry G. Molaison’s remains upon his death.” The lines were followed by a signature and a date: Dec. 19, 1992. Corkin had arranged for Mooney to apply to become Henry’s conservator earlier that year. A probate-court judge, taking Mooney to be Henry’s closest relative, approved the conservatorship. One of Mooney’s first acts as conservator was to donate Henry’s brain to Corkin and her colleagues. He also consented, with Henry’s assent, to the continuation of the experiments Corkin wished to conduct on Henry while he remained alive. The problem was, Mooney wasn’t actually Henry’s next of kin.

      Section on who donated the brain to MIT.

    14. His ego had given him a sense of entitlement.

      How does the author know this?

    15. The frontal lesion would stay in the paper, but it wouldn’t be featured as prominently as it was in earlier drafts.

      The lesion is either there or it isn't. It was either recent or it wasn't. The evidence should be presented (as well as all the data).

    16. (we ascertained it was present even in the 1992–93 M.R.I. scans)

      OK. So scans were done.

    17. As one of the paper’s anonymous peer reviewers pointed out, “much of the neuropsychological literature on H.M. has made the case that so-­called frontal function was intact.”

      Why wasn't his brain scanned regularly?

    18. Because it’s a decision by the people who own the tissue.

      This sentence makes my skin crawl.

    19. Annese was wary of handing over data before they had an agreement for how his work would be credited.

      Of course the work should be credited, but I cannot believe that any foundation that would fund this would not have done so on the condition that the data belonged to neuroscience. If HM willed his brain to science, it was not to advance the careers of scientists but to help scientists understand the brain, and perhaps help people like him.

    20. Back in San Diego, Annese kicked off what he dubbed Project H.M. with what was possibly the most successful publicity stunt in the history of neuroscience.

      Sure to raise the ire of scientists everywhere, reasonably or not.

    21. but my reporting also eventually raised serious questions about Henry’s treatment after he left the operating room, during the decades he spent as a human research subject, as well as in the eight years that have passed since his death.

      Yes. Such issues never came up in classes, but in the wake of the Henrietta Lacks story (interesting that both are variants of Henry), I hope that when being taught about HM, I would question his treatment. I certainly object that any institution could exert exclusive rights over access to a human being.

    22. but I abandoned that effort after Corkin presented me with a confidentiality agreement stating that M.I.T. would allow me access to the “research project entitled ‘The Amnesic Patient H.M.’ ” only if the university had editorial control over anything I intended to publish.

      Why would MIT have control over access to a human being? This story is disturbing on many levels.

    23. Later, reflecting on that moment, Corkin could think of only one word to describe her feelings. She was, she wrote, “ecstatic.”

      See previous comment. I cannot believe this, particularly in the age of high resolution brain imaging. Why would one be ecstatic about an autopsy?

    24. Now she was having one last encounter that only she would remember. The men carefully pulled out Henry’s brain, and Corkin gazed at it through the glass, marveling at this object she had spent her career considering at one step removed.

      I would like to hear another side of this. I can't believe she was that cold-hearted or dispassionate.

    25. That paper documented Henry’s gradual improvement over a three-day period on a difficult hand-eye coordination task.

      Skill learning or what came to be known as procedural memory.

    26. In that pantheon of illuminatingly broken men and women, Henry stands apart. It is difficult to exaggerate the impact he has had on our understanding of ourselves.

      Learning about patient HM as well as Phineus Gage was one of the main reasons I became a neuroscientist.

    27. Corkin built much of her subsequent career on the back of her privileged access to Henry and became both his gatekeeper — fielding requests from other scientists who wished to meet him — and his chief inquisitor.

      What ethics board or family member determines this?

    28. She grew up across the street from the neurosurgeon who operated on him and was close friends with the surgeon’s daughter.

      I don't think I ever read that before or heard Sue Corkin mention that.

    1. (e.g., misleading or inaccurate analyses and analyses aimed at unfairly discrediting or undermining the original publication)

      I don't see what stops this from happening without data sharing.

    2. Furthermore, a mechanism will be needed to fund the data-preparation activities necessary for data sharing in such a way as to protect confidentiality and ensure data integrity
    3. minimum of 2 years after publication

      This figure has been cited elsewhere as the average time over which the data are interesting to the producers. I think during that time, the producers could receive co-authorship, but I don't think the data should be embargoed for that long.

    4. The ICMJE proposal may also lead some investigators to delay publishing their primary trial results to allow time to prepare several secondary manuscripts. Delay or failure to publish the primary results of trials is already a substantial problem.3

      That would be a consequence of our currently skewed reward system.

    5. Once the investigators who have conducted the trial no longer have exclusive access to the data, they will effectively be competing with people who have not contributed to the substantial efforts and often years of work required to conduct the trial.

      Again, this suggests that the only worthwhile thing is the analysis of data, when in fact, gathering the data set and PUBLISHING it, through which it can be duly credited, should be rewarded.

    6. A key motivation for investigators to conduct RCTs is the ability to publish not only the primary trial report, but also major secondary articles based on the trial data. The original investigators almost always intend to undertake additional analyses of the data and explore new hypotheses.

      Ummm, what about the patients and the public who may need these results for other reasons, oh, I don't know, a life saving treatment?

    7. Adequate incentives for researchers to invest the substantial time and effort required to conduct RCTs and to publish the results in a timely fashion are important.

      Aren't they paid to do this? I do think that researchers should be compensated based on the difficulty of performing the trial; not all trials take the same amount of effort. But the compensation should be for doing the trial and releasing the data.

    8. occasional new discoveries

      Well, that is dripping with contempt.

    1. The costs associated with preparing data for sharing can and should be built into the grants, cooperative agreements, and contracts that researchers negotiate with trial sponsors; in other words, expenses associated with administering data-sharing protocols must be treated as a standard, necessary aspect of the costs of carrying out a clinical trial.

      Rational response to the "who is going to pay for it" argument.

    1. new open layer

      A definition of what we are trying to achieve in Patrick Johnston's blog

    1. 40 scholarly

      It's up to 70 now! We've also established a working group at FORCE11

    2. the result is a standard way to enable comments and conversations in context, in a way that is not bound to a specific platform, publisher, or encoding, and in a way that makes it simple and intuitive for people to talk to people and learn from each other, and build vibrant communities that extend human knowledge.

      Definition of open, web based annotation. Should use this for the AAK website.

    3. While 'free' in the commercial sense, they silo our interactions in exact opposition to the principles on which the Web was founded, namely the democratization of knowledge.

      Important to recognize that free isn't always open.

  2. repscience2016.research-infrastructures.eu repscience2016.research-infrastructures.eu
    1. This Workshop aims at becoming a forum to discuss ideas and advancements towards the revision of current scientific communication practices in order to support Open Science, introduce novel evaluation schemes, and enable reproducibility

      Description

    2. irst International Workshop on Reproducible Open ScienceHannover, Germany, 9th of September 2016
    1. obsolete brain of rodent

      We should have a policy about obsolete classes. How do we want to handle them? On the one hand, I think we want to expose where concepts have been discredited and therefore retired, along with provenance information as to why it was retired.. On the other hand, I'm not sure we want all these showing up in the main hierarchy. At the very least, I don't think they should show up as "obsolete brain of rodent"; rather I suspect it should be "brain of rodent: (deprecated)"

    1. more constructively for creative acts rather than consumptive ones

      Interesting, because the surfing the internet has sucked away all the time I used to spend on creating things like paintings and crafts. I've had to deliberately go back to setting time aside for this type of creativity.

  3. Jul 2016
    1. "It could then develop a global consensus eventually and hopefully could serve as a standard for research in the future."

      How about now? Why does this require either consensus or time?

    2. reventing bio-piracy raises challenges of its own. For instance Brazil has been writing new rules for taking samples outside the country. But there are reports that's leaving some researchers in limbo as they wait for the rules to be finalized.

      But this is an obvious response to the behavior of the pirates. And we all lose.

    3. Earlier this year some researchers from Brazil — the epicenter of the outbreak — did the public-spirited thing by uploading Zika virus genome sequences they had produced to an online public database. Soon after, scientists from Slovenia used that data in a paper they published in the New England Journal of Medicine without sharing credit.

      Was it deliberate or just an omission? Good case study for whether such things happen. Also good argument for formal system of data citation.

    4. "If you want really to have a rapid public health response you have to make sure that that data is available as soon as it's known," says Heymann. "And that means in the country."

      Yes. An argument for the methods promoted by the Principles of the Commons.

    5. But in fact, there were plenty of other researchers going into West Africa to do genetic sequencing. A lot of those scientists were simply waiting on prestigious journals to publish their findings.

      I cannot believe that a scientist would do this in the face of a global health crisis. Can anyone save my faith in science?

    6. Sometimes it's because they want to publish their results – and medical journals prefer exclusives

      So our current reward system for science leads to deaths and a lot of bad feelings.

    7. They don't coordinate with people fighting the epidemic on the ground — don't even share their discoveries for months, if ever.

      That is a crime, if you ask me.

    8. Critics call them "parachute researchers": Scientists from wealthy nations who swoop in when a puzzling disease breaks out in a developing country.

      New term, but very relevant to the Scholarly Commons.

    1. Post the resulting data online T

      Actually, post in a repository so that we are sure it will be available for a reasonable time (and not just until the post doc leaves!)

    2. By honest I don't mean that you only tell what's true. But you make clear the entire situation. You make clear all the information that is required for somebody else who is intelligent to make up their mind.Richard Feynman

      Feynman does it again! Argument for data publishing.

    1. Notably, too, many did their landmark work in places that some might regard as off the beaten path of science (Alicante, Spain; France’s Ministry of Defense; Danisco’s corporate labs; and Vilnius, Lithuania). And, their seminal papers were often rejected by leading journals—appearing only after considerable delay and in less prominent venues. These observations may not be a coincidence: the settings may have afforded greater freedom to pursue less trendy topics but less support about how to overcome skepticism by journals and reviewers.

      This is a very important point for the Scholarly Commons.

    2. The history also illustrates the growing role in biology of “hypothesis-free” discovery based on big data. The discovery of the CRISPR loci, their biological function, and the tracrRNA all emerged not from wet-bench experiments but from open-ended bioinformatic exploration of large-scale, often public, genomic datasets.

      Nice sentiment and largely but not entirely true; from above: "His advisor had found that the salt concentration of the growth medium appeared to affect the way in which restriction enzymes cut the microbe’s genome, and Mojica set out to characterize the altered fragments. In the first DNA fragment he examined, Mojica found a curious structure—multiple copies of a near-perfect, roughly palindromic, repeated sequence of 30 bases, separated by spacers of roughly 36 bases—that did not resemble any family of repeats known in microbes (Mojica et al., 1993)."

    3. Vergnaud’s efforts to publish their findings met the same resistance as Mojica’s. The paper was rejected from the Proceedings of the National Academy of Sciences, Journal of Bacteriology, Nucleic Acids Research, and Genome Research, before being published in Microbiology on March 1, 2005.

      And another...

    4. Recognizing the importance of the discovery, Mojica sent the paper to Nature. In November 2003, the journal rejected the paper without seeking external review; inexplicably, the editor claimed the key idea was already known. In January 2004, the Proceedings of the National Academy of Sciences decided that the paper lacked sufficient “novelty and importance” to justify sending it out to review. Molecular Microbiology and Nucleic Acid Research rejected the paper in turn. By now desperate and afraid of being scooped, Mojica sent the paper to Journal of Molecular Evolution. After 12 more months of review and revision, the paper reporting CRISPR’s likely function finally appeared on February 1, 2005 (Mojica et al., 2005xIntervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. Mojica, F.J.M., Díez-Villaseñor, C., García-Martínez, J., and Soria, E. J. Mol. Evol. 2005; 60: 174–182Crossref | PubMed | Scopus (402)See all ReferencesMojica et al., 2005).

      Well, well, well. Good anecdote to have handy when arguing that selective journals pick the winners.

    1. PIs were ranked by this aggregate score and split into quintiles.

      Not sure I completely understand this, but isn't it rather self referential? The RCR is good at predicting who will continue to have a good and bad RCR. Perhaps also who will get grants and publish papers that their network cites. But does it translate into a cure for a disease or 50 years from now, are these the foundational ideas that propelled science forward?

    2. Relative Citation Ratio (RCR): A new metric that uses citation rates to measure influence at the article level

      I've annotated the pdf of this article.

    3. scientific productivity

      Not sure this is the correct word; productivity does not equate with influence. Here is my problem with this: unpublished research, whether positive or negative, does no one any good. The goal should be to publish a report of all funded research (which includes publishing the data). Currently, with all of this jockeying for which journal gets what, we slow down and clog that process. And being concerned about whether someone will cite it will only decrease the ROI on research dollars. So all researchers should be expected to publish their results, without regard or worry about the quality of journal they are published in. Preprint servers are ideal for this. Journals can pick from these.

    4. and/or peer networks,

      Don't think you can underestimate this influence.

    5. either because they measure only the average performance of a group of papers (Vinkler, 2003), or because the article of interest is measured against a control group that includes widely varying areas of science

      Nice statement of justification for a new metric (not saying I agree with producing a new metric)

    6. However, high-Impact-Factor journals (JIF ≥ 28) only account for 11% of papers that have an RCR of 3 or above.
    7. , influential publications can be found in virtually all journals

      That's a good statistic to have when fighting the JIF

    8. Though each of the above mentioned methods of quantitation has strengths, accompanying weaknesses limit their utility.

      This approach is endemic to science: take a flawed approach, find the flaws in the previous implementation and propose a new one. But what is not questioned here is the entire premise that citation analysis measures anything.

    9. n recent years, decision-makers have increasingly turned to numerical approaches such as counting first or corresponding author publications, using the impact factor of the journals in which those publications appear, and computing Hirsch or H-index (Hirsch, 2005).

      I'm not sure that the answer to previous flawed metrics is to develop yet another one, which undoubtedly will be flawed and abused as well. Perhaps impact and quality truly defy the development of a single number that captures these.

    1. To perform their test, they downloaded data from old fMRI studies—specifically, information from 499 resting volunteers who were being scanned while not thinking about anything in particular (these scans were intended for use as controls in the original papers).

      Link to article: http://www.pnas.org/content/113/28/7900.full

      This article makes a strong case for data sharing, as almost none of the studies analyzed made their data available.

    2. Computer says: oops

      The computer may be saying "oops", but I suspect that researchers are using stronger language.

    3. Might be good to have "fMRI in the news" section as well.

    1. Finally, we point out the key role that data sharing played in this work and its impact in the future. Although our massive empirical study depended on shared data, it is disappointing that almost none of the published studies have shared their data, neither the original data nor even the 3D statistical maps.

      Strong statement in support of data sharing.

    1. PLoS ONE supporters have a ready answer: start by making any core text that passes peer review for scientific validity alone open to everyone; if scientists do miss the guidance of selective peer review, then they can use recommendation tools and filters (perhaps even commercial ones) to organize the literature — but at least the costs will not be baked into pre-publication charges.

      The model promoted by the Scholarly Commons

    2. And to Eisen, the idea that research is filtered into branded journals before it is published is not a feature but a bug: a wasteful hangover from the days of print.

      That's a better way to put it.

    3. By rejecting papers at the peer-review stage on grounds other than scientific validity, and so guiding the papers into the most appropriate journals, publishers filter the literature and provide signals of prestige to guide readers' attention. Such guidance is essential for researchers struggling to identify which of the millions of articles published each year are worth looking at, publishers argue — and the cost includes this service.

      Oh please! These are not the only filters that we have to judge quality.

    4. Brian Hole, founder and director of the researcher-led Ubiquity Press in London, says that average costs are £200 (US$300). And Binfield says that PeerJ's costs are in the “low hundreds of dollars” per article.

      This seems like a reasonable figure.

    5. Outsell estimates that the average per-article charge for open-access publishers in 2011 was $660.

      Price of open access article in 2011

    6. Analysts estimate profit margins at 20–30% for the industry, so the average cost to the publisher of producing an article is likely to be around $3,500–4,000.

      I was wondering where this figure was coming from. I've often heard it quoted.

    7. These charges and counter-charges have been volleyed back and forth since the open-access idea emerged in the 1990s, but because the industry's finances are largely mysterious, evidence to back up either side has been lacking. Although journal list prices have been rising faster than inflation, the prices that campus libraries actually pay to buy journals are generally hidden by the non-disclosure agreements that they sign. And the true costs that publishers incur to produce their journals are not widely known.

      I hope this is no longer true.

  4. Jun 2016
    1. Nature Methods' Points of Significance column on statistics explains many key statistical and experimental design concepts.

      Collection of Nature papers: Statistics for biologists

    1. Throughout this paper, we use the phrase ‘machine actionable’ to indicate a continuum of possible states wherein a digital object provides increasingly more detailed information to an autonomously-acting, computational data explorer.

      Definition of machine actionable

    2. that all research objects should be Findable, Accessible, Interoperable and Reusable (FAIR) both for machines and for people.

      Need to add "attributable" and "citable"

    1. Free publication for all Zika Virus research BMJ wants to support the fast development of research around the Zika virus by enabling researchers to share their findings as quickly as possible.

      See Wellcome trust statement about open data sharing during global health emergencies:

    1. Interrupting prolonged sitting with light-intensity walking breaks may be an effective fatigue countermeasure acutely.

      Getting up and stretching is a tried and true method for battling mental fatigue.

    1. It's often said that the bacteria and other microbes in our body outnumber our own cells by about ten to one.

      We also say we only use 10% of our brains. Another myth that is difficult to dislodge.

    1. SPM 2016 course

      The 19th annual course for functional MR imaging (SPM) takes from 26 - 28 September 2016 instead. Due to the significant demand for an intensive training in the various methods in this field, takes the course of this year for the first time exclusively as an intensive course over 3 days instead and covers all areas, both of the first attempts to start with SPM to own to developments program codes from , The previous division into a pure beginner and Fortgeschrittenen- part is thus eliminated.

      Date: 26 - 28 September 2016 Location: Hamburg

    1. School of Data works to empower civil society organizations, journalists and citizens with the skills they need to use data effectively – evidence is power!

      Don't know if this is too beginner for ReproNIM audience, but it has some nice tutorials on how to clean data or extract data from web pages for novices.

    1. In other words, there is still plenty ofgood science out there

      A good point to keep in mind.

    2. often reject

      try "nearly always"

    3. Good science is in some sense a public good,

      In some sense?

    4. we are the first to illustrate the evolutionary logic of how, in theabsence of change, the existing incentives willnecessarilylead to the degradationof scientific practices.

      Innovative! Groundbreaking!

    5. publication quantity

      But bigger labs will produce more publications. Is this factored in?

    6. Replication allows those labs with poor methods tobe penalized, but unlessallpublished studies are replicated several times (an idealbut implausible scenario), some labs will avoid being caught.

      "caught" is an interesting term here.

    7. “money-back guarantee”

      A terrible idea.

    8. This last assumption may appear unrealisticallyharsh, but research indicates that retractions can lead to a substantial decrease incitations to researchers’ prior work

      But most findings that aren't replicated are not withdrawn from the literature, as indeed they should not be.

    9. When the focus is on the production of novelresults and negative findings are difficult to publish, institutional incentives forpublication quantity select for the continued degradation of scientific practices.

      Need to read through the methods again, but the premises are at least somewhat true, although there are some fields, e.g., anatomy, where at least the empirical findings do tend to hold up over time. Which is why I became an anatomist. On the other hand, you can't build a career on just descriptive science anymore.

    10. Research groups would never be able to get awaywith methods for whichallhypotheses are supported

      That was a joke about psychology findings though. A colleague told me that 95% of the hypotheses in the psychological literature were confirmed leading someone to quip: Well then, they don't need anymore funding as apparently they already know everything.

    11. The dying lab is then removed from the population. Next,a lab is chosen to reproduce

      Vaguely disturbing images come to mind.

    12. while negative novel results never are.

      May not be entirely true. The process of registering a study before it is performed and then publishing the results, regardless of whether they are positive or negative, has been taken up by some journals.

    13. Scientists are humanand will therefore respond (consciously or unconsciously) to incentives; when per-sonal success (e.g., promotion) is associated with the quality and (critically) thequantityof publications produced, it makes more sense to use finite resources togenerate as many publications as possible (p. 1037, emphasis in original). Sec-ond, researchers may be trying to do their best, but selection processes rewardmisunderstandings and poor methods.
    14. The first highly-citedexhortation to increase statistical power was published by Cohen in 1962, as a re-action to the alarmingly low power of most psychology studies at the time

      I didn't realize that calls to increase statistical power went back that long.

    15. The population benefitsfrom high power more than individual scientists do. Science does not possess an“invisible hand” mechanism through which the naked self-interest of individualsnecessarily brings about a collectively optimal result.

      And, if we believe that our job as scientists is, at least in significant part, to benefit the population (i.e., the taxpayers), then the current system cannot stand.

    16. Statisti-cal power refers to the probability that a statistical test will correctly reject the nullhypothesis when it is false, given information about sample size, effect size, andlikely rates of false positives
    17. On the other hand, low-poweredexperiments are substantially easier to perform when studying human or othermammals, particularly in cases where the total subject pool is small, the experi-ment requires expensive equipment, or data must be collected longitudinally

      These are real limitations of doing science though.

    18. a dead Atlantic Salmon exhibiting neural responses toemotional stimuli

      That is rather scary, but I remember that study.

    19. Moreover, even firmly discredited research is often citedby scholars unaware of the discreditation

      The argument for better communication channels on top of the literature, like annotation for example!

    20. For example, until recently, theJournal of Personality and Social Psychologyrefusedto publish failed replications of novel studies it had previously published

      At an NIH workshop I attended a while back, it was recommended that journals be required to publish failed replications.

    21. Incentives to increase one’sh-index may also encourage researchers to engage in high-risk hy-pothesizing, particularly on “hot” research topics, because they can increase their citation count bybeing corrected

      Oh come on. Is there evidence of this? On the other hand, being provocative does generate discussion, and there's nothing inherently wrong with that.

    22. This is often summarized more pithily as “when a measure becomesa target, it ceases to be a good measure.”

      Good quote for the Commons.

    23. n the years between 1974 and 2014, the fre-quency of the words “innovative,” ”groundbreaking,” and “novel” in PubMed ab-stracts increased by 2500% or mor

      Ouch.

    24. ewly hired biologists now have almost twice asmany publications as they did ten years ago (22 in 2013 vs. 12.5 in 2005).

      Interesting statistic pointing to the "hypercompetitiveness" of science.

    25. We show that the persistence of poor research practice can be explained as theresult of the natural selection of bad science.

      Which is the strongest indictment of our current system possible.

    26. We assume that all agents have the utmost integrity. They never cheat. In-stead, research methodology varies and evolves due to its consequences on hiringand retention, primarily through successful publication.

      Reasonable premises

    27. over 50 years of reviews of low statistical power

      Is any of this tied to funding levels as well?

    28. IF [impact factor]

      I'm trying to remember when this became so important. When did I even learn this term? It wasn't when I was in graduate school or a post-doc (in the 80's and 90's). We knew which journals were good, but it had nothing to do with any number.

    29. Improving the quality of research requires change at the institutionallevel.

      We came to the same conclusion over at FORCE11 in the Scholarly Commons project. It's time for those of us in science to acknowledge that it is wrong to support an incentive system that is inconsistent with the best science we can do.

    30. THE NATURAL SELECTION OF BAD SCIENCE

      Do we want to have a bibliography associated with ReproNIM?

  5. May 2016
    1. Let’s say all large publishers suddenly refused anyone any access to any of their copyrighted materials at 9am tomorrow morning — what would they be replaced with?

      Interesting thought experiment. Good way to provoke discussions on the Commons.

    1. Scientific fields oftenwork well in situations where we can measure howwell a project is doing. In the case of processorswe know their function and we can know if ouralgorithms discover it.

      Is it possible that the microprocessor might have functions beyond what it was engineered for?

    2. As we reviewed in this paper, we may evenwant to be careful about the conclusions about themodules that neuroscience has drawn so far, afterall, much of our insights come from small datasets,with analysis methods that make questionable as-sumptions.

      Or to assume that these modules are modules in the classic sense vs reconfigurable elements.

    3. However, we cannot write off the failure of thesemethods on the processor simply because processorsare different from neural systems.

      I think this is the crux of the matter. If we didn't know what the microprocessor was or did and were given the task of figuring it out, we would have to develop methods to probe its structure and function. If the methods cannot discern this, then they are not valid methods. So are the methods that were applied here the ones that would -or perhaps more accurately-could be used if we didn't know anything at all about the thing?

    4. However,in the case of the processor we know its functionand structure and our results stayed well short ofwhat we would call a satisfying understanding

      It might be interesting to disguise the data in neural terms, and give them to to computational neuroscientists to see what they come up with.

    5. This again highlights howhard it is to derive functional insights from activitydata using standard measures.

      At this point, as a neuroanatomist, I have to ask whether the anatomy of the system provides any more reliable insights?

    6. Arguably this approach ismore justified for the nervous system because brainareas are more strongly modular.

      I don't think we can argue that.

    7. This finding of course is grossly misleading. Thetransistors are not specific to any one behavior orgame but rather implement simple functions, likefull adders.

      Also an argument by the neuroethologists: if you don't understand a behavior well enough to understand the components that might underly it, how can you investigate neural circuits?

    8. Wemight thus conclude they are uniquely responsiblefor the game – perhaps there is a Donkey Kongtransistor or a Space Invaders transistor.

      The lesion fallacy and the localization fallacy in the brain.

    9. We will finally discuss how neuroscience can worktowards techniques that will make real progress atmoving us closer to a satisfying understanding ofcomputation, in the chip, and in our brains

      Not just a problem but a solution.

    10. In this paper, much as in systems neuroscience,we consider the quest to gain an understandingof how circuit elements give rise to computation.

      Operational definition of understanding.

    11. What constitutes an understanding of a system?Lazbnick’s original paper argued that understand-ing was achieved when one could “fix” a brokenimplementation

      I once fixed my car by jiggling something; I don't know why it worked or what I was jiggling at the time (later found out it was the throttle), but it worked. So I'm not sure that this criterion is entirely true. Also, consider the placebo effect!

    12. mportantly, the processor allows us to ask “do wereally understand this system?”

      Yes, this is an important point.

    13. The human brain has hundreds ofdifferent types of neurons

      And we really don't know all the different pieces just yet

    14. label regions

      With multiple labels for the same structure applied by different anatomists?

    15. We arguethat the analysis of this simple system implies thatwe should be far more humble at interpreting re-sults from neural data analysis.

      Humility in the face of biological complexity is always a good idea.

    1. Publishers are also getting involved: around a dozen journals this year began asking their authors to use unique identifiers for their reagents as part of a push by the Resource Identification Initiative.

      Resource Identification Initiative gets mentioned in the science events that shaped 2015.

    1. insulating myelin sheaths

      Too narrow a definition; should be expanded to include non-myelinating Schwann cells as per the hierarchy.

    1. Broca’s Region: Linking Human Brain Functional Connectivity Data and Nonhuman Primate Tracing Anatomy Studies

      This is the title of the paper.

    1. When we start to see all the protocol, prespecified hypotheses, and raw data available for review, along with full disclosure of methods and analyses and what, if anything, changed along the course of experiments, be it at the bench or in clinical trials, we’ll have made substantive progress.

      There is no reason that we should not be publishing all of these things now.

    2. Use of blockchain technology has recently been shown to provide an immutable ledger of every step in a clinical research protocol, and this could easily be adapted to basic and experimental model science. All participants in the peer-to-peer research network have access to all of the time stamped, continuously updated data. It is essentially tamper proof since any change, such as to the prespecified data analysis, would have to be made in every computer (typically thousands) within the distributed network. All of the data would be in a findable, machine readable format. More importantly, with every step of a research path digitised and shared, we have a platform well suited for rapid, independent verification of methods and results.

      This is getting mentioned more and more in an open science context.

    3. What is missing is the deep commitment—across academia and the life science industry—for open science and open data.

      Yes! I see this, as do many others, as the only antidote to the reproducibility "crisis". That, and ensuring that our incentive and reward system are aligned with doing the best science. Currently, they are not.

    4. It’s not just money that buys replication; time and experience can come into play.

      I think the skill factor is under-appreciated in discussions on reproducibility. I always liken it to what would happen if a top Chef and I were given the same recipe. Would the results be the same? I think not.

    5. Thus, the proposal from a senior executive at Merck to set up a “clawback” model, whereby an academic institution would refund its financial payment if the basic science or preclinical results prove to be irreproducible, is ironic.

      Well, that would ensure that nothing was ever published.

    1. 6879.23333051054

      I'm sorry, but carrying this out to 11 decimal places seems rather extreme.

    1. nlx_anat_1008012

      I am wondering why this neuron has an nlx_anat identifier. Shouldn't it be part of the cell ontology?

    1. You can find the agenda for this year’s conference here and a summary of abstracts here

      Seems like a conference that emphasizes web-based annotation for collaborative knowledge creation should have an agenda and abstracts that are annotatable. Annotators, annotate thyself!

    1. 400 public annotations

      "please correct the material": I love this approach. It gives them a reason for annotating and pointing out a typo is a non-threatening activity.

    1. not humanities

      I have been gathering my thoughts about differences in feelings about annotations in the sciences and humanities. I do think they are viewed differently, with sciences focus on being factually correct and the culture of interpretation in the humanities.

    1. They are proposing to have univocally identified and persistent landing pages that provide human-readable (namely HTML) and machine-readable (e.g., JSON) basic information on the dataset. Repositories are expected to start experiencing with these approaches.
    2. For example, repositories contribute to supply credit to data contributors, however they have not yet developed any micro attribution oriented facility aiming at highlighting who did what (Allen et al., 2014).

      And the concept of "author" often gets conflated with contributor. Those contributing to a data set may or may not, however, be authors on a paper.

    3. Repositories tend to support OAI-PMH (cf. Tab. 8) yet they should reinforce the range of protocols and facilities they offer for programmatically accessing their content, e.g., by exposing their content in formats other than HTML like Schema.org (Guha et al., 2016) and Linked Data (Bizer et al., 2009), by supporting standard protocols like OpenSearch and SRU (Denenberg, 2009). Services implemented by third-party providers largely rely on metadata, thus the quality of metadata severely impacts also the implementation of this facility e.g., Rousidis et al. (2014).

      Role of repositories in making data FAIR.

    4. Thus trustworthiness is actually an induced property acquired by repositories during their lifetime.

      I agree. And I suspect that as data publishing becomes more formalized, understanding what constitutes trustworthiness will become more apparent.

    5. The License Agreement between 3TU.Datacentrum and data submitters explicitly states that the repository (i) “shall ensure, to the best of its ability and resources, that the deposited dataset will remain legible and accessible”; (ii) “shall, as far as possible, preserve the dataset unchanged in its original format, taking account of current technology and the costs of implementation”; and (iii) “has the right to modify the format of the dataset if this is necessary in order to facilitate the digital sustainability, distribution or re-use of the dataset”.

      Very explicit statement of how data in 3TU.Datacentrum is stewarded. Might be of interest to the data citation work at FORCE11.

    6. or preservation, beside storing the data in multiple copies the selected repositories tend to use format migration practices.

      Critical but difficult. Which is why open standards are also necessary, to the degree possible.

    7. Repositories might also support post-publication validation by providing mechanisms for end-users to provide both the repository and the data providers with concrete and documented feedback resulting from (re-)using or attempting to (re-)use the dataset.

      This is why we developed the Resource Identification Initiative to signify use of a resource rather than just mention of the resource. We think that use is a critical piece of information that is difficult to get with our current general citation system.

    8. Trying to assess the general ‘quality’ of a dataset is hopeless; consider instead whether the dataset is suited to a particular use.

      See above!

    9. it is almost impossible to envisage a certification scheme that guarantees that the datasets published by a repository are “scientifically sound”.

      In fact, the definition of "scientific soundness" may be hard to pin down. If there is an error in representing or acquiring the data, that would be unsound. But otherwise, "unsound for what"? or "Sound for what" is perhaps the more reasonable way to think about it.

    10. A similar approach is offered by Dryad that records the number of downloads and makes it possible to browse the “most popular” ones, in terms of downloads. 3TU.Datacentrum only publishes aggregated statistics on downloaded datasets as a sort of validation of the centrum itself.

      With formal data citation, should also be able to find "used by". Very important statistic.

    11. checksum

      A checksum or hash sum is a small-size datum from a block of digital data for the purpose of detecting errors which may have been introduced during its transmission or storage. (from Wikipedia)

    12. No control is made for attesting “scientific soundness” of the dataset, e.g., scientific validity, accuracy or completeness.

      That would indeed be a high bar to have to obtain, although domain-specific repositories can, in many cases, make some judgement in this regard.

    13. Actually, it has been observed that the use of the term “peer review” for datasets is causing some false expectations (Parsons and Fox, 2013; Candela et al., 2015).

      Interesting. Must look at this further.

    14. Rather than billing researchers, new payment methods can be envisaged including private- or public-sector grants and partnerships with journal publishers.

      No, they cannot be private or public sector grants; that model is failing miserably. Partnerships with journal publishers-yes, but everything won't be published as part of published research papers.

    15. the reduction of the publishing costs,

      Reduction? How about recognition that data publication does cost something and putting these charges into their grants, as is done for paper publication costs.

    16. Dryad is the only repository in our sample that always requires submitters to pay a charge independently of the files size. All the other ones offer at least a minimum storage space where users can publish free of charge.

      But how are these other repositories subsidized?

    17. For the sake of re-usability, it is very useful to be able to add further metadata values and documentation as the data exploitation progresses.

      That's why all data repositories need to embed Hypothes.is!

    18. Unfortunately, a data paper is of use only for humans and does not provide any support for automatic consumption.

      Yup

    19. none of them explicitly requires that the deposited datasets be associated with a data paper.

      Would be interesting to think through the consequences if this were to become required.

    20. However, this approach has to face the potential limitations that have been widely discussed in the past, e.g., McGath (2013) stressed how new formats develop and existing ones evolve officially and unofficially thus making the maintenance of the registry challenging.

      Any solution that does not recognize this will fail. Scientists have to be able to utilize the latest technology.

    21. This situation largely restricts the facilities that a repository can offer to support data publication.

      Which is why, if there is a specialized repository, it is generally better to submit the data there. Of course, that often involves more effort.

    22. Independently of file formats, repositories have some limitations on allowed file sizes. They tend to have an upper bound limit yet are open to negoziate extensions to this limit with additional costs (cf. Sec. 4.4). Dryad allows uploading of no more than 10GB of material for a single publication; 3TU.Datacentrum supports the upload of datasets up to 4 GB; Zenodo currently accepts files up to 2GB although it reports that the current infrastructure has been tested with 10GB files; Figshare enables users to store up to 1 GB data in their private space with files up to 250 MB each.

      Remains a challenge for some types of data. But, then again, we've always limited the size of publications to certain contexts-a journal article is of a certain size; if it is larger, it becomes a book and is handled differently. So perhaps the fact that a single repository cannot handle all data sizes is to be expected.

    23. For content format, selected repositories somehow neglect it and describe how the dataset is organised through dataset documentation (cf. Sec. 4.2).

      Thus leading to criticism of generalist repositories

    24. The repository exposing the largest variety of subjects is Dryad where a total of 19,829 distinct subjects have been used to characterise its 9,676 datasets.

      Interesting, because I thought Dryad focused more on earth and evolutionary sciences.

    25. The selected repositories have published a total of 336,647 datasets (Tab. 2). The large majority of such datasets – 85% circa – has been published in the last three years, namely 33% circa of datasets have been published in 2013, 29% circa in 2014, and more than 22% in 2015

      Does, perhaps, suggest that data sharing is picking up. I wonder, then, whether we should use the term "data publishing" instead of data sharing, as it is perhaps a more accurate and less loaded term.

    26. 320,415

      Dwarfs all the others. But I wonder at the average size of the data set here?

    27. This study embraces the definition of (research) data given by Borgman (2015), i.e., “entities used as evidence of phenomena for the purpose of research or scholarship”, and uses “dataset” to refer to the unit of data subject of the data publishing activity, no matter how many files it materialises (Renear et al., 2010). This “dataset” definition includes the term “data package” as adopted by Dryad to mean a set of data files associated with a publication, as well as “dataset” and “fileset” as used by Figshare to indicate data (the former) and a group of multiple files citable as a single object (the latter).

      Reasonable definitions of data and data set.

    28. Scientific data repositories are often proposed as instruments for supporting data publishing as they provide facilities for all the different players involved in this process.

      I think they do more than just "support data publishing", I think they "publish data", i.e., they are the publishers.

    29. They are called to implement systematic data stewardship practices thus to foster adequate scientific datasets collection, curation, preservation, long term availability, dissemination and access.

      Nice encapsulation of the role of a data repository.

    1. “Our ability to understand what to build is so far behind what we can build,” said Dr. Minshull,

      I tend to agree with this statement, particularly if one is going to synthesize "an original bacterial genome".