3,103 Matching Annotations
  1. Aug 2017
    1. consistently

      When results of several analyses point in the same direction, we say the results are consistent.

      For example, if we run three correlation analyses and find that enjoyment of hiking, self-assessed nature-lovingness, and number of times previously hiked all correlate positively with the probability that someone enjoys hiking holidays, we would say that the results are consistent.

      If we found that the number of times previously hiked was negatively correlated with the probability that someone enjoys hiking holidays, the results would be less consistent.

    2. repeated measurement designs

      A repeated measurement design assesses the same outcome variable at several points in time.

      For example, let’s say we want to find out whether jogging before class improves students’ ability to follow a class. We might ask 20 students to jog before class and 20 students not to jog before class, and then after class ask them how easy it was for them to follow the class.

      However, we might be unlucky and conduct our experiment on a day where a particularly difficult topic was covered in class. No-one, neither the joggers nor the non-joggers, could understand the lecture, so all our subjects report they absolutely couldn’t follow the class.

      This problem could be ameliorated if we used a repeated measurement design instead. We would ask our 20 joggers and 20 non-joggers to either jog nor not-jog before class on 5 days in a row, and then ask them for their ability to follow the class each time. Now, we would have not only one point of measurement form each student, but 5 points of measurement of their ability to follow the class at several points in time.

    3. within-subjects manipulations

      Within-subjects manipulations refer to situations in experiments where the same person is assigned to multiple experimental conditions.

      For example, let’s say we want to find out which of two different learning techniques (A and B) is more effective in helping students prepare for a vocabulary test. If we conducted a within-subjects manipulation, each student would apply both learning techniques.

      Let’s say every student must first apply learning technique A, then take a vocabulary test, and then a week later for the next test apply learning technique B. We could now compare following which learning technique the students perform better.

      In contrast, if we conducted a between-subjects manipulation, each student would only apply one learning technique. We would split the group of students, so that half of them use learning technique A and then take the vocabulary test, while the other students use learning technique B and then take the vocabulary test. Again, we could compare following which learning technique the students perform better.

    4. pre-analysis plans

      A pre-analysis plan is a document that specifies which analyses will be run on the data, before these analyses are performed.

      This plan can specify which variables and analyses will be used, how data will be prepared for analyses, and in which cases data will be excluded from analyses. This tool helps researchers specify and commit to the way they want to run the analyses in their study.

    5. confirmatory tests

      A confirmatory test is a statistical analysis of a certain relationship which had previously been hypothesized to hold. The test tries to find out if the hypothesis is supported by the data.

    6. publication bias

      Publication bias is a type of distortion that can occur in making academic research public. When findings which show that a certain effect of interest was found to be statistically significant are more likely to be published than findings which show no evidence or even evidence against this effect, publication bias is present.

      In this case, if you only read the published papers, you would find a lot of papers showing support for an effect, while studies which do not show support for the same effect are not published, giving you the impression that the effect was less disputed and more consistently found than it actually is.

    7. fixed-effect model

      A fixed effects model is a statistical model which accounts for individual differences in the data which cannot be measured by treating them as non-random, or “fixed” at the individual level.

      As an example, let’s say we wanted to study if drinking coffee makes people are more likely to cross the street despite a red light. Our outcome variable of interest is how often each subject crosses a street despite a red light on a walk with 10 red traffic lights. The explanatory variable we manipulate for each participants is if they had a cup of coffee before the experiment or a glass of water (our control condition), and we would use this variable to try to explain ignoring red lights.

      However, there are several other influences on ignoring red lights which we have not accounted for. Next to random and systematic error, we have also not accounted for individual characteristics of the person such as their previous experience with ignoring red lights. For instance, have the participants received a fine for this offense? If so, they might be less likely to walk across a red light in our experiment.

      Using a fixed effects model makes it possible to account for these types of characteristics that rest within each individual participant. This, in turn, gives us a better estimate of the relationship between coffee drinking and crossing red lights, cleaned from other individual-level influences.

    8. Spearman’s rank-order correlations

      Spearman’s rank-order correlation is a specific type of correlation analysis, which assess the relationship between two variables with regard to its strength and direction.

    9. multivariate interaction effects

      A multivariate interaction effect is an effect that is the product of several variables working together and influencing each other.

      For example, we might be interested in finding out how water temperature (warm: 38°C; cold: 15°C) affects the body temperature of humans and sea lions. We might find that humans, on average, have a higher body temperature than sea lions, and that body temperature is higher when the body is immersed into warm compared to cold water.

      However, we might find that a human’s body temperature shows bigger differences between the warm and cold water conditions than the sea lion’s body temperature. Because sea lions have a substantive layer of protective fat, they body temperature does not change as much when water temperature changes, compared to humans.

      Here, species and water temperature show an interaction effect on body temperature.

    10. sample size

      The sample size refers to the number of people from whom data is collected in a study.

    11. accumulated evidence

      Accumulated evidence refers to the results of several studies taken together.

    12. random or systematic error

      There are two sources of error which can occur in scientific studies and distort their results.

      Systematic errors are inaccuracies that can be reproduced. For example, imagine we wanted to measure a participant’s weight and we make our participant step on 5 different scales and measure her weight on each scale 10 times. Four scales report that she weighs of 74kg each time our participant steps on them. The last scale shows that she weighs 23kg each time the participant steps on it. We would say there is a systematic error involved in our study of her weight, because last scale consistently and erroneously reports her weight as too low.

      Random errors are inaccuracies that occur because there are unknown influences in the environment. For example, imagine we wanted to measure a participant’s weight and had her step on the same scale 3 times in a row, within one minute. The first time, the scale reports 74,43kg, the second time 74,34kg, the third time 74,38kg. We don’t think that the participants’ weight has actually changed in this one minute, yet our measurement shows different results, which we would attribute to random errors.

    13. Correlational tests

      A correlational test is a statistical method of analysis which asks if there is a relationship between two variables, and if this relationship is unlikely to be caused by chance.

      For example, if we wonder if intelligence influences students’ biology exam performance, we could use a correlational test to see if more intelligent students get higher scores in a biology test. If we find this pattern, the test would also tell us if our result is extreme enough (if the correlation is strong enough) so that the probability that we make an error if we assume this relationship is true would be very low.

    1. induces a pro-social motivational state in rats, leading them to open the restrainer door and liberate the cagemate

      The main hypothesis tested was whether rats would voluntarily act to free a trapped cagemate because of feeling sympathy for the trapped rat.

      The authors ask whether the free rat, upon seeing and hearing its trapped cagemate, will act to release the cagemate.

  2. Jul 2017
    1. tau-immunoreactive neurofibrillary tangles (NFTs)

      NFTs occur when tau proteins are deformed, clump together, and abnormally accumulate. These accumulations are toxic to neurons If these tangles happen in neurons, they can be observed using a microscope.

      NFTs are always associated with brain disease and are seen in CTE, Alzheimer's disease, and other neurological conditions. NFTs cause neurodegeneration, neuron death, cognitive issues, dementia, and are ultimately fatal.

    2. Neuropathological analysis of postmortem brains from military veterans with blast exposure and/or concussive injury revealed CTE-linked neuropathology

      When they looked at the brains of people who suffered from TBI, the authors saw abnormalities ("neuropathologies") characteristic of CTE.

    3. second impact syndrome

      If the brain does not have enough time to recover after TBI, it might respond to further TBI with sudden and often fatal swelling.

      Second impact syndrom most commonly affects teenagers and young adults. It is very rare and will often result in death.

    4. amnesia

      Partial or total memory loss or impairment

    5. We performed comprehensive neuropathological analyses (table S1)

      Neuropathological analyses consisted of staining thin sections of postmortem brain tissue from autopsy donors and examining these specimens under a microscope for evidence of disease.

      The presence, distribution, and phosphorylation state (whether or not the protein has a phosphoryl group attached to it) is used to confirm CTE diagnosis after death.

    6. phosphorylated tauopathy

      Disorders caused by abnormal accumulation of phosphorylated tau protein in the brain.

      If a protein is phosphorylated, it means that it has a phosphoryl group (PO<sub>3</sub><sup>-</sup>) attached to it.

    1. high-impact

      The impact of a publication can be measured in several different ways. A common metric to assess the impact of a journal is the impact factor, a numerical indicator calculated based on the number of citations and published articles within a given year.

      For this paper, the authors considered an article to be "high-impact" if it was within the top 1% most cited publications in its cohort.

      For more information about assessing impact, see http://researchguides.uic.edu/if

    2. nonparametrically

      Nonparametric statistical models are often used for data that are ranked.

    3. covariates

      A covariate is a variable that is used in a regression analysis. It is a variable that might be responsible for the outcome of a study, or that might be interfering.

      Here, all of the additional variables added in each model were covariates (writing ability, gender, ethnicity, etc.)

    4. including these variables does not substantively affect our findings.

      The authors concluded that writing skills do not affect the chance of receiving grant funding from NIH.

    5. possibility that peer reviewers may be rewarding an applicant’s grant proposal writing skills rather than the underlying quality of her work

      In this model, the authors try to control for the fact that an application could be selected because the applicant writes well, rather than based on the quality of the application.

    6. we employ a probabilistic algorithm developed by Kerr to determine applicant gender and ethnicity (Hispanic or Asian)

      The algorithm in question was developed by William Kerr to estimate the contribution of Chinese and Indian scientists to the U.S. Patent and Trademark Office.

    7. principal investigator (PI)

      A principal investigator is the holder of an independent grant administered by a university and the lead researcher for the grant project, usually in the sciences.

      The phrase is also often used as a synonym for "head of the laboratory" or "research group leader."

    8. Our regression results include separate controls for each type of publication: any authorship position, and first or last author publications.

      What the authors mean here is that they made statistical computations that allow them to remove the effect that the position of a name in the authors row can have in a publication.

    9. U.S. Patent and Trademark Office (USPTO)

      The United States Patent and Trademark Office is an agency of the U.S. Department of Commerce which stores, classifies, and disseminates information on patents and gives grant patents for the protection of inventions and to register trademarks.

    10. PubMed

      PubMed is a database of medical and biological publications, created by the National Center for Biotechnology Information. It is the free version of the database MEDLINE.

    11. our paper asks whether NIH selects the most promising projects to support.

      Previous work has shown that receiving a grant increases scientific productivity. However, the paper authors want to know if NIH is awarding grants to projects that will make the best use of the money.

    12. applicants from elite institutions

      All that separates individual investors is access to the best ideas and powerful research tools.

      See this MarketWatch article about how social media is present in our daily lives and how much it can create a connection between the ideas and investors:

      http://www.marketwatch.com/story/social-medias-next-disruption-the-investment-industry-2016-06-09?siteid=rss&rss=1

    13. research project (R01)

      The Research Project (R01) grant is a type of grant awarded by the National Institutes of Health that provides support for health-related research and development.

  3. Jun 2017
    1. serial sectioning

      Successive microscopic images of a tissue.

    2. amniotes

      The group of animals that lay fertilized eggs on land or retain the fertilized egg in the mother. They include reptiles, birds, and mammals.

    3. canonical

      The usual or natural state.

    4. parasagittal

      An imaginary plane that divides a body into left and right halves.

    5. Hematoxylin and eosin (H&E)

      A widely used stain in histology which dyes nuclei blue and plasma pink.

  4. May 2017
    1. Other investigators may develop alternative indicators to explore further the role of expertise and quality in reproducibility on this open data set.

      In a later approach to estimating how researchers assess the reproducibility of science, a large-scale survey was conducted with more than 1500 researchers answering questions such as "Have you failed to reproduce an experiment?"

      Read more in Nature: http://www.nature.com/polopoly_fs/1.19970!/menu/main/topColumns/topLeftColumn/pdf/533452a.pdf

    2. J. K. Hartshorne, A. Schachner, Tracking replicability as a method of post-publication open evaluation. Front. Comput. Neurosci. 6, 8 (2012). doi: 10.3389/fncom.2012.00008; pmid: 22403538

      Hartshorne and Schachner suggest that replication success should be traced in a database connecting replication attempts with original studies. Based on this information, a replication success score could be computed, which could be used as a criterion for a journal's quality alongside other indicators such as citation counts.

    3. T. M. Errington et al., An open investigation of the reproducibility of cancer biology research. eLife 3, e04333 (2014). doi: 10.7554/eLife.04333; pmid: 25490932

      Similarly to the reproducibility project in psychology, Errington and colleagues planned to conduct replication attempts on 50 important papers from the field of cancer biology. While the registered reports are already available online, the replication studies themselves are currently still being conducted.

      Read more on eLife:

      https://elifesciences.org/collections/reproducibility-project-cancer-biology

    4. J. P. Simmons, L. D. Nelson, U. Simonsohn, False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011). doi: 10.1177/0956797611417632; pmid: 22006061

      Simmons and colleagues conduct computer simulations and two experiments that show how significant results can easily be achieved for a research hypothesis that is false. They show that flexibility --or as they call it: researcher degrees of freedom --in four areas make it more likely to find significant effects for a false hypothesis:

      1. Flexibility in choosing the dependent variables reported: When researchers flexibly analyze two related dependent variables, this already almost doubles the probability of finding a positive result for a false hypothesis.

      2. Flexibility in choosing the sample size: When researchers stop data collection, find no significant result, and collect additional data before checking for the same effect, this increases the probability of finding a positive result for a false hypothesis by 50%.

      3. Flexibility in the use of additional variables included in the analyses: When researchers include additional variables in the analyses, false positive rates more than double.

      1. Flexibility in the number of experimental conditions reported: When researchers collect data in three experimental conditions and flexibly decide whether to report the result of comparisons between any two conditions or all three, this more than doubles the false positive rate.

      If researchers used research practices where they used all four flexibilities, they would, overall, be more likely to find positive results although the underlying hypothesis was indeed false.

    5. R. Rosenthal, The file drawer problem and tolerance for null results. Psychol. Bull. 86, 638–641 (1979). doi: 10.1037/0033- 2909.86.3.638

      Rosenthal addresses the 'file drawer problem', a questionable research practice where only studies that showed the desired result would be published and all other studies would land in the 'file drawer' and thus unknown to the scientific community.

      In the extreme case, this could mean that, if a specific effect did not exist in reality, the 5% of studies that (due to statistical error allowed) find this effect get published and discussed as if the effect were true, whereas 95% of studies do not (and rightly so) find the effect, but are tucked away in a file drawer. This problem hinders scientific progress, as new studies would build on old, but false, effects.

      Rosenthal introduces a way to assess the size of the file drawer problem, the tolerance to future null results: calculating the number of studies with null results that would have to be in a file drawer before the published studies on this effect would be called into question.

    6. B. A. Nosek, J. R. Spies, M. Motyl, Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci. 7, 615–631 (2012). doi: 10.1177/1745691612459058; pmid: 26168121

      Nosek and colleagues argue that scientists are often torn between "getting it right" and "getting it published." While finding out the truth is the ultimate goal of research, more immediately, researchers need to publish their work to be successful in their profession.

      A number of practices, such as the establishment of journals emphasizing reports of non-significant results, are argued to be ill suited for improving research practices. To reconcile the two seemingly-at-odds motives, Nosek and colleagues suggest measures such as lowering the bar for publications and emphasizing scientific rigor over novelty, as well as openness and transparency with regard to data and materials.

    7. L. K. John, G. Loewenstein, D. Prelec, Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23, 524–532 (2012). doi: 10.1177/ 0956797611430953; pmid: 22508865

      John, Loewenstein and Prelec conducted a survey with over 2,000 psychologists to identify to what extent they used questionable research practices (QPRs). The respondents were encouraged to report their behavior truthfully, as they could increase donations to a charity of their choice by giving more truthful answers.

      Results showed that a high number of psychologists admitted to engaging in QRPs. Almost 70% of all respondents admitting to not reporting results for all dependent measures and around 50% of respondents admitting to reporting only studies that showed the desired results.

      Moreover, results showed that researchers suspected their peers also occasionally engaged in such QRPs, but that psychologists thought that there was generally no good justification for engaging in QRPs.

    8. G. S. Howard et al., Do research literatures give correct answers? Rev. Gen. Psychol. 13, 116–121 (2009). doi: 10.1037/a0015468

      Howard and colleagues examine how the file drawer problem---that is, the tendency of researchers to publish positive results but not negative or inconclusive results--affects a body of research literature. They compare "old," existing bodies of literature that could be suffering from the file drawer problem with a newly constructed body of literature guaranteed to be free of the file drawer problem, which they achieved by conducting new studies.

      This investigation suggests that some bodies of literature are supported as relatively file-drawer-free, while other bodies of literature raise concern and kindle further studies on the effects they include.

    9. A. G. Greenwald, Consequences of prejudice against the null hypothesis. Psychol. Bull. 82, 1–20 (1975). doi: 10.1037/h0076157

      Greenwald examines how research practices discriminate against accepting the null hypothesis (that an effect does not exist). Using a simulation, he suggests that too few publications accept the null hypothesis, and that a high proportion of publications falsely reject the null hypothesis when it would have held.

      Greenwald further debunks traditional arguments about why a null hypothesis should not be accepted, and suggests ways to improve research practices to improve the acceptance of accepting the null hypothesis.

    10. D. Fanelli, “Positive” results increase down the hierarchy of the sciences. PLOS ONE 5, e10068 (2010). doi: 10.1371/journal. pone.0010068; pmid: 20383332

      Fanelli assessed more than 2000 papers from different scientific disciplines and found that the proportion of studies reporting support for their hypotheses is higher in disciplines such as psychology or economics as compared to disciplines such as space science. It is concluded that both the type of hypotheses tested and the rigor applied in these tests differ between fields.

    11. K. S. Button et al., Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013). doi: 10.1038/nrn3475; pmid: 23571845

      Button and colleagues study the average statistical power of studies in neuroscience, and conclude that it is low. They highlight that low power does not only mean that studies have a lower chance of detecting a true effect, but that when using such low-powered studies, it is also becomes less likely that a significant effect indeed reflects a true effect.

      It is argued that using studies with low power may seem efficient at the first glance, because less money is spent on subjects. However, because future research could be building on an erroneous line of investigations, low-powered studies are inefficient in the long-run.

    12. improve the quality and credibility of the scientific literature.

      Improving the quality and credibility of scientific literature can be accomplished by improving the daily practices involved in the research process. Improved reporting and registering hypotheses and sample sizes are some ideas for such changes.

      Read more in Nature Human Behavior: http://www.nature.com/articles/s41562-016-0021

    13. research community is taking action

      An important part of taking action to advance psychological research is establishing an open discussion and dialogue about the direction the field could take. In the course of this movement, several researchers' blogs have become an increasingly popular medium for such debate.

      Read more on the topic of reproducibility in Andrew Gelman's blog:

      http://andrewgelman.com/?s=reproducibility

      and in Uri Simonsohn's Blog Data Colada:

      http://datacolada.org/?s=reproducibility

    14. Transparency and Openness Promotion (TOP) Guidelines (http://cos.io/top) (37)

      Nosek and colleagues summarize eight standards for transparency and openness in research that focus on citations, data accessibility, accessibility of computational resources, making research materials like participant instructions available and giving access to the design and analyses, study and analysis plan pre-registration, and the use of replication studies overall.

      They argue that journals should require and enforce adherence to transparency guidelines, and that the submission of replication studies, in particular in the Registered Report format, should be an option.

    15. The present results suggest that there is room to improve reproducibility in psychology. Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should.

      The fifth and final conclusion of the paper addresses the big-picture takeaway of the results.

      On one hand, the authors recognize that research is a process where new ideas have to be explored and sometimes might turn out not to be true. Maximum replicability is therefore not desirable, because it would mean that no more innovations are being made.

      On the other hand, the authors also conclude that there is room for improvement: stronger original evidence and better incentives for replications would form a stronger foundation for psychological research.

    16. Scientific incentives

      Incentives for working in scientific research often differ greatly by country and institution. In the UK, for instance the allocation of research funding and institutional positions depends on the number of published papers which are rated as highly original, significant, and rigorously conducted.

      Read more in The Guardian: https://www.theguardian.com/higher-education-network/2016/oct/17/why-is-so-much-research-dodgy-blame-the-research-excellence-framework

    17. Many Labs replication projects

      Many Labs replication projects are studies in which multiple labs attempt to replicate the same effect. In this example, 36 teams of researchers from different countries attempted to replicate the same 13 effects, with more than 6000 participants.

      The data revealed that 10 effects could consistently be replicated, while one effect showed only weak support for replication and two effects could not be replicated successfully.

    18. Nonetheless, collectively, these results offer a clear conclusion: A large portion of replications produced weaker evidence for the original findings

      Because there is some uncertainty about how exactly the replication success rate in psychological research should be determined, the authors go about the interpretation of the results of this study very conservatively.

      This very careful interpretation of the data is that the replication studies yielded largely weaker evidence for the effects studied than the original studies.

    19. No single indicator sufficiently describes replication success, and the five indicators examined here are not the only ways to evaluate reproducibility.

      It is difficult to say exactly how many original studies were successfully replicated. The precise conclusions drawn from this paper depend a lot on which of the 5 measures used to determine replication success you think is the most appropriate measure. The results of one measure indicate a replication success rate as low as 36%, while another measure suggests a success rate of 68%. Perhaps some researchers would even say that another measure not included in this study would have made it possible to draw more meaningful conclusions. The scientific community has so far not agreed on what measure should be used to evaluate replication success rates.

      Moreover, there are other limitations on this approach to studying reproducibility (see paragraph "Implications and limitations")that make it difficult to generalize the findings of this study not only psychological research, but other disciplines. It is also difficult to evaluate from the findings in this study whether the evidence indicates a specific effect is true or does not exist.

      Therefore, the first conclusion of this paper is that all interpretations of the data are only an estimation of how reproducible psychological research is, not an exact answer.

    20. In addition to the quantitative assessments of replication and effect estimation, replication teams provided a subjective assessment of replication success of the study they conducted.

      Finally, the authors used the subjective rating as an indicator of replication success. Out of 100 replication teams, only 39 reported that they thought they had replicated the original effect.

    21. Comparing the magnitude of the original and replication effect sizes avoids special emphasis on P values. Overall, original study effect sizes (M = 0.403, SD = 0.188) were reliably larger than replication effect sizes (M = 0.197, SD = 0.257),

      With this third measure for replication success, the authors further compared the sizes of the original and replicated effects. They found that the original effect sizes were larger than the replication effect sizes in more than 80% of the cases.

    22. In addition to the quantitative assessments of replication and effect estimation, we collected subjective assessments of whether the replication provided evidence of replicating the original result.

      Finally, the authors included a last measure for replication success: a subjective rating. All researchers who conducted a replication were asked if they thought their results replicated the original effect successfully. Based on their yes or no answers, the authors calculated subjective replication success.

    23. “coverage,” or the proportion of study-pairs in which the effect of the original study was in the CI

      The authors compared the size of the original study effects with their replication study's effects to determine if the original study results fell within the confidence interval (CI) of the replication.

      If the original study's effect size was within the CI of the replication study, the effect size can be assumed to be similar.

    24. z, F, t, and χ2

      z, F, t and X2 test statistics are parameters that are calculated from a sample and compared with what is expected given the null hypothesis (that there is no effect in reality). They allow inferences on whether the data can be used to reject the null hypothesis and assume an effect is present.

    25. Cohen’s d.

      Cohen's d is a measure for the size of an effect, used to report the standardized difference between two means. It is used to judge if an effect is small (d>0.20), medium (d>0.50) or large (d>0.80).

    26. We transformed effect sizes into correlation coefficients whenever possible

      The third indicator of replication success was the effect sizes of original and replication studies. The authors calculated correlation coefficients to indicate effect sizes.

      In a single study, when the means of two groups are similar, the correlation coefficient will be close to 1, and when the means of two groups are very different, the correlation coefficient will be close to 0.

      The effect size of original studies was always coded as positive (values between 0 and 1). When the effect in the relevant replication study went in the same direction, the effect size was also coded as positive (values between 0 and 1), but when the effect in the replication went in the other direction, the effect size was coded as negative (values between -1 and 0).

    27. t test for dependent samples

      The t-test for dependent samples is a statistical procedure that is used on paired data to compare the means of two groups.

    28. central tendency

      The central tendency of a distribution is captured by its central, or typical values. Central tendency is usually assessed with means, medians ("middle" value in the data) and modes (most frequent value in the data).

    29. distribution of P values of original and replication studies using the Wilcoxon signed-rank test

      In addition to comparing the proportion of studies yielding signficant results, the authors compared the p-values of these studies to find out how similar they were to each other.

    30. We tested the hypothesis that the proportions of statistically significant results in the original and replication studies are equal using the McNemar test for paired nominal data and calculated a CI of the reproducibility parameter.

      Next, the authors conducted another test to find out if the proportion of original studies that produced significant results was equal to or different from the proportion of replication studies that produced significant results.

    31. we tested the hypothesis that these studies had “no evidential value” (the null hypothesis of zero-effect holds for all these studies)

      The first analysis that the authors ran on the data assessed all replication studies that yielded non-significant results.

      The authors used Fisher's method to determine whether these studies actually had effects or if the original study authors were interpreting nonsignificant results as significant.

    32. Fisher’s method

      Fisher's method is a statistical procedure for conducting meta analyses, in which the results of all included studies are combined. The procedure examines the p-values of the individual studies, and allows inferences on whether the null hypothesis (that there are no effects) holds.

    33. However, original studies that interpreted nonsignificant P values as significant were coded as significant (four cases, all with P values < 0.06).

      Here, the authors explain how they deal with the problem that some of the original studies reported results as significant when they were, in fact, not significant.

      In each case, the threshold that is customarily set to determine statistical significance (p<0.05) was not met, but all reported p-values fell very close to this threshold (0.06>p>0.05). Since the original authors treated these effects as significant, the current analysis did so as well.

    34. two-tailed

      A two-tailed test looks for a hypothesized relationship in two directions, not just one. For example, if we compare the means of two groups, the null hypothesis would be that the means are not different from each other.

      The alternative hypothesis for a two-tailed test would be that the means are different, regardless if the one is bigger or smaller than the other.

      For a one-tailed test, one would formulate a more specific alternative hypothesis, for instance that the mean of the first group is bigger than the mean of the second group.

    35. meta-analyses

      Meta-analyses integrate the results of multiple studies to draw overall conclusions on the evidence.

    36. There is no single standard for evaluating replication success

      Because the large-scale comparison of original and replication studies is a new development in the field of psychology, the authors had to formulate a plan for their analysis that did not rely much on previous research.

      They decided to use 5 key indicators for evaluating the success of the replications. They compared the original and the replicated studies in terms of the number of significant outcomes, p-values, and effect sizes. They also assessed how many studies were subjectively judged to replicate the original effect. Finally, they ran a meta-analysis of the effect sizes.

    37. covaries

      Covariation indicates how two variables change together, and is the basis needed to calculate a correlation.

    38. P value

      A p-value is a statistical threshold for determining if a result is extreme enough to be considered compelling evidence, because it is unlikely that this result would show in the data if the effect did not exist in reality.

    39. functional magnetic resonance imaging

      Functional magnetic resonance imaging is a procedure that detects the activity of areas in the brain by measuring blood flow.

      It can be used to see what parts of the brain are involved in different processes.

    40. eye tracking machines

      Eye tracking machines are devices that can record eye-movements and make it possible to show what information people look at without asking them explicitly.

    41. autism

      Autism is a mental condition that makes it difficult to communicate and form relationships with others. People with autism can also have difficulty using language or thinking about abstract concepts.

    42. macaques

      Macaques are a type of monkeys.

    43. subjective assessments of replication outcomes

      One of the indicators for whether a study was replicated successfully or not was a subjective rating: each team of replicators was asked if their study replicated the original effect (yes or no).

    44. F test

      An F-test is a statistical procedure that assesses if the variance of two distributions are significantly different from each other.

    45. t test

      A t-test is a statistical procedure that assesses if the means of two distributions are significantly different from each other.

    46. citation impact

      Citation impact is determined by how frequently a paper is cited and built upon by subsequent literature.

    47. tractable

      Easy to deal with.

    48. a priori

      A priori means something was deduced or determined from theoretical considerations, before collecting data.

    49. cognitive psychology

      Cognitive psychology is a subdiscipline of psychology that studies mental processes like perception, problem solving, attention or memory.

    50. social psychology

      Social psychology is a subdiscipline of psychology that studies how people interact with their social environment, and how their thoughts and behaviors are affected by others.

    51. selection biases

      Selection bias here refers to systematic error in the way studies are included or excluded in the sample of studies which would be replicated. An unbiased selection would be truly random, such that the sample of studies used for replication would be representative of the population of studies available.

    52. sampling frame and selection process

      The authors wanted to make sure that the studies that were selected for replication would be representative of psychological research. Representativeness was important because it would mean that the conclusions that would be drawn from the replication outcomes could be cautiously extended to assumptions about the state of the field overall.

      At the same time, they had to make sure that the studies selected could also be conducted (that is, that one of the coauthors had the necessary skill or equipment to collect the data).

      To achieve this goal, a step-wise procedure was used: starting from the first issue of 2008 from three important psychology journals, 20 studies were selected and matched with a team of replicators who would conduct the replication attempt. If articles were left over because no-one could conduct the replication, but more replication teams were willing to conduct a study, another 10 articles were made available. In the end, out of 488 studies drawn from the population of studies, the authors attempted to replicate 100.

    53. constructed a protocol for selecting and conducting high-quality replications

      Before collecting data for the replication studies, the authors produced a detailed protocol that described how they were going to select the studies that were available for replication, how they would decide which effect they would attempt to replicate, and which principles would guide all replication attempts.

      Importantly, this protocol was made public, and all individual replication attempts had to adhere to it.

    54. transparency

      Transparency here means that the process in which a specific result was achieved is made as accessible for other researchers as possible, by explaining publicly, and in detail, everything that was done in a study to arrive at a specific result..

    55. There is plenty of concern (9–13) about the rate and predictors of reproducibility but limited evidence.

      Prahler and Wagenmakers (13) argue that doubts about the reproducibility of findings in psychology became increasingly critical after events such as the fraud case of Stapel in 2011, where fabricated and manipulated data resulted in numerous retractions of journal articles, or the debate around findings published by Bem in 2011, where claims that people had an ability to forsee the future were shown not to be replicable.

      The suspicion that researchers engaged in "questionable research practices"(QRPs) turned out to be more justified than the field had expected.

    56. Direct replication provides the opportunity to assess and improve reproducibility.

      Nosek and Lakens (7) argue in this editorial that registered reports are a partial solution to the problem of few incentives for researchers to conduct replications. A registered report is an article format, where a proposal for replication is peer-reviewed before data is collected, and the pre-registered report of the replication will be published no matter what the data shows.

    57. false negative

      A false negative is a result that erroneously indicates no effect exists: although the data do not suggest that an effect exists, in reality, this effect does exist.

    58. false positive

      A false positive is a result that erroneously indicates an effect exists: although the data suggests an effect exists, in reality, the effect does not exist.

    59. moderate

      In statistics, moderation refers to the dependence of the relationship between two variables on a third variable.

      For example, the positive relationship between socioeconomic status and health (the higher one's status, the better one's health) could be moderated by one's sense of control: people in low income groups with high sense of control might show health levels comparable with people from high-income groups, whereas people in low income groups with low sense of control have worse health (Lachman & Weaver, 1998).

    60. Direct replication

      Schmidt (8) argues that, although replication is critical for scientific progress, little systematic thought had been applied to how to go about replications.

      He suggests to differentiate direct replication (the repetition of an experimental procedure) and conceptual replication (the repeated test of a hypothesis or result using different methods).

      Moreover, he summarizes five main functions that replications serve: to control for sampling error, artifacts or fraud, to extend results to a larger or different populations and to check the assumptions earlier experiments made.

      Schmidt concludes that, although a scientific necessity, replications can be practically difficult to conduct, in particular because this type of work is not always easy to publish or highly regarded. Instead, he recommends that studies which include novel research questions could also include elements of replication of previous findings.

    61. Reproducibility is a core principle of scientific progress

      There are a number of scientists who argue for reproducibility from the perspective of philosophy of science, arguing that scientific theory and explanation require reproducibility to enable scientific progress.

    62. bias

      Bias refers to a systematic error or a process that interferes with accurate results.

    63. confidence interval

      A confidence interval is the range of values in which the true value of the variable of interest would fall, if the experiment were to be repeated again and again. In the case of the 95% confidence interval, the true value would fall in this range in 95% of all cases. Confidence intervals are often referred to with the abbreviation "CI".

    64. effects

      An effect is an observed phenomenon, where differences in one circumstance lead to observable differences in an outcome.

    1. recruiting

      Recruitment refers to the successful addition of new individuals to a population.

    2. succession

      Ecological succession is the predictable change in a community over time, usually referring to changes after a disturbance or initial colonization of a habitat.

    3. polychaetes

      Polychaetes are a diverse group of worms. Polychaetes are commonly found at the bottom of the ocean and are an important part of marine food webs.

    4. bottom topography

      The shape and features of the bottom of the ocean.

    5. Demersal plankton

      The demersal zone is the layer of water nearest to the bottom of the ocean. Plankton are tiny microscopic organisms that live in the water column.

    6. schooling

      Swimming together in a group. Schooling behavior provides benefits to individual fish, such as safety in numbers.

    7. Motile organisms

      Motile organisms are organisms that can move around, such as shrimp or fish. Sessile organisms, like corals, do not move for the majority of their lives.

    8. Among sessile organisms, there were marked differences in survivorship and repair after initial injury.

      The authors observed injured corals and sponges for several months after Hurricane Allen to see if they survived, noting the initial degree of injury.

    9. encrusting

      Encrusting animals form a thin layer (i.e. a "crust") over another hard surface.

    10. Differences in damage to different growth forms (7) were particularly. striking for corals,

      The authors measured the degree of damage to certain species of corals, taking care to note the growth form and size. This allowed them to look at how shape and size influenced patterns of damage.

    11. Within any zone, the amount and type of damage inflicted upon sessile organisms was greatly influenced by their shapes, sizes, and mechanical properties. Damage to gorgonians, corals, and sponges ranged from partial to complete mortality (20) and was caused by abrasion, burial, and the tearing or fracture of tissue and skeleton. The fate of detached colonies and fragments, and thus the ultimate consequences to populations, of Hurricane Allen, varied widely between taxa.

      The way that physical forces affected corals depended heavily on their form, since form affects how organisms are influenced by mechanical forces. Different species have different shapes and sizes, so the impact of the storm depends on the species.

    12. Shallow fore-reef areas were generally more severely damaged than deep ones. We see this most directly by comparing the same species on the same reefs at different depths. For example, head corals were more frequently toppled in sand channels in 10 m of water than in 14 m (Table 1, compare rows 5 and 6; x2 for numbers toppled and not toppled after Hurricane Allen = 4.75, P < .05).

      The energy within waves is released most violently in shallow waters. Corals at greater depths were less likely to be damaged.

    13. Not all patchiness can be easily explained, but a number of patterns emerge.

      The authors observed a variety of sites with different profiles both before and after the hurricane. They describe the variation within and between sites here.

    14. patchy

      Uneven, with some spots being affected more than others.

    15. We consider first the effects of spatial factors and then describe the immediate impact on common organisms and their subsequent responses over the following 7 months.

      The authors noted damage to corals after the hurricane and tracked their subsequent recovery (or death).

    16. substratum

      The substratum is the underlying layer of rock or sediment. This refers to the surface that the corals attach to.

    17. breaker zone

      As ocean waves move towards shallower waters, they eventually become unstable and break. A breaker zone is a region where waves begin breaking.

    18. differences between reefs on Jamaica's north and south coasts were due to differences in hurricane frequency

      Hurricanes hit the southern coast of Jamaica more often than the northern coast, and scientists have previously thought that this may be the reason for the observed differences in coral communities.

      Differences include larger areas of dead coral and lower population densities of corals in certain reef zones on the southern coast.

    19. taxonomic

      Taxonomy is the science of describing, identifying and classifying species. Taxonomic differences, in this context, are differences in the species present.

    20. they collected data comparable to those taken previously on routine patterns and processes

      Because these reefs had been well-surveyed before Hurricane Allen, the authors conducted surveys using the same methods after the hurricane to examine the impacts of the storm.

    1. Y. Yan, S. Shin, B. S. Jha, Q. Liu, J. Sheng, F. Li, M. Zhan, J. Davis, K. Bharti, X. Zeng, M. Rao, N. Malik, M. C. Vemuri, Efficient and rapid derivation of primitive neural stem cells and generation of brain subtype neurons from human pluripotent stem cells. Stem Cells Transl. Med. 2, 862–870 (2013).

      This study presents an efficient method to produce neural stem cells from human pluripotent stem cells.

      The system presented in this study enables the creation of NSC banks, increasing cell therapy applications.

    2. Z. W. Naing, G. M. Scott, A. Shand, S. T. Hamilton, W. J. van Zuylen, J. Basha, B. Hall, M. E. Craig, W. D. Rawlinson, Congenital cytomegalovirus infection in pregnancy: A review of prevalence, clinical features, diagnosis and prevention. Aust. N. Z. J. Obstet. Gynaecol. 56, 9–18 (2016).

      This study examines the effects on the developing fetus of congenital cytomegalovirus infection.

    3. C. Grief, R. Galler, L. M. C. Côrtes, O. M. Barth, Intracellular localisation of dengue-2 RNA in mosquito cell culture using electron microscopic in situ hybridisation. Arch. Virol. 142, 2347–2357 (1997).

      In this study, Grief et al. used electron microscopy to localize dengue virus in infected mosquito cells.

      They concluded that the smooth membrane structures are an important site for the production of virus particles.

    4. H. Tang, C. Hammack, S. C. Ogden, Z. Wen, X. Qian, Y. Li, B. Yao, J. Shin, F. Zhang, E. M. Lee, K. M. Christian, R. A. Didier, P. Jin, H. Song, G. L. Ming, Zika virus infects human cortical neural progenitors and attenuates their growth. Cell Stem Cell 18, 1–4 (2016).

      Tang’s article highlights the impact of ZIKV infection on both cell death and dysregulation of the cell cycle.

    5. G. Calvet, R. S. Aguiar, A. S. Melo, S. A. Sampaio, I. de Filippis, A. Fabri, E. S. Araujo, P. C. de Sequeira, M. C. de Mendonça, L. de Oliveira, D. A. Tschoeke, C. G. Schrago, F. L. Thompson, P. Brasil, F. B. Dos Santos, R. M. Nogueira, A. Tanuri, A. M. de Filippis, Detection and sequencing of Zika virus from amniotic fluid of fetuses with microcephaly in Brazil: A case study. Lancet Infect. Dis. (2016).

      In this article, the authors were able to detect the Brazilian Zika virus in amniotic fluid and compare its genome to other Zika strains and flaviviruses. In doing so, they hoped to find out if there had been recombination events between them.

      The authors collected amniotic fluid samples from women whose fetuses were diagnosed with microcephaly, extracted DNA purified virus particles, and analyzed the samples with qRT-PCR.

      They found that the different viruses share 97–100% of their genomes and that there had been no recombination events.

    6. C. G. Woods, J. Bond, W. Enard, Autosomal recessive primary microcephaly (MCPH): A review of clinical, molecular, and evolutionary findings. Am. J. Hum. Genet. 76, 717–728 (2005).

      Woods et al. discuss some clinical aspects of microcephaly but mainly focus on molecular and evolutionary factors.

      From an evolutionary point of view, changes in genes linked to a microcephalic phenotype might have been responsible for the evolution of the human brain size.

      Woods et al. also note that microcephaly is the consequence of a mitotic deficiency in neural precursors.

    7. E. C. Gilmore, C. A. Walsh, Genetic causes of microcephaly and lessons for neuronal development. WIREs Dev. Biol. 2, 461–478 (2013).

      Microcephaly is caused by abnormal cell growth in the brain leading to a reduced brain size.

      Mutations in genes involved in the cell cycle could be one factor causing this phenomenon.

      Here, the authors showed that variations in brain size is more related to the number of connections between neurons.

    8. Our results, together with recent reports showing brain calcification in microcephalic fetuses and newborns infected with ZIKV (10, 14) reinforce the growing body of evidence connecting congenital ZIKV outbreak to the increased number of reports of brain malformations in Brazil.

      The authors report that their results confirm previous evidence connecting the ZIKV outbreak in Brazil to an increase in cases of microcephaly.

    9. Other studies are necessary to further characterize the consequences of ZIKV infection during different stages of fetal development.

      The authors used models that allowed them to study early stages of brain development. They suggest there is more work to be done to determine the effects of ZIKV infection on later stages of fetal development.

    10. Our results demonstrate that ZIKV induces cell death in human iPS-derived neural stem cells, disrupts the formation of neurospheres and reduces the growth of organoids (fig. S2), indicating that ZIKV infection in models that mimics the first trimester of brain development may result in severe damage.

      Summarizing the results, the authors conclude that in their model of early brain development, ZIKV causes severe damage.

    11. cortical layering

      Development of the layers of the brain.

    12. brain organoids recapitulate the orchestrated cellular and molecular early events comparable to the first trimester fetal neocortex

      Neurospheres are useful for modeling very early (embryonic) development, while organoids are used to study later stages of development.

    13. These results suggest that the deleterious consequences of ZIKV infection in human NSCs, neurospheres and brain organoids are not a general feature of the flavivirus family.

      Because DENV2 did not reduce cell growth or affect morphology, the researchers concluded that those effects are unique to ZIKV and not characteristic of the flavivirus family (to which both DENV2 and ZIKV belong).

    14. ZIKV induced caspase 3/7 mediated cell death in NSCs

      Zika virus induces the expression of caspase 3/7, indicating a cell is preparing to die.

      Dengue virus 2, on the other hand, did not increase caspase 3/7.

    15. caspase 3/7

      Caspases are endoproteases (a type of enzyme that breaks down proteins) that play a critical role in both inflammation and cell death.

      The presence of caspase 3 and 7 can be used as a sign that cells are preparing to die.

    16. In addition to MOCK infection, we used dengue virus 2 (DENV2), a flavivirus with genetic similarities to ZIKV (11, 19), as an additional control group

      The authors also compared ZIKV infection to dengue virus 2 (DENV2) infection. DENV2 is similar to ZIKV.

    17. reduced by 40% when compared to brain organoids under mock conditions

      Brain organoids infected with Zika virus were, on average, 40% smaller.

    18. The growth rate of 12 individual organoids (6 per condition) was measured during this period

      Both infected and uninfected organoids were immersed in a fixative solution to "freeze" them and allow them to be visualized.

      It was then possible to use an electron microscope to compare the infrastructure of infected cells and uninfected cells.

    19. ZIKV-infected cells in neurospheres presented smooth membrane structures (SMS) (Fig. 3, B and F), similarly to those previously described in other cell types infected with dengue virus (17)

      Using in situ hybridization (labeling nucleic acids with probes) on sections of dengue-2 infected mosquito cells, Grief showed that in dengue-2 infected mosquito cells, the smooth membrane structures contained both viral RNA and virus particles.

      This suggests that the smooth membrane structures are important sites for the concentration of viral RNA and possibly for formation of the viral envelope.

    20. pyknotic

      A nucleus whose chromatin has condensed in preparation for apoptosis (programmed cell death)

    21. Apoptotic nuclei

      A nucleus that has started to prepare for programmed cell death (apoptosis).

    22. glial

      Cells located in the central nervous system which protect and support neurons in their function.

      Glial cells differ from neurons since they do not participate in electrical signaling.

    23. ultrastructural

      Smaller than what can be seen with a light microscope

    24. in vitro

      In a controlled experimental environment.

    25. morphological abnormalities and cell detachment

      Neurospheres that contained cells infected with Zika virus were oddly shaped, and some cells broke away.

    26. mock-

      Mock NSCs were not infected with Zika.

    27. Student’s t test

      A statistical test that is used to determine if two sets of data are significantly different from each other.

    28. MOI

      The "multiplicity of infection," which is the average number of virus particles that infect a cell.

    29. to explore the consequences of ZIKV infection during neurogenesis and growth

      In order to obtain neural stem cells from human iPS, researchers cultured iPS in a special medium.

      To create neurospheres and organoids, neural stem cells were divided and again cultured in a special medium.

      Finally, ZIKV was diluted and added to the different types of culture for 2 hours.

    30. neural stem cells

      Undifferentiated cells in the nervous system that have the potential to develop into any type of cell.

    31. induced pluripotent stem (iPS)

      These are differentiated cells which have been reprogrammed into pluripotent ones. This means that they have the ability to develop into any type of cell.

    32. there is direct evidence that ZIKV is able to infect and cause death of neural stem cells

      Tang et al. obtained human neural progenitor cells (hNPCs) from stem cells. They used a particular ZIKV strain that successfully infected hNPCs, and found that the infected cells released ZIKV particles.

      The growth of hNPCs was stunted, and an analysis of DNA content suggested that this attenuation might have been due to a disturbance in the cell cycle.

    33. ZIKV had also been detected within the brain of a microcephalic fetus

      Zika virus has also been detected in microcephalic fetuses.

      The Brazilian strain of the virus has been traced to an Asian strain.

    34. amniotic fluid

      The liquid that surrounds the fetus for its protection, keeping a constant temperature and environment.

    35. placenta

      An organ that develops only during pregnancy to provide oxygen and nutrients needed for the growth of the baby.

    36. ZIKV has been described

      In several case studies of pregnant women diagnosed with fetal microcephaly, the women suffered from symptoms of infection with Zika virus.

      After miscarrying, ZIKAV RNA and antigens were detected in the placental tissues and the amniotic fluid of the microcephalic fetuses. The sequencing analysis of the virus genotype revealed a genotype of Asian origin.

      Read more case studies that made headlines:

      http://www.dailymail.co.uk/health/article-3451984/Zika-cross-placenta-infect-unborn-babies-Traces-virus-amniotic-fluid-surrounding-two-fetuses-diagnosed-microcephaly.html

    37. flavivirus

      A type of viruses usually spread through mosquito and tick bites. They include West Nile and dengue virus.

    38. Syphilis

      Syphilis is a sexually transmitted infection caused by a bacteria known as Treponema pallidum.

    39. Herpes virus

      Herpes virus infections take place around mouth, lips, genitals, or rectum.

    40. Cytomegalovirus

      CMV infections spread through contact with body fluids, and often occur in those with weak immune systems.

    41. Rubella

      Rubella is an RNA virus that is normally spread through the air by coughing or breathing.

    42. Toxoplasmosis

      A disease caused by a parasite called Toxoplasma gondiiand.

      It is usually transmitted by eating uncooked food that contains cysts or by exposure to infected cat feces.

    43. TORCHS
    44. external insults

      Brain injuries.

    45. etiology

      The cause of a disease or disorder.

    46. Microcephaly is associated with decreased neuronal production as a consequence of proliferative defects and death of cortical progenitor cells

      The cerebral cortex (the outer layer of the brain) shows the most severe reduction in microcephaly. This might be explained by reduced division in the cells that neurons come from, resulting in fewer neurons. This, in turn, leads to a smaller cerebral cortex.

    47. heterogeneous

      Diverse.

    48. abrogates

      Prevents.

    49. electron microscopy

      A technique that uses a beam of electrons as a light source and has a magnification of up to 1,000,000x (a light microscope's magnification power is 1,500x).

    50. immunocytochemistry

      A microscopy technique for seeing cellular components by targeting them in tissue samples.

    51. organoids

      An organ bud (miniature organ) that is anatomically similar to the organ it models. Organoids are used to study organ development and function.

    52. neurospheres

      A three-dimensional culture system made up of free-floating clusters of neural stem cells. They are used to study neural precursor cells in vitro.

    53. microcephaly

      An abnormally small head due to failure of the brain to grow sufficiently. It is associated with mental disability.

      The growth of the brain can be impaired by many genetic and environmental factors, including infections by viruses and genetic syndromes.

    54. Zika virus (ZIKV)

      A RNA virus transmitted by mosquitos and sexual interaction with a carrier.

      It was first isolated from the Zika Forest of Uganda in 1947. It was previously only known to occur in a narrow range in Africa and Asia. However, in 2015 there was a Zika outbreak in Brazil.

    1. P. Azoulay, J. Graff-Zivin, D. Li, B. Sampat, Public R&D investments and private sector patenting: Evidence from NIH funding rules, NBER working paper 20889 (2013); http://irps.ucsd.edu/assets/001/506033.pdf.

      This paper shows a link between grants and private-sector innovations and created a model to quantify the variation in funding for different fields.

      Their results show an increase in private-sector patents by NIH.

    2. R. K. Merton, Science 159, 56–63 (1968).

      In this article, the sociological expression "the rich get richer and the poor get poorer", also called Matthew effect, is presented in the context of scientific publication.

      Scientists who have received grants in the past are more likely to get more grants and produce results.

    3. W. R. Kerr, The ethnic composition of US inventors, Working Paper 08-006, Harvard Business School (2008); http://www.people.hbs.edu/wkerr/Kerr%20WP08_EthMatch.pdf.

      This study applies an ethnic-name database to individual patent records granted by the United States Patent and Trademark Office to document these trends with greater detail than previously available.

    4. B. A. Jacob, L. Lefgren, J. Public Econ. 95, 1168–1177 (2011).

      The authors of this paper evaluated the impact of NIH grants on publications. They concluded that researchers who did not get an NIH grant but simultaneously applied for others grants saw one more publication (+7%).

    5. J. Berg, Productivity metrics and peer review scores: NIGMS feedback loop blog (2011); https://loop.nigms.nih.gov/2011/ 06/productivity-metrics-and-peer-review-scores.

      This article is a reasonable hypothesis that preliminary data that contribute to receiving an outstanding peer review score likely lead to high visibility publications shortly after the grant is funded.

    6. S. Cole, J. R. Cole, G. A. Simon, Science 214, 881–886 (1981).

      This article is about one negative effect of peer review, that an individual scientist devotes so much time and energy to getting financial support that it takes away from their science.

      Basically, a huge disadvantage of the peer review program is that scientists must spend too much time writing what they intend to research, rather than performing the research.

    7. D. F. Horrobin, JAMA 263, 1438–1441 (1990).

      The main goal of peer review in the biomedical sciences is to facilitate the introduction into medicine of improved ways of curing, relieving, and comforting patients. The achievement of this aim requires both quality control and the encouragement of innovation. If an appropriate balance between the two is lost, then peer review will fail to reach its purpose.

    8. B. Alberts, M. W. Kirschner, S. Tilghman, H. Varmus, Proc. Natl. Acad. Sci. U.S.A. 111, 5773–5777 (2014).

      This study describes the advances in scientific knowledge and human health that have accrued as a result of the long-standing public investment in biomedical research.

    9. Although our findings show that NIH grants are not awarded purely for previous work or elite affiliations and that reviewers contribute valuable insights about the quality of applications, mistakes and biases may still detract from the quality of funding decisions.

      Summing up all results, we can say that previous work or elite affiliations do not "close the door" for new ideas in research.

    10. we cannot directly assess whether the NIH systematically rejects high-potential applications

      Because the authors only looked at projects that received grant funding, their analysis does not take into account how many high-potential projects were rejected by peer review.

    11. our estimates are likely downward biased

      The authors acknowledge that there is sometimes a long delay between a grant award and patenting, so their analysis may not be a good indicator of how relevant research is to commercial applications.

    12. We control for the same variables as described in Model 6

      The patents like the grants are checked according to certain indicators: institutional quality, gender, and ethnicity of applicants.

    13. The large value-added for predicting tail outcomes suggests that peer reviewers are more likely to reward projects with the potential for a very high-impact publication and have considerable ability to discriminate among strong applications.

      The authors' findings suggest that peer reviewers are good at identifying innovative and ground-breaking projects.

    14. Our final analysis explores whether peer reviewers’ value-added comes from being able to identify transformative science, science with considerable applied potential, or from being able to screen out very low-quality research.

      Finally, the authors wanted to figure out if peer reviewers are good at choosing applicants for their innovation, their practicality, or if they are simply good at weeding out low-quality research.

    15. peer reviewers add value by identifying the strongest research proposals.

      The authors show that peer review scores are good predictors of scientific productivity when differences in field of research, year, and applicant qualifications are removed. This suggests that peer reviewers have the necessary expertise to choose good applicants.

    16. These residuals represent the portions of grants’ citations or publications that cannot be explained by applicants’ previous qualifications or by application year or subject area

      The authors removed the influence of the grant applicant's background, demographics, and writing skill in order to look at what effect a reviewer's expertise has.

    17. the grant with a 1-SD worse score is predicted to have 7.3% fewer future publications and 14.8% fewer future citations

      The authors conclude here that regardless of gender, ethnicity, or institutional prestige, when the peer-review score lowers by one standard deviation, we can observe a corresponding decrease of the number of publications and citations of an author.

    18. Matthew effect

      The Matthew Effect can be summarized as, "the rich get richer and the poor get poorer." It describes the idea that benefits are distributed unevenly, and that those who already have benefits will continue to accumulate them while those without will not have the chance.

      In scientific publication, the Matthew Effect refers to the phenomenon where researchers who are established publish more often simply because they are established (and regardless of the quality of their work).

    19. Controlling for publication history attenuates but does not eliminate the relationship

      Again, controlling for the variable of a PI's research background does not eliminate the relationship the authors originally found.

    20. Controlling for cohort and field effects does not attenuate our main finding.

      The authors' adjustments to control for various external effects did not change their original findings.

    21. adds controls describing a PI’s publication history

      The authors control for yet another potential variable, an applicant's research background (they use the PI's publication history to do this).

    22. We also include NIH institute-level fixed effects to control for differences in citation and publication rates by fields, as defined by a grant’s area of medical application.

      The authors try to remove the effect of an article's field on its impact. For example, a biochemistry article may appear to have a smaller impact because of the high rate of publication and citation in that field, whereas a physics article's impact may be inflated due to a lower publication and citation rate.

    23. potential concerns

      Several factors can lead to the trends in Figure 1 being misinterpreted, like the age of the grant and the field of study. The authors address these concerns by adjusting their model to account for these effects.