5,788 Matching Annotations
  1. Sep 2020
    1. Reviewer #1:

      Marotel et al. study the mechanisms of NK cell exhaustion in patients with chronic hepatitis B infection (CHB). They first confirm several previous findings, such as reduction of IFNg production by NK cells accompanied by a change in phenotype in CHB patients. Furthermore, they show that mTOR activation is impaired in CD56bright NK cells upon IL-15 stimulation, and at the same time total NK cells do not show differences in selected metabolic parameters. They also performed RNAseq analysis which indicated transcriptional similarities of CHB NK cells and exhausted CD8+ T cells. In line with RNAseq, CHB NK cells showed increased expression of TOX transcription factor and inhibitory receptor LAG3 in CHB NK cells. The authors suggest that this is due to NFAT signaling, and show that NK cells have reduced ability to produce IFNg following incubation with target cells if they were previously stimulated with ionomycin overnight to support their hypothesis of NFAT involvement.

      In conclusion, while presented observations are interesting and relevant, they are still preliminary and largely descriptive. In addition, conclusions are not fully supported by the data.

      1) Figure 3. The authors focus on CD56bright NK cells when measuring mTOR activation, as CD56bright NK cells are more responsive to IL-15. They show that in HBV patients CD56bright NK cells have impaired response to mTOR activation. They correlate this finding with several metabolic parameters in total NK cells. Since CD56bright NK cells represent only a small fraction of NK cells it is not clear why the metabolic parameters were not analyzed only on CD56bright population as well, or vice versa, why the total NK cells were not compared in both cases (mTOR activation and metabolic characteristics). At the current state, no conclusion can be reached by comparing these two sets of data. Also, it is not clear if cells that have reduced ability to activate mTOR upon IL-15 stimulation contribute to other observations presented, e.g. if this finding would explain reduced NK cell ability to produce IFNg, changes in NK phenotype or transcriptome.

      2) Several metabolic parameters are studied, however, it is not clear how they were selected as there are many other metabolic processes involved in NK cell response which could be important and deregulated in CHB. In addition, only basal metabolic state was analyzed, but it remains unclear if CHB NK cells show the same metabolic characteristics upon activation.

      3) Figure 5 - isotype controls are missing in all histograms. The authors state in the text 'Increased TOX expression was seen mainly in the CD56dim subset in CHB patients.', however, they do not provide data for this statement. As mentioned previously, the effects of CHB on NK mTOR signaling are the highest in the CD56bright population, so it is not clear how these data do relate one to each other.

      4) The authors provide evidence that expression of transcription factor TOX is increased and T-bet expression is reduced to support the transcriptome data on the similarity of CHB NK cells and exhausted CD8+ T cells. However, they do not provide the evidence on the co-expression of these transcription factors, and if their changed expression directly correlates with reduced functional properties of NK cells, e.g. if NK cells having high TOX and low T-bet will produce less IFNg.

      5) To address their hypothesis on NFAT involvement in NK cell exhaustion and TOX expression the authors stimulate NK cells in vitro with ionomycin and show that pretreatment with ionomycin renders NK cells hyporesponsive. They titrate the effect of ionomycin and find an ionomycin concentration which is inducing a reduction of IFNg response without affecting degranulation. While the reduction of IFNg response in this experiment is observed as in chronic HBV infection, this model should be validated before making any claims. For example, the phenotype and transcription profile of the ionomycin treated cells should be analyzed, as well as the expression of transcription factors. A similar experiment has been published previously, so the novelty is minor without additional experiments addressing above mentioned issues.

    2. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      This is a case report analysing TCR repertoire on two individuals with suspected COVID-19 infection. The report shows that a set of TCR sequences expands between days 15 and day 30/37 and another set contract. The amount of expansion/contraction is not clearly shown. Most of these sequences are found in the memory phenotype. A few (especially CD4) are found before immunisation. As the authors point out, the evidence that the TCRs recognise COVID-19 is purely circumstantial. Even if they do, I do not see that this study contributes significantly to understanding either the protective or the pathological immune response to COVID-19.

      Substantive concerns:

      1) The abstract includes unsubstantiated claims. For example "T cell response is a critical part of both individual and herd immunity to SARS-CoV-2 and the efficacy of developed vaccines. " Or "In both donors we identified SARS-CoV-2-responding CD4+ and CD8+ T cell clones. We describe characteristic motifs in TCR sequences of COVID- 19-reactive clones, suggesting the existence of immunodominant epitopes." The authors do not identify COVID-19 responding clones; nor do they show any evidence that there are immunodominant epitopes.

      2) Fig 1 What does "normalized trajectory of TCR clones in each cluster" mean? It would be interesting to see the magnitude of the responses. Similarly, I don't really understand the y axis in panels d and e.

      3) Fig 3. I don't understand panels a and b. Is this the proportion of contracting TCR sequences which are memory phenotype? If so, what are the rest? Or are they simply not captured. The figure legend is obscure.

    2. Reviewer #2:

      This manuscript describes a longitudinal study of TCR repertoires in two individuals with mild COVID-19. TCRalpha and beta repertoires at 4 time points post-infection are used to identify T cell clonotypes likely responding to COVID-19. These responding clones fall into two groups, a set of monotonically contracting clones and a set of clones whose frequencies peak (at day ~37) and then contract. Sequencing of memory populations at two time points and availability of TCR repertoire data from both individuals prior to infection allow the authors to map clonotypes to memory phenotypes and to identify a handful of responding clones that existed in the memory compartment prior to infection. Clusters of sequence-similar clonotypes are identified that suggest focused responses to immunodominant epitopes. This is a succinct and timely study and I have no major concerns, just a few minor questions/suggestions/typos detailed below.

      How unexpected is the TCR clustering evident in Fig 2d-g? For example if the same number of equally high Pgen sequences were selected at random? I wonder whether the authors could run ALICE on just the responding clones (not the full dataset) to assess which neighborhoods are very unlikely to occur by chance.

      Could the "computational chain pairing" method of Minervina et al be applied to this data? If only to try to connect some of the sequence motifs between the alpha and beta chains?

    3. Reviewer #1:

      General assessment: This work investigates the T cell receptor (TCR) repertoires of 2 individuals diagnosed with mild COVID-19 infection. The authors use high-throughput sequencing of 2 biological replicate samples obtained at each of multiple pre-infection and post-infection timepoints to identify TCRalpha and TCRbeta clonotypes that contract or expand post-infection and to investigate potential reactivation of pre-existing memory cells. This is a potentially interesting work that may provide novel insights into T cell responses to SARS-CoV-2. However, some of the specific details of the various analyses reported are not clear and I have several major concerns about the reported work.

      Substantive concerns:

      1) The primary concern is the TCR specificity of the clonotypes that were determined to be contracting or expanding post-SARS-CoV-2-infection and therefore identified as responding to or reactive to SARS-CoV-2. There is no verification that these expanding or contracting clonotypes have TCR specificity for SARS-CoV-2. One alternative possibility is that some, maybe even many, of these expanding or contracting clonotypes are bystander-activated T cells with TCRs that are not specific for SARS-CoV-2. Similarly, the clonotypes that were identified as contracting or expanding post-SARS-CoV-2 infection and also detected in the memory pool prior to SARS-CoV-2 infection may not be crossreactive (i.e. specificity for another infection + SARS-CoV-2), as suggested by the authors, but rather non-SARS-CoV-2-specific bystander-activated memory T cells.

      While the dynamics of the T cell populations following SARS-CoV-2 infection may be informative regardless of the mode of activation of the T cells (i.e. TCR-mediated vs. bystander activated), the reported TCR clonotype motifs are only informative if these TCRs have SARS-CoV-2 specificity.

      2) Another concern is the substantial variation between the various approaches used to identify the contracting and expanding clonotypes post-infection that are associated with COVID-19 infection. The manuscript text states that the EdgeR and NoiseET approaches for identifying expanding and contracting clonotypes yielded similar results. Fig. S4a, d suggest that the two approaches yield similar trajectories for the identified expanding and contracting clonotype subsets (i.e. fraction of reactive clonotypes). However, the Venn diagrams in Fig. S4b, c, e, f show that the two approaches are, in some cases, identifying substantially different subsets of expanding or contracting clonotypes. For example, for Donor M in Fig S4f, of the 1044 expanded clonotypes identified by NoiseET, only 478 were also identified by EdgeR.

      The text also states that the contracting and expanding clonotypes identified using EdgeR largely overlap/correspond to the clusters 2 and 3 of clonal trajectories yielded using PCA (Fig. 1b-e) but no quantitative evidence is provided to support this. Venn diagrams, similar to those in Fig. S4, could be provided that compare the expanding and contracting clonotypes identified using the three different approaches (i.e. EdgeR, NoiseET, and PCA) as applied to TCRa as well as TCRb clonotypes.

      While these differences between methods may not have significant consequences for some of the reported results (eg. temporal clonal trajectories), these differences raise concerns about the results that depend on specific clonotype sequences (eg. Fig 2d-g, Fig S8 and Fig S5 d-g that report amino acid motifs for contracting and expanding clonotypes).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Author Response

      Reviewer #1:

      This study was designed to determine whether there is a relationship among cranial suture closure patterns, the molecular causes for suture patency/closure, and phylogeny. The authors use correlative data to test causal hypotheses related to brain size, suture closure patterns, and diet and search for the genetic underpinnings of the relationships they identify using reference genomes. There are many ideas put forward and methods used that are not clearly explained in the body of the work or in the supplementary material. This made it difficult to provide a clear evaluation of the work. Even checking original sources on which they base their approach, I found some disconnect between original sources and ideas laid out here. I see some interesting ideas in the study but a lack of solid reasoning behind the hypotheses proposed, confusion about the data and/or ideas summarized from the literature (the confusion could be on my part, but it rests with the authors to explain this more fully), and lack of detail regarding methods used to support their conclusions.

      We take good note of this confusion and we will explain everything in more detail in a revised version of the manuscript.

      1) The entire study rests on the authors scoring of sutures as patent or closed but no information is given other than a suture was considered closed if it was not visible ( 'obliterated"), and a suture was considered open if visible. These are problematic definitions for distinguishing patent from closed sutures if we accept the authors' definition of sutures as growth and stress diffusion sites. A suture can be visible but still be "closed" as evidenced by bony connections or bridges linking the bones that border the suture. In the case of bridging, the suture would be visible, so would be scored as "open" according to the authors' criterion, but functionally, the suture is closed.

      Visual examination of sutures (e.g., from photos or in situ) is a common procedure in macroevolutionary studies of suture patency, where raw data is not always available for histological inspection (e.g., invasive procedures or CT are not permitted). In this regard, we follow previous literature. We would like to note that only photographic materials were available for most specimens during this project, because of the current exceptional circumstances (museums lockdown).

      Also, in some mammals (e.g., the laboratory mouse) most cranial sutures do not close in typically developing individuals.

      In this study we used specimens hosted in museum collections, which come from the wild or zoos. We did not use data from laboratory animals grown in controlled environments, which may indeed affect their suture patency (e.g., by feeding on pellets).

      2) Age estimates are not provided for the specimens used in analysis. In many mammalian species, suture closure occurs in a somewhat predictable fashion - this, coupled with tooth formation/eruption patterns is one of the ways that forensic scientists aged skeletal remains prior to the advent of modern technologies. The order of suture closure is not necessarily similar across vertebrates, or even across mammals. This means that, without known or estimated ages for each skull included in analysis, age becomes an unrecognized source of variation that will affect analytical outcome.

      Unfortunately, the exact age for museum specimens is often not available. For this reason, we focused on adult specimens, where suture patency tends to remain constant. We also excluded individuals with signs of senescence. To accommodate age and other source of intraspecific variation in adults, we collected information for as many individuals as possible, often more than 10 and sometimes up to 100. Thus, we coded suture patency as a frequency rather than as open/closed for each species.

      We only dichotomized suture patency as open/closed for the second part of the study. Here we used a sensible threshold to avoid ambiguity and be conservative. As a result, species with frequency of suture patency between 75% and 25% were excluded. This also means that if only 4 individuals were examined (small sample size was unavoidable for some rare species) and at least one showed a discrepancy, that species was excluded from the analysis. However, because suture patency is a very conserved trait, only a few taxa had to be excluded at the end.

      In any case, we will emphasize more this fact in the revised version.

      3) The authors' impact statement: "brain growth and skull ossification sequence cause suture closure in mammals evolution without common genetic factors causing premature suture closure diseases in humans" is hard to digest as brain growth is not considered by the authors but instead brain size. From a developmental perspective, brain size or even some form of the encephalization quotient (EQ) is not what is commonly proposed to drive suture closure/patency (or degree of patency). Instead it is the dynamics of brain growth that is proposed as a stimulus for the initiation of mineralization of cranial bones. As bones increase in size, new bone is added at the leading edge of opposing bones that line the suture, while the stem cells in the center of the suture remain to add to the mesenchymal cell population of the suture, keeping the suture patent. In short, the dynamics of brain growth (including any signaling emanating from the brain, dura, bones, or even the suture itself) contributes to suture patency. Because sutures tend to close later in life (after childhood in humans), normal suture closure appears to be associated with the termination of brain growth. Making the jump in their study from estimates of EQ (in some way estimated here) to dynamics of brain growth as a cause requires several steps and knowledge on timing and rate of growth that is not considered by the authors.

      We agree with the reviewer. A developmentally focused study on suture formation and closure dynamics must consider brain growth. However, this information is not available for most species selected for this study. Note that species selection depended on the availability of referenced genomes and multiple sequence alignments (some of which are rare, endangered species). Because we were comparing macroevolutionary dynamics in adults we decided to use brain size as a feasible proxy for brain influence (either due to growth or signalling). We aim to fill this gap in future research projects. In the meantime, we will revise the wording of the article to make sure that there are no misleading statements about brain growth influence.

      4) The authors assume a suture closure pattern across the skull that starts at the anterior (rostrally) and move posteriorly (caudally) and builds this into their model. This seems to be based on a work by Koyabu et al. (2014), but that study is about the appearance of ossification centers for bones (not suture formation or closure) and the study actually clumps the frontal and parietal into the same group in their final analysis so why this supports and anterior to posterior direction of suture closure is not clear.

      Note that we did not “assume” any closure pattern; we interpreted the published evidence on how the skull ossifies in mammals to make a plausible hypothesis. We also tested other 11 plausible hypotheses. It could have happened that such hypothesis was worse than the others, but we found that the best supported hypothesis includes an anterior-posterior relation of suture closure. We will try to explain the construction of our model and hypothesis testing better in the revised version.

      5) The authors conclusion: (Lines 289-292 does not follow from their analyses.) Brain growth was not analyzed. I am uncertain what they mean by suture self-regulation as I don't think their detection of genetic variants in common across a diverse set of species means that those are controlling suture patency/closure.

      The proposed idea of suture self-regulation refers to the fact that one suture closure may affect another suture closure (as theoretical models previously suggested), and it is not necessarily related to the genetic variants identified here. As explained before, we will revise any reference to brain growth.

      Reviewer #2:

      -Authors tested 4 hypotheses (page 5, lines 78-84), but rejected or questioned them later on (which is a fair approach to be realistic and point out possible weaknesses or methodological limitations, nevertheless, I find there are more questions or suggestions rather than actual answers).

      We have tried to offer an open and clear set of hypotheses, tested them with the available data, and discussed the results fairly. As it is often the case in science, research may bring more questions than answers; we do not see this as a weakness. Our answers are also contextualized within the limitations that we described in the methods. We believe this is the correct way of doing science: even if this forces us to reject all our hypotheses, negative results are also results. Since our object of study is not very well known, we hope this study can fuel more research.

      -Lots of repeating text

      -Frequent missing references for major statements, unclear formulations

      We will double-check our manuscript. However, the reviewer offers no details about what is repeated or missing.

      -Few contradicting or unclear information, for instance, "high conservation..enabled us to categorize phenotype as either open or closed" / "suture patency ranging from 0-1, only above 75% and below 25% was counted as open or closed" / authors involved species were >2 samples were available but excluded any ambiguous case (small number of samples per species?)

      As explained before, thresholding at 25/75 % was used to binarize species as having a suture open or closed. This binarization is only used for the convergent amino acid substation analysis. We excluded ambiguous cases (i.e., a suture half closed) prior to data collection. We will explain it better in the revised version to avoid confusion.

      -"Phylogenetic path analysis showed almost no effect of diet on the brain size; low to medium (what does that mean then?) effect of brain on suture closure and medium to high effect of 1 suture affecting the other sutures in AP direction" (in many species this is described-the timeline of suture closure)

      Not sure about what the reviewer means; we will revise these sentences to make them clearer to readers.

      -I am not able to evaluate if the assessment of diet hardness as an equivalent to mechanical forces in the skull is correct and hope other reviewers will be able to do that-in fact, also to evaluate the phylogenetic path analysis performed in this manuscript. Authors took information on % of nektar/soft-plants and invertebrates/hard food (seeds etc) that given species consumes and multiplied by an index but not an actual modeling or assessment of the forces... To a laymen it looks like, for instance, cow chewing all day long relatively soft grass, building very strong muscles will at the end develop much more force/tension within the skull than an animal cracking one nut.

      As the reviewer correctly points out, chewing grass all day long is harder than cracking one nut (cracking nuts “all day long” would be another issue). In any case, we have weighed each food item compared to others (e.g., grass is weighed as twice as hard as meat) and there is consensus that feeding on seeds and scavenging is one of the most biomechanically demanding feeding strategies. In addition, we would like to note that we critically discussed the caveats of diet hardness as a proxy for the effect of feeding biomechanics on sutures, and we did not blindly assume this as a hard truth.

      -Lots of attention is given to the three identified genes with convergent amino acid substitution despite the fact that none of these genes have ever been related to any aspect of craniofacial biology, nor to the suture pathological conditions.

      We discussed the three genes that our analysis revealed. We cannot discuss genes for which we found no support. For these three genes, we offered plausible scenarios for how they could be associated to craniosynostosis; it is for future studies to explore these scenarios and validate experimentally or clinically these genes. The fact that they are not currently known as part of pathological conditions does not preclude that we need to discuss them in the manuscript. Every year, new genetic variants are discovered to be associated with craniosynostosis. The lack of correspondence between these genes and pathology is in fact one of the findings of this study: the few genes that show convergent mutations are not associated to pathology. We agree that absence of evidence is not evidence of absence. However, we also think that this is a result to be discussed in this manuscript and for the readers to ponder.

    2. Reviewer #2:

      Authors' goal was to reveal phenotypic and genetic causes of suture closure in evolution. Authors formulated and tested several hypotheses to find out whether brain size, diet hardness, etc is a causal link to the presence of typically patent (open) or closed sutures in 48 mammalian species. Next, authors attempted to identify genes (And convergent AC substitutions) associated with these species-specific suture status, and relate them to the biological functions commonly associated with suture formation and/or mutation in pathological conditions such as craniosynostosis.

      While I think it is an interesting question or hypothesis to test (seems to be inspired by Abelson 2016 and similar studies) during the reading, several concerns arose (and even authors themselves pointed out several of them a few times). Overall, I do not find convincing evidence for the authors' statements. Very briefly, just few of my comments:

      -Authors tested 4 hypotheses (page 5, lines 78-84), but rejected or questioned them later on (which is a fair approach to be realistic and point out possible weaknesses or methodological limitations, nevertheless, I find there are more questions or suggestions rather than actual answers).

      -Lots of repeating text

      -Frequent missing references for major statements, unclear formulations

      -Few contradicting or unclear information, for instance, "high conservation..enabled us to categorize phenotype as either open or closed" / "suture patency ranging from 0-1, only above 75% and below 25% was counted as open or closed" / authors involved species were >2 samples were available but excluded any ambiguous case (small number of samples per species?)

      -"Phylogenetic path analysis showed almost no effect of diet on the brain size; low to medium (what does that mean then?) effect of brain on suture closure and medium to high effect of 1 suture affecting the other sutures in AP direction" (in many species this is described-the timeline of suture closure)

      -I am not able to evaluate if the assessment of diet hardness as an equivalent to mechanical forces in the skull is correct and hope other reviewers will be able to do that-in fact, also to evaluate the phylogenetic path analysis performed in this manuscript. Authors took information on % of nektar/soft-plants and invertebrates/hard food (seeds etc) that given species consumes and multiplied by an index but not an actual modeling or assessment of the forces... To a laymen it looks like, for instance, cow chewing all day long relatively soft grass, building very strong muscles will at the end develop much more force/tension within the skull than an animal cracking one nut.

      -Lots of attention is given to the three identified genes with convergent amino acid substitution despite the fact that none of these genes have ever been related to any aspect of craniofacial biology, nor to the suture pathological conditions.

    3. Reviewer #1:

      This study was designed to determine whether there is a relationship among cranial suture closure patterns, the molecular causes for suture patency/closure, and phylogeny. The authors use correlative data to test causal hypotheses related to brain size, suture closure patterns, and diet and search for the genetic underpinnings of the relationships they identify using reference genomes. There are many ideas put forward and methods used that are not clearly explained in the body of the work or in the supplementary material. This made it difficult to provide a clear evaluation of the work. Even checking original sources on which they base their approach, I found some disconnect between original sources and ideas laid out here. I see some interesting ideas in the study but a lack of solid reasoning behind the hypotheses proposed, confusion about the data and/or ideas summarized from the literature (the confusion could be on my part, but it rests with the authors to explain this more fully), and lack of detail regarding methods used to support their conclusions.

      1) The entire study rests on the authors scoring of sutures as patent or closed but no information is given other than a suture was considered closed if it was not visible ( 'obliterated"), and a suture was considered open if visible. These are problematic definitions for distinguishing patent from closed sutures if we accept the authors' definition of sutures as growth and stress diffusion sites. A suture can be visible but still be "closed" as evidenced by bony connections or bridges linking the bones that border the suture. In the case of bridging, the suture would be visible, so would be scored as "open" according to the authors' criterion, but functionally, the suture is closed. Also, in some mammals (e.g., the laboratory mouse) most cranial sutures do not close in typically developing individuals.

      2) Age estimates are not provided for the specimens used in analysis. In many mammalian species, suture closure occurs in a somewhat predictable fashion - this, coupled with tooth formation/eruption patterns is one of the ways that forensic scientists aged skeletal remains prior to the advent of modern technologies. The order of suture closure is not necessarily similar across vertebrates, or even across mammals. This means that, without known or estimated ages for each skull included in analysis, age becomes an unrecognized source of variation that will affect analytical outcome.

      3) The authors' impact statement: "brain growth and skull ossification sequence cause suture closure in mammals evolution without common genetic factors causing premature suture closure diseases in humans" is hard to digest as brain growth is not considered by the authors but instead brain size. From a developmental perspective, brain size or even some form of the encephalization quotient (EQ) is not what is commonly proposed to drive suture closure/patency (or degree of patency). Instead it is the dynamics of brain growth that is proposed as a stimulus for the initiation of mineralization of cranial bones. As bones increase in size, new bone is added at the leading edge of opposing bones that line the suture, while the stem cells in the center of the suture remain to add to the mesenchymal cell population of the suture, keeping the suture patent. In short, the dynamics of brain growth (including any signaling emanating from the brain, dura, bones, or even the suture itself) contributes to suture patency. Because sutures tend to close later in life (after childhood in humans), normal suture closure appears to be associated with the termination of brain growth. Making the jump in their study from estimates of EQ (in some way estimated here) to dynamics of brain growth as a cause requires several steps and knowledge on timing and rate of growth that is not considered by the authors.

      4) The authors assume a suture closure pattern across the skull that starts at the anterior (rostrally) and move posteriorly (caudally) and builds this into their model. This seems to be based on a work by Koyabu et al. (2014), but that study is about the appearance of ossification centers for bones (not suture formation or closure) and the study actually clumps the frontal and parietal into the same group in their final analysis so why this supports and anterior to posterior direction of suture closure is not clear.

      5) The authors conclusion: (Lines 289-292 does not follow from their analyses.) Brain growth was not analyzed. I am uncertain what they mean by suture self-regulation as I don't think their detection of genetic variants in common across a diverse set of species means that those are controlling suture patency/closure.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      Nayler et al. report methods to generate cerebellar organoids from human induced pluripotent stem cells and their characterization by single-cell sequencing and bioinformatic analysis. They further test the effect of adding Matrigel to the system, which has previously been useful in other organoid systems. The topic is important for the study of human cerebellar developmental and modeling of human disease. The paper suffers from a number of issues, especially the fact that the claims in the text are not supported by the data.

      Specific comments:

      The method is largely the same as developed by Muguruma et al, a methodology that has not proved to be very effective or reproducible. That said, it is not clear that cerebellar organoids generated in this report have differentiated as well as the original paper based on immunolabeling, though this may be due to low power images. The authors repeatedly point out that their method does not need co-culture with mouse granule cells, however they show no maturation of Purkinje cells, which is what prior reports had used them for.

      1) While this method is not entirely novel, single-cell sequencing has not previously been performed using this method. Unfortunately, their analysis of the scRNAsea data is qualitative and unconvincing.

      2) Canonical markers are not associated with the expected populations. For example, PCP4 and IGF1 are found in the P0-choroid plexus group and not with P6 Purkinje cells (PCs), suggesting the markers or separation of populations used for classification are not sufficient. CXCL14 is used as an identifier for PCs, however the gene appears to be downregulated in the P6-PC expression table, while it is instead upregulated in the P0 expression table. These discrepancies between the text and the data do not give confidence in the overall analysis.

      3) Fig S4A there is no legend describing what the dot plot shows (color scale, size scale)

      4) To substantiate cell classification, the authors compare their data with previously published mouse datasets. Cell type clusters are generously suggested to have a "high degree of overlap" with mouse data, with a "high degree of confidence". These claims are not statistically supported nor upon close inspection do they appear to be accurate. While some cells types cluster with mouse cell types, others clearly do not. For example, of the two major cerebellar neurons, human granule cells are found in three clusters (granule cell precursors, granule cells (S-phase), and granule cells (G2M-phase), of which only one clusters with mouse granule cells. Human and mouse Purkinje cells do not cluster. The authors state that pseudotime trajectory reconstruction shows "a pattern reminiscent of the developmental cellular phylogeny of the cerebellum; progression from primitive CP/RP cell types to RL/VZ precursors and subsequently to committed neuronal progeny..." however the choroid plexus and roof plate do not give rise to rhombic lip or ventricular zone precursors (note, ventricular zone precursors are not depicted in the data).

      5) Embedding of cerebellar organoids in Matrigel is novel, however a major finding of this report is that Matrigel increases organoid variability, which itself is already a significant issue in the organoid field. The role of Matrigel in promoting specification of rhombic lip over ventricular zone could be useful.

      6) Have they looked at gene expression any earlier than DIV21? When is the timepoint at which each of the key cerebellar markers appear? This information is lacking for all markers assessed and it is not clear why the timepoints that they are showing were chosen. More characterization and perhaps even scRNA at multiple time-points would have given a clearer view of what they have induced.

      7)There is huge variability in gene expression even before the Matrigel addition step. It is therefore unclear how this is an advancement in making cerebellar organoids compared to the original Muruguma paper in 2015 (which was a very qualitative paper itself).

      8) Low power images of immunolabeling make it impossible to assess the localization of labelling and distinguish between real and background staining. eg: Fig S1 and Fig 1A. This is critical in the stem cells field where spatial organization cannot be relied upon.

      Their interpretation and their data don't always match with regard to their defined cell types and scRNAseq data. For example, ATOH1 only appears in group 5 yet they mention that more groups are graule cell precursors. Also, they say that a major impact of MG encapsulation is the expansion of the GC lineage, yet earlier in the paper they say that ATOH1 expression levels, a marker of the GC lineage, were unchanged, making it very difficult to get a clear picture of what they have found.

      A major issue (along the same vein as their incorrect data interpretation) upon which the paper is framed is the assumption that the human cell types are like their mouse counterparts. No experiments were carried out to show the validity of this assumption. Figure 3B overlays the human and mouse data. Why such low representation of the human cells? Is it because of low sequencing depth (technical issue) or vastly different molecular composition of these organoids when compared to the mouse cerebellum?

      Overall the execution is poor, and the data are not analyzed in any depth. Critically, there is a complete mismatch between what is stated in the text and what is shown in the figures. The claim to have produced all major cerebellar cell types would have been the novel aspect of the paper, but the data are unconvincing.

    2. Reviewer #2:

      In this Tools and Resources manuscript, Nayler and colleagues demonstrate a robust and reproducible protocol for hIPSC derived cerebellar organoids which do not require feeder populations. In general development of reliable pluripotent cell derived cerebellar cell types and organoids have been lagging compared to other regions of the brain and this paper represents a new resource. Given that the manuscript is presented as a resource, more detailed explanation of the generation of the organoids should be provided and their reproducibility should be demonstrated in more detail. Further histological characterization of the organoids with additional markers is needed to really see the reproducibility and the robustness of the methodology.

      Major comments:

      1) Authors mention that the PCs have bipolar morphology (data not shown). I think this is one of the critical pieces of data that demonstrates the quality of the organoids and should be shown. In general, more IF analysis of the organoids with additional markers would have been helpful to understand the variabilities and the composition of the cerebellar organoids that were generated with their method.

      2) Did the authors observe a delay in the maturation of the Matrigel embedded organoids? It is curious that there is an increase in the earlier progenitor cells (based on the increase in the OLIG2 expression as opposed to PTF1A). Based on the data later in the paper, authors suggest that Matrigel increases the expansion of GCPs. How does the non-significant enrichment of the ATOH1 expression shown in Figure 1G relate to the data presented later in the manuscript? It looks like only one of the organoid had upregulation of ATOH1 where other two didn't show any change?

      3) Authors should report the relative proportions of the VZ- derived vs. RL-derived cell types within each organoid.

      4) Were there any astrocytes (other than Bg) and OPC/oligodentrocytes observed in the organoids? Or do they need to culture them longer to observe those cells.

      5) Why is there very low expression of PCP4 in the PCs and the cluster with most PCP4 expression is classified as Choroid plexus? Based on the in situ in figure S4, there is no PCP4 in the CP. Is this a species difference? In general characterization of the PCs are confusing to me based on the markers used. Please elaborate.

      6) Based on the clustering shown in figure 3, is there a particular age from the mouse data that showed higher enrichment for overlapping human cerebellar organoid cells. The way the data is presented is hard to interpret and understand. Also, the ranges of the ages in the mouse data that overlaps with the respective human data is a lot larger than I would have expected (page 9 first paragraph). I am not an expert on integrating such multi age/species data however, I wonder if some additional pseudotime analysis like monocle could be performed on the combined data set represented in Figure S7 and Figure 3 would reveal finer temporal resolution of the human organoid with respect to the mouse developmental data.

      7) Were there differences in the pseudotime ordering of the cells from Matrigel embedded compared to the ones from the control organoids (related to point 2).

    3. Reviewer #1:

      In this manuscript, Nayler et al present a new protocol to generate cerebellar organoids that they differentiate from human iPSCs. Using this system and single-cell sequencing, the authors show that most major cerebellar cell types develop in these organoids. They also find that the micro-environment of the developing organoids changes growth dynamics and cellular differentiation, which motivated the authors to suggest that this organoid approach may be a good model for studying human cerebellar development and disease. The strength, and indeed the motivation, of this manuscript is the description of a novel model system with which to study multiple human cerebellar subtypes in an ex vivo system. In general, this work is a timely addition to several other recent studies on the transcriptomics of mouse cerebellar development, transcriptomics of human cerebellar development, and the use of hPSC derived Purkinje cells grown in co-cultures with mouse granule cells. The data in this manuscript are strong and likely of broad interest to the neuroscience community. However, below I outline several concerns that, if addressed, would help improve the clarity, readability, and impact of the manuscript:

      Comments:

      1) In the title, the authors state "...cerebellar organoids shows recapitulation of cerebellar development". Development in what? human? model systems? Some specificity will be needed in this title, especially since recent work from the Millen group has unveiled some specific differences between mouse and human cerebellar development.

      2) In the Abstract, the authors state "However, this was at the expense of reproducibility." What do you mean? There are issues with reproducibility? If yes, the authors need to provide a thorough discussion about this, as this issue would be essential for researchers to know about if they were to adopt this approach.

      3) Also in the Abstract, the authors state "...conditions, representing a more biologically relevant..." More biologically relevant than what? What about the counter argument that studying the cerebellum would be "more biologically relevant" in vivo in an animal model?

      4) In the Introduction, the authors state "Specifically, abnormal cerebellar development is an emerging theme contributing to many brain disorders (Sathyanesan et al., 2019)." Do you mean to many non-motor brain disorders?

      5) A couple of times in the Introduction the authors use the Manto et al. 2012 reference. This is in fact a very large online book consisting of several dozen chapters. Rather than using such a broad sweep approach, I would highly recommend using the primary original references for such key statements. It's also slightly misleading since Manto himself did not have any involvement in these developmental studies.

      6) The authors state that "Current models have mainly focussed on the differentiation of hPSC-derived Purkinje cells through co-culture with mouse cerebellar progenitors." Okay, but what is your argument against such methodology? Some context and motivation for this statement should be provided.

      7) In the Introduction, the authors frame their case by stating "As a proof of principle,..." But, what is this method proof of principle for?

      8) In the Introduction, the authors state "...we show perturbation analysis of the organoids..." Please state what the perturbation was, and what problem was this perturbation used to test?

      9) Based on the Introduction of the paper, it is very hard to see what motivates this work. Also, related, why focus on the basement membrane? What led to this? The authors need to provide a much stronger rationale for the study upfront, and in particular for the specific concepts that they tackle using their new approach.

      10) The authors state that induction of GBX2 was observed at the expense of the anterior marker OTX2. Apologies if I have missed it, but what was the experiment that shows directly in your organoids that OTX2 was initially high and then lowered due to GBX2?

      11) The authors state "...EBs to MG treatment, we encapsulated these at three different timepoints during differentiation..." What was the justification for picking these timepoints?

      12) The authors state "Overall, the relative effect of MG encapsulation resulted in distinct responses in the various cerebellar populations..." So, what does it mean that each cell type has a different response? Please expand on this.

      13) The authors state "using the murine cerebellum as a close developmental blueprint, most signatures indicate a mixture of mid-late embryonic temporal maturity, suggesting that the cerebellar organoids recapitulate developmental stages of the normally developing cerebellum. An exception to this was overlap of human GCs with murine GCs of postnatal maturity, suggesting that this cell type was more mature than its counterparts." AND "human PCs clustered more closely to murine progenitors and astroglia, suggesting that by day 90 organoid-derived PCs were still developmentally immature, compared with murine PCs. In further support of this, we did not detect appreciable levels of SHH."

      The sentences in this statement raise several questions. First, PCs normally develop before the GCs. Thus, the finding that PCs in the organoids are less mature than GCs is surprising and may even be concerning as it suggests that the organoids do not fully (or reliably) replicate the temporal order of normal cell development that is so characteristic for cerebellar development. Second, the relationship between PC SHH secretion and the responding GC is now well established and has been shown to be an important, if not essential, mechanism for GCs proliferation in vivo. It is therefore surprising that GCs form and proliferate in the organoid without proper SHH signaling. What may be the mechanism for this? The authors need to account for this issue and provide a discussion to address all of these points as well. Moreover, the authors should discuss how the maturation state of PCs in the organoids is different between this paper and the recently published Buchholz et al paper (2020 - DOI: 10.1073/pnas.2000102117).

      14) The authors argue about the cell structure and expression of cell markers in the organoids. However, based on what is shown, it is not clear how robust these features are in the organoids. The authors need to provide additional images of the organoids at much higher magnification in order to properly demonstrate cell structure and identity. In this regard, based on their argument, it would be important for authors to show the bipolar morphology of Calbindin-positive cells and excessive neural outgrowth at the periphery of the organoids (currently referred to in text as "data not shown"). Finally, it would be interesting to see whether different cell types are intermingling or spatially segregated in the organoids. That is, what does the cellular organization in the organoid actually look like?

      15) Along the same lines as above, it seems to me that the authors should present more details about the anatomical architecture of the organoids. One of the major arguments raised by the authors is that the organoids recapitulate many features of normal cerebellar development. Of course, the organoids likely don't show all the intricacies of in vivo cerebellar development, but given that the 3-dimensional assembly of the cerebellum is essential for all aspects of cellular and circuit formation, one needs to fully appreciate exactly what aspects of the cerebellum the organoid is able to reflect. Only then can one predict its full utility towards studying different aspects of development or disease.

      16) There are several cases that the authors state "data not shown". In every one of these cases the data seems essential to me and it should be presented in full.

      17) The authors use the fact that the cell types from the human organoids cluster with mouse cerebellar cell types as an argument that the human organoids have a good representation of the cerebellar cell types. But, the authors also go on to state that the human organoids are advantageous over model organisms because they may better model human genetic background. These two statements are contradictory, especially given the previous issue raised about the organoids not reproducing the temporal sequence of cellular development. Do the authors have additional data to support their statements about the biological relevance of their xeno-free conditions? For example, did they find any human specific genes or developmental pathways? The statements presented by the authors creates a circular argument that needs to be revised and/or supported by additional data. What would help is a much deeper comparison between the organoids, human cerebellar development, and mouse cerebellar development.

      18) What is the fold-change of RNA expression in figure 1 based on? What is the statistical test actually testing? What is the control that this fold-change is compared to?

      19) On the issue of statistics, the section describing the statistics in the methods is rather brief. It would help tremendously if the authors expanded this section by describing which test goes with which experiment and some level of justification for the use of the different statistical tests would be very useful as well.

      20) The authors use a lot of abbreviations. Some of these abbreviations hinder the readability of the text, which would be especially problematic for an audience not as closely acquainted with these terms. It may help to limit the use of abbreviations to cell-types and gene names. For example, Matrigel and embryonic bodies do not have to be abbreviated.

      21) The size of the text in all figures is too small, including gene names, axis labels, and legends.

      22) In the Discussion, the authors state "...this includes proximally located territories in which adjacent signalling is required for cerebellar maturation and development." I am not sure whether enough direct evidence is presented to make this conclusion. As commented on before, additional anatomy should be presented, and based on those data, inter-cellular signaling could then be examined with more confidence. Otherwise, the authors would have to tone down and/or revise this conclusion.

      23) The authors conclude that "hiPSC-derived organoid models offer unprecedented opportunities to model brain development and disorders and for therapeutic development..." I agree, but as a general comment, I found it very hard to know what exactly the authors are comparing in this paper. It appears that the comparisons are mainly to mouse development, although it seems that a more thorough and direct side by side comparison should be made. I suppose some kind of detailed developmental timeline-based model is warranted.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Based on several recent molecular studies, the strength of the current manuscript is the establishment of an organoid approach that could potentially add to our knowledge of normal and abnormal cerebellar development by providing a flexible technique with which to resolve cellular mechanisms. However, there was overall agreement that while the approach has promise, the data presented are lacking in terms of a concrete comparison to known milestones in cerebellar development (in animal models or human). Moreover, given the technical nature of the manuscript, it was deemed necessary that a more complete characterization of the organoid "anatomy" would be required in order to convince the reader of the claims. There was also a concern that the quantitative aspects and interpretation of the scRNA-seq experiments, particularly the characterization of the clusters obtained and the analysis performed to compare the human organoid data to the mouse developmental data, could have been carried out with greater depth.

    1. Reviewer #3:

      In this paper the authors show for the first time that optogenetic activation of the subthalamic nucleus (STN) is aversive and can drive avoidance behavior. This effect may be mediated by polysynaptic activation of the Lateral habenula, which they show is activated following optogenetic activation of the STN. They propose that the STN may excite glutamatergic neurons in the ventral pallidum that in turn project to and excite the lateral habenula. The authors do mention that other pathways may mediate the aversive effects but no other pathways are tested.

      Overall this paper presents a simple and clear demonstration that optogenetic activation of the subthalamic is aversive. It may be that this effect involves activation of the ventral pallidum and the lateral habenula but the evidence provided to support this possibility is weak and currently uncompelling.

      Major issues,

      -While it has not to my knowledge been reported that activation of the STN can drive aversive responses there are a number of lines of evidence that suggested it should be the case. None of these are mentioned in the paper and should be discussed. First the STN is part of the indirect pathway in the basal ganglia. Previous work has shown through optogenetic and other methods that the indirect pathway striatal neurons in the dorsal and ventral striatum can drive aversive responses and are involved in aversive learning (for a critical review that discusses this literature see Soares-Cunha et al., 2016). In line with this, recordings of the indirect pathway have also shown that this pathway is preferentially involved in processing aversive information, for example STN neurons are activated by nociceptive information and are needed for appropriate behavioural responses to nociceptive stimuli (Pautrat et al., 2018), STN neurons are also activated by aversive stimuli and by negative reward prediction error (Breysse et al., 2015). The paper needs to discuss their findings in the context of this and other previous work (these references are just examples and not an exhaustive list) that supports the role of the indirect pathway in processing aversive information.

      -Another topic that should be discussed is the heterogeneity of the STN. The authors themselves mention that the STN is composed of distinct spatio-molecular domains. This may well be relevant as rabies tracing from the EP neurons that project to the habenula and from the glutamatergic neurons in the ventral pallidum has revealed that they receive the majority of their input from the parasubthalamic nucleus and not from the core of the STN (Stephenson-Jones et al., 2016, Stephenson-Jones et al., 2020, Tooley et al., 2018). This raises the possibility that the aversive responses from the STN are primarily driven by neurons in the pSTN. The authors could test this point by restricting their ChR2 expression to one or the other region of the STN. At the moment all example images show that expression is in both the STN and pSTN. This possibility should be discussed.

      -The authors mention that they perform selective activation for the STN-VP pathway by stimulating the STN terminals in the VP. It is not clear that this will selectively activate this pathway. If the STN neurons that project to the VP also project to other areas then these will likely also be activated due to back propagating action potentials driven by the ChR2 stimulation. More work needs to be done to determine if the VP is really the pathway that mediates the aversive effect. Additional work including multi-colour retrograde tracing, selective inactivation of the VP projection while stimulating the STN or stimulating the STN fibers in the VP while inactivating the STN cell bodies would be needed to really determine if the VP is important for mediating the aversive effect. This may be beyond the scope of what the authors want to do but would be needed to support a claim that their evidence "provide strong support for a STN-VP-LHb is a pathway for aversion".

      -The title should not include the word encoded as there were no experiments performed in this paper that looked at any aspect of coding in the STN.

    2. Reviewer #2:

      In this manuscript Serra et al. demonstrate that stimulation of subthalamic nucleus (STN) neurons can drive place avoidance and delayed (presumably bisynaptic) excitation of lateral habenula (LHb) neurons. They also show that STN inputs to the ventral pallidum (VP) can drive place avoidance and excitation of VP neurons. While the potential role of a STN-VP-LHb of driving aversion and avoidance is intriguing, the manuscript leaves many open questions regarding the nature of STN's role in mediating aversion, as well as the circuit mechanisms governing STN-induced avoidance.

      Major Comments:

      1) STN in aversion: The manuscript addresses the role of the STN in mediating "aversion" in a very limited manner, despite the framing of the title ( "Aversion encoded in the subthalamic nucleus"). Based on the title I expected data showing that STN activity is correlated with the aversiveness of stimuli, or data showing that STN activity is required for aversion processing. Instead the authors show that STN stimulation can drive avoidance, which does not necessarily mean that this activity drives "aversion" per se. Data showing that STN represents the aversiveness of stimuli or that activity here is necessary for avoidance or other responses to aversive stimuli would strengthen the point. Currently the evidence for the statement made by the title is weak.

      2) Claims about the role of the STN->VP->LHb pathway in the abstract and elsewhere in the text: The authors demonstrate that activation of STN terminals in VP recapitulates their RTPP avoidance effects, but they do not directly demonstrate that these effects are mediated by downstream VP->LHb connectivity. They show that activation of STN terminals in VP results in excitation of VP units, but it remains unknown whether STN neurons specifically target/activate VP neurons that project to LHb, and/or whether they target VP glutamate neurons specifically (the primary cell type in the VP->LHb pathway that mediates aversion). The current data set does not demonstrate either that that a) STN-induced activity changes are LHb are predominantly mediated by VP (as opposed to EP or GP or other connections), or that b) avoidance elicited by STN->VP activation is mediated by LHb activity. Therefore, statements throughout the manuscript about the STN-VP-LHb circuit are not supported.

      3) Statistical analysis: The authors provide comprehensive statistical information for their behavioral experiments, but not for the electrophysiology. It appears that individual neurons were treated as independent measurements even when they were recorded from the same subject, though in some cases it is not clear how many mice were recorded from (e.g. 1G, 5D, 5E). If multiple measurements were taken from the same subjects, then this should be taken into account in the statistical analysis (such as by including subject as a random effect in an ANOVA or linear mixed model).

    3. Reviewer #1:

      In this study, Serra et al. attempted to study the circuit responsible for aversive behavior in mice. They had previously observed that subthalamic nucleus (STN) excitation induced aversive jumping behavior. The authors proposed that the indirect projection to the lateral habenula (LHb) via the ventral pallidum (VP) could be involved. They used Pitx2-Cre mice for STN-specific gene expression and performed real-time place preference paradigm (RT-PP) and elevated plus maze (EPM) as a means to study aversive behavior. Overall, the findings in this study are potentially important as they describe a previously unknown role of the STN, and its downstream targets, in aversive behavior. However, the authors have not convincingly demonstrated the pathway involved. The evidence so far is rather circumstantial and the arguments made were based entirely on gain-of-function experiments using ChR2. As outlined below here are a number of significant concerns that need to be addressed.

      Major:

      1) The authors should demonstrate the effectiveness and specificity of Pitx2-Cre in driving ChR2 expression. What was the cellular expression pattern within the STN? Did the authors observe ChR2 expression in 100% of STN neurons? Did it label any non-neuronal cells? Did the neighboring regions also express ChR2? According to Papathanou et al., that is likely to be the case. The authors should provide a more rigorous histological examination. Otherwise, a more in-depth discussion is needed to address how these concerns would confound the interpretation of results.

      2) It is interesting that Pitx2/ChR2-eYFP mice avoided STN-photostimulation by spending less time in the light-paired compartment. It should be discussed why not the compartment where STN is stimulated is not completely avoided.

      3) It is unclear if the mice jumped in this study, as the authors had previously observed. Was there any other movement-related behavioral changes?

      4) In Figure 1B, it seems like the entry into Compartments A and B of Pitx2/ChR2-eYFP on Day 5 and 6 is not very different. However, in Figure 1C, the representative heatmap shows a difference. In contrast, in Figure 1B, it seems like the entry into Compartments A and B of Pitx2/ChR2-eYFP on Day 9 is equal. Whereas in Figure 1C, the representative heatmap shows substantial entries. It would be helpful to have an explanation for the discrepancies.

      5) Assuming that the STN excitation duration is 10 seconds upon entry to the "Light" compartment, do the mice remain in the "Light" compartment? If the mice are only stimulated at the entry point of the "Light" compartment, do they just remain there and avoid exiting (as a means to avoid reentry)? As 10 seconds is a long time period for the mice to move around, is the stimulation continued if they then switch to the neutral compartment before the end of the 10-second stimulation period?

      6) It is not exactly clear the point of the first EPM experiment with 10 minutes of stimulation of STN neurons or their terminals. That is a very long time period; it is very likely plasticities were induced with such a paradigm and would confound the study.

      7) The STN-VP slice experiment does not really address any of the circuit questions they proposed to answer. The STN-VP connection is already known. It would be more interesting if the authors show the specific connection between STN and the glutamatergic VP neurons, as they speculate as the downstream target of the STN. This is an important point because of the complex cellular composition within the VP.

      8) It would be important to show that direct optogenetic stimulation of glutamatergic neurons within the VP produced the same phenotype. At the very least, the authors should locally infuse glutamatergic blockers into the VP to examine if the effects with STN stimulation can in fact be blocked.

      9) Both Figures 1 and 5 show a rather low density of STN fiber in the VP and they are restricted to about one-third of the VP. The involvement of the STN-VP circuit in mediating the observed behavior is less than convincing. On the other hand, there are no investigations of whether direct connections to other known targets are involved in the aversive response.

      10) All optogenetic interrogations were based on ChR2 stimulation. As antidromic spikes can propagate to other collateral branches in other synaptic targets of STN neurons (i.e., the GP, EP, and/or SNr), orthogonal approaches are needed to decisively show STN-VP circuit is involved.

      11) What is the latency of STN-driven spiking in LHb? The latency in the peri-stimulus time histogram Figure 4 looks too short to be a polysynaptic event. It also does not match up with that stated in the text (i.e., 10 ms, line 212). This is not a trivial matter as synaptic delays can provide important clues for whether mono- vs polysynaptic events are involved.

      12) In Figure 5E, DNQX and APV did not completely block the evoked currents. A more rigorous examination is needed if multiple neurotransmitters were released.

      13) As anxiety and aversive behaviors are often dichotomous between males and females, the authors should comment on whether there were any sex differences observed.

      14) Some of the sample sizes are very small (only 3-5).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      While the demonstration that stimulation of subthalamic nucleus (STN) neurons produces avoidance is potentially interesting, the circuit basis of this effect was not well established. Specifically, the proposed functional connection of STN with lateral habenula through ventral pallidum was not clearly demonstrated and the STN stimulation findings on their own represent a more minor advance.

    1. Reviewer #2:

      The study provides evidence that an aphid effector Mp64 and a Phytophthora capsici effector CRN83_152 can both interact with the SIZ1 E3 SUMO-ligase. The authors further show that overexpression of Mp64 in Arabidopsis can enhance susceptibility to aphids and that a loss-of-function mutation in Arabidopsis SIZ1 or silencing of SIZ1 in N. benthamiana plants lead to increased resistance to aphids and P. capsici. On siz1 plants the aphids show altered feeding patterns on phloem, suggestive of increased phloem resistance. While the finding is potentially interesting, the experiments are preliminary and the main conclusions are not supported by the data.

      Specific comments:

      The suggestion that SIZ1 is a virulence target is an overstatement. Preferable would be knockouts of effector genes in the aphid or oomycete, but even with transgenic overexpression approaches, there are no direct data that the biological function of the effectors requires SIZ1. For example, is SIZ1 required for the enhanced susceptibility to aphid infestation seen when Mp64 is overexpressed? Or does overexpression of SIZ1 enhance Mp64-mediated susceptibility?

      What do the effectors do to SIZ1? Do they alter SUMO-ligase activity? Or are perhaps the effectors SUMOylated by SIZ1, changing effector activity?

      While stable transgenic Mp64 overexpressing lines in Arabidopsis showed increased susceptibility to aphids, transient overexpression of Mp64 in N. benthamiana plants did not affect P. capsici susceptibility. The authors conclude that while the aphid and P. capsici effectors both target SIZ1, their activities are distinct. However, not only is it difficult to compare transient expression experiments in N. benthamiana with stable transgenic Arabidopsis plants, but without knowing whether Mp64 has the same effects on SIZ1 in both systems, to claim a difference in activities remains speculative.

      The authors emphasize that the increased resistance to aphids and P. capsici in siz1 mutants or SIZ1 silenced plants are independent of SA. This seems to contradict the evidence from the NahG experiments. In Fig. 5B, the effects of siz1 are suppressed by NahG, indicating that the resistance seen in siz1 plants is completely dependent on SA. In Fig 5A, the effects of siz1 are not completely suppressed by NahG, but greatly attenuated. It has been shown before that SIZ1 acts only partly through SNC1, and the results from the double mutant analyses might simply indicate redundancy, also for the combinations with eds1 and pad4 mutants.

      How do NahG or Mp64 overexpression affect aphid phloem ingestion? Is it the opposite of the behavior on siz1 mutants?

    2. Reviewer #1:

      In this manuscript, the authors suggest that SIZ1, an E3 SUMO ligase, is the target of both an aphid effector (Mp64 form M. persicae) and an oomycete effector (CRN83_152 from Phytophthora capsica), based on interaction between SIZ1 and the two effectors in yeast, co-IP from plant cells and colocalization in the nucleus of plant cells. To support their proposal, the authors investigate the effects of SIZ1 inactivation on resistance to aphids and oomycetes in Arabidopsis and N. benthamiana. Surprisingly, resistance is enhanced, which would suggest that the two effectors increase SIZ1 activity.

      Unfortunately, not only do we not learn how the effectors might alter SIZ1 activity, there is also no formal demonstration that the effects of the effectors are mediated by SIZ1, such as investigating the effects of Mp64 overexpression in a siz1 mutant. We note, however, that even this experiment might not be entirely conclusive, since SIZ1 is known to regulate many processes, including immunity. Specifically, siz1 mutants present autoimmune phenotype, and general activation of immunity might be sufficient to attenuate the enhanced aphid susceptibility seen in Mp64 overexpressers.

      To demonstrate unambiguously that SIZ1 is a bona fide target of Mp64 and CRN83_152 would require assays that demonstrate either enhanced SIZ1 accumulation or altered SIZ1 activity in the presence of Mp64 and CRN83_152.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Detlef Weigel (Max Planck Institute for Developmental Biology) served as the Reviewing Editor.

      Summary:

      A major tenet of plant pathogen effector biology has been that effectors from very different pathogens converge on a small number of host targets with central roles in plant immunity. The current work reports that effectors from two very different pathogens, an insect and an oomycete, interact with the same plant protein, SIZ1, previously shown to have a role in plant immunity. Unfortunately, apart from some technical concerns regarding the strength of the data that the effectors and SIZ1 interact in plants, a major limitation of the work is that it is not demonstrated that the effectors alter SIZ1 activity in a meaningful way, nor that SIZ1 is specifically required for action of the effects.

    1. Reviewer #3:

      In their paper, Liutkute et al., use an elegant combination of force profile analysis (FPA) and photo-electron transfer (PET) experiments to probe the co-translational folding pathway of the N-terminal domain of the protein HemK. Over the past decades, it became increasingly clear that co-translational folding pursues different routes than those found in solution. Despite the fact that many proteins fold and unfolded many times during their lifespan after being released from the ribosome, the question of whether and how proteins fold during the process of translation is not only fundamental but also extremely difficult to access experimentally. Here, Liutkute et al. present a synergistic combination of two largely different methods to answer this question. By stalling a nascent polypeptide chain at different sequence positions and measuring the amount of full-length relative to arrested protein in a gel assay, the authors identified a sequential folding path in which the order of helix formation of the 5-helix NTD of HemK follows the order from N- to C-terminus. The authors interpret these results using the foldon concept from the Englander lab. Though the FPA is a rather qualitative experimental tool that measures the amount of molecules that crossed a certain force threshold, the analysis is striking. These experiments were complemented by PET-FCS experiments that were used to quantify the kinetic rates of conformational fluctuations of ribosome-stalled states of the protein. The conclusions drawn by the authors are that conformational fluctuations slow-down the further a protein is away from the ribosome exit tunnel. In my opinion, the work is a substantial step towards understanding the process of co-translational folding. The experiments are beautiful, well described, and the results are of clear interest to a broad readership.

      1) I would like to emphasize that care has to be taken when deducing the order of events from single time-point experiments such as FPA. The speed of translation compared to the folding speed is an important factor that eventually dictates the order at which certain structural elements will form. I admit, however, that the formation of helices, at least in solution, typically exceeds translation speeds by far, thus indicating that the identified intermediates will also form under conditions of continuous translation. Nevertheless, it would have been interesting if the authors could provide data or relevant publications about the folding speed of the HemK-NTD.

      2) The PET-FCS is indeed very appealing, however, I had some problems in understanding the actual procedure that was used for fitting. On p. 25, it is mentioned that the diffusion and triplet component based on the empirical fit with eq. 1 were subtracted from the data. Equation 1 would rather indicate that a separation of the dynamic components requires a division of the data by the relevant diffusion and triplet terms.

      3) I would call eq. 1 'empirical' rather than 'analytical'.

      4) On p. 25, the authors explain that the dynamic components of the FCS-curves were fitted using a sum of terms, one for each species. It would have been more explanatory if the authors would provide the actual equations that had been used for fitting. I would have guessed that the authors derive expressions for the correlation functions of the individual models, e.g., using the approach of Gopich & Szabo (see Eq. 1 in Gopich et al. (2009) J Chem Phys, 131, 095102), but the approach described in the methods sounds different.

      5) I was surprised that the two-step model can even provide negative, i.e., rising, amplitudes, which is very unusual for autocorrelation functions. This feature implies that the kinetic models have amplitudes that are decoupled from the actual kinetic rates. It would be great if the authors could clarify this point.

      6) I find the calculation of free energy barriers a bit overstretched given the complexity of the system. First, the pre-exponential factor of the Eyring equation (eq. 2) is only adequate for gas-phase reactions, particularly when assuming a transmission coefficient of 1. The appropriate pendant is Kramers equation. Clearly, the problem of defining the pre-exponential factor for folding reactions remains also with the Kramers expression. However, a large body of work has been dedicated to this problem over the past 20 years. It seems that a value of 1 μs-1 seems to be a good guess (see e.g. Schuler & Eaton (2008) Curr Opin Struct Biol, 18, 16). Clearly, there is no way to decide whether conformational fluctuations slow-down due to a decrease of the free energy barrier or due to a change in the pre-exponential factor.

    2. Reviewer #2:

      Liutkute and coworkers use a combination of arrest peptide assays and fluorescence correlation spectroscopy to investigate the folding of the HemK N-terminal domain. Previous work from the same group has shown that the domain rapidly forms compact structures co-translationally while still partially within the ribosome exit tunnel, limited by the rate of elongation. Data from the arrest peptide assay presented here suggest that, surprisingly, stably folded structures form as soon as the first of five helices in the domain has moved past the tunnel constriction. Several additional apparent folding events occur at longer chain lengths, suggesting discrete events of structure formation within the tunnel and near its vestibule. Experiments with a destabilized mutant (4xA) indicate that some of the folding events are dependent on formation of the hydrophobic core of the domain, suggesting that they depend on tertiary structure formation. PET-FCS experiments with HemK nascent chains reveal two interconverting states, compact (C) and dynamic (D). Both states are populated similarly regardless of chain length. However, the barrier between these states increases when the domain emerges from the ribosome. These experiments indicate a destabilizing effect of the ribosome on the nascent chain. Taken together, the experiments support earlier work that proposed a sequential co-translational folding mechanism for the HemK NTD, and provide rate constants for the dynamics at the earliest stages of nascent chain folding.

      The experiments appear very carefully designed and executed, and the data is of high quality. The PET-FCS measurements in particular provide valuable quantitative information about early nascent chain folding and should be of broad interest. While the results from arrest peptide experiments are intriguing, I have concerns about their interpretation, detailed below.

      Main point:

      The arrest peptide data is interpreted entirely in terms of a pulling force on the nascent chain, generated by folding. The conclusion that formation of just one (peak I) or two (peak II) alpha-helices inside the tunnel generate substantial mechanical forces is surprising, particularly given the presumed mechanism of arrest released mediated by force. How would a force be generated by a single alpha helix? It is easier to rationalize that forces acting on the arrest peptide are generated by stable tertiary structures. However, in that case, the 4xA mutant should show much lower arrest release in the region where full folding of the domain is expected (regions VII and VIII in Fig. 1), because the mutant is largely unfolded (see Holtkamp et al., Science 2015). This effect is not observed. Together, these considerations make we wonder whether alternative explanations for the observed release rates can be ruled out. For instance, could sequence-specific effects that are not related to folding of HemK, such as local interactions of the nascent peptide with the tunnel, cause the observed changes in arrest release rates? Alternatively, could local structure formation (of an alpha helix) in the tunnel cause arrest release that is not mediated by a pulling force?

      At a minimum, the authors should discuss how they envision single alpha helices to generate the forces necessary to accelerate arrest release (which have been estimated in the literature, e.g. in Goldman et al, Science 2015, and Kemp et al., PNAS 2020).

      In addition, two control experiments should be carried out: (1) An experiment demonstrating that a bona fide unstructured protein yields more or less constant arrest release rates over a range on nascent chain lengths. Perhaps a construct starting residue 73 of HemK could serve as a control. (2) An experiment with previously characterized folded domains (e.g. some of the spectrin constructs from Kemp et al, PNAS 2020; or some of the constructs from Farías-Rico e al., PNAS 2018) to establish the fraction of full length protein (f_FL) obtained with stably folded domains under the experimental conditions used in the present manuscript. How do the f_FL values for the HemK NTD compare to fully folded proteins under the conditions used here?

    3. Reviewer #1:

      This study by Liutkute et al. investigates the co-translational folding of a small alpha-helical domain from HemK. The study continues earlier studies by Rodnina and colleagues that showed using FRET and other measurements that HemK begins folding inside the ribosome exit tunnel and occurs sequentially as individual alpha-helical segments are able to be accommodated in the exit tunnel vestibule. Folding completes just outside the ribosome when the entire HemK domain is exposed. The current work extends these earlier studies using biochemical assays of "force" on the nascent chain and spectroscopic assays of intramolecular dynamics with an N-terminal fluorescent probe.

      The force assays illustrate that tension is seen as individual alpha helices move beyond the exit tunnel constriction, and at other previously documented steps of folding in the vestibule. These intra-ribosomal events are not impacted by a mutation that disrupts packing of the hydrophobic core. The fluorescence quenching dynamics show that the N-terminus is more dynamic inside the exit tunnel prior to folding and not dynamic after folding outside the tunnel. A detailed kinetic model of the fluorescence correlation data is provided to help explain the observations.

      Overall, the study provides a finer resolution view of the sequential co-translational folding of HemK. Although the broad concepts from the earlier studies are not changed by the current work, the study introduces analytical tools based on fluorescence quenching and FCS that may be useful to study the co-translational folding of other proteins.

      My primary suggestion is that the authors should be more explicit about what is being measured in the "force" sensor assay. SecM stalling relies on a specific secondary structure of the stalling sequence that causes an altered P site geometry that is unfavourable for peptide bond formation. Stalling will not occur if this altered geometry cannot be stabilized. Thus, what the authors refer to as 'force' is actually a constraint applied to the nascent chain to prevent SecM secondary structure formation. Thus, the folding is not generating force so much as constraining the nascent chain as a consequence of the ribosome exit tunnel geometry. It is a subtle, but I feel important, distinction to explain the assay. The reason is that such a constraint can actually be due to reasons other than folding. For example, an interaction between the nascent chain and the exit tunnel (or other proteins) could similarly constrain the nascent chain.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

    1. Reviewer #3:

      Summary:

      The authors report a between-subjects, double-blind psychopharmacological study on explore/exploit behavior in healthy human subjects. The authors used propranolol to block norepinephrine (NE), and amisulpride to block dopamine (DA), and compared to a group taking placebo. Using a 3-armed bandit task, coupled with computational modelling and pharmacological manipulation, the authors show that "tabula rasa" (or random exploration) is reduced when NE is blocked. This interpretation was supported by behavioral effects whereby subjects taking propranolol were significantly more consistent than other groups when facing identical choices, and chose the low-value option more often than the other groups. Blocking DA did not appear to affect any parameters. The computational model showed that the E-greedy parameter, which computes the proportion of time an entity makes a random selection, was most affected by the NE blockade. In addition, the modelling shows that some directed exploration (exploring lesser-known options) was also at play.

      General comments:

      The manuscript is well-written and the results are compelling. The findings are important to researchers particularly interested in the cognitive effects of catecholamines, and/or the explore/exploit dilemma. The results may not be that interesting to a broader readership.

      Criticisms:

      1) I do not really like the use of the term "tabula rasa" exploration, over "random" exploration. Using the term random exploration is just simpler, and clearer. The particular problem for me is that "tabula rasa" has the connotation that both the current "tabula rasa" choice and all future choices will not take into account information obtained before that choice. Random exploration is a better term because it is easy and intuitive to see that random choices can be sprinkled in with choices based on previous information, whereas tabula rasa implies wiping previous information away from that point forward. As best I can tell, previous related work has not termed the random exploration associated with the E-greedy parameter "tabula rasa". One consideration I am wrestling with is that apparently there is another parameter in one or more of the models that reflects random exploration (line 618, inverse temperature). This may be why the authors opted to call the E-greedy parameter something else. At the very least, I would like a better explanation of the choice of term (tabula rasa) as well as a thorough explanation of the difference between tabula rasa and random exploration. I recommend changing the term used as well, but am amenable to accepting an argument for keeping it.

      2) Line 162: "Reported findings were corrected for IQ (WASI)". How? It seems WASI was included as a covariate in the repeated-measures ANOVA, but it's not clear exactly what factors went into the ANOVA by the results reported lines 170-185. I recognize that often in higher-impact journals including a full description of the factors and levels of statistical tests is considered a tedious waste of space, but I feel that holds only in cases where the structure of the test is obvious. In my opinion, that is not the case here.

      3) Line 209-210: "the probability of choosing bandits with a lower expected value (here the low-value bandit, Fig 1e) will be higher. We investigated whether such behavioural signatures were increased in the long horizon condition (i.e. when exploration is useful), and we found a significant main effect of horizon (F(1, 54)=4.069, p=.049, η2=.07; Figure 3c)." Isn't this just evidence of general exploration, not specifically tabula rasa exploration? How does this test rule out, for example, directed exploration?

    2. Reviewer #2:

      In this study, Dubois and colleagues claim that noradrenaline promotes tabula-rasa in decision making during exploration, using a novel paradigm involving a short and a long horizon conditions, to elicit exploitation and exploration, respectively. The work tests different computational models and examined in particular supposedly less costly forms of exploration, that is 1) tabula-rasa, in which prior information is ignored and the same probability is assigned to all available options and 2) novelty exploration, in which information processing is biased toward choices that has not been encountered previously. They provide evidence that both of these processes coexist with more demanding exploration strategies. In addition, using a double-blind, placebo-controlled, drug study, they provided support for a role of noradrenaline in tabula-rasa exploration.

      This work extends previous work from the same group that aimed at solving the important question related to decision making and the neuromodulatory influences on these processes. The overall approach and the results are clearly presented. The extensive model comparison is particularly interesting to better approach this difficult question. The results are interesting and bring novel insights about the processes at play during exploration and the influence of neurotransmitters on these processes.

      1) Noradrenaline influence on tabula-rasa exploration:

      The authors claim that "Phasic noradrenaline is thought to act as a reset button, rendering an agent agnostic to all previously accumulated information, a de facto signature of tabula-rasa exploration." It might be interesting to discuss the results in terms of a potential impact of noradrenaline onto the subjective value of the choices. For instance, Rogers et al. (Psychopharmacology, 2004) suggest that propranolol affects the processing of possible losses in decision-making paradigms, and might also reduce the discrimination between the different levels of possible gains (Rogers et al. 2004). In another study, Sokol-Hessner et al. (Psychol Sci., 2015) also report a loss aversion reduction after propranolol administration. These effects might also change prior information and reset behavioral adaptation to look for new opportunities. In this latter study the authors also report a lack of effect of propranolol onto choice consistency, contrary to what the present study reports. I was also wondering how this new result about the effect of propranolol on decision making relates to previous findings from the same group (Hauser et al. 2019) where they described noradrenaline influence on information gathering and the urgency to decide. Finally, according to the network reset hypothesis, it has been indeed suggested that a change in the environment might enhance information gathering at the expense of prior expectations to produce an adaptive behavioral output. Perhaps the authors might avoid using the term 'agnostic', this might instead reflect a reduced influence of 'top-down' prior information, related to changes in subjective value of the different choices.

      2) Model selection:

      One strength of the paper is that the authors compared several computational models. The model selection is presented in Figure 4 and in Figure 4 - Figure supplement 1, the authors provide additional information regarding the winning model that accounted best for the largest number of subjects in comparison with two other models, namely the UCB model (with novelty and greedy parameters) or hybrid (with novelty and greedy parameters). It would be useful for the reader to get a better sense about the number of subjects which results favored any given model (i.e. a more exhaustive picture). One could use the same table as the one presented as in the Appendix Table 2 with the respective number of subjects for which the model achieved the best performance. In fact, as shown in Figure 4, the winning model does not look very different (at least visually) from other models such as UCB (with novelty and greedy parameters) or hybrid (with novelty parameter or novelty and greedy parameters) models. As such, I am wondering whether the conclusion about the 𝜖-greedy parameter would hold true if other model with similar performance were tested e.g. with UCB model (with novelty and greedy parameters) or hybrid (with novelty and greedy parameters)?

      3) The authors used propranolol (40mg), a non-selective β-adrenoceptor antagonist to reduce noradrenaline functioning. Previous studies have shown that it significantly decreased heart rate (e.g. Rogers et al., 2004). How that might relate to the reported results? In terms of NA influence and given the distributions of β receptors, could the authors be more explicit about the relation of their work with the potential mechanisms (e.g. Goldman-Rakic et al. J Neurosci. 1990 or Waterhouse et al., Journal of Pharmacology and Experimental, 1982).

      4) Could the authors clarify whether the PANAS questionnaire was administered to the participants prior to or after the drug treatment to understand if this group difference was a mere difference in groups or whether this was a consequence of the drug administration. It would be indeed interesting to have a measure of the drug effect on these parameters.

      5) The authors claim that: "Although tabula-rasa exploration can comprise influences of attentional lapses or impulsive motor responses, the difference between horizon conditions cancels them out". I would suggest to temper this claim as the effect might be more enduring in the long horizons' conditions. The authors might also want to look at RT variability in addition to RT means that did not differ between groups.

    3. Reviewer #1:

      Dubois and colleagues investigate how two modes of exploration - tabula-rasa and novelty-seeking - contribute to human choice behavior. They found that subjects used both tabula-rasa and novelty-seeking heuristics when the task conditions were in favor of exploration. Specifically, participants could, and had to, make more responses in the long-horizon condition, which favored exploration, compared to the short-horizon condition, which favored exploitation. Then the authors provide evidence that blockade of norepinephrine beta receptors leads to decreased tabula-rasa exploration and increased choice consistency whereas blockage of D2/D3 dopamine receptors had little effects. Novelty seeking was not affected by catecholaminergic drugs.

      The paper provides evidence on exploration-exploitation trade-offs from two different points of view. On the one hand, it addresses computational aspects of exploration by investigating how computationally intense forms of exploration might be supplemented by the usage of heuristic strategies. For doing so, the authors propose a novel task allowing them to disentangle these strategies and quantitatively assess their usage. On the other hand, the findings presented in the paper shed some novel light on neuropharmacological mechanisms underlying explorations. Some interpretations seem to go beyond the data and information is missing in the description of the results and the computational approaches used. In general though, the manuscript conveys the impression of a well-designed and carefully conducted study.

      Major points:

      General

      1) It is one thing to come up with computational terms and model-based quantities correlating with behavior but a different one to show their psychological meaning. Did the trials with tabula-rasa exploration or novelty exploration differ in terms of response times from the other types of responses? Did participants report that they indeed intended to explore in the tabula-rasa exploration trials?

      2) On a related note, how do the authors distinguish random (tabula-rasa) exploration from making a mistake? From how the task was designed, choosing the low value option appears to receive a more natural interpretation as a mistake rather than as exploration because this option was clearly dominated by the other options and remained so within and across trials.

      3) Previous research of the authors (Hauser et al., 2017, 2018, 2019) has associated beta receptor blockade with enhanced metacognition, decreased information gathering/increased commitment to an early decision (Hauser et al., 2018, JNS) and an arousal (i.e., reward)-induced boost of processing stimuli. Of course, it is possible that norepinephrine plays multiple roles, but it appears not exactly parsimonious to imbue it with a different role for each task tested. Are there some commonalities across these effects that could be explained with some common function(s)?

      4) Throughout, the paper implies that a beta blocker provides information about the function of norepinephrine in general. However, blocking beta receptors leaves synaptic norepinephrine to act on alpha receptors; accordingly, beta-blockers can be viewed as partial alpha agonists. Given that the function of these receptor families differs, more care should be taken when describing the nature of the intervention, labeling the groups and interpreting the effects.

      Introduction:

      5) As mentioned above, the paper investigates not only computational aspects of exploration but also the underlying neuropharmacological correlates. However, the introduction focuses mostly on different computational algorithms (which is in itself very helpful for the understanding of the paper!) while the neuropharmacological basis of explorative behavior is only briefly introduced. In the same regard, while some insights were given in the Discussion, it would be interesting to have a rationale for using amisulpride and propranolol already in the introduction.

      6) Relatedly, the introduction focuses on tabula-rasa and novelty strategies based on the argument that these are more computationally efficient. The authors may also want to motivate this with the perspective of neural constraints/brain process. Specifically, they argue that it may be computationally demanding to process the expected value (mean) and variance of choice options. However, computational efficiency has been put forward as an argument for why mean-variance-like signals are coded in the brain, particularly with multi-outcome options where expected utilities are difficult to compute (D'Acremont and Bossaerts, 2008). Thus, the computational efficiency argument at the moment seems insufficiently motivated.

      Materials and Methods:

      7) Successful performance of the task is based on the ability to discriminate between different reward types and select the one with the higher value. From the experimental design description, one can see that in order to do so, the subjects needed to distinguish between different apple sizes. In this regard, a question arises: how large was the difference between two adjacent apple sizes? Was it large enough so that after a visual inspection, the participant could easily understand that the apple size = 7 was less rewarding than the apple size = 8? Finally, since the task requires visual inspection of reward stimuli, was the subject vision somehow tested and did it differ between groups?

      8) The point of heuristics from a psychological perspective is that they dispense with the need to use full-blown algorithmic calculations. However, in the present models, the heuristics are only added on top of these calculations and the winning model includes Thompson exploration. Stand-alone heuristic models would do the term more justice and one wonders how well a model would fare that includes only tabula rasa exploration and novelty exploration.

      9) The simulations provide a nice intuition for understanding choice proportions from different models/strategies (Figures 1e and 1f). However, it would be helpful to provide simulated results for long and short horizons separately. Do the models make different predictions for the two horizons? Additionally, it would be helpful to also show the results from other models (i.e. the proportion of low value bandit chosen by novelty agent). These can be added in the supplement.

      10) One of the best-known effects of propranolol is to reduce heart rate. Did the authors measure heart rate and can they control for the possibility that peripheral effects of the drug explain the findings (and what was the reason for not collecting pupil diameter data, contrary to the previous research of the authors)?

      11) The long horizon condition appears to confound exploration with higher effort demands and longer delays to reward, at least in the early responses. If the authors cannot control for these they should mention them as limitations.

      12) Not only choice rules but also value functions seem to differ between Thompson and UCB (lines 583 and 593). This raises the question how well pharmacological effects on choice rules can be distinguished from effects on valuation and how confident we are that the observed effects indeed arise from changes in choice rules.

      Discussion:

      13) Line 410: The statement that memory is not at play in the present task because all information is always visible on the screen seems too strong. At least some exploration-relevant information, such as the overall distribution of outcomes across all options, is not presented and may be remembered differently by the different groups.

      D'Acremont M, Bossaerts P. Neurobiological studies of risk assessment: a comparison of expected utility and mean-variance approaches. Cogn Affect Behav Neurosci. 2008;8(4):363-374. doi:10.3758/CABN.8.4.363

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

    1. Reviewer #3:

      This is a comprehensive meta-analysis of empirical literature on sex differences in mammalian trait variability. The authors nicely articulate competing hypotheses: "estrus-mediated variability" (which predicts higher trait variability in females because they exhibit cyclic reproductive [estrous] hormone secretion that occurs over multi-day timescales) vs. "male variability hypothesis" (which predicts higher trait variability in males because they are the heterogametic sex). Several prior meta-analyses related to this have not provided support for the estrus-mediated variability hypothesis. The analysis performed here differs significantly from prior work in that the subjects were 27,147 mice from the International Mouse Phenotyping Consortium, which generated over 2x10^6 data points. Unlike other meta-analyses, the subjects of this analysis were therefore more systematically evaluated (9 WT strains across 11 labs). A total of 218 continuous traits were evaluated, grouped into 9 functional trait groups. Some traits were biased towards males and others towards females. There was no consistent pattern of greater variability in either sex. The results support a straightforward conclusion that neither hypothesis adequately explains patterns of trait variability. the discussion is a restrained defense of the practice of including females (please clarify that monitoring of estrous cycles was not performed in these studies so the females are classified as as "unstaged"); consequently females can be included in research studies without a default assumption that they are any more likely to introduce more variability than including males. The authors also apply their data on widespread differences in trait specific lnCVR values to the potential for phenotypic response to selection due to rapidly changing environmental events. The discussion is well written with the sections that are each meaningful. The web-based tool is a very helpful contribution. The discussion of statistical implications of the work (e.g., equalizing power and Type I consequences of unequal variance) is of significance to research on mammalian biology.

      1) The present work adds important new information to a growing literature (see for example Smarr BL, Rowland NE Zucker I. Male and female mice show equal variability in food intake across 4-day spans that encompass estrous cycles. PLoS One. 2019 Jul 15;14(7):e0218935) indicating that incorporation of unstaged female rodents in biomedical research does not increase variability compared to that generated by males; importantly, it also specifies several circumstances in which specific traits are more variable in one sex than the other.

      2) The statement on line 41-42 is a strong overgeneralization and should be tempered and/or clarified: "However, scientists in (bio-)medical fields have not traditionally regarded sex as a biological factor of intrinsic interest (2-7)." This is an overstatement. The study of sex differences and sexual differentiation in mammals (a class of animals of most direct relevance to biomedical research) has a long history, complete with dedicated journals (e.g. Biology of Sex Differences), learned societies, etc. Such an enduring interest in sex among biologists only makes the present work more interesting and important. This critique may be addressed with a more clear definition of "(bio-)medical", here, and throughout the manuscript.

      3) Colloquialisms such as "This is an important step, but we can go much further" (line 50) are vague and difficult for this reader to endorse as true, as written and we recommend deletion.

      4) In the Introduction, the authors delineate competing hypotheses: "estrus-mediated variability" vs. "male variability hypothesis". In their elaboration of the former hypothesis, the authors should clarify that the historical concern regarding decreased power and increased variability in females compared to males specifically regarded the inclusion of females that were not synchronized (or "staged") so as to be tested/treated on the same day/phase of the estrous cycle. Data from these so-called 'randomly cycling' females were predicted to be more variable than data from males. "Staged" females were presumed to be less variable, and the interventions and costs associated with the presumed need for staging are viewed as onerous. But a growing literature, including the important new results from the present study, argues that there is no empirical support for the contention that females generally are more variable than males across many traits.

      5) Methods: the data analysis pipeline is clear and rigorous. It should be stated that the data used come from unstaged females.

    2. Reviewer #2:

      Summary:

      There are significant methodology and interpretative concerns with this article. The analysis over stretches and does not consider the potential weaknesses. It needs to refocus on the primary question of whether there is a pattern in the sex's impact on the variance for these traits. The analysis then needs to go deeper and remove other sources of variance that could be confounding their findings.

      Major comments:

      Methodology

      1) The methodology is not clear.

      2) Meta-analysis is used when you don't have access to the raw data - why not use mixed effect regression models?

      3) The variance summary metric is calculated for an institute and strain for data collected in multiple batches, with potential baseline shifts as the data is collected across many years. This isn't a representative metric of variability for a sex as there are multiple sources of variance impacting this metric.

      4) Figure 3b and code: It is very rare for a fixed effect analysis to be justifiable. Why assume that there is no variation between the different traits when testing effect of sex? Normally you would explore sources of heterogeneity by meta regression rather than just assume it is sex differences.

      5) "A previous study found that the heterogametic sex was more variable in body size". If this holds, would not traits that are correlated with body weight also demonstrate the same finding?

      6) "minimum of 2 different institutes" is a very low N. Why would this give meaningful analysis? What was the minimum amount of data for a strain*centre for a trait to be included?

      7) Consider the recent discussions on phenotypic plasticity and the phenotypic interaction with the environment (https://www.nature.com/articles/s41583-020-0313-3 ). This suggests a fixed effect model is not appropriate. The results and approach need discussing in this context.

      Conclusion;

      1) It isn't made clear that this analysis is trying to assess the role of sex across strains and institutes.

      2) There are no discussions of the potential weakness of the analysis.

      3) Figure 3a

      • Why is there no discussion of measures of heterogeneity within the meta-analysis at the population level?

      • Should the differences in classification as male or female biased within functional group not be assessed by a fisher exact test and the p value adjusted for multiple testing before you state an area has a difference?

      4) Concern by "Notably most SD trait means also show the greater difference in trait variance" - seems to be an eyeball rather than a statistical analysis

      5) I have concerns on relating these results to power

      • These estimates are from an analysis across strains, batches and institutes looking at global behaviour in the traits. This absolute variance measure would be very different to that seen in a lab within a classic parallel group design study with one strain.

      • They advocate a factorial design but suggest the powering of the sexes independently. This feeds into the misconception that to study both sexes you have to double your sample size.

      6) The authors report that this analysis on mean differences was in accordance with previous studies. Not really. The differences will arise from the different approaches taken and highlights how this summary metric is losing sensitivity. The authors relate many of these changes to differences in body size. However, the earlier published analysis, adjusted for body weight.

      7) Why would the "difference in variability impact on the potential of each sex to respond to changes in specific environments"?

    3. Reviewer #1:

      This study looks at whether there are sex differences in the variability of traits in mice, via a meta-analysis of published datasets. The analyses show that females typically show greater variability in traits categorised as immunological, while males show greater variability in morphological traits. Traits related to the eye were also more variable in females. These findings are interpreted in light of evolutionary theory about greater between-individual variability in males, and greater within-individual variability in female mammals due to estrus. A handy online tool is provided to allow researchers to consider possible sex-specific variability in traits at the experimental design phase.

      I enjoyed the paper and thought the question and conclusions were interesting. The figures are great. I am not an expert in meta-analyses, so my comments mostly relate to the hypotheses and discussion of the results.

      1) The paper jumps about quite a bit between talking about sex differences relevant to mammals only and those that might apply to animals more generally. For example, the Introduction begins with reference to biomedical research (mammals) and the estrus hypothesis (mammals) but then introduces the "male variability" hypothesis by stating the "males are often the heterogametic sex". Given that the subject of your study is the mouse, I think it would be more logical to restrict the Introduction to mammals (i.e. explain the two hypothesis with respect to mammals). You could then include a section in the Discussion on if/why we might expect the same trends in other animals (see below also).

      2) I feel that the rationale behind the two hypotheses (female estrus and male variability) could be explained better in the Introduction. i.e. WHY estrus might produce higher variability in females and WHY stronger sexual selection or male heterogamety might produce greater male variability. A few extra sentences on each would probably be enough. At the same time, I think it would be worth clarifying a priori the extent to which these hypotheses are expected to apply to different traits. Some predictions are given only in the Discussion (e.g. estrus expected to mostly affect immune response and physiology).

      3) The Discussion on eco-evolutionary implications (line 184) would be greatly strengthened if it included at least one specific example of how sex-specific differences in trait variability might affect the evolutionary trajectory of a population. At present, one very general hypothetical is given, but I did not find it easy to follow (disease/climate change kills more of one sex than the other --> sex ratio of the population is skewed (temporarily?) --> mating system is "influenced" --> "downstream effects on population dynamics"). It is also stated that "modelling sex difference in trait variability could lead to different conclusions compared to existing models (cf 44)". The cited study there is on Eurasian sparrowhawks. I'm not familiar with this sparrowhawk study, but perhaps it is a suitable one to highlight in more detail as a clear example? What sort of different conclusions would be expected? It's fantastic that your paper is aiming to speak to a broad range of biologists, but I think that greater clarity in this section is needed to make ecologists and evolutionary biologists really take notice.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers agreed that the topic of the study was an interesting one, and that the issue of sex differences in trait variability is relevant to good experimental design. As you'll see below, however, Reviewer #2 felt that the current analytical treatment of this mouse dataset is not appropriate to the question. Of particular concern is that sources of variability other than sex were not adequately considered.

    1. Reviewer #3:

      Saderi and colleagues study the effects of arousal and task engagement on sound responses in the (primary) auditory cortex and inferior colliculus of ferrets. Arousal is measured by pupillometry, task engagement by contrasting an auditory detection task with passive sound exposure, and effects are quantitatively dissociated using a general linear model and multiple regression. The authors find that the sound responses of about half of the recorded neurons are modulated by task engagement and/or by arousal, with IC neurons most frequently modulated by arousal and AC neurons modulated by both factors. Increased arousal was associated with enhanced sound responses. In AC, task engagement was associated both with enhanced and suppressed sound responses. In IC, task engagement was associated with suppressed sound responses.

      Major comments:

      1) Some of the main conclusions of the results from AC are not novel. Using a different experimental approach, the study of Knyazeva et al., 2020, Front Neurosci. 14: 306 already suggested that the discharges of many neurons in AC are affected by arousal, that task effects can disappear if effects of arousal have been accounted for, and that there is no systematic difference in response modulation between neurons tuned, or not tuned, to task-relevant sounds. Dissociations of the effects of different non-auditory factors on sound responses in AC have also been described by Zhou et al., 2014, Nat Neurosci. 17:841-850 and by Carcea et al., 2017, Nat Comm. 8:14412.

      2) The study is based on a relatively small number of neurons and behavioral sessions, potentially reducing the strength of the statistical inference, e. g., that IC was more strongly affected by arousal than AC. It appears that data from about 20 behavioral sessions entered analyses. This estimate is based on the information that 1-3 behavioral blocks were tested during individual sessions (line 611) and that Figure 1F shows the results of about 36 active-passive comparisons in four animals. This indicates that, on average, about 10 neurons were simultaneously recorded in individual sessions. Therefore these neurons were statistically more dependent than neurons recorded in different sessions. This needs to be considered for potentially global effects such as arousal and task engagement. The authors should include this information, together with the number of trials in active and passive blocks and whether the responses to different TORCs were averaged.

      3) The authors did not distinguish single unit and multiunit data. This difference should be considered in detail because it could affect the interpretation of whether there are units that are affected both by arousal and task engagement.

      4) The authors should include a statement that the results on the effects of task engagement may not apply to all types of auditory tasks. This is highly important because the authors used an auditory detection task, which is a task that may not require AC at all.

    2. Reviewer #2:

      Main Review:

      Saderi and her colleagues have performed a cool study that attempts to determine whether and how two behavior-related variables - arousal and task engagement - differently influence activity in two stages of the auditory neuraxis, IC and A1. They define arousal as pupil diameter and task-engagement as a binary variable determined by the experimental block design. They find that although these two parameters often co-vary, they sometimes do not. They find that IC was more influenced by arousal and A1 was modulated by both arousal and engagement. One of their main findings is that previous reports of task-engagement effects may in fact be attributed to arousal state.

      This is a nice quantification of neural activity and behavior. My major concerns are all thematically linked and they stem from the use of a continuous readout of arousal (i.e. pupil diameter) but a binary readout of task-engagement (i.e. the block the animal is in at any moment). Relatedly, I am interested in knowing whether neural effects can be accounted for by the animals from which they were recorded (and from that particular animal's behavior). I expect that my enthusiasm for this paper will not be diminished in any way regardless of any changes that come out of the deeper analyses outlined below. Also, I do not intend that responses to these concerns will require any new experimentation.

      Major concerns:

      1) Can task engagement be explained more rigorously as a continuous rather than binary variable? In my experience training and testing animals on appetitive behaviors, task engagement can wax and wane within a single block, across an experimental recording session, or across days of behavioral testing. Such changes in engagement can be inferred, for example, as strings of (seemingly) easy trials in which the animal does not answer correctly. The authors should attempt to quantify through behavioral analysis (running lapse rate, lick latency, etc) whether and how task engagement may be changing within and across task blocks. Alternatively, the authors could clearly explain that their binary encoding of engagement has limitations and may not actually describe the animal's engagement at any given moment.

      2) Can a continuous readout of task engagement better explain neural activity? For many neurons, task-engagement does not provide unique predictive information, yet for others it does (e.g. Fig. 3C). If task engagement can be modeled as a continuous rather than binary variable, is it still true that "some apparent effects of task engagement should in fact be attributed to fluctuations in arousal" (Abstract)? In general, I worry that the current analysis is effectively a floor on task-related modulations since it assumes constant engagement throughout a task block.

      3) Can neural heterogeneity be attributed to animal-to-animal behavioral variability? Even if task engagement does not vary within a task block for any one animal, it may indeed vary across animals. In theory, the actual task engagement of some animals might more closely mirror the block design that the experimenters are imposing, and some animals may simply have a higher level of engagement than others. This could mean that some results that are currently attributed to population-level heterogeneity (e.g. some A1 neurons do this, while others do that) might actually be attributed to animal-to-animal heterogeneity as opposed to distinct neural populations. For example, the authors state that for a subset of neurons, persistent task-like activity after a block change can be accounted for by pupil, whereas for other neurons this effect cannot (Fig. 7, line 452). The authors should confirm that key findings are consistent across animals and not related to degrees of task engagement (see point #1). If the findings are not consistent across animals but can be explained by each animals' unique behavior, this would also be really cool.

    3. Reviewer #1:

      This study distinguishes effects of generalized arousal and specific task engagement on the activity of neurons in the inferior colliculus and auditory cortex of ferrets as they engaged in a tone detection task, while monitoring arousal via pupillometry. The authors found that arousal effects were more prominent in IC, while arousal and engagement effects were equally likely in A1. Task engagement was correlated with increased arousal. They propose that there is a hierarchy such that generalized arousal enhances activity in the midbrain, and task engagement effects are more prominent in cortex. I have no major concerns, but two points to consider:

      I would like to know how the model would perform if task engagement were modeled as a continuous regressor.

      The authors state that they separated single units and stable multi units from the electrode signal, but I do not see where these data are separately reported.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Saderi and her colleagues have attempted to determine whether and how two behavior-related variables - arousal and task engagement - differently influence activity in two stages of the auditory neuraxis, IC and A1. They define arousal as pupil diameter and task-engagement as a binary variable determined by the experimental block design. They find that although these two parameters often co-vary, they sometimes do not. They find that IC was more influenced by arousal and A1 was modulated by both arousal and engagement. One of their main findings is that previous reports of task-engagement effects may in fact be attributed to arousal state.

    1. Reviewer #3:

      The role of histone chaperone Hira during the formation of paternal pronucleus has been well documented in both mouse (Lin et al, 2014; Inoue and Zhang, 2014; Nashun et al, 2015), and in Drosophila (Loppin et al, 2005). The histone chaperone Hira is known to act in a protein complex with Ubn and Cabin 1 (Tang et al, 2012). The authors built on their previous findings (Lin et al, 2014) and assessed the effect of the Ubn and Cabin 1 oocyte deletion during the fertilisation. Not surprisingly, the observed phenotypes more or less recapitulated the observation made using Hira deletion. In this sense, the findings are not novel. It has also been previously shown that deletion of Hira leads to the removal of the whole complex (Nashun et al, 2015).

      The authors add some potentially interesting observations using 1PN (aberrant) human zygotes. Although the observed lack of Hira complex components in these zygotes could be interesting, the causality is not established.

      Beyond the statements above, there are major issues that would need to be addressed:

      1) Validation and characterisation of the ko/kd models: Ubn1 knockdown using morpholinos: Fig S1C - lots of protein remains present in the nucleus, Hira Zp3Cre driven oocyte specific knockout - how much Hira protein is left in the zygote?

      2) H3.3 staining to document the deletion of the complex: Figs1E - not obvious what the authors are trying to say here? How is H3.3 signal quantified? Only paternal signal should be affected by the KO ?? The same is true for Fig2D - no signal is obvious even in the control.

      3) Presence of Cabin1 in the zygote - pre-extraction needs to be carried out (Fig 2C)

      4) Fig S2: overexpression of Hira : is there a significant difference between the Hira signal in control (het) and KO zygote?? It does not appear so, which undermines the whole knockout study. The same is true for the quantification of H3.3 . What should the quantification of GFP signal demonstrate?

      5.) The authors say that they developed a conditional KO for Hira in the main text. But they haven't verified the Hira deletion after Cre expression (by IF or PCR)

      6) "Data not shown" in the text. The authors say that their new hiraF/F, zp3 females are sterile but they don't show it.

      7) The authors never show anti-ubn1, cabin1 staining on HiraKO.

      8) Language: the text needs editing. There a number of statements that are wrong: Hira (or any other component of the complex) does not incorporate into chromatin - the complex associates with chromatin to incorporate histones (there are several other examples of similar statements).

    2. Reviewer #2:

      A high proportion of in vitro fertilized eggs yield zygotes with 1 pronucleus (1PN) instead of the normal 2PN. The authors previously showed that maternal Hira is important for H3.3 deposition on the male pronucleus; and that the loss of Hira leads to a high proportion of 1PN zygotes.

      In this manuscript, the increase in 1PN zygotes after fertilization was confirmed following deletion of Hira in mouse oocytes. The effect could be rescued upon microinjection of Hira RNA. The authors also depleted the other Hira protein complex subunits, Cabin1 and ubinuclein-1. The 1PN phenotype was again seen. Human 1PN zygotes were finally shown to lack HIRA on the abnormal pronucleus.

      This is an interesting observation that is definitely worth the investigation. The lack of HIRA components on the abberant pronucleus in 1PN human zygotes is an important find. However, because the authors had already shown that the loss of Hira correlates with a high proportion of 1PN in mice, the experiments (though respectable) provide limited novelty as is.

      Main concern:

      • Unless there are reasons to believe that there are Hira-independent Cabin1 and ubinuclein-1 functions in oocytes, their depletion only serves to confirm the role of Hira and its relation to the 1PN phenotype. The rescue experiment and human data is important, but again serves as confirmation on the role of HIRA without further mechanistic insights.

      Perhaps novelty could come through a deeper exploration on Hira levels in oocytes and what differentiates 'poor quality' oocytes that lead to 1PN from normal ones. For example, does maternal Hira RNA and protein levels increase with maturation? Are HIRA levels lower in poor quality oocytes? Is there a step in the IVF procedure that affects Hira levels and/or changes on the paternal chromatin?

    3. Reviewer #1:

      This study establishes the role of additional members of the Histone chaperone HIRA complex in male pronucleus formation in mouse. Genetic inactivation of maternal Ubn1 and Cabin1 affects histone deposition following protamine removal on the fertilizing sperm nucleus in a way similar to maternal Hira mutants. However, the study does not provide new insights about the way these factors function or cooperate during paternal chromatin assembly. Analysis of aberrant human zygotes revealed a correlation between the lack of male pronucleus and the absence of maternal HIRA. Although the data are generally convincing, the manuscript does not sufficiently acknowledge earlier work. Notably, the rescue experiment which is presented as a "proof of principle" for future human therapy is not entirely original.

      Substantive concerns:

      1) The authors present the (partial) rescue experiment of maternal Hira KO (oocyte injection of Hira mRNA) as an original experiment that serves as a proof of principle for future therapy. However, a very similar rescue experiment of Hira KD oocytes was successfully performed by Inoue & Zhang, NSMB, 2014, a work that is not cited in the manuscript.

      2) The authors used PLA to detect interactions between the Hira complex proteins in mouse zygotes. However, it is not clear from the images in Fig. 1C how the specific interactions are actually appreciated. The foci seem to be everywhere and not particularly in the male pronucleus shown in the insets.

      3) The occurrence of 1 PN human zygotes is intriguing but the origin of this defect is unknown. It could reflect a more general problem than the sole lack of Hira expression. In this context, overcoming male pronuclear formation by re-expression of Hira seems to represent a hazardous therapeutic strategy.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All three reviewers agree on the fact that the study, although interesting, does not appear sufficiently novel regarding the already established role of Hira complex in sperm chromatin remodeling in mouse and other animals. In addition, although the reviewers were intrigued by the observation that 1PN human zygotes lack HIRA, the origin and timing of this defective expression are not established. The reviewers share the feeling that these experiments do not really bring novel insights about the regulation of HIRA levels in mammalian oocytes.

    1. Reviewer #3:

      The manuscript by Vera et al. reports on cohesin-dockerin interaction studies of cellulosomal subunits using mainly single-molecule FRET, but also molecular dynamics simulations and NMR measurements. The authors study a range of cohesin-dockerin pairs and discover a varying distribution of two alternative binding modes that apparently follows a built-in cohesin-dockerin code. Finally, the authors show that prolyl isomerase activity can regulate kinetics towards equilibrium/steady state as well as distribution of the binding modes. The results are important for understanding the mechanistic basis cellulosome function.

      In my opinion, this is an important paper, which provides new interesting insight into cellulosome function. The single-molecule FRET and molecular dynamics parts of the study are well designed, the corresponding experiments are thoroughly performed, and data are carefully analysed. The manuscript is also very well written. However, there are several issues that need to be addressed:

      1) The authors claim to have uncovered a built-in cohesin-dockerin code. However, the principles of the code remain elusive. For example, what is the relationship between the Pro66 cis/trans conformation and the binding mode? What needs to be known to predict the dockerin binding mode? This point should be elaborated in the manuscript.

      2) The conclusion that prolyl isomerase activity is able to change the distribution of binding modes requires more consideration and/or research. First, it seems from Figure 6A that the expected steady-state B1 fraction of c1C-CcCel5A and c1C-CcCel5A+prolyl isomerase could be the same within error ranges. Second, it is unlikely that the enzyme will change the equilibrium ratio of Pro66 cis/trans conformation that is controlled by thermodynamics. Therefore, the prolyl isomerase activity may only be relevant in case of slow re-equilibration kinetics.

      3) NMR measurements were performed in order to check if the dockerin ́s Leu65 - Pro66 peptide bond is in the cis conformation in the cohesin-dockerin complex. The authors found very similar dockerin chemical shifts in the absence or presence of 1.3 equivalents of cohesin suggesting that the binding does not significantly alter the conformation. However, this is an indirect measurement, although NMR also allows direct determination of Pro cis/trans conformation (based on 13C chemical shifts and NOE patterns, e.g. see https://doi.org/10.1107/S1744309110005890 ). The authors should check if direct determination of the cis conformation is possible in their case. Also, peak doubling in the 15N-1H HSQC spectrum should be checked, which is an indication of Pro cis/trans equilibria.

      4) Furthermore, a direct measurement of the Pro66 cis/trans ratio for two cohesin-dockerin pairs that show distinct B1/B2 preferences would be useful to clarify the role of Pro66.

    2. Reviewer #2:

      By analyzing the formation of a series of dockerin-cohesin complexes from the cellulosome of two species of the Clostridium bacteria using smFRET experiments and other techniques, the authors conclude that the overall equilibrium between the two binding modes of the complex can be allosterically regulated by the enzymatic isomerization of dockerin's proline 66, which is part of a structural clasp between the N and C terminus of the protein. They speculate that a mechanism of enzymatically or environmentally driven clasp de/stabilization may be present in other dockerin-cohesin complexes, as well, and may provide the cellulosome with the required plasticity to carry out its function.

      In large part the work is clearly written and the claims seems to be supported by the data provided, however there are few issues that the authors should address:

      1) The computer simulations presented in the manuscript are not described very clearly. For example on page 19 regarding the foldX MC method: the author identifies two variables to describe the binding: an "axis" Z and a rotation angle phi. An axis, however, is defined by three coordinates, while the authors always associate a single number to Z. The reader has to guess that the axis is the axis of symmetry of the two binding modes and Z is only an offset along the axis. Similarly in eq. (4) the authors associate the sum over the conformations indexed by i to an average (first line page 20) but in reality that sum and the others that appear in the argument of the logarithm of equation 4 are an estimate of the partition function of the system.

      2) The computer simulations of the complex do not seem to add significant information to the overall message of the manuscript: the rigid-body coarse grained approach does not allow to distinguish allosteric effects as the authors already admit, while the FoldX approach provides only very large errors. Most probably, given the presence of well defined crystallographic structures for some of the complexes, simple free-energy estimation techniques (i.e. metadynamics, steered MD etc.) based on classical atomistic molecular dynamics simulations (with limited homology modelling for the mutants) would have provided more accurate results. The authors should explain why they did not consider this approach.

      3) The data about the time dependency of the FRET signal in C. cellulolyticum are a bit worrying. The authors should dissect them more carefully, possibly adding additional control experiments to exclude artifacts (whose possible presence is also admitted by the authors in the caption of Fig.6 figure supplement 3). Then, if the process is confirmed, they should really try and identify the underlying process in a more precise way.

      4) Fig 5C and Fig 5F show two different curves for the same data. Similarly Figure 6 figure supplement 4 C shows two different histograms for the same complex. If this is the result of repeated experiments, the authors should make an effort and report histograms with error bars. Visual comparison of histograms which have a large intrinsic variability may be misleading.

      5) A picture showing a model of the molecular structure of the dyes attached to the molecular structure of the proteins would be very useful to to understand the relative size of the objects.

    3. Reviewer #1:

      Vera et al. report the detection of binding and quantification of populations of two different orientations of assembly of dockerin and cohesin, which define structural organization and plasticity of bacterial cellulosome multi-enzyme complexes. The authors apply smFRET spectroscopy in in-vitro experiments carried out on isolated, modified domains. Vera et al. find uneven distributions of populations of the protein in the two modes of binding. Vera et al. investigate the molecular origins of the observed bias by studying homologous sequences obtained from various organisms, by mutagenesis and by domain-swap experiments. The authors complement experimental studies by Monte Carlo and molecular dynamics simulations. The authors arrive at the conclusion to having identified a cohesion-dockerin "code" of binding and a novel allosteric control mechanism involving cis/trans isomerization of a C-terminal proline residue in dockerin.

      Structural plasticity of the cellulosome induced by variable assembly of the cohesion-dockerin adapter, facilitated by rotational symmetry of the two-helical binding interface, is an interesting biological phenomenon. The dual binding mode is already reported in the literature (refs. 23, 24, Wojciechowsky et al. Sci Rep 2018, 8:5051), somewhat limiting the novelty of findings. But forces and mechanisms that drive the orientations are not yet understood. The authors successfully developed a smFRET assay that can distinguish the two binding modes of the cohesion-dockerin interaction and that can measure the respective populations in vitro. Their homology, mutagenesis and domain-swap experiments show that specific interactions within the binding interface are not responsible for modulation of orientation. Instead, they show that interactions of a C-terminal proline can modulate binding. However, the relevance of findings for the in-vivo situation appear unclear.

      I have the following concerns:

      1) The authors' smFRET assay clearly distinguishes the two binding modes B1 and B2. A key element of their work, which goes beyond state of the art, is the quantification of populations estimated from integrals of smFRET histograms and PDA. Their FRET analysis presumes that photophysics or quantum yields of donor/acceptor fluorophores are independent on orientation of binding. But the protein micro-environment at the positions of the labels close to the binding interface may change in B1 and B2 orientation, which may modulate photophysics and thus FRET. This would, in turn, lead to errors in estimation of populations. The authors could test for such effects by measuring fluorescence of donor-only and acceptor-only constructs in B1 and B2 orientations.

      2) From their study of homologous sequences, mutagenesis experiments and swap of helix 1 and 2 of dockerin, the authors provide a solid body of data that shows that specific interactions within the binding interface are not responsible for the swap of binding mode. Instead, their results show that interactions of a C-terminal proline can modulate binding through an elusive mechanism. Proline mutagenesis experiments and enzymatic cis/trans isomerization show significant effects. But the relevance of a prolyl isomerase for the modulation of the dockerin-cohesin interaction in vivo remains speculation. The conclusion calls for additional experiments where, e.g., changes in catalytic activity of cellulosomes are measured upon application of a prolyl isomerase. Alternatively, the packing of enzyme subunits in the dense cellulosome may be responsible for alternate binding. Such protein-protein interactions may also modulate a proline interaction.

      3) An allosteric mechanism of the proline interaction in modulating binding, as proposed in this work, is not sufficiently supported by the data presented. The flexibility of the C-terminal tail of dockerin, which hosts the proline, and its close proximity to the cohesin binding interface, evident in structures (please provide PDB IDs in Fig. 1), may allow a direct interaction of the proline with cohesin.

      4) The impact of the intrachain proline/tyrosine interaction on binding, however, identified by the authors, is very interesting. This finding calls for further investigations on mechanistic details. Here high-resolution techniques, like NMR, which can provide atomic details of protein structure and dynamics, are desirable. Such experiments could help to identify potential allosteric effects on the conformation and thus on binding.

      5) Having said that, the authors state (in the abstract and introduction) to have performed NMR experiments in their study. But no NMR data are shown or discussed in this manuscript.

      6) If the C-terminal proline was a biologically relevant switch that modulates binding, this residue should be conserved. Have the authors checked conservation of the C-terminal proline in homologous sequences?

      7) The authors conclude to have identified a cohesion-dockerin "code". The word "code" in this context is unclear to me. What do the authors mean by "code"?

      8) The authors conducted and analysed a set of kinetic experiments. But these experiments are not described at sufficient detail in the results and methods sections.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers find your work of interest and acknowledge your development of an elegant smFRET assay that can detect and quantify populations of cohesion-dockerin binding orientations. They further acknowledge your interesting finding of a role of the molecular clasp in modulating binding orientation involving a terminal proline. The reviewers find, however, that your conclusions of an enzymatic and allosteric control mechanism present in the cellulosome is not sufficiently supported by the data presented. The study lacks molecular-level information required to identify allosteric effects, which could, for example, be obtained using NMR spectroscopy that falls short in the present work. The proposed Monte Carlo approach and coarse-grained computer simulation does not provide sufficient molecular details and dynamic information to obtain mechanistic insight. There are further issues with the kinetic experiments. Some reported quantities are within error and controls are required to exclude artefacts.

    1. Reviewer #3:

      In this manuscript by Hansen et al., the authors describe three low (3.0 to 4.0 Å) resolution crystal structures of Ca2+-ATPase from Listeria, a gram positive bacterium. Two are crystal structures of wild type protein with B eF3- and AlF4- in the absence of Ca2+, thus, likely to represent the E2P ground state and E2~P transition state. The third one is a structure of a G4 mutant, in which 4 Gly residues are inserted into the A-domain -M1 linker, with BeF3- and Ca2+-present in crystallisation, designed to capture the E2P[Ca2+] state. Authors state, however, the three structures are virtually the same and that the E2·BeF3- crystal structure represents a state just prior to ("primed for") dephosphorylation. They also propose that proton counter transport "mechanism" is different from that of SERCA.

      As Listeria Ca2+-ATPase has been studied by a single molecule FRET, its crystal structures will certainly contribute to our understanding of ion pumping. Furthermore, different from SERCA, Listeria Ca2+-ATPase transports only one Ca2+ per ATP hydrolysed. Therefore, how site I is managed is an interesting topic, although let's not forget the same 1:1 stoichiometry is observed with plasma membrane Ca2+-ATPase (PMCA), for which an EM structure appeared in 2018 (ref. 9). The authors indeed find that the Arg795 side chain extends into binding site I. This part is solid and a more elaborate (and interesting) discussion could be made than what is currently described.

      Another solid finding is that the two E2·BeF3- crystal structures are similar to the E2·AlF4- crystal structure, although how similar is unclear as a structural superimposition reporting an RMSD is not provided and the presented figure makes it difficult to judge directly; the structures are viewed from almost one direction, which makes it infeasible to discern the differences in M1 and M2 and in the horizontal rotation of the A-domain. Two or three structures are superimposed, but with cylinders and again viewed from only one direction. As the authors designate that the structures represent H+ occluded states, it is important to clearly show the extracellular gate is really closed to H+ (not only to Ca2+ as well). For completeness, they should also examine the effect of crystal packing on the A-domain position.

      With regard to the point that the E2·BeF3- structure is "primed for dephosphorylation", only Fig. 2 is shown, in which differences appear to be the path of the TGES loop and the orientation of the Glu167/183 side chain. Their atomic models show that there is plenty of space for the Glu167 sidechain to take an orientation similar to that of Glu183 in SERCA. The authors should, however, provide an omit annealed Fo-Fc map for the Glu167 side chain and explain why that is the preferred and only orientation. If a Glu side chain is free to move, it could adopt in less than a nanosecond a different orientation. If it does, then the difference in the orientation of the Glu side chain does not sufficiently explain "the rapid dephosphorylation observed in single-molecule studies". The authors place further emphasis on proton occlusion and countertransport. However, this part of the manuscript is more speculative and, as detailed later should, at least, be entirely moved to the Discussion section.

      As mentioned, the authors place a larger emphasis on proton countertransport. Here a number of issues show up. First of all, I think they have frequently used the term "occlusion" improperly. From my understanding, occlusion of a site (or ion) means that the site (or ion) is inaccessible from either side of the membrane. This means more than closure of the gates, as the two gates have to stay closed for a substantial length of time (i.e. locked). It is experimentally well established with SERCA that Ca2+ ions are occluded in E1P species. It can be shown that the lumenal gate is closed for Ca2+ in the E2 state. However, that does not necessarily mean that the gate for H+ is also closed. As far as this reviewer knows, nobody has actually demonstrated that H+ is occluded, even in the E2 state of SERCA.

      Furthermore, the authors presume that protons enter the binding sites through a different pathway from that used for Ca2+ release, citing ref 26. However, if it does, can closure of the gate for Ca2+ really mean closure for the gate for H+? This seems a contradictorily statement as the authors designate that the E2·BeF3- state in Listeria Ca2+-ATPase as a proton occluded state (p.12). Apparent closure of the gate for Ca2+ on the extracellular side in a crystal structure seems insufficient for such a statement. One must keep in mind that a crystal structure merely provides a possible conformation in that particular state. It may not, however, represent the most populated conformation for that state. It is equally plausible that the E2·BeF3- complex takes a closed conformation for only a small fraction of the time. At this resolution it is simply not possible to determine if H+ occupies the binding site in the crystal structure. Furthermore, although it may be possible to show the gate is closed for Ca2+, it would be very difficult to show the gate is closed for H+. Thus, more experimental evidence is required to support that the structure represents a H+ occluded state.

      The authors write in the Abstract "Structures with BeF3- mimicking a phosphoenzyme state reveal a closed state, which is intermediate of the outward-open E2P and the proton-occluded E2-P* conformations known for SERCA". In essence this statement is fine, although what "closed" means is still unclear to me. In Figure 1, the authors state that "LMCA1 structures adopt proton-occluded E2 states". This statement is a bit misleading, because, in E2·BeF3-, the lumenal (extracellular) gate can in fact be opened and closed, at least with SERCA. As the authors recognize (p.14), the BeF3- complex of SERCA can be crystallised in two conformations, one with the lumenal gate is closed (with thapsigargin) and the other with the gate open; yet, they write "In SERCA, the calcium-free BeF3 -complex adopts an outward-open E2P state,..." p.8). This is for lumenal (extracellular) Ca2+, not for H+. Further evidence is required to establish that the extracellular gate of LMCA1 is fixed in a closed position for H+ in E2·BeF3-. Again more experimental evidence is required to support that E2·BeF3- is a H+ occluded state.

      The authors write that "SERCA has two proposed proton pathways: a luminal entry pathway [26] and a C-terminal cytosolic release pathway [27] (p. 9). One has to be careful here, as the luminal entry pathway has not been experimentally confirmed in SERCA. The authors write that "The luminal proton pathway has been mapped to a narrow water channel …” [26]. But since the pathway is not confirmed in SERCA I don't think it can be used to justify that the corresponding part of LMCA1 is mainly hydrophobic and that protons cannot enter through this pathway.

      The description on the exit pathway for H+ also needs clarification. They describe (p. 10; first line) "In SERCA it consists of a hydrated cavity...[27]. ... M7 in LMCA1 further blocks the pathway ... and LMCA1 therefore does not appear to have a C-terminal cytosolic pathway either" and rationalize that "This may explain why no distinct proton pathways are required in LMCA1". I think it should be made clearer that this is a proposal rather than an established fact.

      As H+ release takes place in the E2 to E1 transition the authors state that the E2·BeF3- structure of LMCA1 is different from that of SERCA. However, I don't think they can confidently make such statements without E1 and E2 structures of LMCA1. Furthermore, these descriptions (discussion) should not be in the "Results" section. As they conclude that LMCA1 use the Ca2+ release pathway, which is assumed to be the same as that in SERCA (even though no Ca2+ release pathway is visualised in their crystal structures), for H+ entry, why does SERCA not use the same pathway? I think experimental evidence is required for a proposal that H+ binds to E309 from the cytoplasmic side.

    2. Reviewer #2:

      The manuscript by Hansen et al. presents three new structures of LMCA1, Ca2+-ATPase 1 from Listeria monocytogenes. They determined structures with BeF and AlF, and a Gly4 linker form of LMCA1 in complex with BeF. This latter structure is at 3 Å resolution and was very challenging. The other two structures are at low 4 Å resolution. These structures are a follow up to an excellent single-molecule fluorescence study of the same enzyme. The structures support the main conclusion of that work that LMCA1 more rapidly progresses through the dephosphorylation step of the reaction cycle. The manuscript is well written, the structures and findings are interesting and make a significant contribution, and the work seems ideally suited for this journal. There are no substantive concerns with the manuscript. Overall the R factors are high for the structures, particularly the 3 Å resolution structure for which they should be lower. However, the authors offer a reasonable explanation for this in the supplemental information provided.

    3. Reviewer #1:

      Structural comparison is an important tool to understanding how proteins function at the molecular level. The mechanistic premise of obtaining LMCA1 structures from the gram-positive bacteria Listeria monocytogenes was to understand how Ca2+ pumps have different Ca2+ stoichometies to the mammalian SERCA and how they are proton coupled differently. Per molecule of ATP hydrolyzed, SERCA exports two Ca2+ ions in exchange for 2 or 3 protons, whereas LMCA1 exports a single Ca2+ and perhaps 1 proton in return.

      The paper describes two intermediate states of LMCA1 and from my understanding a mechanism is proposed based on structural differences in ionisable groups at the Ca2+ binding site, in particular the positioning of Arginine 795 that in SERCA is a glutamate. Since a previous crystal structure of LMCA1 was determined the new mechanistic insights rely heavily on the details achieved by the improved resolution. While this is technically an important achievement, just the assignment of side-chains in the current structures is not sufficient to reach the mechanistic conclusions reached and, as such, the current paper is unfortunately too preliminary. Proton-coupling pathways are mechanistically difficult to detangle and require extensive experimentation, such as ITC, mutagenesis and transport measurements as well as computational approaches. Indeed, ion or proton coupling pathways that alter energetics are rarely just the result from differences in a few residues. For example, glucose (GLUT) transporters are passive sugar transporters, whilst the bacterial counterparts are proton coupled. The proton coupling in the bacterial proteins is due to single aspartic acid residue in TM1. Whilst one can convert the bacterial sugar transporters to be no longer proton coupled by the mutagenesis of this TM1 residue to asparagine, you cannot make GLUT transporters proton coupled by mutating the corresponding asparagine residue to aspartic acid.

      One would have liked the authors to biochemically demonstrate how they could evolve LMCA1 to function similar to SERCA. This would have broader implications in our understanding of how biological systems can evolve substrate coupling and energetics.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      We all agreed that the LMCA1 complex structures are an important step forward for providing a structural framework for piecing together an ion pumping model to follow on from the previous smFRET studies. Nonetheless, two of the reviewers think that the mechanistic conclusions reached - based solely on crystal structures - require further validation. In particular, further experimental work (and likely computational) is required to i) confirm the hitherto designated crystallographic "states" and ii) to begin clarify how LMCA1 and SERCA have different Ca2+:H+ stoichiometries as there are other, plausible models.

    1. Reviewer #3:

      Bolze and colleagues describe a new database of mitochondrial variation that consists of a greater number of samples than existing databases. To overcome some of the limitations of existing databases, they use the same sequencing pipeline for all samples, do not select for any particular phenotypes, and report both heteroplasmic and homoplasmic calls. They demonstrate the utility of their database by defining intervals of invariable regions, which may indicate mutational constraint and could aid in interpreting candidate variants in disease patients. The authors also calculate the filtering allele frequency for LHON variants and suggest that the allele frequencies for many LHON variants in their database and UKB are too high for the variants to be considered pathogenic and that they should be reclassified. The main limitations of this database, as stated by the authors, are the lack of diverse haplogroups and the relatively low depth of coverage considering the variable heteroplasmy of the mitochondria. The technical aspects of the data aggregation and database are solid, and the scientific analyses are sound. I have only a few comments that would strengthen the paper.

      1) There is no discussion of how to distinguish heteroplasmy from sequencing errors. While some filtering was done akin to germline variant filtering (particularly that calls at positions with fewer than 10 reads were removed), this could still result in a ~1/11 variant being called as heteroplasmic (at 9%). The spike in Figure 3F (final panel) around 90% ARF could suggest that something like this could be happening (homoplasmic variants with sequencing errors reverting to another base). Was there a minimum heteroplasmy level used for this analysis? Perhaps showing these plots filtered to a minimum of 2, 5, etc of the same alternate allele would reveal a sensible cutoff that could then be used for the whole paper.

      2) Line 484: This is the only mention of NUMTs in the paper, but the complications that can arise from them are not detailed by the authors. Considering the mitochondrial coverage, how confident are the authors that their low heteroplasmic calls are not false positives resulting from NUMTs?

      3) Along the same lines, the authors use HaplotypeCaller, which is a standard tool for germline variation but not optimized for mitochondrial calling. Was this run in haploid or diploid mode? It would be useful to state the limitations of using this tool to call mitochondrial variants as it is designed for diploids.

      4) The suggestion that "all protein-coding genes in the mitochondrial genome were highly intolerant to LoF variants" is certainly plausible, but not definitive from the current data. While 0 LoFs are observed, how many would be expected? If these genes are small (which they must be since they are on a very small chromosome), the number of expected variants based on a mutational model (akin to [Samocha et al., 2014]) would likely be <1, and thus 0 would not necessarily be remarkable. Given that, you may not be quite powered to do this at a per-gene level, but pooling all the genes may provide enough power to make a broader statement. The same goes for the % of bases invariable analysis (Figure 5) - it would be good to make this more quantitative, perhaps comparing these proportions to autosomes, or within each other (are the tRNA and rRNA ones significantly different from the protein-coding? Would it be possible to split protein-coding by synonymous, missense, LoF?).

      5) "Indeed, we found that no haplogroup markers -- even those from haplogroups not represented in our dataset -- were mapped to these highly constrained regions" - is this not circular? Markers that delineate haplogroups are found as homoplasmic calls that were used to determine the constrained regions, so it stands to reason that these would not be found in them, no? But perhaps I'm missing something.

    2. Reviewer #2:

      The authors represent a resource of human mtDNA variants and heteroplamies from 195983 individuals, and scoring 14,324 mutations. The resource is of value. It may be possible to criticize the European ancestry- heavy data set, and the American specificity of it, but the authors fully acknowledge and disclose this in their manuscript, and make the data available to others to continue the work. Other high depth human papers are out there (Wei 2019 reference) and others, but the data is often not available due to patient confidentiality issues as in Wei 2019. Having this dataset available is of great intrinsic value.

      I only have a few comments that would require looking into the data for a few small things, or changing the writing of the manuscript.

      Comments:

      1) My biggest concern is that the authors use a read-aligning method where they take in all calls where the was at least 1 read mapping to mtDNA. The logic seems to be that they do not want to discard reads that may "mis-map" to the NuMTS, but this leads to another, potentially larger problem of potentially including NuMTS as heteroplasmic variants (See PMID: 23972387). For instance, the recent claim of paternal mtDNA transmission appears to be the result of a complex NuMT that was able to amplify in the strategies used in the original study (PMID: 32269217). More details on how the authors exclude the possibility of NuMTS incorporation are needed, especially in light of the 1+ alignment parameters used.

      2) Line 340 - 357 - regarding LHON. The problem with choosing LHON for this analysis is that it has a complicated clinical manifestation, which may not support the handling of the 14484t>C allele in the manner present. First, the 8:1 male to female ratio in becoming afflicted (with homoplasmic LHON), the fact that many people with the homoplasmic allele will not become afflicted, and the fact that it can onset late in life (after having children) all could contribute to it's allele being more representative in a random sampling of the population.

      While the authors are correct that the allele on its own may not be pathogeneic in specific haplogroup backgrounds (Howell 2003 reference), or require the co-expression with secondary "affector" mtDNA mutations (ex. PMID: 25342614 - alleles including 3397A>G, 3497C>T, 3571C-T, 3745G>A, and other "helper" mutations in MitoMap). The paper need a bit more on the 14484 conclusion due to all of these issues. Perhaps finding linkage (or lack thereof) to these helper alleles would strengthen this section sufficiently.

      3) Lines 206 - 207. How did the authors handle AGG / AGA codons? In 2010 a lab published evidence that AGA and AGG may not be true stop codons, but are simply not coded in the human mtDNA genome (PMID: 20075246). While this finding remains not universally accepted, it does explain the lack of an AGA/AGG-binding translational termination factor in the mitochondria. It is possible that the authors are in a position to comment on the behaviour of AGA or AGG codons, relevant to their section on PCG-truncating mutations.

      4) The work - especially discussing the control region, overlaps a bit more with Wei et al. 2019 than the manuscript lets on. A bit more direct openness about this overlap and similar finding should be introduced into the manuscript, within the discussion.

    3. Reviewer #1:

      Bolze et al. report their effort to sequence the mitochondrial genomes of ~200,000 individuals. The authors generated a large, unified database that can be used for the investigation of mitochondrial mutations and the prediction of pathogenic alleles. Importantly, it addresses key limitations of other currently available sources, mainly it is not biased for mitochondrial diseases, all analyses were done in the same lab and using the same bioinformatics tools, and heteroplasmic alleles are reported. The authors then use their source to draw conclusions on the nature of mitochondrial mutations, their distribution across the mt-genome, and to challenge previously annotated pathogenic mutations, specifically for LHON disease.

      For example, figure 3A, which is one of the main take home messages from the paper, does not reflect hardly any "interesting" alleles. The vast majority of the >14,000 discovered variants cannot be seen on the plot. Unfortunately, many of the plots display the same data in similar, and unnecessary formats, making the figures dense and confusing. Examples include figure 3F (mean and max ARF distribution) and figure 5A, B & C.

      Another, and more concerning issue, is the quality of heteroplasmic variants. The authors mention very briefly in the Methods section what was done to consider NUMTS - nuclear copies of mtDNA - that may be mutated and thus bias SNV calling. From their short description, it seems like NUMTS could be a source of errors. Furthermore, Figure 2E shows that the vast majority of individuals had {less than or equal to}1 heteroplasmic variation. This observation cannot be reconciled with the basis underlying current methods to infer cellular lineages based on heteroplasmy in a cellular population (PMID: 30827679).

      These issues are particularly critical when using the data to draw conclusions on the pathogenesis of mutations, which is the focus of the last part of the manuscript. When considering the effect of m.14484T>C mutation on LHON disease, the authors argue that this mutation should be reclassified as non-pathogenic as it satisfies the "Bening Strong 1" criteria. Given the above limitations, this is certainly too strong of a conclusion. Stronger evidence for this claim is required, especially since all subjects carrying this mutation are from the same haplogroup.

      Lastly, to assess the probability that m.14484T>C is indeed non-pathogenic, the authors use previously published estimates of the "maximum credible population allele frequency". Despite the abundance of papers that estimate these parameters, the authors provide only one number, with no error or range estimates, and show that the frequency of m.14484T>C is higher than expected. It is important to understand what is the certainty of this claim, and ideally to reflect it as a range around the dashed lines in Figure 6.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      Bolze and colleagues describe a new database of mitochondrial variation that consists of a greater number of samples than existing databases. To overcome some of the limitations of existing databases, they use the same sequencing pipeline for all samples, do not select for any particular phenotypes, and report both heteroplasmic and homoplasmic calls. They demonstrate the utility of their database by defining intervals of invariable regions, which may indicate mutational constraint and could aid in interpreting candidate variants in disease patients. The authors also calculate the filtering allele frequency for LHON variants and suggest that the allele frequencies for many LHON variants in their database and UKB are too high for the variants to be considered pathogenic and that they should be reclassified. The main limitations of this database, as stated by the authors, are the lack of diverse haplogroups and the relatively low depth of coverage considering the variable heteroplasmy of the mitochondria.

      While the database is indeed unique and will likely be very valuable for the community, on the whole, the computational analyses are in several places superficial, in some cases even flawed and overall not as well presented as they could be.

    1. Reviewer #3:

      General assessment:

      This manuscript examines publicly available genomes of a number of Enterobacteriaceae species, and makes statements regarding their evolution, geographical distribution and antimicrobial resistance. While repurposing existing data can add value, such analyses must be carefully done and inferences only made after assessment and consideration of the potential limitations and biases of such data. Currently, the rationale and methods for performing the analyses outlined in this manuscript are not sufficient to support the conclusions. Following critical evaluation of the metadata associated with the genomes, and more robust analyses, useful insights may be obtained.

      Numbered summary of substantive concerns:

      1) More justification for examination of these particular bacterial species is required. For example, only 59 M morganii genomes were included; given these small numbers, how big is the clinical problem, and is a global analysis really possible?

      2) There is no description of inclusion / exclusion criteria for these genomes. It is clear that most genomes derived from the United States; a full description of the selection process will provide a greater understanding of potential bias, which could affect the results and conclusions reached.

      3) A number of outbreaks are stated to have been observed, but there is no robust evidence presented to support such identification, other than presumably clustering in the phylogenetic trees. More generally, without proper evaluation of the metadata associated with the genomes, there is a large risk that any observations (regarding similarity or clustering, or higher prevalence of resistance determinants, etc) are merely due to the nature of the genome collection rather than true biological or epidemiological relatedness. A critical evaluation of the representativeness of the genome collection is required.

      4) Various qualitative statements on differences between species or clades are made, such as the relative richness of resistomes, but (in addition to the issue described in the previous point) such statements require the use of appropriate statistical tests. Definitions are required for terms such as "closely related", "comparable" resistome diversity etc.

      5) The analyses performed are currently not sufficient to underpin many of the statements made in this manuscript regarding the evolution and transmission of these bacteria. For example, the trees presented in the figures appear to be cladograms, therefore the branch lengths are meaningless. Branch lengths are important in this context. Also, the phylogeography was evaluated by mapping genome origins physically onto a map, but there are more sophisticated approaches for this (eg phylogenetic diffusion models), though such analyses may regardless be heavily biased by the nature of the genome collection.

    2. Reviewer #2:

      This manuscript presents species-by-species analysis of presence and distribution of antimicrobial resistance (AMR) genes for the less isolated Enterobacteriaceae species using the genome and meta data registered in PATRIC database. It is valuable, but most analyses are not quantitative but just descriptive, and sentences describing the results are not easy to read. The phylogenetic tree and heatmap indicating presence of AMR genes are presented for each species, but it's hard to understand what the main message is in each figure, and what are characteristics of a species compared to the others. The current manuscript will be useful as a dictionary indicating the presence of a specific AMR gene in each species for researchers in AMR.

      -Each figure should have legends to let readers understand which color indicates what at a glance. Information of geographical region should be clearly indicated in the figure, in particular when it is mentioned in the main text. Also, what do the different colors in the strain names in the tree mean?

      -The Method section is too simple and lacks sufficient explanation. For example, what is a criterion to judge presence of an antimicrobial resistance gene?

      -The list of detected AMR genes at the top should be clearly categorized using different colors and headers (e.g., "ESBL", "AmpC" etc)

      -L126: what is the "outbreak"? I cannot tell in the figure and how it was defined.

      -Examples of the not quantitative but just descriptive explanations are L135 "richer resistome" and L136 "common". Why do the authors not specifically present the number and percentage?

      In the entire text, the authors do not conduct any statistical test to judge significance of the difference they mention.

    3. Reviewer #1:

      Sekyere and Reta present a comprehensive descriptive characterization of the epidemiology, phylogeographical distribution and antibiotic resistance profiles of six species of Enterobacteriaceae. Using a total of 2377 publicly available genomes, the authors show many multidrug resistant clones that are distributed worldwide. This study potentially provides important insight into a group of clinically relevant bacteria that remain poorly characterized compared to their more well-known relatives. Below are my comments.

      Major comments:

      1) The entire study is basically a descriptive enumeration of the resistance characteristics six different bacterial species based on genome sequences, with numerous reference to "less" or "more" or synonyms of these words (a few examples are line 140 "richer resistome diversity", line 157 "lesser resistome abundance and diversity", line 163 "richly endowed", line 215 "fewer resistome diversity and abundance", line 217 "sparse", lines 218 and 221 "virtually absent", line 222 "substantial abundance", line 244 "richest abundance of resistomes"). The lack of statistical analyses to compare lineages/clusters of the same species and between species and determine significant differences among them is problematic. Throughout the text, there is no reference to specific numerical values (e.g., p values) when making these comparisons.

      2) Similar to my comment above are the references to "short (or close) evolutionary distance" (for examples, lines 131, 208, 228, 265, 432, 439). How was evolutionary distance measured - number of SNPs, phylogenetic distance, average nucleotide identity? This "closeness" or "shortness" should be explicitly stated in terms of number, for example number of SNPs.

      3) The Methods section needs more details. I have listed my specific comments on methods below.

      3 a) Lines 504-511: How many genomes were initially downloaded? Were these genomes complete or in draft stages? How were these filtered and the final 2377 genomes selected? What were the criteria for selecting the 2377 genomes - number of contigs, size of genomes, assembly quality, available metadata, etc - or did the authors use programs that check genome quality such as CheckM? Line 510 "filtered to remove poor genome sequences" How is poor defined here?

      3 b) Line 517: How were the 1000 genes used for phylogenetic reconstruction selected?

      3 c) Lines 522-525: Simply drawing the distribution of subspecies and species on a map does not constitute a phylogeographical analysis. There are many biases that can influence the geographic distribution of microbes, most notably the sampling scheme used (for example, more samples from a single country or from a specific host/environment/setting), the composition of the database being used (NCBI and PATRIC in this study) and the collection of more strains of a single species and fewer strains in other species. The current study, similar to many others, has these biases and were in fact mentioned in the Results section. How do the authors address these biases?

      3 d) Lines 526-531 Resistome analyses: The current study is basically a summary of the information from the NCBI Pathogen Detection database. The authors need to briefly describe how resistance genes were identified in the genomes from this database. Since the entire study and all figures focus on the ARGs, authors need to show the reliability and confidence on how these were identified.

      4) Results, lines 187-188: Citation for "local and international outbreaks" needed. How did the authors come up with the inference that lines 183-186 represent outbreaks? Analyses of outbreaks require information on dates of sampling, which are lacking from this dataset. Hence, to make inferences that such topologies in the tree represent outbreaks is quite a stretch. I suggest that the authors either carry out temporal analyses of their data to be able to say that there were outbreaks or remove suggestions of the occurrence of outbreaks.

      5) Discussion, lines 447- 457: I agree that both vertical and horizontal modes of evolution of resistance bacteria are important mechanisms in the spread of resistance in many pathogens and there are numerous previous studies that have reported this. However, the study did not carry out any specific analyses on HGT and vertical evolution, hence to say that "both phenomena are being observed" (lines 455-456) is misleading.

      6) Discussion or Conclusion: The authors mentioned that a limitation in their study is that the genomes they downloaded were those available only up to January 2020. I think there are a few more important limitations and caveats that need to be discussed (for example, see comment 3.c above)

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on medRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that the topic is interesting in principle, i.e. tracking antibiotic resistance globally in less well-studied but nonetheless clinically important bacterial species. However, the reviewers also had several major concerns, with the main concerns being:

      1) Overall lack of rigor in the analysis. This is due in large part to a lack of precision in the methods, e.g. differences in diversity are not statistically supported, lengths of evolutionary distance are not defined, the definition of a resistance gene is unclear, how an outbreak is defined is unclear.

      2) The paper does not address biases in sample collection. Since the data were taken from a central repository, there are many different studies included, each with their own biases. It is important to address these biases when comparing datasets from different groups and from different geographical locations.

      3) There is insufficient evidence to make claims about horizontal gene transfer.

      The individual reviews provide more details on each of these points.

    1. Reviewer #3:

      In this paper, Barnett and colleagues used network-based, data-driven analyses to characterize how the default mode network (DMN) and the Medial Temporal Network (MTN) interact with the hippocampus. First, the authors confirmed previous findings that the MTN is a distinct network from the DMN. Second, the authors identified three subnetworks of the DMN that differ from each other based on their connectivity profiles. They further investigated cross-network and intra-network dynamics during rest and also the representational similarity of patterns within these networks during a memory retrieval task. Finally, they used meta-analytic analyses to develop hypotheses about the specific cognitive functions of the MTN and DMN subnetworks.

      Major comments:

      1) One noteworthy aspect of this paper is that the networks identified by the current investigation do not map on perfectly to a previous framework outlined by the senior author (the AT-PM framework; Ranganath and Ritchey, 2012). I think that readers of this work will be very curious to hear about this update, and I think that the similarities and differences between the AT-PM framework and the current findings should be made crystal clear. For example, perhaps a schematic could be used to visually depict the similarities and differences.

      2) In addition to this suggested visualization, I think that memory scholars that are familiar with the AT-PM framework will be curious to know how these results can update the current thinking of how different brain networks organize memories and perform different types of cognitive functions. The meta-analysis partially does this, but one is left wondering about how this changes our updates the field's understanding of how specific types of memory (e.g. object versus scene memory as in Maass et al., Brain, 2019) are supported.

      3) The authors state in the methods, "This sample size is comparable to the cohort sample sizes from the seminal Power et al., study investigating functional brain organization." I think a bit more can be said about the effect sizes reported in the previous literature (which might be inflated due to publication bias), and the power to detect such effect sizes (or smaller) here.

      4) I found the results reported in the section "Regions within the same community represent similar kinds of information during a memory task" difficult to follow. Moreover, I was not sure what this analysis provides beyond the resting state analyses. This paper would be strengthened if these analyses were linked to behavioral performance on the memory retrieval task.

      5) I was surprised to see that the Anterior Hippocampus was more highly correlated (numerically) to the DMN (Supplementary Table 1) and the MP and PM sub-networks (Figure 4) compared to the MTN network. Is this difference statistically significant, and, if so, do the authors think that this difference is meaningful?

      6) Tau spreading models have been demonstrated to follow patterns of function connectivity (Franzmeier et al., Nature Comms, 2020). The authors may wish to comment on the relevance of these findings to different patterns of tau accumulation in different types of dementia.

    2. Reviewer #2:

      Overall, I thought that the topic addressed and approaches used were interesting and in particular I appreciated the motivation of relating data-driven analyses of resting state data to existing theoretical frameworks and task-based data. As described below, I believe the manuscript could be strengthened with additional comparison to past work as well as addressing a potential methodological issue.

      1) As noted by the authors, past work has used data-driven approaches on resting state data to subdivide the default mode network. The manuscript would be strengthened by highlighting the similarities/differences of the current work with such past work. In terms of revealing subnetworks, Is it believed that some aspects of the data acquisition/delineation methods employed here are preferable? MTL signal dropout was mentioned in the discussion, but was this a major motivating factor? Might there be any way of quantifying or tabulating the differences between the proposed subdivisions here and other efforts in order to help bridge the current findings to past work and to assess how and why the current results might differ?

      2) The motivation to link data-driven network clustering approaches (e.g. the MTN and DMN subnetworks found here) with more hypothesis-driven approaches (e.g. the PM/AT framework) is a key strength of the study, although the findings and conclusions drawn about the relationship were a little difficult to fully understand. For example, how functionally distinct are the MTN and the PM/AT DMN subnetworks given that the PM/AT framework highlights the distinct contributions of subregions of the MTN (e.g. PHC/PRC)? Is it thought that there is a distinction between PM/AT pathways that spans DMN and MTN but is not captured here or do the findings suggest that a better distinction in terms of understanding hippocampal-based memory in the brain is between DMN subregions and MTN? Relatedly, might it be possible that the DMN subnetworks connectivity with the hippocampus is mediated by MTL subregions? More generally, this comment is intended to probe the authors as to whether they believe that the data-driven and hypothesis-driven are reconcilable or if they are arguing that the data-driven approach is preferable.

      3) To what degree might the spatial proximity of the ROIs influence the results of the various analyses? In particular, I wonder if the analyses done using pattern similarity might be influenced by partial non-independence of adjacent ROIs. That is, adjacent ROIs might have correlated pattern similarity due to smoothing and other sources of voxelwise spatial non-independence, and so insofar as there are more nearby ROIs within networks than across networks, it might influence the observed results. Similar concerns might be applicable to the Participation analysis, but seem less obvious.

    3. Reviewer #1:

      This paper characterizes resting state functional connectivity across the brain and within memory networks, evaluates whether similar networks arise in a memory-guided decision-making task, and collects descriptions of the function of these networks in prior imaging studies. The authors find that the DMN and a Medial Temporal Network (MTN) can be differentiated, and that there are three subnetworks within the DMN that interact differently with different parts of the hippocampus and that have been ascribed different kinds of functions in prior imaging studies.

      The paper provides a systematic overview and re-examination of multiple approaches that have been used before to characterize networks across the brain and those focused on memory systems. My overall sense is the paper will be very useful to the cognitive neuroscience / memory communities but does not present a substantial theoretical advance. I am also concerned about the interpretation of the memory task connectivity data, as described below.

      Major comments:

      -It seems possible to me that the trial-by-trial RSA analyses run on the task data are picking up on basically the same signal as the functional connectivity resting state analyses. If the authors ran the RSA analyses TR by TR on the resting state data, would that pick up the same structure? Similarly, would the functional connectivity analyses on the task data explain the same variance as the RSA? Univariate signals can drive RSA effects, so careful analyses would need to be done to demonstrate that these methods are picking up on different aspects of the interactions between these regions. Relatedly, if the authors have access to a non-memory task dataset, perhaps it could be useful to show that the results are different in that case.

      -The results are displayed on surfaces, but I think (but am not sure) that all the analyses were done in the volume. Given the interest in the hippocampus and its connectivity, it would be very useful to see results displayed in the volume in addition to (or replacing) the surfaces.

      -By eye, the MP network as shown in Fig 2 looks much less coherent than the other two. It is difficult to see much cluster structure there at all. I am therefore unsure how confident to feel in the existence of this as a distinct network.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers felt that this work represents a useful contribution to the literature, relating different perspectives on the nature of interactions between brain areas and how these interactions may support memory, but that it does not offer a substantial theoretical advance beyond prior work. The reviewers also raise some methodological concerns that the authors may wish to consider.

    1. Reviewer #3:

      In this work, Yao and colleagues described transcriptome profiling of human plasma from healthy individuals by TGIRT-seq. TGIRT is a thermostable group II intron reverse transcriptase that offers improved fidelity, processivity and strand-displacement activity, as compared to standard retroviral RT, so that it can read through highly structured regions. Similar analysis was performed previously (ref. 20), but this study incorporated several improvements in library preparation including optimization of template switching condition and modified adapters to reduce primer dimer and introduce UMI. In their analysis, the authors detected a variety of structural RNA biotypes, as well as reads from protein-coding mRNAs, although the latter is in low abundance. Compared to SMART-Seq, TGIRT-seq also achieved more uniform read coverage across gene bodies. One novel aspect of this study is the peak analysis of TGIRT-seq reads, which revealed ~900 peaks over background. The authors found that these peaks frequently overlap with RBP binding sites, while others tend to have stable predicted secondary structures, which explains why these regions are protected from degradation in plasma. Overall, this study provided a robust dataset and expanded picture of RNA biotypes one can detect in human plasma. This is valuable because the findings may have implications in biomarker identification in disease contexts. On the other hand, the manuscript, in the current form, is relatively descriptive, and can be improved with a clearer message of specific knowledge that can be extracted from the data.

      Specific points:

      1) Several aspects of bioinformatics analysis can be clarified in more detail. For example, it is unclear how sequencing errors in UMI affect their de-duplication procedure. This is important for their peak analysis, so it should be explained clearly. Also, it is not described how exon junction reads (when mapped to the genome) are handled in peak calling, although the authors did perform complementary analysis by mapping reads to the reference transcriptome.

      2) Overall, the authors provided convincing data that TGIRT-seq has advantages in detecting a wide range of RNA biotypes, especially structured RNAs, compared to other protocols, but these data are more confirmatory, rather than completely new findings (e.g., compared to ref. 20).

      3) The peak analysis is more novel. The authors observed that 50% of peaks in long RNAs overlap with eCLIP peaks. However, there is no statistical analysis to show whether this overlap is significant or simply due to the pervasive distribution of eCLIP peaks. In fact, it was reported by the original authors that eCLIP peaks cover 20% of the transcriptome. Similarly, the authors found that a high proportion of remaining peaks can fold into stable secondary structures, but this claim is not backed up by statistics either.

      4) Ranking of RBPs depends on the total number of RBP binding sites detected by eCLIP, which is determined by CLIP library complexity and sequencing depth. This issue should be at least discussed.

      5) Enrichment of RBP binding sites and structured RNA in TGIRT-seq data is certainly consistent with one's expectation. However, the paper can be greatly improved if the authors can make a clearer case of what is new that can be learned, as compared to eCLIP data or other related techniques that purify and sequence RNA fragments crosslinked to proteins. What is the additional, independent evidence to show the predicted secondary structures are real?

      6) The authors should probably discuss how alignment errors can potentially affect detection of repetitive regions.

      7) Many figures are IGV screenshots, which can be difficult to follow. Some of them can probably be summarized to deliver the message better.

    2. Reviewer #2:

      Yao et al used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) to study apheresis plasma samples. The first interesting discovery is that they had identified a number of mRNA reads with putative binding sites of RNA-binding proteins. A second interesting discovery from this work is the detection of full-length excised intron RNAs.

      I have the following comments:

      1) One doubt that I have is how representative is apheresis plasma when compared with plasma that one obtains through routine centrifugation of blood. The authors have reported the comparison of apheresis plasma versus a single male plasma in a previous publication. I think that to address this important question, a much increased number of samples would be necessary.

      2) For the important conclusion of the presence of binding sites of RNA-binding proteins in a proportion of apheresis plasma mRNA molecules, the authors need to explore whether there is any systemic difference in terms of mapping quality (i.e. mapping quality scores in alignment results) between RBP binding sites and non-RBP binding sites, so that any artifacts of peaks caused by the alignment issues occurring in RNA-seq analysis could be revealed and solved subsequently. Furthermore, it would be prudent to perform immunoprecipitation experiments to confirm this conclusion in at least a proportion of the mRNA.

      3) In Fig. 2D, one can observe that there are clearly more RNA reads in TGIRT-seq located in the 1st exon of ACTB, compared with SMART-seq. Is there any explanation? Will this signal be called as a peak (a potential RBP binding site) in the peak calling analysis (MACS2)? Is ACTB supposed to be bound by a certain RBP?

      4) For Fig 2A, it would be informative for the comparison of RNA yield and RNA size profile among different protocols if the author also added the results of TGIRT-seq.

      5) As shown in Figure 4 C (the track of RBP binding sites), it seems quite pervasive in some gene regions. How many RBP binding sites from public eCLIP-seq results are used for overlapping peaks present in TGIRT-seq of plasma RNA? What percentage of plasma RNA reads have fallen within RBP binding sites? Are those peaks present in TGRIT-seq significantly enriched in RBPs binding regions?

      6) Since there is a considerable portion of TGIRT-seq reads related to simple repeat, one possible reason is likely the high abundance of endogenous repeat-related RNA species in plasma. Nonetheless, have authors studied whether the ligation steps in TGIRT-seq have any biases (e.g. GC content) when analyzing human reference RNAs and spike ins (page 4, paragraph 2)?

      7) As described in Figure 2 legend, there are 0.25 million deduplicated reads for TGIRT-seq reads assigned to protein-coding genes transcripts which are far less than 2.18 million reads for SMART-seq. The authors need to discuss whether the current protocol of TGIRT-seq would cause potential dropouts in mRNA analysis, compared with SMART-seq?

      8) While scientific thought-provoking, the practical implication of the current work is still unclear. The authors have suggested that their work might have applications for biomarker development. Is it possible to provide one experimental example in the manuscript?

    3. Reviewer #1:

      The Lambowitz group has developed thermostable group II intron reverse transcriptases (TGIRTs) that strand switch and also have trans-lesion activity to provide a much wider view of RNA species analyzed by massively parallel RNA sequencing. In this manuscript they use several improvements to their methodology to identify RNA biotypes in human plasma pooled from several healthy individuals. Additionally, they implicate binding by proteins (RBPs) and nuclease-resistant structures to explain a fraction of the RNAs observed in plasma. Generally I find the study fascinating and argue that the collection of plasma RNAs described is an important tool for those interested in extracellular RNAs. I think the possibility that RNPs are protecting RNA fragments in circulation is exciting and fits with elegant studies of insects and plants where RNAs are protected by this mechanism and are transmitted between species.

      I have one major comment for the authors to consider. In my view the use of pooled plasma samples prevented the important opportunity to provide a glimpse on human variation in plasma RNA biotypes. This significantly limits the use of this information to begin addressing RNA biotypes as biomarkers. While I realize that data from multiple individuals represents a significant undertaking and may be beyond the scope of this manuscript, I urge the authors to do two things: (1) downplay the significance of the current study on the development of biomarkers in the current manuscript (e.g., in the abstract and discussion - e.g., "The ability of TGIRT-seq to simultaneously profile a wide variety of RNA biotypes in human plasma, including structured RNAs that are intractable to retroviral RTs, may be advantageous for identifying optimal combinations of coding and non-coding RNA biomarkers for human diseases."). (2) Carry out an analysis in multiple individuals - including racially diverse individuals - very important information will come of this - similar to C. Burge's important study in Nature ~2008 where it was clear that there is important individual variation in alternative splicing decisions - very likely genetically determined. This second suggestion could be added here or constitute a future manuscript.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy Nilsen (Case Western Reserve University) served as the Reviewing Editor.

    1. Reviewer #3:

      In this manuscript Brown et al characterized fatty acid taste discrimination in Drosophila melanogaster. Fat taste is relatively poorly understood, but has critical implications for feeding and obesity research; thus, studies that advance our understanding of the molecular and physiological underpinning of this modality are important. The finding that Ir56d neurons enable organisms to discriminate between short, medium and long chain fatty acids but not to differentiate between types of medium chain fatty acids is certainly novel and interesting. It is also surprising but fascinating that this receptor is only required for the detection of medium fatty acids. The manuscript is well written and the figures presented in a clear and thoughtful manner. These findings lay out ground for future exciting work to investigate how sweet taste and fatty acid taste perception are selectively modulated by the brain since these gustatory neurons overlap and whether such discrimination is altered depending on the state of hunger.

      Strengths:

      1) Despite the overlapping nature of taste neurons in this case, i.e., Ir56d neurons being co-expressed with Gr64f - those that broadly label the sweet GRNs and the fact that Ir56d neurons are responsive to both sucrose and fatty acids; mutation in Ir56d results in loss of taste for hexanoic acid, but not sucrose. Authors use this taste discrimination to their advantage in combination with a robust aversive taste memory assay to address the question of differential fatty acid taste perception.

      2) Authors rule out the potential involvement of olfaction in modulating taste perception.

      3) Use of CRISPR-Cas9 to generate Ir56dGAL4 flies, implying accurate and targeted genome editing, provide validation to the results obtained when Ir56d expressing neurons are silenced. Additionally, use of the fly gustatory system for in-vivo Ca2+ imaging strengthens and corroborates the results at the physiological level, especially the rescue experiments.

      Overall (minor) comments and questions:

      1) Are the differences in taste discrimination between male and female flies?

      2) Individual data points should be shown whenever possible for all figures (except PER because that would make it impossible to interpret).

      3) Can the authors discuss how discriminating between different fatty acids types may be adaptive? Are they found in different food sources, some of which are "good" and some "bad"? Is there evidence from other organisms about this type of molecular discrimination in fatty acid taste?

    2. Reviewer #2:

      In the present paper Brown et al., study the ability of Drosophila melanogaster to discriminate between Fatty Acids (FAs) of different lengths. Using a combination of behavioral experiments, molecular biology and in vivo calcium imaging, the authors show that a subset of Ir56d expressing neurons are able to differentiate FAs. However, the Ir56d receptor is only necessary for the detection of medium-length FAs but not short- or long-. The paper explores in detail the role of the Ir56d receptor as FA detector, a role previously described by the authors in a previous paper Tauber et al 2017.

      Major concerns:

      I consider that the experiments are properly done, and so the statistical analysis, however gain in knowledge is very limited. So far, the authors can prove that flies can discriminate FAs of different lengths, being Ir56d the receptor detecting medium-length FAs, a result that expands the knowledge gained in Tauber et al 2017. In figure 3, the authors show that silencing Ir56d neurons using tetanus toxin expression, reduces dramatically PER to medium-length fatty acids, but not to short or long, pointing to a different set of neurons involved in their detection. However, the in vivo calcium imaging experiments show that Ir56d neurons also respond to short- and long- FAs. In this regard, I disagree with the statement at the abstract: Characterization of hexanoic acid-sensitive Ionotropic receptor 56d (Ir56d) neurons reveals broad responsive to short-, medium-, and long- chain fatty acids, suggesting selectivity is unlikely to occur through activation of distinct sensory neuron populations. In fact, I consider that selectivity would come from the activation of different subsets of gustatory neurons. It seems that Ir56d neurons could be a subset of the neurons that generally respond to FAs, providing the specificity for medium-length FAs. Other neurons, in addition to the Ir56d ones, might be responding to short- and long- FAs in an Ir56d independent manner.

      I consider the authors should explore in deep how short- and long- FAs are actually detected, whether it depends on other Ionotropic Receptors (probably Ir25a and Ir76b might be involved (Ahn et al. 2017)) and which subset of gustatory neurons are actually responding to these compounds, considering they do not require Ir56d nor Ir56d neurons.

    3. Reviewer #1:

      This paper investigates fatty acid taste in flies and asks the broad question of whether flies can discriminate different compounds within a single taste modality. The authors' main finding is that flies can discriminate between long, medium, and short chain fatty acids using a previously established aversive memory taste paradigm. When they delve into the cellular and molecular basis of fatty acid detection they find that IR56d neurons respond to all three classes of fatty acids, but are required only for the behavioural responses to medium chain molecules. Similarly, CRISPR/Cas9 deletion of the IR56d receptor reveals that it too is required only for medium-chain fatty acid responses. Thus, different fatty acid classes presumably activate distinct, but partially overlapping subsets of appetitive taste neurons. In general I think the paper is potentially interesting (see comment 1 below) and the data mostly supports the conclusions. However, there is some lack of attention to details that make some of the data hard to interpret (see minor comments).

      Concerns:

      1) The ability of flies to discriminate between different fatty acid classes is presented as the interesting finding, since, as the authors point out, discrimination between compounds within a taste modality is generally not thought to occur. On the surface I agree that this is interesting. However, in the authors' set up of the main question (line 101), they raise an important issue: "Is it possible that flies are capable of differentiating between tastants of the same modality, or is discrimination within a modality exclusively dependent on concentration?" This should be rephrased to replace "concentration" with "intensity" since not all tastants at the same concentration have the same intensity, and from a behavioural perspective it is intensity that matters. Given that, the authors don't do anything to demonstrate that their discrimination task does not depend on intensity, aside from the fact that 1% solutions of all the FA seem to give similar PER. They need to show more explicitly that this task is truly showing identity-based discrimination.

      2) The second broad concern I have is over the nature of short and long chain fatty acid detection. Interpreting the discrimination results would be greatly aided if we knew what other neurons mediate the PER to these molecules. Is it the non-IR56d population of Gr64f neurons?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers find fatty acid taste discrimination potentially interesting and agree that the experiments are performed to a high standard. One major concern is whether discrimination is based on intensity rather than quality. A second limitation is that the mechanism of FA detection is not greatly advanced beyond the authors' previous work: the cellular mechanisms for long and short chain FA detection remain unclear. The reviewers agreed that if the major concerns of Reviewer 1 were addressed, this manuscript would provide a broader understanding of fatty acid discrimination.

    1. Reviewer #2:

      In this paper the authors use a genomics approach to tackle the question of how the combined transcriptional response to two signals compares to the responses to the two treatments individually. They treat MCF-7 cells with TGF-beta and retinoic acid, and find that the combined response at the level of gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) may encompass additivity, multiplicativity but also a wide range of other intermediate or more extreme behaviours.

      The work is conceptually very interesting, and the manuscript text and figures were extremely clear and a pleasure to read. We suggest that the following major points be addressed to clarify the assumptions and limitations of the study.

      The authors treat the cells for 72h. This is a very long time where secondary effects may be dominating the results. The choice of this time point should, at the very least, be justified and discussed. For example, previous studies that quantitatively characterized distinct temporal dynamics in SMAD signaling after TGF-beta treatment showed a transient, dose dependent SMAD response in the first 4 h after TGF-beta treatment, with a strong early peak in the nuclear/cytoplasmic ratio of SMAD2/4 (Clarke & Liu, 2008; Schmierer et al, 2008; Zi et al, 2011; Zi et al, 2012; Strasen et al., 2018). In addition, TGF-b signaling has been suggested to depend on cell density and cell cycle stage (Zieba et al, 2012), which may also affect the results. Also it would be helpful to have a quantitative measure of the corresponding nuclear TF levels at the selected time-point after 72h (e.g for main affected TFs such as pSMAD2 and RARA levels).

      MCF7 cells were treated with three different doses of TGF-beta (1.25, 5, and 10 ng/mL) and RA (50, 200 and 400 nM). As it seems that the selected doses are higher than what has been used in previous studies, the authors should comment on their choice. The authors state that "We defined a master set of 1,398 upregulated genes by selecting the set of genes that were differentially expressed in any dose of the combination treatment (log FC {greater than or equal to} 0.5 and padj {less than or equal to} 0.05) and that had increased expression in each dose of each individual signal." It is unclear how this gene set relates to the top-right Venn diagram in Fig 1B, in which only 303 genes are shown as being upregulated in all three treatments and the total according to the numbers in the diagram are >1398.

      Fig 1B shows that a large proportion of genes were differentially expressed in response to both signals, but not to either of the signals individually. Their responses are presumably more non-additive than the responses of genes upregulated in response to all three treatments. Restricting analysis to the latter group therefore introduces a bias for certain modes of combinatorial regulation. The justification for this choice should be discussed.

      The authors suggest a bimodal distribution for the observed c values, with peaks at 0 and 1. The authors write that "Our simulated c value distributions bear a moderate resemblance to our observed c value distributions". This conclusion is central to the paper's claim that "Gene regulation gravitates towards either addition or multiplication when combining the effects of two signals" (title) and that "the combined responses exhibited a range of behaviors, but clearly favored both additive and multiplicative combined transcriptional responses" (abstract). However, the additional peak at c=1 is not obvious from the data in Fig. 1E. Stronger evidence (i.e. statistical analysis of the observed distributions) would be needed to demonstrate overrepresentation of c values ~1. Alternatively, the title and abstract could be revised to better reflect the strength of the findings.

      The authors frame the work on the basis of simple models of gene regulation by pairs of transcription factors that predict either addition or multiplication. However, they are activating two signalling pathways that could interact also at the level of signal transduction (and need not be directly regulating the genes in question, as noted in point 1). How justifiable is it to make inferences about the nature of combinatorial transcriptional regulation from this kind of experimental set up? These issues should be made more clear from the beginning, and should be taken into account when interpreting the data.

      Related to the point above, the authors use chromatin accessibility as a proxy for TF binding. However, this does not need to be the case, especially if the accessibility data are considered quantitatively. For example, TFs may bind and recruit remodeling factors that affect accessibility differentially across the genome, obscuring the relationship between TF binding and accessibility. This is especially pertinent at longer time scales after perturbation. We suggest presenting the data on accessibility as just that, instead of presenting it as data that directly reports on TF binding. The relationship to TF binding can and should still be explored in the analyses, but with clarification for how accessibility data is limited in this case.

      The following are instances where accessibility data is described as directly reporting on TF binding that we recommend revising (the list is not exhaustive):

      -the title of section two

      -Fig.2E

      -the link between models of TF control and the relationship between peaks and expression, such as the reference to the thermodynamic model at the end of section 3

      -remove the implicit assumption between cooperativity of TF binding and super-additive peaks in section 3 and section 4. This may help explain more naturally the lack of dual-motif finding in section 4

    2. Reviewer #1:

      Cells perform many types of computations to respond to external signals at the transcriptional regulatory level. Often, regulatory sequences read out the concentration of input transcription factors and combine that information to dictate the level of transcriptional output. Yet, for most genes, the quantitative rules for how regulatory regions integrate multiple inputs remain unclear.

      Sanford et al. studied how two signals are interpreted by downstream genes using quantitative tools such as RNA-seq and ATAC-seq. The authors propose two phenomenological models to understand combinational regulation. Specifically, a model in which output gene expression in the presence of two different input signals is the sum of the gene activity in the presence of each signal alone (additive), and an alternate model where the output of the two signals is the product of the output driven by each individual signal (multiplicative).

      The authors performed a genome-wide analysis of thousands of genes and found that most genes responding to either TGF-β or retinoic acid behave in either an additive or multiplicative fashion. The authors further asked whether these additive/multiplicative behaviors can be explained by the accessibility of DNA regulatory regions reported by ATAC-seq. The result reveals that DNA accessibility is mostly additive. However, they also find that multiplicative gene expression is correlated with super-additive accessibility.

      This work provides a platform to quantitatively assess combinatorial transcriptional regulation both at the level of DNA accessibility and transcriptional output. Indeed, one of the exciting aspects of the work is the attempt to use the quantitative values of DNA accessibility reported by ATAC-seq to constrain possible biophysical models of transcriptional regulation. We foresee that this work will set the stage for a better understanding of the molecular relation between transcription factor binding and the gene activity resulting from this binding, in general, and for dissecting the molecular mechanisms of combinatorial regulation, in particular.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In this work the authors used a genomic approach to investigate the way cells interpret two combined signals versus two individual signals. The authors used RNA-seq to examine the gene expression outputs from thousands of genes in response to two signal inputs, TGF-b and retinoic acid, either individually or in combination. The authors found that when stimulated with both signals, most cells exhibited additive or multiplicative responses. The authors further used paired chromatin accessibility by ATAC-seq to relate such responses to putative transcription factory binding patterns in these genes. Surprisingly, ATAC-seq revealed that most genes prefer addition to combine two signals as chromatin accessibility is largely additive, although some super-additive accessibility may respond to multiplicative gene expression.

      This work provides a platform to quantitatively assess combinatorial transcription regulation both at the level of DNA accessibility and transcriptional output. Although the concept of additive v.s. multiplicative transcriptional response is phenomenological, it may be used to clarify and constrain certain biophysical models of transcriptional regulation and set the stage for a better understanding of the molecular relation between combinatorial transcription factor binding and corresponding gene activity.

      While the work is written in a clear and concise language, there are places that require further clarification and better presentations.

    1. Reviewer #2:

      The authors investigated the joint influences of visual evidence strength and action (un)certainty on the formation of perceptual decisions, and used MEEG to track the associated cascade of visual-motor processing using a relatively complex set of analyses. This manuscript addresses a general question that has already attracted (but also continues to attract) considerable interest. One of the main advances of this specific work (in addition to the advanced MEEG analyses) is the explicit manipulation of action certainty in addition to evidence strength. My enthusiasm for this work, however, remains somewhat limited in light of the following aspects.

      1) The article is set-up from a perspective of adjudicating between strictly "serial models" of perceptual decisions in which decisions are reached about what is viewed before turning to the appropriate action, versus more "continuous models" in which potential action plans are formed while evidence accumulation is still taking place. Is there not already ample evidence for the latter scenario (e.g., the work of Tobias Donner, Floris de Lange, Ian Could, and others)? Moreover, the authors currently provide only a single reference for the serial model, which dates back to 1966. Thus, the temporal overlap between visual evidence accumulation and action planning is, in itself, not very surprising, nor new; and yet it appears a central component of the article's pitch.

      2) While the manipulation of action (un)certainly provides an interesting extension of the popular random-dot-motion task, the nature and rationale of this manipulation remain insufficiently unclear. Do participants view multiple patches of equal coherent motion and arbitrarily decide which to respond to? If so, does this not confound action uncertainty with evidence (i.e., more patches with motion may give more evidence)? And should this not make participants faster, rather than slower? Are they slower simply because they are asked to make a "fresh" response? At a minimum the authors should more clearly explain this manipulation, starting in the Results section. In this, the authors should clarify exactly how visual signals and action certainly are independent in their design, or (as I suspect) acknowledge that the current manipulation confounds action certainty with the availability, collective strength, and/or spatial region of the visual evidence (which may each in turn affect neural signals throughout the brain).

      3) It would help to first show the (basic) effects of sensory and action certainty on time-frequency activity in several brain areas (at least visual and motor), for example by showing power modulations for each of the certainty levels, together with a contrast plot of high vs low certainty. This would help understand the data, before turning to the more complex analyses. Such a plot may reveal, for example, decreased alpha activity in posterior sites with higher action uncertainty, simply as a result of more visual stimulation. If so, this may be problematic for the more complex analyses of transfer entropy. It could also help justify the current focus on beta and gamma (but not, for example, alpha) and to help understand the distinction between modulations in beta and gamma.

      4) I am surprised the authors find a gamma decrease rather than an increase. Does gamma not usually increase with motor preparation (e.g., Donner et al. Current Biology 2009) and visual attention (e.g., Fries et al., Science, 2001; Siegel et al., Neuron, 2008)?

      5) Given that both certainty manipulations affected RT, are all neural correlates of these certainty manipulations not "confounded" with differences in RT?

      6) Do the two uncertainty factors (sensory and action certainty) interact? This information appears missing from the analysis of the behavioural data. Also, if these two factors interact, it would be sensible to also explore this in the modelling and MEEG analyses.

    2. Reviewer #1:

      This study uses combined EEG/MEG to characterise the neural dynamics of the visuomotor decision process by separately manipulating its perceptual- and action-related components. Subjects monitored 4 simultaneous random dot stimuli to detect changes from incoherent to coherent motion, and indicated detection with a finger press. Perceptual and action uncertainty were manipulated by varying the motion coherence of the stimuli, and number of motor response options (1 vs. 3), respectively.

      Authors identify activity in the beta and gamma bands correlating with decision-related trajectories predicted by an accumulation-to-bound model. They reveal distributed networks in both frequency bands that show a negative relationship with the predicted patterns (i.e., desynchronization after onset of coherent motion). Several interesting findings stand out: 1) beta activity follows a gradual progression from posterior to anterior regions, a finding further supported by a connectivity analysis assessing the direction of information flow. 2) The accumulating signals across the identified regions overlap in time, which is taken as evidence for a continuous flow of information along the visual-to-motor pathway. 3) regions where (beta) activity flow is modulated by perceptual (as opposed to action) uncertainty show earlier responses to perceptual evidence, and are more likely to drive the information flow to downstream areas.

      This is overall a well-written, clearly structured paper on an ever-relevant topic. Authors use elegant, rigorous statistical methodology, and their characterisation of beta activity provides some important insight into the global neural dynamics of decision making, in particular the temporal properties of decision-related signals across the perception-to-action processing pipeline. I do however have a couple of points of concern regarding parts of the results (in particular those involving gamma activity) and their interpretation:

      1) Gamma band activity is seen to exhibit a negative relationship with the predicted accumulating signal, with a gradual desynchronisation upon the onset of perceptual evidence (coherent motion). I found this surprising, as several previous studies looking at decision-related activity have shown increases in gamma activity with perceptual evidence (Polania et al. 2014 Neuron, Donner et al 2009 Curr. Biol., Wilming et al. 2020 biorxiv). Is it possible that with the broad gamma range investigated here (31-90Hz) and the spectral smoothing involved, the negative relationship might be at least partly driven by activity in the lower ranges, i.e., qualitatively closer to task/motor-related beta desynchronisation? It would be interesting to see if the significant negative correlation is maintained with a slightly narrower gamma range (e.g., >35Hz or >40Hz). Either way, I think it's important for these results to be discussed in relation to the literature mentioned above.

      2) Regarding the interpretation of the beta-gamma relationship, authors seem to place the results in the context of feedforward/feedback information dynamics (or at least they make several references to the literature throughout the manuscript). I am not sure if I understand or agree with this interpretation - if anything, doesn't the temporal progression of decision-related information for gamma and beta observed here (e.g., Fig. 5b) go against the current understanding of their roles in feedforward and feedback information flow, respectively? Some clarification on this point would be very useful.

      3) While the timing of beta/gamma decision-related accumulation is summarised in Figs. 4/5, I think it would be informative to also include (either in the main figures or as supplement) the actual trial-averaged traces, highlighting the overall timing differences between activity in the two bands (from Fig. 4), as well as the progression across the anterior-posterior axis (shown in Fig. 5).

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      While we found the topic very relevant - especially the role of large-scale beta dynamics in visuomotor processing - and the approach used interesting, our overall enthusiasm was limited by concerns regarding novelty, design and interpretation. Critically, it remains unclear whether we are dealing with narrowband oscillations here, especially regarding the reported gamma band results, but also in terms of separating different oscillatory contributions in the alpha/beta frequency ranges. Since everything that follows hinges on this assertion, one would have to first establish a separation of these different spectral contributions in order to attribute particular dynamics to particular bands.

    1. Reviewer #3:

      This report examines the mechanisms by which the KSHV KaposinB (KapB) protein causes disassembly of processing bodies (PBs) in HUVECs. Convincing data is presented showing that mDia1 and ROCK, factors downstream of RhoA, are necessary for PB disassembly in HUVEC cells. Data suggesting cofilin enhances KapB PB disassembly is less convincing. Over-expression of actinin-1 or directly activating actomyosin contraction favored PB disassembly, implicating mechano-responsive signaling components. Analysis of YAP, a mechano-responsive transcription factor showed that levels were elevated in cells expressing KapB and its knockdown rescued PB formation in KapB expressing cells. Expression of constitutively active YAP promoted PB disassembly, similar to KapB, although it did not reproduce the stabilization of ARE-containing mRNAs seen in KapB-expressing cells. Interestingly, subjecting cells to shear stress or increasing the stiffness of the matrix on which they grow, both thought to activate YAP, recapitulated the PB disassembly phenotype seen in cells expressing KapB and knockdown of YAP abolished this.

      These are interesting and exciting results that further illuminate the mechanisms by which a viral protein perturbs PB function. Perhaps even more exciting is the finding that mechano-sensitive signaling pathways can influence PB formation (and perhaps) function. The data are of high quality and support the major conclusions of the study. However, a couple items could be addressed that raised questions with me. First, there is some question as to whether or not the impact shown is a general effect on PBs as a whole or just on the HEDLS marker that is used exclusively in the study. Showing that another PB marker (or two) behaves similarly would support this conclusion. Perhaps doing this for a few key conditions- such as the shear-stress and expression of constitutively active YAP would be possible. The authors conclude, based on a TEAD-Luc reporter assay, that YAP transcriptional activity is not induced even though it appears to be up significantly compared to controls (Fig S5A, left panel). Could they elaborate on how they arrived at this conclusion? The argument that levels of phospho-YAP are not increased in KapB-expressing cells is not supported by the data. While the ratio may not be different, the total amount of phospho-YAP is clearly elevated, as are total YAP levels. Throughout the manuscript, can the authors comment on the impact of knockdowns on cell viability, morphology, if any?

    2. Reviewer #2:

      The authors demonstrate that disappearance of P-bodies from cells expressing a KSHV protein, KapB, requires factors regulating actin contractility, mechanosensation and YAP - but does not require the transcriptional regulatory activity of YAP. The function of P-bodies has long been contentious, and the endogenous mechanisms regulating P-body assembly vs. disassembly are still being elucidated. Many studies of P-body dynamics have relied on treatment with sodium arsenite, global translational inhibition, etc. This study therefore has the potential to add significantly to our understanding of P-body disassembly mechanisms and improve our understanding of the role of these ribonucleoprotein granules in cells. Several points of data presentation and interpretation may benefit from clarification.

      1) The introduction and discussion present P-bodies as sites of decay of ARE-containing mRNAs, a long-accepted model of P-body function. However, building on well-established observations from the Izaurralde lab that RNA decay is uncoupled from P-body formation, recent work by Parker, Singer, and Chao utilizing single-molecule imaging of 5' end decay provided clear support for cytosolic localization of RNA decay events, with no decay occurring inside P-bodies, strongly supporting a storage/translational repression role for P-bodies rather than a role in decay. The authors then attempt to provide a complex explanation of the observation that constitutively active YAP decouples P-body disassembly from ARE mRNA stability, rather than considering this result in the context of alternative P-body models.

      2) It is unclear why, in Fig. 1B (middle panel), there is a large, statistically significant increase in P-bodies per cell in vector-expressing cells - which do not express KapB - treated with shDia1-1 over shNT - but not with shDia1-2. Is this due to the more efficient silencing of mDia1 expression by shDia1-1, and does mDia1 have a KapB-independent effect on P-bodies? Or does this suggest off-target shRNA effects?

      3) It appears throughout the manuscript that there is always far more dispersion in P-body numbers in experimental (either shRNA or inhibitor-treated) cells than in control cells, though this may be an artefact of the fold-change calculation in which the authors normalize control cells to 1.0 and present no estimate of variance. Especially for experiments in which p values are close to the cutoff for significance, meaningful analysis of variance in all measurements is important and presentation of the raw data pre-normalization may be helpful.

      4) In Figure 4A, are the KapB expressing cells larger than the vector-expressing cells, or is a higher magnification used? The nuclei appear nearly double in diameter. In the immunofluorescence experiment, no other control marker is imaged to support the assertion that YAP signal is selectively increased by KapB expression. No image quantitation is performed to support the assertion that "nuclear:cytoplasmic YAP was not markedly increased". Quantitation across multiple fields of view (and discussion of how many cells were utilized in the image analysis) rather than presentation of a single image would address these concerns. The authors' observation that the fraction of phosphorylated YAP, as measured by Western blotting in Fig. 4B, decreases in KapB expressing cells appears incongruent with the stated lack of change in cytoplasmic:nuclear YAP in KapB vs. vector expressing cells (Fig. 4A).

      5) While I appreciate that the authors have utilized the luciferase assay in multiple studies, direct measurement of the luciferase reporter mRNA stabilities should be performed to differentiate between changes in stability of the ARE mRNA vs. selective translational repression of the ARE mRNA in this specific experimental context.

      6) "Comparison of the transcriptomic data from HUVECs subjected to shear stress from Vozzi et al (2018) (Accession: GEO, GSE45225) to entries in the ARE-mRNA database (Bakheet, Hitti, and Khabar 2017) showed a 20% enrichment in the proportion of genes that contained AREs in those transcripts that were upregulated by shear stress." This comparison (1) lacks any measure that this enrichment is significant, and (2) relies on a single steady-state microarray measurement, and therefore does not accurately report on RNA decay rates/permit conclusions about RNA dynamics.

      7) It is impossible for the reviewer to assess "unpublished data" on autophagy cited in the discussion.

    3. Reviewer #1:

      In this manuscript the authors show that the oncogenic transcription factor YAP is an important factor in the signaling pathway from the Kaposi Sarcoma virus protein KapB via the host cell GTPase RhoA down to the disassembly of processing bodies (PBs). This is in principle an interesting finding. However, the connection between KapB and PB-disruption, between YAP and the Rho pathway, Kaposi KapB and the Rho pathway, as well as the connection between Kaposi virus infection and YAP (and Rho) have been described before. Therefore, this connection alone does not come as a surprise. New mechanistic insight into how exactly YAP contributes in PB disruption is unfortunately missing.

      1) A bit contradictory is that the last author in 2015 was first authoring a paper in which they did not receive a significant PB-rescue with ROCK inhibitor, leading them to the conclusion that contractility and PB-disruption are independent events downstream of RhoA activity. In the current manuscript they now revise this and convince the reader that PB disruption involves contractility (which is also more in line with earlier work (Takahashi et al., 2011)).

      2) The fact that contractility leads to YAP activation is known, but the authors now convincingly show that this does not happen in parallel, but that PB disruption depends on YAP activation. Therefore, the most interesting aspect is that RNAi-mediated removal of YAP leads to suppression of P-body disruption. This finding places YAP as an essential intermediate between contractility and PB-disruption. This reviewer really likes this finding but requests that the authors follow this path a little further and add to the mechanism.

      i) Is it based on a protein-DNA interaction of YAP, i.e. does YAP need to act as transcription factor to induce PB dissolution? And what transcripts would then be induced and be required for PB disruption or dispersal? Could it be something like DICER RISC (Chaulk et al., 2014)? The authors delineate that this first option is less likely to them but no experimental proof is provided.

      ii) The effect of YAP on PBs might be based on a protein-RNA interaction or

      iii) It might depend on a protein-protein interaction between YAP and an unidentified partner?

      iv) Finally, one could ask if PB dispersal is connected to an induction of autophagy?

    1. Author Response

      Summary:

      A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

      We would like to thank eLife for this Preprint Review service.

      In this manuscript, we present for the first time a model of DNA rereplication, which permits us to analyse how the process evolves at the single-cell level, across a complete genome, over time. This analysis revealed a pronounced heterogeneity at the single cell level, resulting in increased copies of different genomic loci in different cells, and highlighted rereplication as a powerful mechanism for genome plasticity within an evolving population. We would like to thank the reviewers for their critical appraisal of our work and the editor for his summary of the reviews. The points raised were overall easy to address, and we have done so in a revised version of the manuscript, where we have also clarified points which were unclear to the reviewers. Importantly, we have clarified that: there are currently no available methods for studying rereplication dynamics experimentally at the single cell level across the genome, and it is exactly this analysis that our manuscript offers; model assumptions were either standard and previously validated experimentally for DNA replication or subjected to sensitivity analysis with key findings shown to be robust to model assumptions; there was no arbitrary cut-off point in the rereplication process, which was analysed over time - an advantage of our approach. Data were depicted early in the process (2C) and late in the process (16C) but findings were robust across the process; fission yeast cells can be experimentally induced to rereplicate to different extents (from 2C to 16C or even 32C) and our model permits us to capture the process as it evolves at any ploidy; correlations between experimental and simulated data were highly significant and robust to model assumptions.

      We would like to thank the reviewers for their comments, which we believe have helped us improve our manuscript and clarify points of possible misunderstanding. A point-by-point response follows.

      Reviewer #1:

      The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

      The manuscript has been modified to further clarify the underlying questions and model assumptions. We would like to point out that the model was presented in detail in the supplementary material of the original manuscript, which included all model assumptions. In addition, model parameters used for the base-case model were systematically varied, the outcome was presented in a separate paragraph (“Sensitivity Analysis” in Results), and findings were shown to be robust to model assumptions. These points are presented in detail below.

      1) It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

      With this work our goal is to elucidate the fundamental mechanisms and properties underlying DNA re-replication. Specifically, we aim to investigate how re-replication evolves over time along the genome, and how it may lead to different number of copies of different loci at the single-cell level and result in genetic heterogeneity within a population. Given the large number of origins along the genome and the stochasticity of origin firing (Demczuk et al., 2012; Kaykov and Nurse, 2015; Patel et al., 2006), it is unclear how re-replication would evolve along the genome in each individual cell in a re-replicating population and how local properties and genome-wide effects would shape its progression and the resulting increases in the number of copies of specific loci. As no experimental method exists that can analyze DNA re-replication at the single-cell level over time along the genome, we designed a mathematical model that is able to track the firing and refiring of origins and the evolution of the resulting forks along a complete genome over time, and in this way capture the complex stochastic hybrid dynamics of DNA re-replication. Since existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007), we believe that our in silico model, which is the first modeling framework of DNA re-replication, is an important contribution in the field.

      In the revised version of our manuscript, we have modified the introduction to explain these points in more detail.

      2) One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

      Modeling choices and assumptions are presented in detail in the Supplementary material of the manuscript, and were made to accurately capture the dynamics of origin firing, which is known to be stochastic, as established by many studies in fission yeast (Bechhoefer and Rhind, 2012; Patel et al., 2006; Rhind et al., 2010) and the continuous movement of forks along the DNA. Specifically, the choice of the exponential distribution used for assigning a firing time to each origin has already been discussed and validated in our previous work on normal DNA replication (Lygeros et al., 2008). Indeed, as shown in Figure 2 of (Lygeros et al., 2008), our model was able to accurately reconstruct experimental data derived by single molecule DNA combing experiments (Patel et al., 2006).

      The use of the exponential distribution for transition firing times is standard in stochastic processes in general, including what are known as Piecewise Deterministic Markov Processes (PDMP), the class where the models considered in the paper belong. There are good mathematical reasons for this, for example the "memoryless" property that makes the resulting stochastic process Markov, a basic requirement for the model to be well-posed [M. H. A. Davis, "Markov models and optimization", Monographs on Statistics and Applied Probability, vol. 49, Chapman & Hall, London, 1993]. Practically, assuming an exponential distribution can be quite general, because the rate (the probability with which a transition "fires" per unit time) is allowed to depend on the state of the system, both the discrete state (in our case, the state of individual origins) and the continuous state (in our case, the progress of individual replication forks). It can be shown that one can exploit this dependence to write seemingly more general processes (that at first sight do not have exponential firing times) as PDMP (with exponential firing times) by appropriately defining a state for the system [M. H. A. Davis, "Piecewise-Deterministic Markov Processes: A General Class of Non-Diffusion Stochastic Models", Journal of the Royal Statistical Society. Series B (Methodological), Vol. 46, No. 3 (1984), pp. 353-388]. In the manuscript this feature is exploited in what we call the LF model, where the rate of the exponential firing time of each origin (probability of firing per unit time) depends on the state of the system (specifically, the number of PreR origins), as discussed in the section on Sensitivity Analysis. We have further clarified these in the revised manuscript.

      3) The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

      Under the experimental conditions employed (ectopic expression of a mutant version of the licensing factor Cdc18, stably integrated in the genome under a regulatable promoter), the vast majority of cells undergo rereplication but to relatively low levels, resulting in cells with a DNA content of 2C-8C. Though the DNA content of several cells indeed appears similar to the DNA content of normal G2 phase cells, the vast majority (>90%) of cells undergo rereplication, as manifested by the appearance of DNA damage and, eventually, loss of viability. We have chosen this experimental set-up (medium levels of rereplication) as it allows induction of rereplication in practically all cells in the population, without the abnormal nuclear and cellular morphology which accompanies a pronounced increase in DNA content (ie 16C), and would make single-cell imaging more prone to artifacts. Fission yeast cells can be induced to undergo rereplication to various extents, by regulated expression of different versions of Cdc18 to different levels and/or co-expression of Cdt1. We have now explained this more extensively in the revised manuscript and thank the reviewer for identifying a point which may not have been clear in the first version of the manuscript.

      Concerning the possibility of studying two loci at the same time, we have indeed tried to tag a second region with TetR/TetO, however the signal-to-noise ratio and thus reproducible detection of the TetR focus was suboptimal under rereplication conditions. We therefore did not proceed further with this approach.

      4) What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

      The definition of signal ratios is given in Results: DNA re-replication at the population level: “Specifically, we computed in silico mean amplification profiles across the genome, referred to as signal ratios in (Kiang et al., 2010), by averaging the number of copies for each origin location and normalizing it to the genome mean in 100 simulations. In these profiles, peaks above 1 correspond to highly re-replicated regions, and valleys below 1 correspond to regions that are under-replicated with respect to the mean.”

      Indeed, as observed by the reviewer, simulated peaks appear overall sharper and higher than experimental peaks. This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 probes and 2 independent experiments. We have clarified this in the Results.

      Last, we chose to compare in silico and experimental profiles at a similar ploidy. Plotting in silico profiles of an earlier timepoint would indeed lead to visually more similar patterns in terms of peak intensity, but we believe this could be misleading for the readers.

      5) From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

      We have now clarified this point. Experimental observations show that under high levels of rereplication, DNA content reaches 16C four to six hours following accumulation of Cdc18 (Nishitani et al., 2000). Estimates for 0.5 kb/min and the LF model are therefore closer to experimental observations.

      6) I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

      In the manuscript, we initially present what we observe in the Results section and then proceed to provide possible explanations in Discussion. We quote from the Discussion: “Such in trans negative regulation of distant origins could be explained by competition for the same limiting factor: high-level amplification of a given locus recruits high levels of the limiting factor, indirectly inhibiting firing of other genomic regions.” and “[…] in cis elements contribute to amplified copy numbers not only directly by passive re-replication, but also implicitly through increasing the firing activity of their neighbors”. To our understanding, these sentences are in complete agreement with the reviewer’s suggestions. Nonetheless, and to make this even more clear, we have modified the Discussion in our revised manuscript.

      7) Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

      We have clarified this point in the revised manuscript.

      8) One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

      As the reviewer points out, one of the important input parameters of the model are origin efficiencies. Since the model is stochastic however, origin efficiencies do not directly determine the amplification levels at a single-cell level. For example, in Figures 3A and Supplementary Figure S4, we show the outcome of 4 random simulations with identical underlying parameters, where it is clear that re-replication can lead to markedly different single-cell amplification levels. Indeed, genome-wide analysis across 100 simulations (Supplementary Figure S5) indicated that on the onset of re-replication, amplification levels are highly unpredictable (again, despite the fact that the input parameters are identical).

      On the contrary, when analyzing amplification profiles at a population level (averaging across sets of 100 simulations), the most highly amplified regions appear to be highly reproducible. We agree with the reviewer that these population level profiles are strongly affected by the origin efficiencies, but they are not determined solely by them. For example, low efficiency origins can be highly amplified, or highly efficient origins can be suppressed (see discussion on in cis and in trans effects) depending on their neighborhood and system-wide effects, and the extend of these effects depends on the fork speed. Sensitivity analysis with respect to different model assumptions, or model parameters (see Results, section Sensitivity Analysis and Supplementary Figure S3) indicated that amplification profiles might appear sharper or flatter, but overall amplification hotspots were highly robust.

      To summarize, in our conclusions (Discussion, section Emerging properties of re-replication) we highlight these properties (stochasticity vs. robustness) and elaborate further on how they emerge during the course of re-replication (onset vs. high re-replication) or depending on the level of analysis (single-cell vs. population level).

      9) It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

      We thank the reviewer for the useful suggestion, which we have incorporated in the revised manuscript.

      10) The methods section should provide more detail.

      We would like to point out that Supplementary Material, including a full mathematical description of the model is available on BioRxiv, which was also available at the time of the preprint review, (https://www.biorxiv.org/content/10.1101/2020.03.30.016576v1.supplementary-material ), and has also been uploaded as a separate document in our GitHub page: https://github.com/rapsoman/DNA_Rereplication

      Reviewer #2:

      Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

      1) It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

      Fission yeast cells undergo robust rereplication, and reach a ploidy up to 32C - see for example (Kiang et al., 2010; Mickle et al., 2007; Nishitani et al., 2000). 16C is therefore a usual ploidy for rereplicating fission yeast cells, observed under many experimental conditions. In addition, by manipulating the licensing factors over-expressed, different levels of ploidy can be experimentally achieved, ranging from 2C (the normal ploidy of a G2 cell, but with uneven replication) to 32C. In Figure 4, we have employed a truncated form of Cdc18 (d55P6-cdc18 (Baum et al., 1998)), which induces medium-level re-replication, as confirmed by FACS analysis in Supplementary Figure S6A. Under these conditions, the vast majority of the cells (>90%) undergo re-replication, albeit at medium to low levels. We have opted to use this strain to avoid artifacts due to disrupted nuclear morphology under high levels of re-replication We have now clarified this point in the revised manuscript. We would like to point out that in silico analysis is not carried out at 16C only but across different ploidies – it is actually a strength of our approach that we can follow the rereplication process as it evolves, at any ploidy, and we have shown that our conclusions are robust throughout. We show plots at the beginning of the process (2C) and towards the end (16C), at the single-cell and at the population level, to facilitate comparison.

      Last, as also discussed in our response to reviewer 1, simulated data appear sharper, with higher peak values than experimental data (Figure 2). This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 neighboring microarray probes and 2 independent experiments. We have clarified this in the revised manuscript.

      2) This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

      We agree with the reviewer that comparing our predictions with known gene duplication events in S.pombe would be of interest. Unfortunately to our knowledge no such dataset for fission yeast exists in the literature. The most comprehensive datasets are the ones from (Kiang et al., 2010; Mickle et al., 2007), which analyse rereplicating cells, and which we have already exploited in our paper. We would like to point out that this manuscript aims to show how rereplication evolves genome-wide. Whether the additional copies generated can lead to gene duplication events is beyond the scope of the present manuscript.

      3) The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

      One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

      This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

      It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

      We thank the reviewer for the positive comments. Indeed, as we elaborate in our Discussion, we believe that the mechanism behind the observed in trans effects is the competition for a factor that exists in a rate-limiting quantity (see also reply to point 6, reviewer 1 above), which is essentially the constant in his/her equation. Though less pronounced, such in-trans effects are also possible in the UF model, and could be due to the total DNA increase being dominated by certain origins, as suggested by the reviewer. We do not suggest anywhere in the manuscript that this inhibition is direct, but rather clearly state that it is an indirect effect.

      Reviewer #3:

      This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

      The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

      We would like to point out that the overlap between experimental and simulated data is highly significant. Firstly, the Spearman correlation coefficient between simulated and experimental genome-wide profiles is highly statistically significant (p values ranging from 7.310-12 to 3.610-41 for the three fission yeast chromosomes). Furthermore, 100.000 repetitions of random peak assignment resulted in only one case where 10 out of 22 peaks overlapped (median 2 out of 22 peaks overlapping), while comparing simulated and experimental data resulted in 14 out of 22 peaks overlapping. Simulations appear more sharp than experimental data, this is however expected as simulated data correspond to the actual number of copies generated, while experimental data are subject to background noise, have a signal-to-noise ratio that is limited by the experimental method employed and represent averages of 3 probes and 2 independent experiments (see Kiang et al., 2010 and also above). We have modified the manuscript to clarify this point. The reviewer suggests that the model is of limited use, because one could trivially generate new experimental data. We would like to point out that existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007). To date no experimental method can generate single-cell, whole-genome, time-course measurements in re-replicating cells. Our model aims to fill this gap, and for this reason we believe in its usefulness.

      Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

      We would like to point out that it is the nature of replication in fission yeast which is stochastic, as experimentally shown (Patel et al., 2006), and defined at the level of single origins, and this is captured by the simulations. Heterogeneity amongst single rereplicating cells has not been previously shown or suggested in any organism, at least to the best of our knowledge. It is in our opinion a highly interesting observation, as it provides a powerful mechanism for generating a plethora of different genotypes within a population, from which phenotypic traits could be selected.

      Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

      Again, the reviewer seems unaware that no experimental method currently exists for analysing the dynamics of re-replication at a single-cell level genome-wide. We also feel obliged to point out that modeling and in silico analysis is in our opinion of great value for analysing complex biological processes, even when experimental methods are available. Though we are sure this is not what the reviewer really meant, his/her comment appears derogative to a complete field.

      Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

      As clearly stated in our manuscript (Results, section Modeling DNA re-replication across a complete genome), many studies have estimated fork speed in yeasts in normal DNA replication, with plausible values ranging from 0.5 kb/min to 3 kb/min (Duzdevich et al., 2015; Heichinger et al., 2006; Raghuraman et al., 2001; Sekedat et al., 2010; Yabuki et al., 2002). In our model, we set the base-case value as the lowest estimate (0.5 kb/min), but also explored the model’s sensitivity to this parameter by simulating the model for higher values (1 and 3 kb/min). This analysis indicated that estimates for 0.5 kb/min were closer to biological reality, a non-surprising finding given that fork speed is expected to be slower in re-replication that in normal replication.

      Overall, the comments of reviewer 3 appear in our eyes more derogative than constructive and provide little specific criticism.

      References

      Baum, B., Nishitani, H., Yanow, S., and Nurse, P. (1998). Cdc18 transcription and proteolysis couple S phase to passage through mitosis. The EMBO Journal 17, 5689–5698.

      Bechhoefer, J., and Rhind, N. (2012). Replication timing and its emergence from stochastic processes. Trends in Genetics 28, 374–381.

      Duzdevich, D., Warner, M.D., Ticau, S., Ivica, N.A., Bell, S.P., and Greene, E.C. (2015). The dynamics of eukaryotic replication initiation: origin specificity, licensing, and firing at the singlemolecule level. Mol. Cell 58, 483–494.

      Heichinger, C., Penkett, C.J., Bähler, J., and Nurse, P. (2006). Genome-wide characterization of fission yeast DNA replication origins. The EMBO Journal 25, 5171–5179.

      Kiang, L., Heichinger, C., Watt, S., B\ähler, J., and Nurse, P. (2010). Specific replication origins promote DNA amplification in fission yeast. Journal of Cell Science 123, 3047–3051.

      Lygeros, J., Koutroumpas, K., Dimopoulos, S., Legouras, I., Kouretas, P., Heichinger, C., Nurse, P., and Lygerou, Z. (2008). Stochastic hybrid modeling of DNA replication across a complete genome. Proceedings of the National Academy of Sciences 105, 12295–12300.

      Menzel, J., Tatman, P., and Black, J.C. (2020). Isolation and analysis of rereplicated DNA by Rerep-Seq. Nucleic Acids Res 48, e58–e58.

      Mickle, K.L., Oliva, A., Huberman, J.A., and Leatherwood, J. (2007). Checkpoint effects and telomere amplification during DNA re-replication in fission yeast. BMC Molecular Biology 8, 119.

      Nishitani, H., Lygerou, Z., Nishimoto, T., and Nurse, P. (2000). The Cdt1 protein is required to license DNA for replication in fission yeast. Nature 404, 625–628.

      Patel, P.K., Arcangioli, B., Baker, S.P., Bensimon, A., and Rhind, N. (2006). DNA Replication Origins Fire Stochastically in Fission Yeast. Mol. Biol. Cell 17, 308–316.

      Raghuraman, M.K., Winzeler, E.A., Collingwood, D., Hunt, S., Wodicka, L., Conway, A., Lockhart, D.J., Davis, R.W., Brewer, B.J., and Fangman, W.L. (2001). Replication Dynamics of the Yeast Genome. Science 294, 115–121.

      Rhind, N., Yang, S.C.-H., and Bechhoefer, J. (2010). Reconciling stochastic origin firing with defined replication timing. Chromosome Res 18, 35–43.

      Sekedat, M.D., Fenyö, D., Rogers, R.S., Tackett, A.J., Aitchison, J.D., and Chait, B.T. (2010). GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome. Molecular Systems Biology 6, 353.

      Yabuki, N., Terashima, H., and Kitada, K. (2002). Mapping of early firing origins on a replication profile of budding yeast. Genes to Cells 7, 781–789.

    2. Reviewer #3:

      This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

      The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

      Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

      Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

      Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

    3. Reviewer #2:

      Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

      1) It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

      2) This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

      3) The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

      One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

      This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

      It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

    4. Reviewer #1:

      The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

      1) It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

      2) One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

      3) The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

      4) What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

      5) From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

      6) I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

      7) Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

      8) One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

      9) It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

      10) The methods section should provide more detail.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Tim Formosa (University of Utah School of Medicine) served as the Reviewing Editor.

      Summary:

      A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

  2. Aug 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the References

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes.

      They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      **Major points:**

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      We agree with the reviewer that we need to describe more about the Hasle et al. paper (now ref #20 in the revised manuscript) and expand our description of other applications that could be performed with the method. For this purpose, we have made the following changes:

      We have modified the relevant paragraph in the Introduction.

      p.3 the second paragraph

      Recently, an imaging based method named “visual cell sorting” was described that uses the photo-convertible fluorescent protein Dendra2 to enrich phenotypes optically, enabling pooled genetic screens and transcription profiling(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). Here, we developed an analogous approach to execute an imaging-based pooled CRISPR screen using optical enrichment by automated photo-activation of the photo-activatable fluorescent protein, PA-mCherry.

      We have also added the following paragraph in the Discussion.

      p.14 line 1

      In our study, optical enrichment was utilized for pooled CRISPR screens on phenotypes identifiable through microscopy. However, optical enrichment can be used for other purposes, as demonstrated previously(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). In a recent study by Hasle et al.(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), the process of separating cells by FACS after optical enrichment was termed “visual cell sorting”. This method was used to evaluate hundreds of nuclear localization sequence variants in a pooled format and to identify transcriptional regulatory pathways associated with paclitaxel resistance using single cell sequencing(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), demonstrating the broad applicability and power of this approach beyond CRISPR screening.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.

      We thank the reviewer for pointing this out and to avoid future confusion, we restricted the usage of photo-conversion to specifically indicate conversion of fluorescence from one color into another: e.g. when talking about the published visual cell sorting paper in which Dendra2 is used as a photo-convertible fluorescent protein. We use photo-activation in reference to the activation of PA-mCherry in our work.

      1. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets.

      For validating hits coming from the nuclear size screen, we have verified the successful transduction of corresponding sgRNA constructs by FACS analysis, but have not confirmed the knockdown. Before final journal publication, we propose to perform rt-qPCR on our 15 gene hits before and after knockdown to measure the percentage of knockdown separately.

      In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      We apologize for the confusion. These phenotypes were verified with pools of 3-4 sgRNAs and the histograms are representative of a single replicate infected with a mixed 3-4 sgRNA pool. We have modified the legend to Figure 5 (original Fig. 4) and the method section to explain this point.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.

      The description of the experiment and information about the selected sgRNAs has been added in the Method section as follows:

      p.23

      Verification of hits from nuclear size screen

      For each hit in the nuclear size screen, the two sgRNAs with the highest phenotypic score in the screen and the two sgRNAs with the highest score predicted by the CRISPRi-v2 algorithm24 were selected and pooled to generate a mixed sgRNA pool of 3-4 sgRNAs (detailed information in Supplementary file 8). Cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were transduced with pooled sgRNAs targeting each gene and puromycin selected for 2 days to prepare for imaging. Cells were then seeded into 96-well glass bottom imaging dishes. Images were collected the next day and nuclear size was measured using the Auto-PhotoConverter µManager plugin. To focus on cells with successful transduction, BFP was co-expressed on the sgRNA construct and only cells with BFP intensity above a threshold value were included in nuclear size measurements. This BFP threshold was established by comparing the average BFP intensity of cells with and without sgRNA transduction (Fig.S3a).

      We agree with this important point and have changed the figure legend of Fig. 5c (original Fig. 4c) to just describe the plot:

      c, The ratios between median level of nuclear size measured from microscopy and H2B-mGFP fluorescence or FSC signal measured from FACS after knockdown, were plotted separately. TACC3, confirmed to be a control gene, was used for comparison (Grey bar).

      The typo has been corrected.

      Reviewer #1 (Significance (Required)):

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      **General comments**

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories:

      Seeding density

      Time elapsed (in hours) between cell plating and optical enrichment

      The number of fields of view examined

      The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted

      The total time taken (in hours) to perform imaging and photoconversion

      The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      We agree with the reviewer and apologize for the confusion that arose from our description. We also thank the reviewer for suggesting using established methods. However, MAUDE, an analysis for sorting-based CRISPR screen with multiple expression bins, might not be suitable for our study since 1) the distribution of mCherry fluorescence intensity is a reflection of photo-activation efficiency and not sgRNA effect 2) only one sorting bin is collected for each experimental condition. Our analysis is adapted from an existing method from the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing).

      We agree with the reviewer regarding clarifying other points and rewrote the following part in the Method section:

      p. 20

      mIFP proof-of-principle screen, Nuclear size screen, FSC screen and H2B-mGFP screen

      For the mIFP proof-of-principle screen, mIFP positive cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP mIFP-NLS) and mIFP negative cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “mIFP sgRNA library” (CRISPRa library with 860 elements, see Supplementary file 5) and the “control sgRNA library” (CRISPRa library with 6100 elements, see Supplementary file 6) separately. For the nuclear size screen, FSC screen and H2B-mGFP screen, cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “nuclear size library” (CRISPRi library with 6190 elements, see Supplementary file 7). To guarantee that cells receive no more than one sgRNA per cell, BFP was expressed on the same sgRNA construct and cells were analyzed by FACS the day after transduction. The experiment only continued when 10-15% of the cells were BFP positive. These cells were further enriched by puromycin selection (a puromycin resistance gene was expressed from the sgRNA construct) for 3 days to prepare for imaging. For FSC and H2B-mGFP screens, cells were then subjected to FACS sorting. Cells before FACS (unsorted sample for FSC and H2B-mGFP screens) and top 10% cells based on either FSC signal (high FSC sample) or GFP fluorescence signal (high GFP sample) were separately collected and prepared for high throughput sequencing. For mIFP proof-of-principle screen and nuclear size screen, cells were then seeded into 96-well glass bottom imaging dishes (Matriplate, Brooks) and imaged starting from the morning of the next day (around 15 hr after plating). A series of densities ranging from 0.5E4 cells/well to 2.5E4 cells/well with 0.5E4 cells/well interval were selected and seeded. The imaging dish with cells around 70% confluency was selected to be screened on the imaging day. For mIFP proof-of-principle screen, a single imaging plate was performed for each replicate while 4 imaging plates per replicate were imaged for the nuclear size screen. When executing multiple imaging runs, 2 consecutive runs could be imaged on the same day (day run and night run). 64 (8x8, day run) or 81 (9x9, night run) fields of view were selected for each imaging well and each field of view was subjected to an individual round of imaging directly followed by photo-activation. Around 200-250 cells were present in each given field of view and 60% to 80% surface area of each well was covered. Either mIFP positive cells or cells passing the nuclear size filter were identified and photo-activated automatically using the Auto-PhotoConverter µManager plugin. The total time to perform imaging and photo-activation of a single 96-well imaging dish with around 1.5 million cells was around 8 hr. The night run generally took longer, since more fields of view were included than in the day run. Cells were then harvested by trypsinization and pooled into a single tube for isolation by FACS. Sorting gates were pre-defined using samples with different photo-activation times (e.g. 0s, 200ms, 2s) and detailed gating strategies are described in Supplementary file 1. Sorted samples were used to prepare sequencing samples.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      To address these, we have added the following sentences:

      p. 4 line 16

      A photo-activatable fluorescent protein was chosen over a photo-convertible fluorescent protein to increase the number of channels available for imaging. PA-mCherry was chosen to leave the better performing green channel open for labeling of other cellular features. Moreover, non-activated PA-mCherry has low background fluorescence in the mCherry channel (Fig. S1b), and it can be activated to different intensities when photo-activated for various amounts of time.

      p. **14 line 10

      Phenotypes of interest should be identifiable under the microscope and generally require fluorescent labeling. Commonly used fluorescence microscopes use four channels for fluorescent imaging with little spectral overlap: blue, green, red and far red. In our study, the red channel was occupied by cell labeling with PA-mCherry and the blue channel was used to estimate sgRNA transduction efficiency. Since sgRNA transduction efficiency can be measured by other approaches, the blue channel could be used together with the remaining two channels to label cellular structures. Combining bright field imaging with deep learning can be used to reconstruct the localization of fluorescent labels(Ounkomol, C.; Seshamani, S.; Maleckar, M. M.; Collman, F.; Johnson, G. R. 2018), making it possible to use bright field imaging to further expand the phenotypes that can be studied with our technique.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      We thank the reviewer for this remark and revised all figures accordingly. Points and fonts were enlarged, and schematics were simplified or removed.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      We agree with the reviewer and the following adverbs have been removed: “highly” in abstract and page 15; “easily” on pages 4 and 11; “completely” on page 11 and three “only” on page 12.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      We apologize that we missed this, and the sequencing data has been deposited to GEO (GSE156623) which will be made available before final publication. The following part has been added to address this.

      p. 24

      DATA AND SOFTWARE AVAILABILITY

      The raw and processed data for the high throughput sequencing results have been deposited in NCBI GEO database with the accession number (GSE156623). The plugin Auto-PhotoConverter developed for open source microscope control software μManager(Edelstein, A. D.; Tsuchida, M. A.; Amodaj, N.; Pinkard, H.; Vale, R. D.; Stuurman, N. 2014) has been deposited on github (https://github.com/nicost/mnfinder).

      **Specific comments**

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      We apologize for our poor terminology. Our original definition of “specificity” is the same as “precision” suggested by the reviewer. To avoid future confusion, we have changed all relevant occurrences of “specificity” into “precision”. The following sentence was modified to clarify the definition:

      p. 5 line 15

      To evaluate the precision (the fraction of called positives that are true positives) of this assay, all cells were collected and analyzed by FACS after image analysis and photo-activation (Fig. 2d and 2e). We calculated precision as the fraction of photo-activated cells (mCherry positive cells) that are true positives (mIFP-mCherry double positive cells) (Fig. 2f).

      Measuring recall is complicated because the microscope is unable to visit all locations in the imaging plate, hence recall will depend on the fraction of cells actually “seen” by the microscope. For the screening strategy employed in the nuclear size screen, recall is not as important as precision, since lower recall rates are compensated for by screening larger cell numbers. We therefore did not attempt to measure recall directly.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      We agree with the reviewer and have changed the following description:

      p. 5 line 20

      The precision varied with the initial percentage of mIFP positive cells and ranged from 80% to ~100% (initial percentage of mIFP positive cells ranging between 2.3% and 43.7%) (Fig. 2f). Precision is expected to fall below 80% with initial percentage of mIFP positive cells less than 2.3%. However, these results indicate that optical enrichment can be used to identify hits with high precision even at relatively low hit rates.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      We apologize for our confusing description. To avoid the confusion, we rewrote the paragraph describing the experiment and added a schematic (Fig. 3a) to better describe this experiment. We also simplified the result by just presenting the lower right panel of original Fig. 2b (current Fig. 3b) and moved the other data into supplementary figures (Fig. S2).

      p. 6 line 4

      mIFP negative cells and mIFP positive cells were separately infected with two different CRISPRa sgRNA libraries (6100 sgRNAs for mIFP negative cells; 860 sgRNAs for mIFP positive cells) at a low multiplicity of infection (MOI) to guarantee a single sgRNA per cell. Note that in these experiments, the sgRNAs only function as barcodes to be read out by sequencing, but do not cause phenotypic changes as the cells do not express corresponding CRISPR reagents. These two populations were then mixed at a ratio of 9:1 mIFP negative cells: mIFP positive cells. We again used mIFP expression as our phenotype of interest (outlined in Fig. 3a). Two biological replicates were performed and at least 200-fold coverage of each sgRNA library was guaranteed throughout the screen, including library infection, puromycin selection, imaging/photo-activation and FACS.

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      We have clarified our description. There are no clear BFP+ and BFP- populations but instead one continuous population due to the background expression of BFP from the dCas9 construct: dCas9-KRAB-BFP (which is now clearly indicated in the manuscript). On top of the dCas9-KRAB-BFP, another BFP is encoded on the sgRNA construct, which leads to a higher BFP expression level.

      There was no gating in the experiment, the grey dots in the figure represents wild type cells without viral transduction while the red dots (partially covered by the grey dots) were cells infected with the two negative control sgRNAs. We mistakenly wrote the legend of original Fig. S3 (current Fig. S3a) that these were FACS data; however, the data were acquired by imaging. We apologize for the confusion and thank the reviewer for detecting the issue. We completely rewrote the legend to Fig. S3a (original Fig. S3) to clarify.

      We now include the number of cells analyzed and the number of replicates for the other flow plots and histograms in the manuscript.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      To address the reviewer’s point, we have included the following sentence:

      p. 8 line 23

      We defined nuclear size as the 2D area in square microns measured by H2B-mGFP using an epifluorescence microscope, as determined by automated image analysis (Fig. 4a and Supplementary file 2).

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff.

      Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      We clarified the description of the statistical analysis of the screen (see new/changed text below). Mann-Whitney p-values for the two replicates were calculated independently. The Mann-Whitney U test was not performed against a randomized set of phenotypic scores, but using the phenotypic scores of the 22 control non-targeting sgRNAs that were part of the library. Because there are only 22 control sgRNAs (adding more control sgRNAs would increase the size of the library, and reduce the number of genes that can be screened within a given amount of time), the statistical significance of testing genes against these controls is not expected to be very high, and using direct approaches such as multiple hypothesis testing are not expected to yield hits. Instead, we calculated a score combining the severity (phenotypic score) and the trustworthiness (Mann-Whitney p value) of the phenotype (a method previously developed in the Weissman lab at UCSF: https://github.com/mhorlbeck/ScreenProcessing24). We thank the reviewer for suggesting using false discovery rate control as a better method for choosing a cutoff. We modified our original analysis and now determine the threshold of our score based on a calculated empirical false discovery rate (eFDR). We used this approach to maximize the number of true hits and relied on a repeat of the screen and follow-up testing of hits to narrow down true hits. We added the following part in the method section and added an analysis example to the supplementary files (Supplementary file 9)."

      p. 22

      Bioinformatic analysis of the screen

      Analysis was based on the ScreenProcessing pipeline developed in the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing)**(Horlbeck, M. A.; Gilbert, L. A.; Villalta, J. E.; Adamson, B.; Pak, R. A.; Chen, Y.; Fields, A. P.; Park, C. Y.; Corn, J. E.; Kampmann, M.; Weissman, J. S. 2016). The phenotypic score (ε) of each sgRNA was quantified as previously defined(Kampmann, M.; Bassik, M. C.; Weissman, J. S. 2013)** (Supplementary file 9). For the mIFP proof-of-principle screen, phenotypic score of each group was the average score of two sgRNAs assigned to the group and averaged between two replicates except otherwise described. For the nuclear size screen, FSC screen and H2B-mGFP screen, genes were scored based on the average phenotypic scores of the sgRNAs targeting them. For the nuclear size screen, phenotypic scores were further averaged between 4 runs for each replicate. For the nuclear size screen, FSC screen and H2B-mGFP screen, sgRNAs were first clustered by transcription start site (TSS) and scored by the Mann-Whitney U test against 22 non-targeting control sgRNAs included in the library. Since only 22 control sgRNAs were included, significance of hits was assessed by comparison with simulated negative controls that were generated by random assignment of all sgRNAs in the library and phenotypic scores of these simulated negative controls were scored in the same way as phenotypic scores for genes. A score η that includes the phenotypic score and its significance was calculated for each gene and simulated negative control. The optimal cut-off for score η was determined by calculating an empirical false discovery rate (eFDR) at multiple values of η as the number of simulated negative controls with score η higher than the cut-off (false positives) divided by the sum of genes and simulated negative controls with score η higher than the cut-off (all positives). The cut-off score η resulting in an eFDR of 0.1% was used to call hits for further analysis (Supplementary file 9). An example analysis is described in detail in Supplementary file 9 and raw counts and phenotypic scores for all four screens are listed in Supplementary file 10 and 11.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      We agree with the reviewer and have changed the sentence into:

      p. 10 line 17

      These data suggest that a direct measurement utilizing a microscope can provide different information and reveal hits that are inaccessible using other screening approaches.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      We agree with the reviewer and have modified the sentence with a citation:

      p. 12 line 20

      This is significantly faster than in situ methods which process millions of cells over a period of a few days(Feldman, D.; Singh, A.; Schmid-Burgk, J. L.; Carlson, R. J.; Mezger, A.; Garrity, A. J.; Zhang, F.; Blainey, P. C. 2019).

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      We have added more description in the discussion section:

      p. 13 the second paragraph

      Optical enrichment screening also is possible for phenotypic screens with relatively low hit rates (defined as the fraction of all genes screened that are true hits). The ability to detect hits at low hit rates in our method depends on multiple factors, including: 1) the penetrance of the phenotype; 2) cellular fitness effect of the phenotype; 3) detection and photo-activation accuracy of the phenotype; 4) limitations imposed by FACS recovery and sequencing sample preparations of low cell numbers. The first three factors vary with the phenotype of interest. We optimized the genomic DNA preparation protocol (Methods), and are now able to process sequencing samples from a few thousand cells, enabling screens of low hit rate phenotypes. In our nuclear size screen, more than 1.5 millions cells were analyzed during each run with 2000-4000 cells recovered after FACS sorting. The hit rate of this screen was 2.76%, similar to optical CRISPR screens performed in an arrayed format(de Groot, R.; Luthi, J.; Lindsay, H.; Holtackers, R.; Pelkmans, L. 2018)**, demonstrating the possibility to apply our approach to investigate phenotypes with low hit rates.

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      The relevant description has been moved to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      To address this point, we have added the following sentences in legend of Fig. 5:

      The cell population is heterogeneous due to inefficient knockdown, incomplete puromycin selection, and penetrance of the phenotype. A BFP was expressed from the same sgRNA construct. Only cells with high BFP intensity, indicating successfully sgRNA transduction, were included for data analysis as described in Methods.

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      We applied the Kolmogorov-Smirnov test and the corresponding sentence was changed into:

      p. 10 last line

      *14 out of 15 hits were confirmed to be real hits (Kolmogorov-Smirnov test two tailed p-value

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      The figures were simplified to just show the example images.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      We agree with the reviewer and changed the color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      We agree with the reviewer’s point; “most” has been changed to “some”.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      The bars represent the ratio of the median level of H2B-mGFP intensity (the axis is now labeled with "GFP" rather than "FITC", the colloquial name for the channel used on the FACS machine) measured by FACS and the median nuclear size of the same population of cells measured by microscopy. We plan to perform additional experiments to measure DNA content using a DNA dye in the same cell by microscopy so that we will be able to correlate these on a cell by cell basis. Data will be added before final publication.

      Reviewer #2 (Significance (Required)):

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      We do not in any way claim that the presented method replaces arrayed screens. However, most current sgRNA libraries are pooled libraries, and the few available arrayed sgRNA libraries are expensive and difficult to maintain, hence our methods to screen pooled sgRNA libraries are timely and useful. Comparisons with arrayed screens are unwarranted as no claims are made with respect to arrayed screens.

      We have clarified the manuscript in many places, and hope it is now readable and better understandable by more readers with diverse backgrounds.

      **Specific points:**

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      We used the term specificity inadvertently and should have used precision, as also pointed out by Referee 2. This has been corrected in the current manuscript. We picked the mIFP phenotype as this was a proof of principle screen to clarify the performance of our screening approach and needed a phenotype that can be measured both by microscopy and FACS. We demonstrate that multi-parametric read-outs are possible, but do not think that the first demonstration of new technology needs such an application.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      We improved the description of this experiment. To clarify, we used mIFP in a proof of concept screen to validate whether sgRNAs infecting mIFP positive cells can be distinguished from those infecting mIFP negative cells No phenotypic signature other than the mIFP signal is used (as described in the text). As customary in pooled screens, a primary comparison was made between the positive (optically selected) cells and the complete population. To improve the clarity of this screen, we further described the concept of pooled sgRNA screens, which may have made this section harder to follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      There are no phenotypes induced in the mIFP population (as now explicitly explained in the text). The mIFP population is isolated using optical enrichment, and we test our ability to discriminate the sgRNAs present in the enriched population. It is unsurprising that comparing to the negatively selected population (which is not possible in most other pooled screens) is significantly better than comparing against the total population (as customary in pooled screens).

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      There are no arrayed screens performed in our study.

      Reviewer #3 (Significance (Required)):

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      Specific points:

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      Significance

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      General comments

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories: • Seeding density • Time elapsed (in hours) between cell plating and optical enrichment • The number of fields of view examined • The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted • The total time taken (in hours) to perform imaging and photoconversion • The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      Specific comments

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff. Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      Significance

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes. They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      Major points:

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.
      2. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets. In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.
      2. The legend of Figure 4c is not describing what the plot is showing. Instead it tells the readers the authors' interpretation of the data.
      3. Figure S1b there is a typo

      Significance

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

    1. Reviewer #3:

      This paper by Thaker et al describes the use of lung-on-a-chip microfluidic devices for early interactions during acute M. tuberculosis infection under conditions chosen to mimic the alveolar environment in vivo. The authors use time-lapse microscopy to study host-Mtb interactions in macrophages and alveolar epithelial cells, the role of the Mtb Type VII secretion system and the impact of surfactant on Mtb infection. This study suggests that organ-on-a chip systems might be able to reproduce host-microbe physiology during infection, which is difficult to reproduce ex vivo using single cells, air-liquid interface, organoids or organ explants. This is an exciting approach which has the potential to expand the ability to study host-pathogen interactions, but there are some limitations that dampen my enthusiasm.

      Major concerns:

      While I recognize that it is challenging to use live cell imaging with colocalization markers, much of the data of the paper, such as comparisons between AECs and macrophages, or mutant Mtb strain vs WT, or role of surfactant, rests on the ability to determine the precise localization of bacteria. However, neither AECs nor macrophages are specifically identified with high enough resolution to give confidence that the Mtb are associated with those cells specifically, and more importantly, that the bacteria are growing intracellularly rather than extracellularly. The authors show multiple bacterial microcolonies that grow in size over time, but whether these are inside or outside cells, and whether the cells are AECs or macrophages isn't overtly specified. Many of the images are of such low resolution that only tiny dots of bacteria are observed. To the author's credit, the quantitative and statistical analysis is very rigorous, however, better evidence for the issues raised above would increase confidence in the results. This point is highlighted in detail by by the following:

      Lines 60-63: "Inoculation of the LoC with between 200 and 800 Mtb bacilli led to infection of both macrophages (white boxes in Fig. 1M, P, zooms in Fig. 1O, R) and AECs (yellow boxes in Fig. 1M, P, zooms in Fig. 1N, Q) under both NS (Fig.1M-O) and DS (Fig. 1P-R) conditions." Identification of GFP-expressing macrophages can be assumed based on their expression of GFP (though the cells themselves aren't colocalized) on images but the same cannot be said of AECs. The yellow boxes could represent AECs or spaces on the chip with no cells at all. Furthermore, the 2D images showed in Figure 1 do not necessarily represent infected cells, and the possibility of visualization of Mtb outside the cells should be considered. Thus, higher resolution images, with clear colocalization and z-stacks, would increase the confidence in the results.

      The data arguing for attenuation of Esx-1 mutant Mtb in AECs and macrophages is not strong, and the authors do not actually make a direct statistical comparison between appropriate groups (i.e. AEC NS WT vs Esx-1, or Mac NS WT vs Esx-1). For example, it appears that the mean/median growth rate of WT Mtb in macs is ~0.25hr-1, which appears roughly the same for Esx-1 mutant Mtb in the same cells. There may be a difference under DS conditions, but since the comparisons aren't made directly it is impossible to know.

    2. Reviewer #2:

      The manuscript by Thacker et al, entitled "A lung-on-chip model reveals an essential role for alveolar epithelial cells in controlling bacterial growth during early tuberculosis" is an interesting study describing a new in vitro model to determine the early events of Mycobacterium tuberculosis infection. This model is important and novel; however, this study is descriptive and some of the findings (e.g., attenuated growth of M. tuberculosis after exposure to surfactant in macrophages and alveolar epithelial cells, as well as changes on the M. tuberculosis cell wall after exposure to surfactant, or that exposure to surfactant does not alter the extracellular viability of M. tuberculosis) have been reported by others using other in vitro models. The use of the ESX-1 attenuated mutant is not clear in this study, as well as the concept that exposure to surfactant may change the attenuation of this strain. The composition of mouse surfactant and human surfactant is also quite different, thus extrapolating results need to be done with caution.

      Major concerns:

      1) Results provided in Figures 1, 2 and Fig. 3 supplement 1 are confusing, and readers need to guess what they are looking at, especially in Figure 1 M-R. As this is an important model , it will be appropriate to have detailed and better images showing well-defined cells, and quantify their findings in Tables (e.g. number of alveolar epithelial cells type I and II, number of macrophages, numbers of endothelial cells, bacteria per cell, etc.). In Fig. 3 supplement 1 one needs to guess what is intracellular or extracellular within the studied system.

      2) The definition of Normal surfactant (NS) vs. Deficient surfactant (DS) is confusing as used. Alveolar epithelial cells type II (AT-IIs) become type I (AT-I) over time in in vitro cultures (in 5 to 7 days) and thus, these stop secreting surfactant. Authors found that after 6-11 passages AT-IIs stopped producing surfactant but also lost their cellular characteristics as well as the expected characteristics of AT-Is. This needs to be further studied in detail to ensure that this cell is not an artifact produced by multi-passaging in vitro. Authors need to use several AT-IIs and AT-Is markers to be certain that the DS cell monolayers indeed still are ATs. Surfactant protein C, although used as a marker for AT-IIs, is a soluble protein that has been shown to interact with many cells within a cellular system. A correlation between SPTPC and AQP5 expression over time is also necessary as points out the differentiation of AT-IIs to AT-Is, a key feature of the role of AT-IIs as progenitors of AT-Is.

      3) Authors did not consider that M. tuberculosis can form micro-colonies on the cell surface of alveolar epithelial cells and thus, the intracellular growth that they are reporting could be extracellular growth. Did the authors after infection treat the system with an antibiotic to kill extracellular M. tuberculosis bacilli attached to the alveolar epithelial cell surface? In addition, the concept of M. tuberculosis micro-colonies growing inside cells need to be better explained. Are these bacterial clumps? How the authors discern that the ones that are not growing vs. the ones that are dead?

      4) If I understand the described method well, the staining of Curosurf (poractant alfa) is not as such. Authors used a commercial labeled phosphatidylcholine (PC) added into the Curosurf. This labeled PC may or may not interact with Curosurf components, but what is obvious is that it makes micelles. What it is quantified is the interaction of the labeled PC with M. tuberculosis. Moreover, the artificial addition of this phospholipid (at 10%) is changing the original composition of Curosurf, and this may have physiological implications. Authors need to confirm if the PC added was indeed DPPC. Authors also need to come up with a better way to demonstrate that Curosurf components are opsonizing M. tuberculosis bacilli. In addition, why authors used 1% Curosurf for their experiments. Is there a dose titration effect? Why authors did not use Survanta or Infasurf or mouse surfactant?

      5) The in vivo simulation of infection using grow rates randomly chosen from the kernel density estimations for the respective populations. In this graph, it is very important to discern the bacteria with high growth rates from the bacteria with low growth and intermediate growth rates (at the 99 percentile, 75 percentile, at the 50 percentile, at the 25 percentile and at the 1 percentile) and assess how these are projected to behave in vivo. As presented it is not very informative about the impact of NS ATs vs. DS ATs on M. tuberculosis infectivity in this model system.

      6) Similar alterations on the M. tuberculosis cell wall and release of cell wall components to the milieu when exposed to physiological concentrations of human lung surfactant have been already described. The same is applicable to the slower replication rate in ATs (an intracellular killing in macrophages) after M. tuberculosis exposure to human lung surfactant. Although two different systems, authors need to contrast their findings with these reported ones in their discussion. In addition, it is not clear how many times this was performed. Statistics are mentioned on the figure legends, but there are no stats in the figure.

    3. Reviewer #1:

      1) What quality control is done for each experiment to determine the ratio of type I and type II AECs in each chip set up for each experiment? This is of particular importance because the authors do not show any images where they stain for both type I and type II AECs in the same chip. Do the authors have images stained for both type of cells to illustrate the composition of each chip? After figure 1, what staining is done to confirm the DS cells decrease proSPC expression for each experiment?

      2) The authors focus on the difference in surfactant gene expression in the newly isolated AECs (NS) versus in vitro passaged AECs (DS), but they also observe that aqp5 is downregulated. In fact, the data supports that the cells are just de-differentiating during passage in culture, which will have multiple effects on the cells, not just surfactant production. This should be commented on and discussed. After loss of those markers, how do the authors confirm they still have type I and type II AECs in their cultures? Is there microscopy data with other markers that are retained in the AECs? The add back experiments with Curosurf support that surfactant can contribute to bacterial control, but this imparts only a partial complementation and the evidence for de-differentiation implies other pathways at play.

      3) One of the biggest concerns is that the authors never stain for type I or type II AECs after infection and make the conclusion that the bacteria are within type II cells based on the absence of macrophage staining. However, the bacteria may not even be in a cell, or the AECs could be dying during infection. On a related note, there is no data presented that shows that type I cells are not infected in the lung on chip system with Mtb.

      4) The authors state that their data with the Esx1 mutant "demonstrates that ESX-1 secretion is necessary for rapid intracellular growth in the absence of surfactant, consistent with the hypothesis that surfactant may attenuate Mtb growth by depleting ESX-1 components on the bacterial cell surface". This seems like quite a jump in interpretation of the data since the Esx1 mutant is likely attenuated for many reasons, and this attenuation is dominant to any effect that surfactant is having. The authors also show that PDIM levels are not different in the presence or absence of surfactant, and this is an Esx1 dependent lipid.

      5) What is the purpose for including the icl1/icl2 mutant? This experiment is not included in the data quantification.

    1. Reviewer #3:

      The study by Taverna et al. uses NGN2-induction in human, chimpanzee, and bonobo pluripotent stem cells to attempt to decouple the process of neuronal maturation from the cell cycle in order to study species-specific differences in neuronal maturation. Using single cell RNA sequencing, analysis of neuronal morphology, and electrophysiological recordings, the study argues that neuronal maturation is delayed in human compared to chimpanzee and bonobo among a heterogeneous class of sensory neurons and that this delay is cell-intrinsic. However, the current data are incompletely analyzed and do not provide strong support for this conclusion.

      Major comments:

      The dramatic differences in cell type composition of the induced neurons across species, revealed by single cell sequencing in Figure 2A, pose significant problems for the interpretation of the rest of the results. Specifically, if the chimpanzee cells are biased to making different sensory neuron cell types than the human cells, then differences in maturation rates between cell types rather than between species could drive the results. The authors must take into account the influence of cell type, individual, and species in order to support their claims of species differences.

      First, the number of individuals (only one chimpanzee individual) used for single-cell analysis is inadequate. There could be individual differences in timing and neuronal composition between lines that are independent of species and are not accounted for. At least 3-5 individuals per species should be used to enable statistical analysis of species differences. Ideally, the same lines should be used for single-cell analysis and morphological/physiological analyses. Staining for the cluster markers discovered from the current single cell analysis could also be applied to the remaining individuals to understand whether induced neurons have a similar composition across all the individuals from the three species.

      If the single chimpanzee individual shown in the single cell data is really representative of the three chimpanzee lines used elsewhere in the manuscript, the dramatic differences in neuronal types across species must be taken into account in subsequent analyses. For example, gene expression in Figure 3 could be analyzed on a cluster by cluster basis rather than grouping all neuronal clusters together. As shown, the differences across species could just be due to cell-type specific differences (for example, cluster 4 appears to be made up of entirely chimpanzee neurons while cluster 5 has more equal species representation). For physiology and morphology experiments, post hoc marker staining could ensure that neurons of the same type are compared across species, or if not registered to individual cells, it could still reveal the similarities and differences in composition between plates.

      Does NGN2 induction make a valid cell type? The authors should compare their expression data to previous work utilizing NGN2 induction (Zhang et al 2013) as well as to data from mouse and human tissue samples. It would be helpful to clarify whether the differences with previous work (i.e. induction of sensory neurons compared to cortical neurons) are due to incomplete characterization previously or to a different outcome here. And most importantly, it would be helpful to more clearly identify the endogenous cell types modeled in this data, perhaps by integration with primary sensory and cortical neurons single cell datasets.

      Do the BRN2 and CUX1-positive cells show co-expression with other cortical markers, like FOXG1 and EMX2, to support the statement that some of these cells may be cortical, or are these genes also expressed in some sensory neurons, or are these simply cells of mixed identify that lack in vivo counterparts?

      Please provide more detail about the NGN2 expression system as utilized across species.

      For each species, was the corresponding NGN2 gene used? If so, are there sequence differences between species that could influence differentiation?

      Is the time course of NGN2 expression the same across species?

      What are the dynamics of NGN2 induction in this system compared to normal differentiation - does persistent NGN2 expression after differentiation ultimately keep neurons in a more immature state?

      Does the NGN2 system entirely de-couple differentiation from cell cycle as the authors claim or do a few cell cycles still occur post-induction, and does this number differ between species? The focus in the introduction on cognition and the role of cortical differences between humans and non-human primates is puzzling in light of the claim that most of the neurons generated in this study are sensory neurons. If the authors' conclusions are valid, then it seems that this finding should be framed differently. Are there known species differences in sensory neurons? Do these results suggest that delayed maturation is a more general phenomenon and not restricted to brain regions involved in cognition?

      The following sentence in the discussion attempts to address this point: "Of note, sensory neurons are interesting from an evolutionary point of view, as the development and evolution of working memory in humans is linked to a higher integration of sensory functions in the human prefrontal cortex." However, this statement and the references cited instead support the view that species differences might be found in the prefrontal cortex rather than in sensory neurons.

    2. Reviewer #2:

      This is a well written MS looking at comparing the rate/tempo of maturation of Chimpanzee, Bonobo and human neurons. The work is well done and easy to follow. The core findings are that human neurons, developed in vitro via a well-established directed differentiation protocol mature slower than the NHP neurons.

      Several groups have previously used both in vivo and in vitro models (similar to the one used here) to define cross-species maturation features. These earlier studies have shown that indeed human cells develop more slowly than other species (like mice or Chimpanzees). The authors recognize this work in their introduction. While the finding of slower human neuron maturation is not completely novel, the current work furthers these earlier studies by adding additional characterization of electrophysiological and molecular properties of the neurons made. It also highlights an underappreciated presence of sensory neurons in these cultures.

      Things to consider:

      1) Definitive characterization of the neurons produced by Ngn2 overexpression. Prior work defined the neurons mostly as pyramidal, of cortical origin. Here, the authors claim both mix identity (very probable) and the presence of large numbers of sensory neurons. One is left wondering whether this is a slightly different differentiation protocol, whether the interpretation of the data is different, or whether variability is high. If the authors classify the single cell RNA data from prior studies with this same protocol, would they still conclude that these are sensory neurons? If the authors could prove that the protocol produces bona fide sensory neurons, that would be an advance for the field. That may require direct comparison to endogenous sensory neurons (beyond a small number of markers) and classification based on electrophysiological properties (which the authors do have). Are these sensory neurons based on physiology?

      2) Could one use the system to point at mechanisms that may mediate the observed differences in maturation rates? This would move the field forward in a powerful way.

    3. Reviewer #1:

      The results are somewhat underdeveloped and there are several aspects of the study that can be improved by deeper analyses:

      1) The rigor of the experiments and statistical analysis is not clear. Although the use of several lines of iPSCs from each species is a strength, there are no details of how many batches of differentiation/induction were done or how many replicates were used for analysis. This is especially important for structural and functional analysis that can vary between lines and batches.

      2) The identity of the induced neurons as sensory neurons is interesting but is based solely on gene expression (scRNAseq). It would be more compelling if the authors would show other characteristics that identified this population of neurons. It is possible that some neurons express these sensory neuron genes, but do not express the proteins and/or do not differentiate into functional sensory neurons.

      3) The proportions of cells in each cluster of the scRNAseq would be informative to 1) identify changes as the neurons mature and compare between species, and 2) identify differences between species, as the authors state (page 9) that same populations were found in different proportions.

      4) Given the valuable time course scRNA seq data, the analysis of neuron maturation over time is somewhat limited. More sophisticated analysis of gene expression changes/coexpression would strengthen the overall impact of the data.

      5) Similarly, the discussion is superficial and focused on consequences but not causes of differences in neuron maturation time. The discussion does not build on the rich and extensive transcriptomic data to provide any mechanistic hypotheses of the causes of the differences.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The manuscript by Schörnig presents an elegant comparison of structural and functional maturation of cortical neurons from different primate species that is of broad interest to researchers interested in evolutionary neuroscience and those who are interested in the unique qualities of the human cortex. The authors use an induced neuron approach to generate cortical-like neurons from iPSCs from different species and compare the structure, function and gene expression of the different neurons over time in culture. This strategy bypasses development and provides much more heterogeneous cultures for analysis. While the results are largely descriptive, they provide very interesting resource data providing insight into both primate neural development and human-specific attributes.

    1. Reviewer #3:

      This very interesting manuscript further describes the receptive field structure of ON-OFF retinal direction selective ganglion cells. The authors demonstrate that spot light stimuli flashed at positions that do not correspond with dendritic processes of the recorded DSGC evoke strong excitatory responses that are most powerful on the preferred side of the (moving bar determined) receptive field. The authors go on to show that small light stimuli flashed in the dendritically sampled area of visual space are also non-uniform, and maximal on the preferred side. The authors data are in line with previous reports of a nondirectional zone at the periphery of the dendritic tree of DSGCs. The experimental approaches taken by the authors seem sound. I was concerned by the obviously different kinetics of the flash response recorded under control and GABAA/nAChR antagonists in Figure 1 D, is this a consistent finding, what are the authors thoughts on the unusual shape of the current in Figure 1 D (lower, red trace)? As indicated in the discussion the authors have not investigated the mechanisms underlying this asymmetry, other than dismissing structural determinants (dendritic tree asymmetry, investigation of existing EM volume). This to my mind is a vital component missing from the manuscript. The authors however do go on to describe using elegant light stimulus patterns and modelling some of the potential emergent properties of this behaviour. In this reviewer's mind, I am left puzzled and wanting to understand the cellular basis of the behaviour the authors have identified.

    2. Reviewer #2:

      In this research, Ding and colleagues present evidence that the excitatory input to OO DS RGCs from bipolar cells is strongly asymmetric, with strong inputs occurring on the side opposite from the SAC inhibition. They performed careful studies to show that this was not due to spatial asymmetry in the DSGC morphology nor to ribbon synapse density. Using 'interrupted motion' stimuli, which are effectively local directional stimuli, they show that this asymmetry leads to a non-directional response on one side of the cell's RF. Last, they create a model to show that such firing patterns could be used to improve localization of edge position under the specific conditions of an edge emerging from behind an occlusion.

      The work showing the asymmetry appeared careful, thorough, and well-done. The second half of the paper dealing with the functional consequences of this asymmetry left me with a few questions:

      1) Throughout the paper, several experiments showed no changes when a mix of receptor antagonists was added to exclude SAC inhibition as the origin of these effects. But I did not find a positive control, showing that these antagonists had the desired effect. Later, in Figures 5CD, the remaining effect after application of these antagonists was cited as evidence that the excitational asymmetry was responsible for the effect; that interpretation is only valid if the drugs truly kill all SAC input to the DSGC. What if the drugs were not 100% effective? Relatedly, in the experiments in 5CD, the measured responses all decrease with the antagonists, an effect that seems surprising and is not explained. Connecting the asymmetry in excitation to the interrupted motion is central to this paper, so it should have strong support.

      2) The measured functional results appear quite similar to results in Kuhn & Gollisch 2019, which is not cited in that context. That paper found that DSGCs responded to local contrast, not just motion, much like the results here, and suggested that oppositely tuned cells could be subtracted to eliminate this contaminating contrast signal or added to isolate the contrast signal. Here, the authors suggest a very similar use for these signals, albeit with a decoder of position and a focus on motion rather than contrast changes. (See line 528, where the authors suggest that this position-direction hypothesis is new. See also line 537: or could not be salient, if there's any kind of downstream opponent subtraction, as in primate MT.)

      3) The interrupted motion stimuli are more complex than standard motion stimuli, but it's not clear how ethological or naturalistic they really are. In particular, the occluder was the same contrast as the rest of the background, which seems like a very specific kind of occluded motion, and it's not clear how this would generalize when the occlude is the same or opposite contrast of the moving edge. Moreover, the existence of directed motion in these stimuli lead the authors to emphasize the motion on the 'preferred side', rather than just non-directional contrast changes, which seem as though they would also induce responses.

      4) The modeling/decoding aspect of this paper seems pretty speculative. It doesn't seem as though these cells are known to be involved in any kind of position encoding. The fact that they transmit information about contrast changes means they can enhance position-decoding, but many other RGCs could also (better?) serve this purpose. The optic-flow-field arrangement of these cells in the retina suggests just the opposite - that they appear likely to be used for optic flow detection, in which positional information is less relevant than the field structure.

      5) Last, I kept wondering how this offset excitatory input made the DSGCs look very similar to a classical Barlow-Levick model (though with DS inhibition). I believe a classical BL model would have many of the properties shown here, including the sensitivity to occluded ND motion on its 'preferred side'. Is there an advantage in the BL model formulation to having disjoint excitatory and inhibitory spatial inputs, rather than a broad excitatory field that overlaps with the delayed inhibition? If so, would such an advantage explain why this asymmetry might exist in these DSGCs, even with DS inhibition from the SACs? I guess I'm asking whether there is an advantage for general motion detection, rather than proposing a new role for these cells in localizing specific types of motion stimuli.

    3. Reviewer #1:

      This paper describes a new finding about stimulus encoding in On-Off directionally selective ganglion cells. It is well established that these cells have spatially displaced inhibitory input from starburst amacrine cells, and that the spatial offset of inhibitory input contributes to the cells' selectivity for direction of motion. The work in this paper shows that the cells also have spatially offset excitatory input, and that this input can give rise to a non-directional response. Several functional roles are suggested for the non-directional response. I felt that the evidence for the non-directional response was strong, but that the connection to visual function was too preliminary.

      Functional importance:

      The paper emphasizes the possible functional importance of the non-directional motion signal; this is a focus of the discussion, and is highlighted in both the abstract and introduction. I found this part of the paper less complete and convincing than the experimentally-driven results. Several issues contribute to this. One is that the contribution to identifying the position of a moving object is fairly modest. Another is that the impact of the non-directional component on other stimulus properties - e.g. the accuracy with which motion direction is encoded - is not explored. A third is that the position of a moving object is almost certainly encoded by multiple ganglion cell types, and hence the modest improvement in position encoding in the DS cell population may make even less contribution when the entire ganglion cell population is considered. A complete investigation of coding in the ganglion cell population is clearly too much, but a more balanced and complete consideration of the benefits and drawbacks of the mechanism described would strengthen the paper considerably.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers were in broad agreement that the findings were interesting and that the experiments were well executed and clear. The main concern is that the paper does not provide either a definitive mechanistic insight into why excitatory input is asymmetric, or a definitive functional argument about the importance of this asymmetry.

    1. Reviewer #3:

      Summary:

      Gene drives are alleles that bias their inheritance to spread through a population. Engineered gene drives could potentially be used to spread genes that prevent malaria transmission in mosquitoes. In this study, the authors develop a proof-of-principle of effector components that would be part of a proposed integral gene drives. Such drives are different from standard gene drives by separating the Cas9 and effector components at different loci, with each one having biased inheritance, a useful strategy if the Cas9 has a substantial fitness cost (though it remains unclear if this is the case). They can also more easily target conserved sites of important genes compared to a standard drive, though this is not unique to the integral gene drive strategy. The Cas9 and effector components would be expressed from natural promoters, with introns and translation skipping utilized so that the original gene works properly and so gRNAs can be expressed within the intron. The authors showed that the effector component of such a drive performed as expected, and that both effectors and the target gene were expressed. Overall, the manuscript is a mostly sound technical demonstration of the effector component of an integral gene drive.

      Review:

      1) It's unclear how exactly resistance alleles would be dealt with in the author's strategy. While an integral gene drive could target an essential gene so that resistance alleles are nonviable, that doesn't seem to be the strategy here, since the authors needed to target a gene with a promoter that would be a good match for their effector. The need for both an essential gene and a suitable promoter in one package may thus limit the use of the integral gene drive strategy. Higher fitness costs associated with disruption of the gene may partially ameliorate this issue, but this was not confirmed in the current study (transgenic strains had lower fitness, but was this due to the drive, the effector, or the reduced expression of the host target gene?).

      2) The authors removed their marker genes by surrounding them with LoxP sites and crossing their lines to Cre. This was justified since the authors believed that the presence of the marker would interfere with expression of the target gene, causing fitness issues. However, the authors found no sign of fitness reduction based on anecdotal (?) observations. Were these observations actually quantified, in which case they should be supplemental material? It could be particularly interesting in light of the fact that even without the marker, the transgenic strains suffered fitness effects. It would be nice if the decision to remove the marker was better justified in this section, based on the next section where it was found that the marker interfered with effector expression. Perhaps even combining or reversing the order of the sections would be appropriate (for example, consider first saying that the marker interferes with expression, then mention how this was expected and the marker could be removed, solving the problem).

      3) Based on figure 3D-E, it appears that the target host gene has reduced expression even after the marker is removed. This is quite important for future considerations, yet seems to be glossed over. For example, if a target is chosen that can effectively help remove resistance alleles due to fitness costs from disrupting the target gene, this means that the gene drive will also suffer fitness costs.

      4) The fitness analysis examining fecundity and hatch rates is not very informative. While similar fitness effects among the transgenic strains lends some weak evidence that inbreeding may account for the fitness reduction, variability between individuals certainly does not (after all, wild-type individuals were also highly variable). Also, if the Cre line has a different background than G3, wouldn't all the lines have received some of this background from prior crosses? Perhaps this could be the answer. It would nonetheless have been better for the authors to outcross the lines before inbreeding them, with similar inbreeding for the wild-type control, before doing this experiment. Because of the issues with this experiment, I'd suggest that it is conducted again with better controls or is moved to the supplement.

      5) It's hard to believe that no end-joining took place, even though the last sentence of the results indicates that no end-joining was detected. Did the authors not sequence any progeny with the drive, to look for end-joining products formed from maternally deposited Cas9? Other studies with vasa-Cas9 in Anopheles saw this phenomenon occur at a high rate. For end-joining products formed as an alternative to HDR, was it 21 individuals that were sequenced (nine with Aper1 and twelve form the full AP2 sequencing)?

    2. Reviewer #2:

      Hoermann et al. present a new gene engineering concept for disease vector mosquitoes, whereby endogenous mosquito genes are hijacked to express a heterologous effector peptide intended to render mosquitoes resistant to human pathogens. In addition, a synthetic intron added within the effector-coding sequence will express gRNAs for the CRISPR-Cas9 system, recognizing the transgene's own wild-type insertion locus. In the presence of a source of Cas9, the effector gene is thus able to home into a wild-type chromosome, triggering a gene drive effect that can increase the frequency of the modification in the mosquito population. A fluorescent marker, also cloned within the intron, is used at early steps to track the transgene, but is subsequently removed by Cre/lox excision to restore host gene + effector expression and to result in minimal genetic modification.

      This is an extremely elegant procedure and a remarkable technical achievement, especially in such a difficult species as Anopheles gambiae. The choice of midgut-specific promoters to express anti-malaria effectors makes sense to target early stages of development of parasites, before they had a chance to amplify in the mosquito. Using endogenous regulatory sequences without a need for promoter cloning alleviates the tedious work of individual promoter characterization. The molecular designs are well described, and the results likely to have a large future impact in the development of vector control tools, notwithstanding some weakness in assessing the antiparasitic effect of Scorpine in the transgenic mosquitoes (see below). I agree that this type of transgene should facilitate semi-field or field testing of candidate anti-parasitic effectors, before any true gene drive intervention is envisaged.

      Major Comments:

      P. falciparum transmission blocking assays - Fig. 5:

      I have several questions about figure 5.

      -Are mosquitoes with 0 parasite taken into account in the calculation of the mean and median? This should be explained in the legend or in Exp procedures

      -Several replicates have been pooled to generate the figure, for each transgenic strain. Is this legitimate? i.e. were the mean oocyst number and prevalence, reflecting the quality of each ookinete culture, similar enough between replicates to allow pooling? If not, it would be more legitimate to show the result of a single representative replicate. Please provide a table with the raw parasite counts of the separate replicates in a supplemental file so that readers can better judge these results. I note that panel C is very useful.

      -I find the bar graph hard to interpret. The median M is represented either as a stroke inside some bars, or overlapping the x axis when M=0. The size of the bar doesn't represent the mean, m. Does it represent a confidence interval? This must be explained in the legend. Maybe a dot plot where each dot represents the parasite counts of one mosquito would better represent these results?

      -From my point of view, mosquito numbers in some of these infections may be too low to yield solid results. Especially in the ScoG-AP2 experiment: 37 mosquitoes in the G3 control with a prevalence of 51% means that only 19 mosquitoes across R=2 replicates contained parasites. This low number is associated with a risk of atypical outliers in the parasite counts, even if the statistical tests presented here show good significance. In the panel C analysis of these values, we see from the size of the squares that the replicate that had the highest statistical significance also had the smallest number of mosquitoes. The replicate with a larger N has only one *. For the Aper1-Sco line, N is large and the statistical significance is high (although panel C shows that one of the 4 replicates showed no difference) but I'm still somewhat unconvinced of the effect of scorpine in this line: the mean only drops from 10 to 6 parasites, prevalence drops from 37 to 21%. Combining this moderate effect with the facts that (1) some replicates sometimes show no Scorpine effect, (2) the Sco-CP line, which has a comparably high level of scorpine expression according to Suppl fig . 3, shows the exact opposite, i.e. pro-parasitic effect, makes me doubt the antiparasitic effect of scorpine.

      In the case of the ScoG-AP2 line, scorpine expression is only 1/10 to 1/8 of the expression in the other two lines, but seems to have a similar effect as in the highest (Aper1) expressing line: one possibility is that fusion to GFP stabilizes Scorpine so that lower expression results in higher activity, but a milder effect would have been logical if scorpine had a dose-dependent effect.

      One caveat of these experiments is that the genetic background of the control mosquitoes (G3) is not exactly the same as the transgenics (G3 x KIL). There is a possibility that the KIL background contributed some alleles conferring elevated Plasmodium resistance (or the opposite in the case of Sco-CP). I would find the results more trustable if a control of equivalent genetic background could have been generated for each transgenic strain (in the process of homozygous line selection, the homozygous WT siblings could have been retained to serve as specific controls, though I know how demanding this work would have been...).

      Another caveat is that we don't know the precise kinetics (e.g. between 0-36h post blood meal) of the scorpine protein midgut concentration in each transgenic line, and we don't know at what time point after the blood meal parasites would be most susceptible to killing by scorpine (probably between 3 and 24h, time after which they transform into protected cysts). Taken together, the scorpine data is not highly conclusive and there remains much uncertainty about the efficacy of transgenically expressed Scorpine as an anti-plasmodium molecule. I'm not requesting additional experiments (though future long term assessments of these transgenic lines with new isogenic controls would be very interesting), but I invite the authors to downstate scorpine's potential effectiveness as an antimalarial effector in vivo. This does not decrease the importance of this work of which scorpine is only one aspect. A candidate molecule had to be chosen for these proof-of-principle experiments. Scorpine may not have been a very lucky choice, but its moderate (or opposite) effect should be seen as an interesting result in itself. The way is now open to test other possible candidates.

    3. Reviewer #1:

      This is a compelling demonstration of a number of important steps that take population replacement gene drive for malaria control closer to reality. I have no major concerns and think the manuscript shows the authors have made substantial progress in a) taking Integral Gene Drive (which is a recent idea from senior author Windbichler) into mosquitoes, b) successfully removing marker genes to make the whole system more effective, c) demonstrating that the approach works to express a molecule to reduce parasite infection rates in the lab while also making it possible to test these effector molecules in natural settings without risk of accidental drive release, and d) also showing that drive is successful. My comments are only minor and I think the study is high impact.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This paper demonstrates a number of important steps necessary for implementing the recently proposed "integral gene drive" strategy. In this approach, endogenous mosquito genes are hijacked to express a heterologous effector peptide intended to render mosquitoes resistant to human pathogens. Such drives differ from standard gene drives by separating the Cas9 and effector components at different loci, with each one having biased inheritance. This could be useful if the Cas9 has a substantial fitness cost and could also more easily target conserved sites of important genes compared to a standard drive. While it remains to be seen how effective this approach will be in practice, the paper provides valuable insights into how such gene drives could work in mosquitoes.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our upmost gratitude to the three anonymous reviewers for their constructive and insightful comments on our manuscript. We broadly agree with all comments made and have uploaded a preliminary revised version with changes highlighted in bold. We now deal with each of the reviewer comments in turn.

      Reviewer #1

      L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants?

      Having done a crude examination of unmapped reads, we couldn't find compelling evidence of them being of viral origin. The unmapped fraction in fact was in the same region as seen for other sRNA libraries in our lab which we found to occur for a number of reasons such as sequencing errors, incomplete assembly, differences between the sequenced lines and the reference line. Those all result in unmapped reads, which is also cause by since we employed a stringent mapping (0 mismatches).

      L67-68, which is the explanation?

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space.

      Thanks, this has been addressed.

      Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point.

      We agree, and have moved Fig3 to Supplement.

      *P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x 0.8, not "strong bias (0.2

      Thank you for pointing this out, we have now corrected this error. We have duly corrected it in the text.

      P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.

      Thank you for pointing this out. You are correct that the changes in the density distribution are not as striking for locus size. A great deal of deliberation on our part went into deciding what to do about this. In the end, we decided that for the size classes there was benefit in having several different classes with the understanding that having additional potentially redundant cut-offs would not adversely effect the analysis. In doing this, we were partially driven by the albeit subtle changes in the curve, but also by the desire to have size classes that were biologically relevant and informative. For example, a locus 3000nt captures the long tail. However, we neglected to fully explain these subtleties in our decision-making, something we have now rectified through some added explanation in the text. These choices were validated by the way size classes are differentially associated with different locus clusters in Figure 8.

      Fig 5: the legend has the C subfigure twice, the second should be D.

      Thank you for highlighting this. It has now been corrected.

      Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.

      Thanks for pointing this out. We agree with this suggestion and have replaced Table 1 with a Figure (Fig 5) which is indeed a better way to present those results.

      Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?

      Thank you for pointing this out, we understand that this isn't clear at the moment. Briefly, for this analysis we performed the clustering multiple times each time with a random sample of the loci (with replacement) of the same size as the original dataset. We then calculated the proportion of loci that retained their original clustering. We have clarified this in the figure legend and also elaborated on the approach in the methods section to ensure that it is better described.

      P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.

      Thank you for querying this. The process of choosing the appropriate value of k is a complicated one and we appreciate that the explanation could be clearer. After your comment, we re-visited our decision-making process and were reassured that a k value of 6 rather than 7 was indeed appropriate. The stability plots in Fig. 6A start with k=2 and it can be clearly seen for k=6 that stability is comparatively high for dimensions 7-10. Indeed, k values of 2,3 and 6 seem to be the only feasible values. k=7 is fairly unstable for all dimensions from 1-8. We have done some rewording of the methods to hopefully make this clearer.

      Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2.

      Thanks for pointing this out. This is now corrected and we have referenced all figures in the main text.

      Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each.

      A good suggestion - this has been replotted with different colours.

      Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file.

      Thanks for pointing this out. We originally submitted this file as a gff format. We are not sure why this got converted. We will make sure this is going to be in appropriate format in the final form, especially having suffered from the pains of pdf tables ourselves in the past.

      I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Indeed, the colours were inverted. Thanks a lot for that spot, we have now swapped them around.

      Reviewer #2

      I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      Thank you for this very insightful comment which has helped us to reflect on the methodological approach. While it is likely that there are some RNA breakdown products picked-up in the sRNA sequencing, we do not think that the locus-map as a whole is undermined by this. For example 54% of loci have a predominance for 21-nt sRNAs and 18% for 20-nt sRNAs, so the majority of sRNA loci do have a predominance for a specific RNA size.

      However, your point does raise a very valid concern with implications for the interpretation of LC4. Although we posit some explanations for these loci (e.g. DCL-mediated sRNA production without an accessory protein to provide PAZ domain-like sRNA measurement), given the very strong strand bias and association with genic regions we do agree that there is a risk that these loci predominantly represent degradation fragments. Therefore, we have now reworded how we discuss LC4 in the discussion to reflect this. This also reveals a key advantage of the clustering approach in that should LC4 indead represent degradation products, they have been successfully grouped together into a seperate cluster such that they don't undermine the insights gained from the other locus clusters.

      One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      Thank you for pointing this out. In a pursuit to share as many details as possible, we appreciate that this can be too verbose, as righlfully noticed here. In order to not compromise detail too much, we have created a second, toned down, version as csv which now includes essential details such as name, position and LC. As for the gff, we kept all details in since it loads quickly in a genome browser, but also into other tools such R in which those feature can be used as efficient filters.

      Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      We have included another table which includes direct accession number for ENA as well as numerous other meta data in Sup Table S6 i.e. "Supp_Table_S6_library_ENA_accession"

      Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      The snapshots have been improved to ensure better readability.

      Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      We agree and have changed it accordingly

      Figure 4 caption: The acronym "CRSL" should be defined.

      CRSL is now been duly defined in the manuscript

      Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      We have used the appropriate bibtex code to reference this Zenodo share (https://zenodo.org/record/3862405/export/hx). The current cite format does somehow omit some information. We hope this will be fixed by the publisher but we have also provided the full DOI address in the “additional information” section just in-case. We will keep an eye on how it comes out.

      Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Thank you for this suggestion, we have adapted the title to make it more descriptive.

      Reviewer #3

      …the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Thank you for these insightful comments. As we followed a very similar methodological approach to that used to produce the Arabidopsis sRNA locus map published in Hardcastle et al. (2018), we wanted to take the opportunity to compare the results and build upon the ongoing discussion concerning the evolution of sRNA mechanisms in Chlamydomonas (e.g. Valli et al. 2016). Your point about the possibility of an ancestral progenitor with greater diversity that was then lost is very valid. You are also of course correct about the limitations to what can be concluded from this study and the limited comparisons that can be made. We see our approach as a useful tool for hypothesis generation which can be complemented by more in-depth exploration in the future. With this in mind, and taking on board your comments, we have elaborated on our discussion of the evolutionary implications of our study, which we hope now gives a more balanced account.

      I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      That information was indeed missing, thanks for bringing it up. We have now included this in the gff file (column LC) as well as in another cleaner table (Supp_Table_S7_loci_class_annotation).

      Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      Thanks for pointing this out, this was an error on our part, the captions have now been corrected. Unfortunately, due to the ongoing pandemic-related restrictions we were unable to run to get a genome browser session to run to this point to create more loci figures.

      If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      Thank you for this suggestion. In fact, not all annotation features were used predictively in the MCA and clustering process, and so these "supplementary" annotations as outlined in supplementary table S3 can provide some cross-validation. With that in mind, we have now included an additional heatmap as a supplementary figure which shows associations for some of these supplementary annotations as well as corresponding explanations in the text. Further validation is provided by the chromosome tracks in figure 9 showing the distinct genomic distributions of each locus cluster despite chromosomal location not being a factor in the clustering.

      Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      Thank you for pointing this out, we have now corrected this error.

      Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      Table 1 has been replaced by the new Fig 5, annotation/abbreviations should now be more obvious.

      Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Thank you for pointing this out. We have now moved the reference to the paragons to earlier in the section where we introduce the six clusters. We hope this is now clearer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript presents a detailed map of sRNA (precursor) loci in the green alga Chlamydomonas reinhardtii based on large volumes of sequencing data (145 sRNA libraries). The locus map based on a false discovery rate of less than 0.05 had 6164 loci, covering 4.1% of the Chlamydomonas reference genome. Individual loci were annotated based on both intrinsic features, such as sRNA size, 5'-nucleotide, strand bias and phasing pattern, and extrinsic features, such as sRNA expression, genotype and overlap with genomic attributes (e.g., genes, transposons, methylation levels).

      By using the intrinsic and extrinsic features of each sRNA locus and Multiple Correspondence Analysis (MCA) approaches, the sRNA loci were clustered into six distinct classes, referred to as locus class (LC) 1-6. This strategy is partly validated by the grouping of well-characterized Chlamydomonas miRNAs into the same cluster, LC3.

      As the authors state, this data-driven approach is valuable for hypothesis generation since (with the possible exception of LC3) the biogenesis and function of most sRNA loci (and of the corresponding locus classes) remain uncharacterized in Chlamydomonas. The analysis provides a framework to facilitate future characterization of the diverse types of sRNAs in this model algal system.

      However, the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Some specific comments:

      1.I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      2.Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      3.If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      4.Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      5.Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      6.Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Significance

      Chlamydomonas reinhardtii is a model unicellular green alga, the lineage of which diverged from land plants approximately one billion years ago. Chlamydomonas encodes a great number of diverse small RNAs. However, the biogenesis and function of the majority of these sRNAs are not known. By grouping sRNA loci into specific classes (based on intrinsic and extrinsic features), this manuscript provides a framework that will facilitate the future characterization of sRNAs in Chlamydomonas and, very likely, in other algal species. This information may also contribute to our understanding of the evolution of sRNA loci within eukaryotes.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript describes the annotation of small RNA-prodicing loci from the green alga Chlamydomonas reinhardtii. A large number of small RNA-sequencing datasets were anlayzed and used to create genome-wide annotations of small RNA-producing loci. These loci were annotated based on several features, and then classified into six major groups based on these features.

      Major comments:

      Are the key conclusions convincing? --> Yes.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? --> No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation. --> Yes, additional analyses should be conducted, see itemized list below.

      Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. --> Perhaps a few weeks to a month of analysis and revision time.

      Are the data and the methods presented in such a way that they can be reproduced? --> Yes.

      Are the experiments adequately replicated and statistical analysis adequate? --> Yes.

      SPECIFIC COMMENTS:

      1.I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      MINOR COMMENTS:

      2.One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      3.Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      4.Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      5.Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      6.Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      7.Figure 4 caption: The acronym "CRSL" should be defined.

      8.Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      9.Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.:

      This study provides a genome-wide annotation of small RNA-producing loci from Chlamydomonas reinhardtii. This will serve as a use data resource for researchers working with this model system. The results overall confirm what is known from previous studies of Chlamy small RNAs : They are rather distinct from angiosperm small RNAs and from animal small RNAs.

      Place the work in the context of the existing literature (provide references, where appropriate).:

      This may be the first study to provide a genome-wide annotation (as opposed to a focused effort) for Chalmy small RNA populations.

      State what audience might be interested in and influenced by the reported findings:

      Chlamy researchers, especially those interested in gene silencing and genome annotations, and small RNA specialists with interest in annotations and in wide phylogenetic comparisons.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. :

      Plant microRNAs, siRNAS, genetics, and genomics.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, Müller, Matthews, Vali, and Baulcombe have used data-driven machine learning approaches to annotated and classified sRNA loci of Chlamydomonas reinhardtii. I have found the manuscript very interesting and a handy handbook for the appropriate way to annotate sRNA loci in different organisms. I believe this is not only a great resource paper on its own, but it also contains essential information to start understanding how Chalmydomonas silence TEs without a RdDM pathway. I have a few comments that may help to improve the manuscript.

      -L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants? -L67-68, which is the explanation?

      • Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space. -Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point. -P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x < 0.2 and x > 0.8, not "strong bias (0.2 < x < 0.8)", as in the text. -P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.
      • Fig 5: the legend has the C subfigure twice, the second should be D.
      • Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.
      • Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?
      • P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.
      • Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2. -Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each. -Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file. -I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Significance

      This is a very interesting aticle. It may looks a little bit technical but is provide useful information for people studying Chlamydomonas. In addition, the way the authors approached the annotation of sRNA is very meticulous and elegant. I would suggest people exploring small RNAs in non-model organisms to use this article as a handbook of how to annotate sRNAs. In this particular way the artivle will be of interest beyong the Chlamydomonas, and event plant, research field.

    1. Reviewer #1:

      The manuscript by Wuertz-Kozak et al explores the relationship between early life stress bone parameters in mice and humans. In mouse studies, micro CT and qPCR analyses were done, while in humans with depression and history of childhood neglect had bone turnover markers and DXA scans done. Increased CTX levels were noted in both mice with early life stress and in certain groups of humans with depression. These investigators recommend that early life stress be further assessed as a risk factor for human bone disease.

      1) Although the authors acknowledge the limitations of controlling and even assessing accurately the kind of impacts (e.g., nutritional, activity-related, body weight changes, age when stress inflicted etc) that may operate during childhood stress and neglect, the human model is very problematic because of this heterogeneity. There do not appear to be good parallels between the mouse model and the human cohort.

      2) Bone cell proliferation and differentiation are proposed to be affected in the mouse model. Proliferation can be directly measured in many ways and should be formally tested. Similarly, the stage of osteoblast differentiation can be easily assessed by PCR with well-validated gene markers of early vs late differentiation. The hypothesis proposed in line 140 can be directly tested.

      3) What is the significance of the increased innervation that is reported in Figure 1 and the reduced neuronal receptor expression in the next figure? It would make sense that more nerve growth would lead to greater receptor expression. Is it also unexpected that NGF2 levels are so low when there is increased nerve innervation to the bone in MSUS mice?

      4) The authors propose a 'catabolic shift' in bone in the MSUS mice. There are a few unusual things that have been reported in this matter. Most researchers would not consider osteoprotegrin a matrix gene (line 159). Furthermore, changes in osteocalcin, osteopontin and sclerostin mRNA would not be the most sensitive markers for the proposed catabolic shift. The proteins encoded by these genes are in the matrix but they are the products of osteoblasts and osteocytes and the bone formation marker P1NP per the authors is unchanged in the mice. It is the CTX that is elevated and perhaps more sensitive gene markers for a catabolic shift would be RANK-ligand, mCSF and perhaps osteoclastic genes.

      5) The Descriptive Result for the Human Study (line 172-184) is very difficult to follow. Many more key demographic, biochemical, and clinical characteristics of the human study populations need to be provided. The paper uses a wide age range of patients (18-65 years). Therefore some of the subjects will have gone through menopause and others who may not yet have reached peak BMD. This introduces a great deal of heterogeneity into the population being studied.

      6) What was the exposure and duration of the use of SSRI's in the population? These medications are implicated in reduced BMD and increased fracture rates in some studies.

      7) DXA results: (a) What site in the hip DXA is "H" or "collum femoris"? (b) One would have suspected that the total hip BMD and femoral neck BMD would have aligned with the results for the greater trochanter BMD, as shown in Table 1. Yet the 3 sites in the hip do not align. This suggests a weak relationship. (c) Lines 179-181, it seems that only ~33 subjects were included in the DXA studies. Given the heterogeneity of the population being studied in key parameters - age, sex etc - this would be an extremely small number to break into 4 groups as in Table 1, run statistical testing on, and report out on BMD results. This is a very under-powered study. BMD varies with age, sex, ethnicity, body size. Such characteristics need to be controlled to tease out an effect of childhood trauma and depression on bone.

      8) Micro CT data in the MSUS mice are driven by effects on body weight, and these data do not support a direct effect of postnatal stress on the bone itself.

      9) The human cohort needs to be better defined and described. It likely should not cover such a wide age range (18-65 years). Drug therapies for depression and their duration should be specified to compare the groups. A thorough medical assessment needs to be done on these subjects with screening labs and a basic screening medical history and physical examination. Many disorders known to affect bone could be missed (e.g., menopause, liver or kidney disease, etc). Alcohol consumption needs to be explored and clearly reported as well as the amount of smoking since both habits affect bone parameters.

    2. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      The authors examine the robustness of coupling of distinct oscillatory circuits of different frequencies across a range of temperatures. The two circuits have different means of generating oscillations and could therefore, potentially, be impacted to different degrees by temperature perturbations. Across all temperatures tested the two distinct rhythms increased their frequency but remained coordinated. The coordination was in the form of the previously-described integer-coupling where the cycle period of the slow rhythm was an integer multiple of that of the fast one. This is due to the fact that the slower rhythm was most likely to start at a given phase within the faster oscillation cycle. The temperature robustness of this coupling is an interesting and important result and the description and analysis are both well done.

      Major comments:

      The main finding of the paper is that a previously-described integer-coupling between two rhythms remains more or less intact across temperature variations. It is a nice descriptive finding, but rather disappointing in that there is so much more that could have been done rather easily that would have given much more depth to this finding. Most obviously, because it is known that the source of the coupling is the inhibitory synapse from the pyloric pacemaker to the gastric mill half-center, it is quite important to know how the strength of this synapse affects the interaction at different temperatures. That is, to expand what Bartos et al 1999 did across a range of temperatures. Short of that, it would have been nice at least to perturb the cycle period of the pyloric rhythm and see whether the interaction would remain robust across temperature despite changes in cycle period.

      While the study convinces the reader that integer coupling between pyloric and evoked gastric rhythms is robust to temperature changes, it does not attempt to to explore the origin of this robustness, e.g. by using different methods to activate the gastric rhythm or testing if integer coupling is present with spontaneous gastric rhythms.

    2. Reviewer #2:

      In the present paper, Powell and colleagues investigated how coupled oscillatory circuits maintain their coordination over a wide range of temperature. To do so they used the stomatogastric system of the crab Cancer borealis that contains the fast (1Hz) pyloric network and the slow (0.1 Hz) gastric mill network. The two generated rhythms are coordinated such that there are an integer number of pyloric cycles per gastric cycle. Both rhythms exhibit temperature-induced frequency changes, but their coordination is well maintained even at high temperature. Therefore, this study shows that the relative coordination between rhythmic circuits can be maintained as temperature changes, thus ensuring appropriate physiological functions even under global perturbations.

      This study, that uses a fantastic model for investigating neural networks in general, addresses an important physiological question. However, I have a few concerns that could be probably clarified with some additional explanations in the text:

      -While the intrinsic temperature sensitivity of the pyloric rhythm has been nicely investigated in some previous excellent publications (most done by the authors), that of the gastric rhythm is less well known. Stadele has shown that increasing the temperature leads to a breakdown of the gastric rhythm that can be rescued by modulatory afferences. What do we know about the temperature sensitivity of the afferent neurons that are stimulated to trigger the gastric rhythm here? Is there the possibility that what is observed also includes an effect of the temperature changes on these neurons (MCN1 function for example) or that the gastric temperature sensitivity described here reflects in fact that of the afferences?

      -All experiments were performed in conditions in which the gastric rhythm is triggered by stimulation of the two dorsal posterior esophageal nerves (dpons) that contain axons of modulatory afferent neurons. However stimulating these nerves also modulates the pyloric network that is also a target of those afferences (as stipulated in the text line 583-584). Isn't this a bias in the experiments and their interpretations? Also, because as schematically represented in Fig 1, the pyloric pacemaker neuron AB has direct connections with Int1 gastric neuron that is itself connected to the LG gastric neuron, the simplest interpretation of the experiments would be that this connection is preserved and remains efficient even under high temperature. Is it finally one of the conclusions of the paper?

      -In the same vain, the sensitivity to temperature changes of the gastric rhythm has been studied here but with the pyloric network, being itself intrinsically sensitive to temperature changes, still active (Fig 3 and related text). What do we know about the intrinsic temperature sensitivity of the gastric rhythm when elicited by dpons stimulation but isolated from the pyloric network (AB neuron killed for example)?

      -Data presented here show that coordination between PD and LG neurons is preserved after temperature increase, but that this is not the case between PD and DG neuron that shows no phase-coupling at high temperature (Fig 6). The PD neurons are used here as an indicator of the pyloric rhythm while the LG neurons indicate the gastric rhythm. Then what would be the conclusions of the authors if the DG neuron would have been used as the gastric rhythm indicator? How do you conciliate everything together?

    3. Reviewer #1:

      Powell and colleagues measured coordination robustness between pyloric and gastric rhythms in in vitro preparations of Cancer borealis exposed to temperature variations (7-23C degrees). Using extracellular recordings, they first show that spontaneous rhythms are not stable, likely resulting from multiple physiological processes that are difficult to monitor. Therefore, they rather used bouts of activity reproducibly evoked by stimulation of a neuromodulatory pathway. As expected, cold temperatures slowed down rhythms, warm temperatures accelerated rhythms in a similar manner. Despite this variation in rhythm frequency across temperatures, coordination between pyloric and gastric rhythms was stable . This suggested that the activity of rhythmogenic neurons is coordinated across temperatures. Powell and colleagues also found that the gastric Lateral Gastric motor neuron (LG) was phase-locked with the Pyloric Dilatator neuron (PD), suggesting they may be involved in coordination robustness.

      The originality of the study is that the authors focused on the coordination of pyloric (1 Hz) and gastric (0.1 Hz) networks. A large quantity of raw data is beautifully illustrated. Data analysis is sophisticated and convincingly supports the interpretations on the authors. The text is exquisitely written in a clear style and pleasant to read. In my view, the study contains the first experiments of a potentially exceptionally interesting study, once more mechanistic insights are added. To further strengthen the relevance of the study, I would suggest pursuing one of the three options below to further uncover the mechanisms underlying the effects described. 1.) Could the authors design causality-based experiments to identify which neuron is responsible for the coordination of the rhythms at different temperatures? There are many interconnected neurons in Figure 1C. Even if LG is phase locked to PD, is it possible that another neuron drives PD and LG? If PD controls LG, would it be relevant if the authors reversibly switched off PD (e.g. with tonic hyperpolarisation) and see the effect on gastric rhythm frequency at various temperatures?

      2) Could the authors identify using pharmacological tools whether distinct neuromodulatory substances influence coordination robustness over specific ranges of temperature, but not in others? It seems that Stadele et al. 2015 PLoS Biol 13(9):e1002265 used a different way to evoke the rhythm, and their gastric rhythm crashed at lower temperatures (13C degrees) than in the present study (27C degrees). Do the authors think that the different stimulation approaches used in the two studies could involve different neuromodulatory substances, which would result in different robustness profiles?

      3) Do the same intrinsic properties or synaptic connections underlie coordination robustness across temperatures? Modeling suggests that different conductances are involved in a temperature-dependent manner (Alonso and Marder 2020 Elife 9:e55470.2020). Is it possible for the authors to experimentally deactivate specific conductances using dynamic clamp in LG or PD or with pharmacological tools and determine whether this would reversibly disrupt the coordination between pyloric and gastric networks in some specific temperature ranges but not in others?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study addresses an important question about the physiology of coupled oscillatory neuronal networks operating under a wide range of temperatures. The stomatogastric system of the crab Cancer borealis contains the fast (~1Hz) pyloric network and the slow (~0.1 Hz) gastric mill network. The two generated rhythms are coordinated so that there is a given number of pyloric cycles per gastric cycle. Powell and colleagues show that upon stimulation of a neuromodulatory pathway, these coupled oscillatory circuits exhibit reproducible bouts of activity and maintain their coordination, and that this coordination is maintained over a wide range of temperatures, thus ensuring appropriate physiological functions even under global perturbations.The authors show that the gastric Lateral Gastric motor neuron (LG) is phase-locked with the Pyloric Dilatator neuron (PD), suggesting these neurons may be involved in coordination robustness.

    1. Author Response

      Summary:

      The bacterial ribosome from E. coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryo-EM reconstruction with an overall resolution of 2Å, and even better than that in the best areas of the map. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these could be better developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what the advantage is here as the values are the same when estimated from half map FSCs. Therefore, it is suggested that the discussion about the model-to-map FSC is toned down considerably in (or even removed), while adding in more information about the new findings in the map, along the lines of the comments below.

      We thank the reviewers for their interest in this work, and for their helpful comments on the first version of the manuscript. We provide responses to the individual points below.

      Reviewer #1:

      This paper describes a 2A cryo-EM reconstruction of the E.coli 70S ribosome. This structure represents the highest resolution ribosome structure, by any method, available thus far and highlights interesting modifications that were not possible to see in previous structures. I'll let the ribosome experts comment on the relevance of these and focus my review on the cryo-EM technical parts. The paper is clearly written and the figures are informative and beautiful.

      The first author is particularly gratified that the figures were well received.

      Major comments:

      1) The authors make a big deal out of resolution assessment by model-to-map FSCs. It is unclear to me why they do this. First of all, model-to-map FSC is not a new resolution measure: it is in widespread use already. Second, it is unclear why the authors are so forceful in stating that it is better than the half-map FSC. They say "While map-to-model FSC carries intrinsic bias from the model's dependence on the map, in a high resolution context it does provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." What additional information does it provide? I would say it only provides true additional information if the atomic model comes from another experiment! In the way it is used here: by refining the model inside the very same map, there is a danger of increasing model-to-map FSC values through overfitting of the model (see also below). This danger is not recognized enough in the text (it is only hinted at in the sentence above), and overfitting is not measured explicitly for this case. Yes, half-map FSC measures self-consistency, but in practical terms (when done right!), this doesn't matter for the resolution estimate. The same is true for model-to-map FSCs: when done right they convey the right information, but the danger of self-consistency (through overfitting) also exists here. As the paper is mainly about the high-resolution ribosome structure, and no proper evaluation of the relative merits of half-map FSC versus model-to-map FSC is performed, I would suggest that the authors remove (or at least tone down considerably) their statements about resolution assessment from the manuscript.

      All three reviewers commented on our emphasis on using the map-to-model FSC criterion. We thank the reviewers for pointing out our motivation to discuss FSC metrics was not clear. We agree with the reviewers that the map-to-model FSC metric has been available for some time. However, in the ribosome field, the half-map FSC is still very commonly used as the sole resolution-dependent metric, including in recent literature that we cited (Nürenberg-Goloub, 2020; Tesina, 2020; Stojković, 2020; Pichkur, 2020; Halfon, 2019), as well as in a newer publication (Loveland, 2020, Nature, https://doi.org/10.1038/s41586-020-2447-x ). We mention some of the shortcomings of half-map FSC, which the third reviewer alludes to in their comment on “intense debate” in the field. While it is acknowledged as best practice to examine both maps and models, many visitors to the PDB likely will download only the model. Therefore we find it prudent to communicate confidence in the model resolution and not just the half-maps, particularly in this resolution regime. Again, this is not common in recent ribosome literature, which we will clarify in the Discussion. We will make changes throughout the manuscript to streamline and clarify our discussion of the two metrics, including an additional comparison to a newly released ribosome structure, as detailed below.

      When we discuss “additional information provided by map-to-model FSC,” we recognize that there may be semantic issues with the word “information” as map-to-model FSC depends on the same information content of the maps. However, the map-to-model FSC provides new information about the model quality to the reader. While half-map FSC tells us something about the best model one might achieve, new practical information lies in the authors’ handling of the model, which will vary among individuals (as discussed further below). Furthermore, model refinement procedures leverage well-defined chemical properties (i.e. bond lengths, angles, dihedrals, and steric restraints) that the map “knows” nothing about, which has value for keeping the realism in check. This is also why we originally included the sentence, “Sub-Ångstrom differences in nominal resolution as reported by half-map FSCs have significant bearing on chemical interactions at face value but may lack usefulness if map correlation with the final structural model is not to a similar resolution.” We will rewrite portions of this section for clarity.

      Comparisons to other recent high-resolution cryo-EM ribosome structures show discrepancies in the reported half-map FSC and map-to-model FSC calculated by us (see beginning of section “High-resolution structural features of the 50S ribosomal subunit”), with the map-to-model FSC values being to lower resolution. These structures report half-map FSCs only, which we could not replicate because of unavailability of half-maps, but we describe our calculation of map-to-model FSC with their deposited maps. We did not explicitly highlight the comparisons with their reported half-map FSC resolutions in the original manuscript, and we will include further discussion to more clearly communicate our point. We will also include another comparison to the newly released structure by Pichkur et al. (Pichkur, 2020) which has become available during the review process and is the closest to our map resolution. The map-to-model FSC with their model and map yields 2.29 Å resolution, while a simple rigid-body fit of our model into their map without further adjustment yields 2.07 Å. This difference highlights the practical insufficiency of focusing only on half-map FSC and the value of our model as a reference for future work.

      2) To test for the presence of overfitting their atomic models in the maps, the authors should shake-up the atomic models and refine them in the first independently refined half-map. The FSC of that model versus that half-map (FSC_work) should be compared with the FSC of that very same model versus the second half-map (FSC_test). Deviations between the two would be an indicating of overfitting. If that were to be observed, the weights on the stereochemical restraints should be tightened until the overfitting disappears. The same weighting scheme should then be used for the final model refinement against the sum of the half-maps.

      In lieu of what the reviewers have suggested, we think the additional map-to-model comparison of our model rigid-body docked into the 2.1 Å 50S map by Pichkur et al. provides reasonable evidence that our model suffers from minimal overfitting. Without any additional refinement of our model into their map, the map-to-model FSC resolution is 2.07 Å. We will include the new comparison in the revised manuscript.

      For model refinement, we used default parameters for phenix.real_space_refine, which internally optimizes weights for hundreds of different “chunks” during the refinement. This “black box” aspect does not give us facile control over the weighting scheme. However, we also note that the final model is not “fresh” out of Phenix; rather, the macromolecules have been meticulously reviewed and adjusted manually in Coot, with blurred maps to aid in accurate modeling for areas that are not as well connected/resolved. RSR in Coot was also required to “stitch” sections of the model together, since the models were refined in multiple focus-refined maps. Further, we think that for models that are ⅔ RNA, manually optimizing the Ramachandran restraints is unlikely to provide much new insight into RSR of this structure.

      3) Figure 1 -supplement 7: if radiation damage breaks the ribose rings, they should still be OK during early movie frames. This could be investigated by performing per-frame (or per-few-frames) reconstructions. The radiation damage argument would be a lot stronger if the density is present in early frames, yet disappears in the later ones. There will be a balance between dose-resolution and achievable spatial-resolution to see this of course. But it may be worth investigating.

      This is a great suggestion, and we have now carried out this analysis. We have performed the early-frame reconstruction and now have an alternative hypothesis that may make more sense. We will include the alternative hypothesis that we are likely seeing disorder due to conformational flexibility in the RNA backbone, rather than radiation damage, which seems unlikely given the features in the early-frame map. We will also update Figure 1–figure supplement 7 with new panels to aid this discussion.

      Reviewer #2:

      The manuscript by Watson et al. presents the structural analysis of a bacterial ribosome at high resolution. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these are missing or not well developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what this actually brings here as the values are the same when estimated from half map FSCs. The structure visualizes chemical modifications of ribosomal RNA and amino acids and water molecules, which together are interesting and important. However, here one would expect a comparison with structures of previously analyzed bacterial ribosomes, e.g. E. coli and T. thermophilus, e.g. from the same group and from the work by Fischer et al., Nature 2015: how far are the sites conserved? How do the maps compare? Are the same features seen? It is surprising to see that the main chemical modifications are not discussed and shown (only summarized in the Suppl. Data). Pseudo-uridines are mentioned, but how were these identified? It should be mentioned here that due to their isomeric nature these can be discussed only from their typical hydrogen bond pattern. The paper discusses new sites with chemical modifications, but this could benefit from a more thorough discussion of existing biochemical data or from including new biochemical characterization. The structural role of these modifications is not much described. The side chain of IAS119 has no density, hence one should be careful in interpreting an isomerization of this residue, not sure whether the data allow the conclusions to be made. Similar for the mSAsp89 residue for which the density is uncertain, hence it is not clear whether the conclusions stay on a safe ground.

      We thank the reviewer for their interest in this work. We addressed our emphasis on the map-to-model FSC in response to reviewer #1.

      For the majority of rRNA modifications, we included the supplementary figure as a reference for comparison to the published 4YBB and 4Y4O maps and models. These modifications have been extensively described in the structural biology literature, including in the recent cryo-EM study of the 50S ribosomal subunit (Stojković, 2020) and warrant no detailed comment by us at this time. Instead, we focused on new features that were not previously observed, such as hypomodifications and new modifications. The new modifications are the isoAsp observed in uS11 and the thioamide modification in uL16.

      IAS119 modeling in uS11: We thoroughly analyzed Asn or isoAsp modeled at this residue, and will provide additional evidence that isoAsp is correctly modeled at residue 119. In the original maps, although the side chain density is weak, the backbone density is unequivocal. There is clear density for the extra methylene group (marked with an asterisk in Fig. 4A). We have now calculated a map of the 30S subunit using the first three frames in the image stacks corresponding to a ~3 electron/Å2 dose. In this map, the side chain of isoAsp is more clearly visible (we will include a new figure panel with this density in the supplement). In addition to visual inspection, PHENIX provides a quantitative measure of the fit that also rules out Asn at this position. As we noted in the Methods, “Initial real-space refinement of the 30S subunit against the focused-refined map using PHENIX resulted in a single chiral volume inversion involving the backbone of N119 in ribosomal protein uS11, indicating that the L-amino acid was being forced into a D-amino acid chirality, as reported by phenix.real_space_refine.” Of the 10,564 chiral centers in the 30S subunit, only that for N119 stands out, having an energy residual nearly 2 orders of magnitude larger than the next highest deviation. This stereochemical problem was resolved by modeling isoAsp at this position. We will add these refinement details to the Methods.

      Furthermore, as we noted in the manuscript, isoAsp has been identified in E. coli uS11 by biochemical means (see David, 1999). We examined the phylogenetic conservation of the neighboring sequences in uS11, finding that the N is nearly universal in bacteria and organelles, and D is nearly universal in archaea and eukaryotes (Figure 4 and original Figure 4–figure supplement 1). Finally, even in lower-resolution maps of the archaeal and eukaryotic ribosomes, we find that isoAsp better fits the density, visually with respect to the backbone, and quantitatively based on correlations between RSR models and the density (original Figure 4–figure supplement 2). We therefore think we have been careful in interpreting the isoAsp in uS11, structurally, phylogenetically, and in light of available biochemical evidence. We also provided an in-depth analysis of the neighboring 16S/18S rRNA residues that are in intimate contact with the isoAsp119 region of uS11. See Figure 4B and Supplementary Table 2 and accompanying description.

      mSAsp89: Density for mSAsp89 has been seen previously in the X-ray crystal structure of the 70S ribosome (Noeske, 2015). Here, we also see density for mSAsp89 at lower contour levels. See Figure 1–figure supplement 5. We should have noted in the legend of this panel that we used a lower contour level for mSAsp89 and m7G527, to reveal the modifications. This will be added. Notably, at higher contours that still enclose the standard nucleobase and amino acid side chains, we do not see clear density for the mSAsp89 and m7G527 modifications, in Figure 1–figure supplement 6. In the section of the manuscript covering hypomodifications in RNA, we will clarify this point.

      Pseudouridines: We will clarify how pseudouridines are inferred in the main text. These can be inferred if a solvent molecule or other polar atom is within hydrogen-bonding distance of the N3 in pseudouridine (would be C5 in uridine). We will update Figure 1–figure supplement 5 to better show solvent molecules within hydrogen bonding distance of pseudouridine N3 atoms.

      From a methodological point of view it would be interesting to discuss in more detail how this high resolution structure was obtained, what the specific aspects of high-resolution data collection were and which were the important parameters to refine the structure. Also, how were the thousands of water molecules validated? Regarding the discussion on electrostatic potentials, in contrast to what might be intuitive, the contribution of electron scattering is actually stronger at medium resolution, i.e. its effect does not need high resolution per se. The discussion on radiation damage is a hypothesis at this stage and should be done more carefully including processing of the data using less electron dose (see detailed points below). Taken together, this work describes some interesting findings, but some remain unclear in the discussion because for some no biochemical data are available yet. However, this analysis provides useful hints to design future experiments. Also, there are no developments of tools in this paper in contrast to what is stated.

      We will add some additional information to the Discussion and Methods. In terms of the water molecules, we have not gone through these one by one at this point. We actually do not claim to have introduced new tools, but we note that our water modeling spurred the incorporation of phenix.douse into the latest PHENIX releases. This will be more clearly stated, and we will acknowledge Pavel Afonine for helping us as he developed this functionality. (He indicated we should cite Liebschner, 2019.) Solvent modeling is ripe for future development, as we note in the Discussion.

      Although scattering is stronger at medium resolution, it is not absent at < 2 Å. See the recent atomic-resolution structures of ferritin for examples. In fact, we have now examined the 2.1 Å map deposited by Pichkur et al. (Pichkur, 2020), in which the thioamide is barely visible. The thioamide in the 2.2 Å map deposited by Stojković (Stojković, 2020) is not obviously visible. We will add panels showing this in the revised manuscript.

      We have now used the early frames to address the question of ribose damage and the carboxylate of IAS119 in uS11, as noted above.

      Reviewer #3:

      The bacterial ribosome from E.coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryoEM reconstruction with features compatible with a resolution in the range of overall 2Å and below that resolution in the best areas of the map. With this level of detail, a chemical interpretation of many and fundamental aspects of translation and antibiotic interaction can be discerned in physicochemical terms, greatly improving our understanding of this key component of bacterial cells. The manuscript is well presented with clear evidence supporting the author's claims and interpretations. Specially remarkable is the detailed and accurate handling of the reference list.

      We thank the reviewer for their interest in our work. In the revision, we will keep the references mostly as-is, but will add a few based on the revisions we need to make.

      Mayor concern:

      There is an intense debate within the cryoEM community regarding which is the best way to estimate the resolution of a cryoEM reconstruction. In this manuscript, the authors claim map-to-model FSC values could "in a high resolution context [...] provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." Regardless of the opinion of this reviewer about this specific point, if a map-to-model FSC is to be used as a claim of "high-resolution" a convincing overfitting test proving the absence of overfitting in the refined model should be presented. Otherwise, map-to-model FSC values may be artificially high due to unrealistic deformation of the model. The authors thus, should prove their refined model is not overfitted.

      This was a concern of all the reviewers, which we addressed above. We think the comparisons to other recent structures, especially the 2.1 Å 50S map by Pichkur et al., makes the case for using the map-to-model FSC criterion.

    2. Reviewer #3:

      The bacterial ribosome from E.coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryoEM reconstruction with features compatible with a resolution in the range of overall 2Å and below that resolution in the best areas of the map. With this level of detail, a chemical interpretation of many and fundamental aspects of translation and antibiotic interaction can be discerned in physicochemical terms, greatly improving our understanding of this key component of bacterial cells. The manuscript is well presented with clear evidence supporting the author's claims and interpretations. Specially remarkable is the detailed and accurate handling of the reference list.

      Mayor concern:

      There is an intense debate within the cryoEM community regarding which is the best way to estimate the resolution of a cryoEM reconstruction. In this manuscript, the authors claim map-to-model FSC values could "in a high resolution context [...] provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." Regardless of the opinion of this reviewer about this specific point, if a map-to-model FSC is to be used as a claim of "high-resolution" a convincing overfitting test proving the absence of overfitting in the refined model should be presented. Otherwise, map-to-model FSC values may be artificially high due to unrealistic deformation of the model. The authors thus, should prove their refined model is not overfitted.

    3. Reviewer #2:

      The manuscript by Watson et al. presents the structural analysis of a bacterial ribosome at high resolution. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these are missing or not well developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what this actually brings here as the values are the same when estimated from half map FSCs. The structure visualizes chemical modifications of ribosomal RNA and amino acids and water molecules, which together are interesting and important. However, here one would expect a comparison with structures of previously analyzed bacterial ribosomes, e.g. E. coli and T. thermophilus, e.g. from the same group and from the work by Fischer et al., Nature 2015: how far are the sites conserved? How do the maps compare? Are the same features seen? It is surprising to see that the main chemical modifications are not discussed and shown (only summarized in the Suppl. Data). Pseudo-uridines are mentioned, but how were these identified? It should be mentioned here that due to their isomeric nature these can be discussed only from their typical hydrogen bond pattern. The paper discusses new sites with chemical modifications, but this could benefit from a more thorough discussion of existing biochemical data or from including new biochemical characterization. The structural role of these modifications is not much described. The side chain of IAS119 has no density, hence one should be careful in interpreting an isomerization of this residue, not sure whether the data allow the conclusions to be made. Similar for the mSAsp89 residue for which the density is uncertain, hence it is not clear whether the conclusions stay on a safe ground.

      From a methodological point of view it would be interesting to discuss in more detail how this high resolution structure was obtained, what the specific aspects of high-resolution data collection were and which were the important parameters to refine the structure. Also, how were the thousands of water molecules validated? Regarding the discussion on electrostatic potentials, in contrast to what might be intuitive, the contribution of electron scattering is actually stronger at medium resolution, i.e. its effect does not need high resolution per se. The discussion on radiation damage is a hypothesis at this stage and should be done more carefully including processing of the data using less electron dose (see detailed points below). Taken together, this work describes some interesting findings, but some remain unclear in the discussion because for some no biochemical data are available yet. However, this analysis provides useful hints to design future experiments. Also, there are no developments of tools in this paper in contrast to what is stated.

      Overall, this work appears to be promising, but it could benefit from clearer explanations, further comparisons with previous structures and clearer formulation of the conclusions drawn. There is indeed a significant level of novelty in this study.

    4. Reviewer #1:

      This paper describes a 2A cryo-EM reconstruction of the E.coli 70S ribosome. This structure represents the highest resolution ribosome structure, by any method, available thus far and highlights interesting modifications that were not possible to see in previous structures. I'll let the ribosome experts comment on the relevance of these and focus my review on the cryo-EM technical parts. The paper is clearly written and the figures are informative and beautiful.

      Major comments:

      1) The authors make a big deal out of resolution assessment by model-to-map FSCs. It is unclear to me why they do this. First of all, model-to-map FSC is not a new resolution measure: it is in widespread use already. Second, it is unclear why the authors are so forceful in stating that it is better than the half-map FSC. They say "While map-to-model FSC carries intrinsic bias from the model's dependence on the map, in a high resolution context it does provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." What additional information does it provide? I would say it only provides true additional information if the atomic model comes from another experiment! In the way it is used here: by refining the model inside the very same map, there is a danger of increasing model-to-map FSC values through overfitting of the model (see also below). This danger is not recognized enough in the text (it is only hinted at in the sentence above), and overfitting is not measured explicitly for this case. Yes, half-map FSC measures self-consistency, but in practical terms (when done right!), this doesn't matter for the resolution estimate. The same is true for model-to-map FSCs: when done right they convey the right information, but the danger of self-consistency (through overfitting) also exists here. As the paper is mainly about the high-resolution ribosome structure, and no proper evaluation of the relative merits of half-map FSC versus model-to-map FSC is performed, I would suggest that the authors remove (or at least tone down considerably) their statements about resolution assessment from the manuscript.

      2) To test for the presence of overfitting their atomic models in the maps, the authors should shake-up the atomic models and refine them in the first independently refined half-map. The FSC of that model versus that half-map (FSC_work) should be compared with the FSC of that very same model versus the second half-map (FSC_test). Deviations between the two would be an indicating of overfitting. If that were to be observed, the weights on the stereochemical restraints should be tightened until the overfitting disappears. The same weighting scheme should then be used for the final model refinement against the sum of the half-maps.

      3) Figure 1 -supplement 7: if radiation damage breaks the ribose rings, they should still be OK during early movie frames. This could be investigated by performing per-frame (or per-few-frames) reconstructions. The radiation damage argument would be a lot stronger if the density is present in early frames, yet disappears in the later ones. There will be a balance between dose-resolution and achievable spatial-resolution to see this of course. But it may be worth investigating.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The bacterial ribosome from E. coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryo-EM reconstruction with an overall resolution of 2Å, and even better than that in the best areas of the map. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these could be better developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what the advantage is here as the values are the same when estimated from half map FSCs. Therefore, it is suggested that the discussion about the model-to-map FSC is toned down considerably in (or even removed), while adding in more information about the new findings in the map, along the lines of the comments below.

    1. Reviewer #3:

      In this paper, the authors proposed an automatized method to sub-cortically parcellate the brain given a set of manual delineations. One of its strongest points relies on the adoption of a Bayesian approach, combining priors from the brain anatomic and MRI acquisition. These priors are then used to estimate the posterior probabilities per voxel, which after a series of operations on them provide a final sub-cortical parcellation. The paper sounds correct from a technical point of view and the proposed method potentially relevant, given the importance of having competent tools to find good sub-cortical brain delineations, especially in high resolution datasets.

      I have some possible concerns and suggestions that might increase the quality of the paper:

      -From Figure 4, it is clear how estimated Dice coefficients decrease with age. As it is well noted by the authors, this is likely caused due to the fact that the priors were built from 10 subjects that had an average age of 24.4 years and thus, the highest predicted performance rates are reflected for subjects whose age range (18-40) lies around this average prior age. I know that the authors mentioned in the paper that they plan on modelling the effects of age in the priors in future works. However, I was wondering whether they could already sort of address this question in the current work. Since the data used to test this age bias has already been manually delineated, what if the authors generate new priors for this set of delineations, including subjects from all ages, and test whether the predicted Dice coefficients still depend on age, in the same way as was done in Figure 4?

      -Automatized methods are usually sensitive to the number of subjects used to build the parcellation, with results from a bigger training cohort being potentially more robust and generalizable. As said earlier, I think that one of the strongest points of the automated method presented in this paper is the adoption of a Bayesian approach, which usually works efficiently for small sample sizes and allows to update previous results when new data comes. Still, I think it could be highly illustrative to show the performance of the current method depending on the initial training size. From the same set of delineations of the 105 subjects used to test the age bias, what if the authors show the predicted performance from generating the priors on a training set varying its size?

      -What is the value for the scale parameter delta that appears in the priors? Is that a free parameter? If so, do results change when this parameter varies?

    2. Reviewer #2:

      In the present manuscript, Bazin and colleagues describe an automatized computational approach to segment 17 subcortical nuclei from individual quantitative 7T quantitative MRI derivations. Therefore, they have trained a Bayesian "Multi-contrast Anatomical Subcortical Structure Parcellation (MASSP)" algorithm. They validate the approach in a leave-one-out fashion trained on 9/10 high-resolution scans. They assess age-related bias and report that dilated dice overlap allowing 1 voxel of uncertainty is demonstrating very high accuracy of segmentations when compared to expert delineation.

      This is a straight forward work. It would certainly benefit from an additional step of out-of-center / out-of-cohort validation, but I have no serious concern that performance would be unsatisfactory. The most important limitation is acknowledged, which is the bias from anatomical variation through age or disease. The algorithm is shown to be affected by age and most certainly will be affected by contrast and size changes in neurodegenerative disorders.

      The authors certainly know their field and are a driving force in open 7T research of the basal ganglia.

    3. Reviewer #1:

      The main criticisms of the work fall under categories largely centered on how the method is evaluated, rather than fundamental concerns with the method itself.

      Major concerns:

      1) Relative effectiveness. While a critical advancement of this method is the ability to segment many more regions than previous subcortical atlases, there are still many regions that overlap with existing segmentation tools. Knowing how the reliability of this new approach compares to previous automatic segmentation methods is crucial in being able to know how to trust the overall reliability of the method. The authors should make a direct benchmark against previous methods where they have overlap.

      2) Aging analysis. The analysis of the aging effects on the segmentations seemed oddly out of place. It wasn't clear if this is being used to vet the effectiveness of the algorithm (i.e., its ability to pick up on patterns of age-related changes) or the limitations of the algorithm (i.e., the segmentation effectiveness decreases in populations with lower across-voxel contrast). What exactly is the goal with this analysis? Also, why is it limited to only a subset of the regions output from the algorithm?

      3) Clarity of the algorithm. Because of the difficulty of the parcellation problem, the algorithm being used is quite complex. The authors do a good job showing the output of each stage of the process (Figures 7 & 8), but it would substantially help general readers to have a schematic of the logic of the algorithm itself.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy Verstynen (Carnegie Mellon University) served as the Reviewing Editor.

      Summary:

      In this study, Bazin and colleagues propose a novel segmentation algorithm for parcelling subcortical regions of the human brain that was developed from multiple MRI measures derived from the M2RAGEME sequence acquired on a 7T MRI system. The key advancement of this approach is a reliable segmentation of more subcortical areas (17 regions) in native space than what is possible with currently available methods. The authors validate their algorithm by comparing against age-related measures.

      This manuscript was reviewed by three experts in the field, who found that this method has strong potential to be a new "workhorse" tool in human neuroimaging that could substantially advance our ability to measure brain structures that are largely overlooked due to problems with segmentation. The main criticisms of the work are largely centered on how the method is evaluated & implemented, rather than fundamental concerns with the validity of the method itself.

    1. Reviewer #3:

      This paper presents a neural network based approach to predict the retinotopic organization of the human visual cortex from structural MRI data. The authors are promoting the use of non-Euclidean/geometric deep learning methods for this problem. They apply their technique to the HCP data and show some interesting results, which they claim demonstrates that functional organization in the visual system can be predicted at the individual level. For me, the paper has several substantial and important flaws.

      First, one of the most important contributions of the paper is the promotion of geometric deep learning. To me, the value of this framework has not been demonstrated with the experiments. In order to assess the additional boost afforded by geometric techniques, one would need to establish a baseline with a Euclidean model. Without this comparison, it is impossible to evaluate the value of this innovation.

      Second, in general, I did not find the quality of the individual-level predictions and the presented quantitative results convincing or impressive. In Figure 3, for example, I'd like to see the underlying sulcal geometry (of each subject) to assess the value of the presented "individualized" predictions. Also, the quality of the predictions, as the authors acknowledge, is significantly reduced in large parts of the cortex, including higher order areas. Importantly, though, it is not clear how much of the individual variability is truly captured in these predictions. For example, the error maps in Figure 6 for the "shuffled" and "constant" cases look very similar to the actual error maps. And quantitatively, the overall error values are very close for these cases. This suggests that the predicted retinotopic maps are not much better than a simple group average retinotopic map. One way to counter this concern would be to conduct a fingerprinting/identifiability experiment and demonstrate that the predicted maps are much closer to the observed/measured/estimated (ground truth) maps for the same individual than other individuals. Without such an analysis, it is impossible to assess how much of individual variation is captured.

      The proposed smooth L1 loss was not properly justified and seems inappropriate. The threshold of 1 seems arbitrary. In fact, the cyclical nature of polar angle should require a cyclical loss function. However, this is a minor concern.

      The need for dropout was not also demonstrated. Was there a concern of overfitting? Showing learning curves (for training and validation data) would help with that.

      Choosing the best model based on validation loss can be improved with a "deep ensemble" strategy.

      In the shuffling procedure, spatial correlation structure seems to have been destroyed. A better approach would be to randomly deform/rotate the structural image.

      Setting the structural data to zero at input and assessing test time performance makes no sense and provides no real value.

      I suggest that authors make their code available during peer review too. Otherwise, it is impossible to assess the reproducibility of their work.

      Finally, I believe 10 is too small for the test dataset. A widely accepted convention is to use at least 10% of the total dataset for testing. I would recommend using 20 or 30 subjects for testing.

    2. Reviewer #2:

      The authors use deep learning to map brain anatomy (cortical curvature and myelination) to retinotopic maps (eccentricity and polar angle) in individual subjects.

      My overall assessment of this work is that, although the idea is neat, the execution seems a bit rushed and lacks somewhat in depth of analysis.

      More specifically:

      1) This is my main concern: The evaluation of the method's ability to find fine-grained individual differences is somewhat anecdotal and not strongly backed by rigorous analyses.

      -The idiosyncratic differences shown in Fig4a are intriguing but they could also simply be explained by gross differences in the gyral patterns of these subjects.

      -The differences between the predictions of different subjects is much lower than the within-subject prediction errors.

      -The authors should make these evaluations more quantitative. For example, by delineating several visual areas in the empirical datasets and predicted maps (in a blinded manner) and checking to see if the sizes of the different visual areas are well predicted at an individual level. This could even be built up in the model as a classifier for different visual areas.

      -Using shuffled features as some sort of null is not appropriate in my opinion, as that breaks the statistics of the input. In fact, I am amazed that it has any predictive power at all, which it clearly does seeing that the prediction errors are similar to the empirical data (Fig 6). Why is that? Is it the case e.g. that the model learns the relation between where the edges of the visual areas mask is and the retinotopy map? What happens if you give the model a mask as input that is completely different (e.g. arbitrarily expanded or contracted). My guess is that the predictions will be vastly different and distorted.

      2) It is really unclear what the approach achieves beyond finding the border between primary regions V1,V2, and V3.

      -The authors should consider delineating more areas in the empirical data and showing that their predictions cover the full 0-360 and 0-12deg range in both dimensions. This analysis would greatly inform the individual variations mentioned above.

      -One interesting suggestion by the authors is that dorsal areas in the IPS actually have bad empirical retinotopy data (indeed these areas might need specialised tasks, e.g. involving attentional components, i.e. attending to parts of the visual field [see Sereno et al.]). In fact the empirical data seem to predict that these regions cover a different hemifield in the shown test subjects, which is not what is expected. It would be interesting to see if the model proposed here does indeed predict, e.g. polar angle reversals in IPS1,2,3.. (I can see a hint of it in Fig3). To me, even without empirical data to compare to, this would be a strong suggestion that the authors may be capturing some genuine structure-function relations.

      3) Some discussion around the modelling/quantification is lacking:

      -Errors in the polar angles are really high (~30deg even in V1).

      -Related to a sub-point in comment (1): why does shuffling work? Can the authors show the actual predictions of the shuffled data (as opposed to the errors) - do they look like retinotopy maps?

      -Do we need deep learning? Previous work has shown simple relations between V1,V2,V3 and the geometry of the brain. Does this model actually capture more fine-grained features?

      -I would have set it up as a regression against x,y coords in the visual field rather than polar coords (which have obvious wrap-around problems). This will avoid the use of tricks like rotating the visual field before training, as the authors did.

      -The obvious deep-learning question: learning such a highly parameterised model based on 180*2 hemispheres sound hard. What evidence is there that this is not overfitting?

      -The authors mention in the methods that the 3D coordinates were also used as features, but in their Fig2 It looks like the features are only curvature+myelin: which is it? Are 3D coords used as explicit features?

      4) Show the data:

      -It would be good to see the features going into these predictions and the relationship with the targets. Maybe even scatter plots of curvature/myelin vs polar angle?

      -Subjects shown (test set) have very noisy maps outside the early visual cortex. Where are they in the subject-distribution of variance explained? (Benson J Vis 2018).

    3. Reviewer #1:

      The manuscript by Ribeiro, Bollmann and Puckett uses machine-learning to predict, across individuals, the retinotopic mapping from the cortical myeline and curvature map. The authors use a sophisticated method (convolutional network on the graph, called here geometric deep learning) and show appealing predicted maps of individual retinotopy in V1. While the work is interesting, the quality of the result is disappointing and the positioning to the literature imprecise.

      The authors claim that their model is "able to predict retinotopic organization far beyond early visual cortex, throughout the visual hierarchy". However the figures do not seem to support this claim: the qualitative figures do not show a clear structure in the higher-level regions.

      Figure 1 is appealing, however it should be compared to a simple average of all retinotopic maps. Likewise, the quantitative results in supplementary table 2 do not come with a comparison to the mean predictor (as with an R2 score), and it is not possible to judge whether these numbers are a good performance or not.

      Rather, figure 6 shows that the models trained on shuffled and constant data perform qualitatively and quantitatively well. The proposed model does perform slightly better, but the statistical and practical significance of this improvement is unclear. The manuscript makes no clear attempt at judging the statistical significance, and the small number of participants in the test set (10), makes it unlikely that significance would be attained. It would be beneficial to perform a complementary analysis on a larger cohort, for instance using the 3T HCP data, at the cost of lower-quality data.

      There have been many prior works that have shown the ability to predict functional organization from other mapping information. In this respect, the positioning of the present manuscript with regards to the literature is very unclear. The manuscript does acknowledge some prior work, including work using template warping, but claims that they have not "been able to capture the detailed idiosyncrasies seen in the actual measured maps of those individuals". However, no precise argument is brought forward: no quantitative measure can be compared with the prior publication, no comparison is performed. Also, individual task functional topography has been inferred from other information such as anatomical connectivity [Saygin 2012], resting-state activity [Tavor 2016], or movie watching [Eickenberg 2017]. A discussion of the relative accuracy, or pros and cons would have been interesting here.

      With this in mind, the title feel much too general: "Predicting brain function from anatomy using geometric deep learning"

      As a minor comment: controlling for the twin structure could be done in a more powerful way by isolating siblings in each of the train, validation, and test set so that there is one pair separated across sets.

      [Saygin 2012] Saygin, Zeynep M., et al. "Anatomical connectivity patterns predict face selectivity in the fusiform gyrus." Nature neuroscience (2012)

      [Tavor 2016] Tavor, I., et al. "Task-free MRI predicts individual differences in brain activity during task performance." Science 2016

      [Eickenberg 2017] Eickenberg, Michael, et al. "Seeing it all: Convolutional network layers map the function of the human visual system." NeuroImage (2017)

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Gaël Varoquaux (INRIA) served as the Reviewing Editor.

      Summary:

      The reviewers all expressed interest in the research agenda as well as the methods. However, it was felt that the results did not demonstrate a clear and sufficient improvement with regards to prior art. On the methodological side, the benefit of the deep-learning formulation was not clearly revealed. On the neuroscience side, the evidence that the method captures fine inter-individual differences was felt insufficient.

    1. Author Response

      Summary:

      This is an interesting and creative paper implicating a differential mechanism of intracellular trafficking and subsequent signaling that is triggered by different dynorphins binding to the kappa opioid receptor. In principle, if the authors could explain the molecular basis for this phenomenon, the story would be of tremendous impact in the fields of opioid receptor signaling and trafficking. The reviewers noted a number of concerns that would require significant further work and clarification to support the authors' conclusions.

      We are very happy that you and the reviewers found that the study could be of tremendous impact and describe the paper as “interesting and creative”, “novel and intriguing”, “fascinating and novel”, and feel that the study was “nicely conducted”. We appreciate the comments of the reviewers, and we are confident that we can address the comments as below.

      Reviewer #1:

      General assessment: In this manuscript the authors have assessed the different endocytic routes of KOR when activated by DynA or DynB. These are nicely conducted experiments that show interesting results, however the authors completely obviate the connection with their own work that highlights the different degradation mechanisms of these two peptides. As it stands it does not add to the field, and lacks a mechanistic explanation that could be explored given the authors’ expertise in these systems.

      We thank the reviewer for the positive comments. We are happy that the reviewer felt that the experiments are nicely conducted, and that the results are interesting. However, we respectfully but strongly disagree with the comments that our study does not add to the field.

      First, considering the extended and severe opioid epidemic, understanding the many ways in which the opioid peptide/receptor system is modulated is of high priority. Endogenous opioid peptides are highly relevant neuromodulators about which we know even less than opioid drugs. Why there are over 20 different endogenous opioid peptides but only three receptors, has been a question that has been unanswered for decades. We show that two highly related endogenous opioids, which initially activate KOR to similar levels but subsequently diverge in trafficking and endosomal signaling. We feel that this is a clear advance in the field of opioids and GPCRs.

      Second, the idea that location-biased signaling can lead to different consequences for the same agonist is still a relatively new idea, and clearly a very important area of continuing research. Even for well-studied systems like the adrenergic receptor system, we know very little about the mechanisms or the relevance of differential signaling. Demonstrating that endogenous opioids take advantage of location bias to generate distinct signaling consequences is a clear indication that such differential trafficking and signaling is physiologically relevant. Considering that opioid receptor trafficking has been implicated in opioid signaling and tolerance (although again, the mechanisms are debated), showing that different endogenous opioids can regulate localization and trafficking of the same receptor is a key advance.

      Numbered summary of substantive concerns:

      1) The major conclusion of the study is that after endocytosis, DynA preferentially sorts KOR into the degradative pathway, while DynB sorts KOR into the recycling pathway and this has consequences in the duration of the active state of the receptor and its ability to signal. It is surprising that the authors do not investigate the connection between these results and previously published work that shows differences in the degradation of DynB vs DynA within endosomes. Indeed, the authors have previously shown that: i) ECE2 hydrolyzes DynB and not DynA (Mzhavia et al JBC 2003), ii) overexpression of ECE2 increases the rate of mu-opioid receptor recycling upon DynB stimulation (Gupta et al BJP 2015) and iii) inhibition of ECE2 decreases mu-opioid receptor recycling (Gupta et al BJP 2015). Considering this previous work, it is totally expected that the two ligands show distinct post-endocytic trafficking of KOR.

      The reviewer cites data that the surface recovery rates of a different GPCR (MOR) is regulated by ECE2, and that ECE2 differentially processes Dyn A and B, to argue that it is expected that the two ligands will direct KOR to different subcellular localizations. While our results certainly could be one logical outcome of previous data, we disagree that it is a foregone conclusion.

      Specific to the reviewer’s assessment of our previous work, we were never able to test DynA previously because traditional assays did not have the sensitivity to resolve DynA-mediated recycling or trafficking. This limitation precluded the key comparison, between DynA and DynB, necessary for addressing differences between these two physiologically relevant opioid peptides. Here we use advanced high-resolution imaging experiments to carefully address how DynA and DynB diverge in directing KOR trafficking and signaling.

      More generally, we have known for over a decade that the rates of GPCR recycling can be regulated by signaling pathways without changing sorting, endosomal localization, or fates (e.g., PMID: 16604070, PMID: 27226565, PMID: 25801029, PMID: 24003153). Further, many recent studies have highlighted that the details of how GPCRs are regulated and how that affects their function diverges considerably between different receptors, even though the gross signaling characteristics are nearly identical. Therefore, it is becoming increasingly clear that we cannot apply our understanding of one GPCR too broadly to argue that we expect all GPCRs are regulated in the same manner.

      We also appreciate the reviewer’s interest in the question of whether and how ECE2 regulates location-specific signaling, and we agree that it will be very exciting to study. This is particularly important since ECE2 is not ubiquitously expressed in every cell type in the brain and thus cells with no/low ECE2 expression should exhibit different profiles for recycling or location-based signaling by DynA and DynB compared to cells expressing moderate/high levels of ECE2.

      Nevertheless, we disagree with the reviewer’s assumption that there is an obvious correlation. ECE2 sensitivity for opioid peptides was estimated using purified peptides and enzymes, and there is no evidence that the selectivity persists in vivo. In fact, most of the previous studies measured simply the sensitivity to overexpressed ECE2. Even within these constraints, the correlation is not obvious or direct. For example, we have found that BAM22 and BAM18, two peptides that activate opioid receptors, show much lower recycling of KOR than DynB (Gupta, Gomes and Devi, INRC 2019, manuscript in preparation) even though all three are ECE2 substrates (PMID: 12560336). Therefore, it is unlikely that ECE substrate sensitivity is the only difference between these peptides.

      We will be happy to provide some insight on the question of ECE sensitivity and discuss possibilities, but we feel that a thorough characterization of how ECE regulates location-specific signaling, while interesting, is outside the scope of our study that demonstrates a physiological difference between two different endogenous opioids in neurons.

      Most importantly, we respectfully feel that following up and demonstrating a logical conclusion is a strength, and should not be viewed as a negative. Clearly differentiating and establishing predicted outcomes is a critical part of advancing biology. Acknowledging and supporting this is especially important in these times where there is a clear effort and an opportunity to make academic publishing open and fair.

      2) Similarly, the differences in ECE2 sensitivity can also explain the Nb39 results, with KOR activated by the ligand that is not hydrolysable (DynA) being able to remain in the active state (and signal) for longer than when activated with the hydrolyzable ligand (DynB).

      As described in the response to #1, we agree that it is possible that the trafficking and signaling differences we see could correlate with ECE2 substrate sensitivity. Again, we feel that the focus of the manuscript is on signaling differences between endogenous opioids, and not on how ECE inhibition regulates location-specific signaling.

      3) A simple experiment to address this obvious connection is to use an ECE2 inhibitor. One would expect that in the presence of this inhibitor DynB-activated KOR is retained intracellularly and remains active for longer.

      We agree that ECE inhibitors are important tools to manipulate recycling. As mentioned above, we can provide some insight towards the correlation of ECE sensitivity and trafficking and discuss possibilities, but an in-depth characterization of how ECE proteases regulate GPCR location-specific signaling is not the focus of our study.

      4) The authors state "this is the first example of different physiological agonists driving spatial localization and trafficking of a GPCR" in light of the above comment, previous work from Bunnett et al have shown how peptides with different endocytic enzyme sensitivity can indeed, localize GPCRs (e.g somatostatin receptor) in different compartments and elicit distinct signals (Padilla et al J Cell Biol 2007; Roosterman et al PNAS 2007; Zhao et al JBC 2013 to name a few).

      We were quite taken aback by this comment. We take previously published work very seriously, and we try to be as fair as possible when we describe them. We will be happy to modify the sentence to match the current literature.

      We carefully searched through the papers the reviewer pointed out for an example where two physiological agonists drive different spatial localization and signaling of the same receptor. But we could not find one. Padilla et al., 2007, show that the recycling of CLR, activated by the ECE1-sensitive CGRP, is sensitive to ECE inhibition, but that the recycling of angiotensin receptor or bradykinin receptor, whose ligands are not sensitive to ECE, is not. Similarly, Roosterman et al., 2007, focus on how NK1 receptor recycling is sensitive to ECE1 inhibition. To the best of our knowledge, neither paper shows that spatial localization or location-biased signaling of a given GPCR is regulated differentially by two different endogenous agonists.

      The closest experiment we could find are in Fig 2, titled “Agonists induce endocytosis of SSTR2A in myenteric neurons” in Zhao et al JBC 2013. This figure shows that, when cells exposed to SST14 or the pro-peptide SST28 for 1 hour at 4˚C are followed at 37˚C and fixed, SSTR labeling at the plasma membrane and cytoplasm is similar at 30 min, but diverges after that. As far as we could figure out, receptor recycling, the precise endosomal distribution, or signaling were not tested in this manuscript.

      Therefore, we respectfully submit that the manuscripts the reviewer points to, which describe how the recycling of a receptor that binds an ECE-sensitive peptide is sensitive to ECE inhibition, should not be conflated with our careful analysis of whether different endogenous opioids can drive different spatial localization and signaling fates of the same opioid receptor.

      We would, however, be be happy to modify the sentence to state the impact of our work more precisely and to discuss the details on SSTR trafficking in the revised manuscript. If the reviewer would point us to specific examples that show that subcellular localization and spatially restricted signaling of a given GPCR is regulated differentially by two different endogenous agonists, we will be more than happy to include a discussion of that work.

      5) Support for endosomal signalling falls a bit short. For example, if indeed KOR signals from endosomes, the authors should use an inhibitor of receptor internalization and assess Nb39 recruitment and KOR signalling.

      We agree this experiment will support the conclusion, and we will be happy to provide this data.

      Reviewer #2:

      This manuscript demonstrates that two highly similar endogenous opioid agonists can give distinct opioid receptor trafficking and signaling fates. There are two key observations that are novel and intriguing: 1) two opioid peptides that are derived from the same precursor can distinctly modulate Kappa Opioid receptor (KOR) trafficking into two distinct pathways; Dynorphin A causes KOR trafficking to the late endosomes/lysosomes pathway whereas Dynorphin B promotes rapid recycling; 2) Dynorphin A activates Gi proteins on the late endosomes/lysosomes which leads to Gi-mediated cAMP inhibition from these compartments.

      The idea that GPCRs can activate G proteins at the late endosome/lysosomal compartments is fascinating and novel, however, the data presented here does not fully support their model that Dynorphin A activated Gi proteins on the late endosomes/lysosomes.

      We are very happy that the reviewer found our study fascinating and novel. We thank the reviewer for the comments, and we can address them as follows.

      Main questions:

      1) There is a mismatch with the timing of receptor colocalization experiment (Fig 3B and C, 20 min Dynorphin A/B treatment) and the cAMP assay (Fig 3H, 5 min treatment). There needs to be direct evidence that KOR is localized on the late endosomes/lysosomes at 5 minutes post agonist stimulation, i.e. at the time that cAMP levels are measured. It is important to demonstrate that the sustained signaling inhibition by DynA comes from the late endosomes/lysosomes as opposed to early endosomes. A colocalization experiment with 5 min DynA stimulation followed by a 25min washout would be necessary to support their model.

      We agree that this is a good point, and we will be happy to perform the experiment suggested. In addition, we can also provide live cell imaging data, where we simultaneously localize the nanobody that recognizes active KOR with a lysosomal marker and KOR, to show that they colocalize after DynA treatment.

      2) What percentage of KORs are proteolytically degraded in the late endosomes/lysosomes at 20 min DynA stimulation?

      At 20 min, although some of the receptors reach the lysosome, it is unlikely that there is significant degradation. This is supported by our blots that show similar levels of KOR expression at 30 minutes, and loss of receptor levels at 2 hours. This is also roughly consistent with previous studies on GPCR degradation. We will include these details in the revised manuscript.

      3) Given that KOR trafficking to the late endosomes and lysosomes is mediate by ubiquitination (as shown here PMID: 18212250), does mutation of these ubiquitination sites (3 lysine residues on KOR C-terminus) block its trafficking and the sustained signaling from the late endosomes/lysosomes?

      The reviewer raises an interesting topic that has been a subject of considerable debate in the GPCR trafficking field. The mutation of the three lysine residues on the KOR C-terminus cause more residual KOR levels after 4 hours of Dyn A, suggesting that degradation/downregulation of KOR is reduced in these mutants, even though internalization is comparable. For some opioid receptors, although ubiquitination might be required for involution and entry into the intralumenal vesicles, lysosomal localization is arguably independent of ubiquitination. Ubiquitination and/or lysine residues that interact with Ub-transferases could also affect downstream signaling, especially in the endosomes, by some GPCRs. Therefore, we feel that interpretation of results from the lysine mutant receptors will not be straightforward. Nevertheless, we appreciate that this is an interesting point, and we will address this in the revised manuscript.

      4) Is there any evidence for Gi protein localization on the late endosome/lysosomes?

      This is another interesting point raised by the reviewer, as the majority of endosomal signaling data rely on Gs-coupled or Gq-coupled receptors. However, Gi-coupled GPCRs, such as the cannabinoid receptor or the related mu opioid receptor can exist in the active conformation in endosomes (e.g, PMID: 18267983, PMID: 29754753), and internalization is required for sustained cAMP inhibition for the Class B S1P receptor (PMID: 24638168). These provide indirect evidence that Gi proteins might be present and active on endosomes.

      Unfortunately, directly testing whether Gi proteins are active on endosomes has been technically challenging, unlike with Gs proteins. The main limitation has been the lack of conformation-sensors for Gi proteins. We will be happy to discuss these points in the revised manuscript.

      5) Additional functional readouts would also be helpful to support their model of Gi-mediated inhibition of cAMP response from late endosomes/lysosomes and not the plasma membrane or early endosomes. Perhaps mTOR activation (as authors have suggested in their discussion) could be used as a read out to show differences between DynA and B-mediated signaling?

      We will be happy to test endosome-based mTOR signaling downstream of KOR to see if there is a difference between DynA and B. Since our data already suggest that the main impact might be on cAMP signaling, we will also discuss the implications to cAMP signaling.

      Reviewer #3:

      This is an interesting idea and creative paper implicating a differential mechanism of intracellular trafficking and subsequently signaling that is triggered by different dynorphins binding to the kappa opioid receptor. However, there are some questions for the authors:

      We thank the reviewer for the comments that the paper is interesting and creative, and for the critique. We are confident that we can fully address them as follows.

      1) My reading is that some dynorphins are extremely rapidly degraded in serum and with these experiments performed in 15% Horse/FCS there is concern that some of the differential results could be explained by differential degradation. One hypothesis could be a differential frequency of receptor activation over time of a fast recycling receptor population. Can the authors convince me that this difference in trafficking and subsequent signaling is an intrinsic property of the peptide and not an exhaustion of peptide (would be DynB) over the 30min assay?

      We agree this is an important point, and we apologize for not specifically addressing this point. For the trafficking experiments, we directly compared results from experiments done with and without protease inhibitors. We saw no difference between the two conditions, possibly because we were using short time points, high enough concentrations, and dialyzed serum. We agree that it will be important to include these data in the revised manuscript. The signaling experiments, which required longer incubations, were performed in the presence of protease inhibitors, consistent with previous studies.

      2) In Fig 2D, 2G and 2J at what time after addition peptides was this data obtained?

      For measuring individual recycling events (2D and G), cells were treated with agonist for 5 minutes at 37°C. Receptor clustering was visualized using TIRF microscopy, and then a recycling movie was recorded at 10 Hz for 1 minute in TIRF. For 2J, we measured 2 time points, 30 min and 120 min after agonist addition. We apologize for not stating these details in the figure, and will be happy to do so.

      3) In Fig 2F the divergence of internalized receptor only occurs from time 20-30 mins which was difficult for me to understand since DynA should result in lost surface receptor number. What confuses me is that in Fig2H the initial recycling induced by DynA17 is fast and slows down so I am wondering if a second hit is needed which feeds into my concern about peptide degradation in the media. Since released peptide would be pulsatile maybe in vivo DynA17 could act like DynB?

      We realize that a better explanation is needed for the recycling experiment performed in 2F. The cells were imaged for a period of 2 minutes to collect baseline SpH fluorescence, which corresponds to the steady-state amount of KOR on the cell surface. After this period, cells were imaged for 15 min after DynA or DynB was added. In this period, because internalization is the predominant factor affecting surface levels, we see a loss in fluorescence as the receptors are internalized and SpH is quenched in the relatively acidic compartments. Because KOR internalization rates are not dramatically different between DynA and B, we do not expect the fluorescence traces to be different. The agonist was then washed out at this time (t=17), and cells were imaged in media containing antagonist. Because there is very little agonist-induced internalization after this point, the fluorescence change depends predominantly on reappearance of receptors via recycling. Therefore, if the main difference between DynA and DynB is in KOR recycling, we expect to see a divergence only in the late points of the trace.

      We thank the reviewer for carefully viewing the traces in 2F and 2H. We understand the interpretation that there might be fast and slow components to DynA induced recycling. While it certainly is possible, we are not comfortable making a strong conclusion on that, based on the sensitivity of the assays used and the variability between cells.

      As mentioned in point#1, it is unlikely, however that this divergence in recycling is due to significant degradation of DynA. Nevertheless, it is an important point to discuss in light of the new data we provide, and we will be happy to explain this in detail.

      4) The assays seem to be done with a single concentration of peptide - 1µM. Do the authors have data to show that at lower (or higher) concentrations than 1µM result in the same trafficking patterns, albeit to a lesser or greater extent. Also, for the cAMP inhibition what concentration gives max inhibition? For a binding affinity of 0.01nM in the cells and with high expression, the 1micromolar concentration seems high.

      We used the 1µM dose based on careful dose-response measurements for cAMP signaling. Part of the dose-response data has been published (PMID: 32393639). We will be happy to provide the extended data, and also provide a dose-response for trafficking. It is possible that the dose is what helps us mitigate potential degradation of the peptides.

      5) In Fig 2H 100% of receptors appear to be recycled after DynB however 25% of kappa colocalize in Rab7 in 3C so do these Rb 7 co-localized receptors recycle?

      It is certainly possible that some receptors from Rab7 endosomes can recycle. Current views are more aligned with overlapping populations of endosomes as labelled by biochemical markers, especially by trafficking components like Rabs. Therefore, our characterization likely describes a spread of receptor distributions across overlapping compartments. Moreover, the recycling of receptors in Fig 2H was quantitated using ELISA over 2 hours after agonist washout. The endosome colocalization in 3C was measured after 20 min of agonist treatment. As the reviewer would agree, it is difficult to directly compare data from these two experiments and draw definite conclusions.

      That said, we certainly did not mean to imply that all of DynB-activated KOR is recycled and that DynA-activated KOR is degraded. Current data on trafficking support a more dynamic and flexible model for receptor sorting, where a fraction of the receptors is recycled while a fraction is degraded from each endosome. Our results are consistent with this model. We feel that, because the receptor populations undergo many rounds of rapid iterative sorting as the endosome matures, a larger fraction is recycled back to the surface in the case of DynB at a steady state, while a larger fraction stays behind in the case of DynA. Importantly, this difference in steady state localization is enough to cause a difference in endosomal receptor activation and cAMP signaling, suggesting that small differences in steady state localization can cause relevant changes in signaling.

      We apologize for not making this important point clearer, and we will be happy to clarify this in the revised manuscript.

      6) Could some of the signaling differences be explained by continued activation of receptors as a consequence of peptide processing in the endocytosed vesicle as opposed to different vesicles? I guess the continued signaling could also direct subsequent trafficking and this could be tested with a membrane permeable antagonist.

      We thank the reviewer for raising this point. As we described in our response to reviewer#1, peptide processing by ECE proteases could contribute to the differences, but the data suggest that this is not a direct correlation or the main explanation for the differences we observe. We will be happy to provide data to address this aspect.

      7) The impact statement "Co-released dynorphins, which signal similarly from the cell surface, can differentially localize GPCRs to specific subcellular compartments, and cause divergent receptor fates and distinct spatiotemporal patterns of signaling" could be misconstrued. If one of the pathways is dominant and blocks the other, then co-release may only have one signaling outcome. Have any dynorphin mix experiments been conducted? What might be anticipated?

      We agree that the question of whether one peptide is dominant is an interesting one in the context of the paper, and we thank the reviewer for pointing this out. Assay sensitivity has remained a long-standing problem when trying these mixed experiments in the endogenous opioid system. We will be happy to try a dynorphin mix experiment with our state-of-the-art imaging assays. We will also revise the sentence to reduce ambiguity.

      8) It looks like details for the ELISA measurements in the methods section was missing. Were the ELISA measurements done with untagged KOR or SpH-KOR? One might worry about the effects of the N-terminal SpH tag on KOR trafficking, and it would be nice if the fluorescence SpH-KOR data were supported by ELISA for untagged KOR. (At least some of the data is immunostaining of FLAG-KOR, which probably introduces only minimal perturbation)

      We apologize for not including the details of the ELISA experiments. The ELISA experiments were performed essentially as described previously (PMID: 24990314; PMID: 24847082). Briefly, CHO-KOR cells or SpH-KOR cells (2x105) were seeded in complete growth media into each well of a 24 well poly-lysine coated plate. The following day cells were washed once in PBS, placed on ice and incubated with 1:1000 dilution (PBS containing 1% BSA) of either anti-Flag M1 mouse monoclonal antibody (for CHO-KOR cells), or anti-GFP rabbit polyclonal antibody (for SpH-KOR) for 1h at 4˚C. Cells were then gently washed twice with PBS and treated without or with 1mM peptides in either F-12 medium (for CHO-KOR cells) or F-12K(for SpH-KOR) containing protease inhibitor cocktail (Sigma) for 30 min at 37oC to induce receptor internalization. Cells were then washed and incubated in media without peptides for different time periods (5-120 min). Cells were chilled to 4˚C and briefly fixed with paraformaldehyde for 3 min. Cells were then incubated with 1:1000 dilution of either anti-mouse or anti-rabbit HRP-coupled secondary antibody. The substrate o-phenylenediamine (5 mg/10 ml in 0.15 M citrate buffer, pH 5, containing 20 ul of H2O2 ) was added to each well (100 ul) and reaction stopped after 10 min by addition of 50 ul 1N HCl. Absorbance at 490 nm was measured with a Bio-Rad ELISA reader. We will definitely correct this oversight and include these details in the revised manuscript.

      The reviewer’s concern about the tag is a valid one, and one that we are very careful about. We have used three different tags to label the receptor, all on the N-terminus to reduce potential interference. The ELISA measurements were done using FLAG-tagged and HA-tagged KOR. The trafficking experiments were done with FLAG-tagged and SpH-tagged KOR. The results are consistent between all these experiments, suggesting that the difference we observe are not due to tagging. We will clarify these details in the revised manuscript.

      9) Dynorphin A17 is a very sticky peptide and difficult to wash out. Since we don't have a dose response it may require only very doses to have full activation for cAMP inhibition. It would be nice to be able to discount this as a potential for prolonged activation after washout.

      The reviewer brings up a good point. DynA is less sticky in media or solutions containing 150mM NaCl, but we realize that this is a concern that should be addressed. In our case, we picked the doses we used based on dose-response curves that we have performed for cAMP signaling for these peptides. We realize that it is important to explain the choice of our concentrations better, and we will be happy to do so in the revised manuscript.

    2. Reviewer #3:

      This is an interesting idea and creative paper implicating a differential mechanism of intracellular trafficking and subsequently signaling that is triggered by different dynorphins binding to the kappa opioid receptor. However, there are some questions for the authors:

      1) My reading is that some dynorphins are extremely rapidly degraded in serum and with these experiments performed in 15% Horse/FCS there is concern that some of the differential results could be explained by differential degradation. One hypothesis could be a differential frequency of receptor activation over time of a fast recycling receptor population. Can the authors convince me that this difference in trafficking and subsequent signaling is an intrinsic property of the peptide and not an exhaustion of peptide (would be DynB) over the 30min assay?

      2) In Fig 2D, 2G and 2J at what time after addition peptides was this data obtained?

      3) In Fig 2F the divergence of internalized receptor only occurs from time 20-30 mins which was difficult for me to understand since DynA should result in lost surface receptor number. What confuses me is that in Fig2H the initial recycling induced by DynA17 is fast and slows down so I am wondering if a second hit is needed which feeds into my concern about peptide degradation in the media. Since released peptide would be pulsatile maybe in vivo DynA17 could act like DynB?

      4) The assays seem to be done with a single concentration of peptide - 1µM. Do the authors have data to show that at lower (or higher) concentrations than 1µM result in the same trafficking patterns, albeit to a lesser or greater extent. Also, for the cAMP inhibition what concentration gives max inhibition? For a binding affinity of 0.01nM in the cells and with high expression, the 1micromolar concentration seems high.

      5) In Fig 2H 100% of receptors appear to be recycled after DynB however 25% of kappa colocalize in Rab7 in 3C so do these Rb 7 co-localized receptors recycle?

      6) Could some of the signaling differences be explained by continued activation of receptors as a consequence of peptide processing in the endocytosed vesicle as opposed to different vesicles? I guess the continued signaling could also direct subsequent trafficking and this could be tested with a membrane permeable antagonist.

      7) The impact statement "Co-released dynorphins, which signal similarly from the cell surface, can differentially localize GPCRs to specific subcellular compartments, and cause divergent receptor fates and distinct spatiotemporal patterns of signaling" could be misconstrued. If one of the pathways is dominant and blocks the other, then co-release may only have one signaling outcome. Have any dynorphin mix experiments been conducted? What might be anticipated?

      8) It looks like details for the ELISA measurements in the methods section was missing. Were the ELISA measurements done with untagged KOR or SpH-KOR? One might worry about the effects of the N-terminal SpH tag on KOR trafficking, and it would be nice if the fluorescence SpH-KOR data were supported by ELISA for untagged KOR. (At least some of the data is immunostaining of FLAG-KOR, which probably introduces only minimal perturbation)

      9) Dynorphin A17 is a very sticky peptide and difficult to wash out. Since we don't have a dose response it may require only very doses to have full activation for cAMP inhibition. It would be nice to be able to discount this as a potential for prolonged activation after washout.

    3. Reviewer #2:

      This manuscript demonstrates that two highly similar endogenous opioid agonists can give distinct opioid receptor trafficking and signaling fates. There are two key observations that are novel and intriguing: 1) two opioid peptides that are derived from the same precursor can distinctly modulate Kappa Opioid receptor (KOR) trafficking into two distinct pathways; Dynorphin A causes KOR trafficking to the late endosomes/lysosomes pathway whereas Dynorphin B promotes rapid recycling; 2) Dynorphin A activates Gi proteins on the late endosomes/lysosomes which leads to Gi-mediated cAMP inhibition from these compartments.

      The idea that GPCRs can activate G proteins at the late endosome/lysosomal compartments is fascinating and novel, however, the data presented here does not fully support their model that Dynorphin A activated Gi proteins on the late endosomes/lysosomes.

      Main questions:

      1) There is a mismatch with the timing of receptor colocalization experiment (Fig 3B and C, 20 min Dynorphin A/B treatment) and the cAMP assay (Fig 3H, 5 min treatment). There needs to be direct evidence that KOR is localized on the late endosomes/lysosomes at 5 minutes post agonist stimulation, i.e. at the time that cAMP levels are measured. It is important to demonstrate that the sustained signaling inhibition by DynA comes from the late endosomes/lysosomes as opposed to early endosomes. A colocalization experiment with 5 min DynA stimulation followed by a 25min washout would be necessary to support their model.

      2) What percentage of KORs are proteolytically degraded in the late endosomes/lysosomes at 20 min DynA stimulation?

      3) Given that KOR trafficking to the late endosomes and lysosomes is mediate by ubiquitination (as shown here PMID: 18212250), does mutation of these ubiquitination sites (3 lysine residues on KOR C-terminus) block its trafficking and the sustained signaling from the late endosomes/lysosomes?

      4) Is there any evidence for Gi protein localization on the late endosome/lysosomes?

      5) Additional functional readouts would also be helpful to support their model of Gi-mediated inhibition of cAMP response from late endosomes/lysosomes and not the plasma membrane or early endosomes. Perhaps mTOR activation (as authors have suggested in their discussion) could be used as a read out to show differences between DynA and B-mediated signaling?

    4. Reviewer #1:

      General assessment:

      In this manuscript the authors have assessed the different endocytic routes of KOR when activated by DynA or DynB. These are nicely conducted experiments that show interesting results, however the authors completely obviate the connection with their own work that highlights the different degradation mechanisms of these two peptides. As it stands it does not add to the field, and lacks a mechanistic explanation that could be explored given the authors’ expertise in these systems.

      Numbered summary of substantive concerns:

      1) The major conclusion of the study is that after endocytosis, DynA preferentially sorts KOR into the degradative pathway, while DynB sorts KOR into the recycling pathway and this has consequences in the duration of the active state of the receptor and its ability to signal. It is surprising that the authors do not investigate the connection between these results and previously published work that shows differences in the degradation of DynB vs DynA within endosomes. Indeed, the authors have previously shown that: i) ECE2 hydrolyzes DynB and not DynA (Mzhavia et al JBC 2003), ii) overexpression of ECE2 increases the rate of mu-opioid receptor recycling upon DynB stimulation (Gupta et al BJP 2015) and iii) inhibition of ECE2 decreases mu-opioid receptor recycling (Gupta et al BJP 2015). Considering this previous work, it is totally expected that the two ligands show distinct post-endocytic trafficking of KOR.

      2) Similarly, the differences in ECE2 sensitivity can also explain the Nb39 results, with KOR activated by the ligand that is not hydrolysable (DynA) being able to remain in the active state (and signal) for longer than when activated with the hydrolyzable ligand (DynB).

      3) A simple experiment to address this obvious connection is to use an ECE2 inhibitor. One would expect that in the presence of this inhibitor DynB-activated KOR is retained intracellularly and remains active for longer.

      4) The authors state "this is the first example of different physiological agonists driving spatial localization and trafficking of a GPCR" in light of the above comment, previous work from Bunnett et al have shown how peptides with different endocytic enzyme sensitivity can indeed, localize GPCRs (e.g somatostatin receptor) in different compartments and elicit distinct signals (Padilla et al J Cell Biol 2007; Roosterman et al PNAS 2007; Zhao et al JBC 2013 to name a few).

      5) Support for endosomal signalling falls a bit short. For example, if indeed KOR signals from endosomes, the authors should use an inhibitor of receptor internalization and assess Nb39 recruitment and KOR signalling.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This is an interesting and creative paper implicating a differential mechanism of intracellular trafficking and subsequent signaling that is triggered by different dynorphins binding to the kappa opioid receptor. In principle, if the authors could explain the molecular basis for this phenomenon, the story would be of tremendous impact in the fields of opioid receptor signaling and trafficking. The reviewers noted a number of concerns that would require significant further work and clarification to support the authors' conclusions.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript is clearly written and the figures appropriate and informative. Some descriptions of data analyses are a little dense but reflect what would appear long hard efforts on the part of the authors to identify and control for possible sources of misinterpretation due to sensitivities of parameters in their fitness model. The authors efforts to retest interactions under non-competition conditions allay fears of most concerns that I would have. One problem though that I could not see explicitly addressed was that of potential effects of interactions between methotrexate and the other conditions and how this is controlled for. Specifically, I could be argued that the fact that a particular PPI is observed under a specific condition could have more to do with a synthetic effect of treatment of cells with a drug plus methotrexate. Is this controlled for and how? I raise this because in a chemical genetic screen for fitness it was shown that methotrexate is particularly promiscuous for drug-drug interactions (Hillenmeyer ME ,et al. Science 2008). I tried to think of how this works but couldn't come up with anything immediately. I'd appreciate if the authors would take a crack at resolving this issue. Otherwise I have no further concerns about the manuscript.

      We thank the reviewer for the kind comments. We agree with the reviewer’s point that methotrexate could be interacting with drugs or other perturbagens, similar to how the chosen nitrogen source, carbon source, or other growth conditions may interact with a drug. However, the methotrexate concentration is held constant across all conditions, as is the rest of the media components such as the nitrogen and carbon source (with the exception of the raffinose perturbation). Any interactions with methotrexate, or other media components, is undetectable without systematically varying all components for all stressors. Therefore, we use the typical experimental design of measuring molecular variation from a reference, holding invariant media components (such as methotrexate, glucose, or vitamins) fixed between conditions. This is a general practice, and we describe that every condition contains methotrexate on page 3, line 10.

      The library was grown under mild methotrexate selection in 9 environments for 12-18 generations in serial batch culture, diluting 1:8 every ~3 generations, with a bottleneck population size greater than 2 x 109 cells (Table S1).

      We also list the full details of each environment in Table S1.

      Reviewer #1 (Significance (Required)):

      Lui et al expand on previous work from the Levy group to explore a massive in vivo protein interactome in the yeast S. cerevisiae. They achieve this by performing screens cross 9 growth conditions, which, with replication, results in a total of 44 million measurements. Interpreting their results based on a fitness model for pooled growth under methotrexate selection, they make the key observation that there is a vastly expanded pool of protein-protein interactions (PPI) that are found under only one or two condition compared to a more limited set of PPI that are found under a broad set of conditions (mutable versus immutable interactors). The authors show that this dichotomy suggests some important features of proteins and their PPIs that raise important questions about functionality and evolution of PPIs. Among these are that mutable PPIs are enriched for cross-compartmental, high disorder and higher rates of evolution and subcellular localization of proteins to chromatin, suggesting roles in gene regulation that are associated with cellular responses to new conditions. At the same time these interactions are not enriched for changes in abundance. These results are in contrast to those of immutable PPIs, which seem to form a core background noise, more determined by changes in abundance than what the authors interpret must be post-translational processes that may drive, for instance, changes in subcellular localization resulting in appearance of PPIs under specific conditions. The authors are also able to address a couple of key issues about protein interactomes, including the controversial Party-date Hub hypothesis of Vidal, in which they could now affirm support for this hypothesis based on their results and notably negative correlation of PPIs to protein abundance for mutable PPIs. Finally, they also addressed the problem of predicting the upper limit of PPIs in yeast, showing the remarkable results that it may be no more than about 2 times the number of proteins expressed by yeast. Such an upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.

      This manuscript is a very important contribution to understanding dynamics of molecular networks in living cells and should be published with high priority.

      Reviewer 2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Report on Liu et al. "A large accessory protein interactome is rewired across environments"

      Liu et al. use a mDHFR-based, pooled barcode sequencing / competitive growth / mild methotrexate selection method to investigate changes of PPI abundance of 1.6 million protein pairs across different 9 growth conditions. Because most PPI screens aim to identify novel PPIs in standard growth conditions, the currently known yeast PPI network may be incomplete. The key concept is to define immutable" PPIs that are found in all conditions and "mutable" PPIs that are present in only some conditions.

      The assay identified 13764 PPIs across the 9 conditions, using optimized fitness cut offs. Steady PPI i.e. across all environments, were identified in membrane compartments and cell division. Processes associated with the chromosome, transcription, protein translation, RNA processing and ribosome regulation were found to change between conditions. Mutable PPIs are form modules as topological analyses reveals.

      Interestingly, a correlation on intrinsic disorder and PPI mutability was found and postulated as more flexible in the conformational context, while at the same time they are formed by less abundant proteins.

      I appreciate the trick to use homodimerization as an abundance proxy to predict interaction between heterodimers (of proteins that homodimerize). This "mass-action kinetics model" explains the strength of 230 out of 1212 tested heterodimers.

      A validation experiment of the glucose transporter network was performed and 90 "randomly chosen" PPIs that were present in the SD environment were tested in NaCl (osmotic stress) and Raffinose (low glucose) conditions through recording optical density growth trajectories. Hxt5 PPIs stayed similar in the tested conditions, supported by the current knowledge that Hxt5 is highly expressed in stationary phase and under salt stress. In Raffinose, Hxt7, previously reported to increase the mRNA expression, lost most PPIs indicating that other factors might influence Hxt7 PPIs.

      **Points for consideration:**

      *) A clear definition of mutable and immutable is missing, or could not be found e.g. at page 4 second paragraph.

      We thank the reviewer for pointing this out. We have now added better definition of mutable and immutable on line 19 page 4:

      We partitioned PPIs by the number of environments in which they were identified and defined PPIs at opposite ends of this spectrum as “mutable” PPIs (identified in only 1-3 environments) and “immutable” (identified in 8-9 environments).

      *) Approximately half of the PPIs have been identified in one environment. Many of those mutable PPIs were detected in the 16{degree sign}C condition. Is there an explanation for the predominance of this specific environment? What are these PPIs about?

      The reviewer is correct that ~40% of the PPIs identified in only one environment were found in the 16 ℃ environment. One reason for this could be technical: the positive predictive value (PPV) is the lowest amongst the conditions (16 ℃: 31.6%, mean: 57%, Table SM6). It must be noted, however, that PPVs are calculated using reference data that has generally been collected in standard growth conditions. So, it might be expected that the most divergent environment from standard growth conditions (resulting in the most differences in PPIs) would result in a lower PPV in our study even if the true frequency of false positives was equivalent across environments. We have attempted to be transparent about the quality of the data in each environment by reporting PPVs and other metrics in Table SM6. However, we suspect that the large number of PPIs unique to 16 ℃ is due in part to the fact that it causes the largest changes in the protein interactome, and believe that it should be included, even at the risk of lowering the overall quality of the data. The main reason for this is that this data is likely to contain valuable information about how the cell copes with this stress. For example, we find, but do not highlight in the manuscript, that 16 ℃-specific PPIs contain two major hubs (DID4: 285 PPIs involved in endocytosis and vacuolar trafficking, and DED1: 102 PPIs involved in translation), both of which are reported to be associated with cold adaptation in yeast (Hilliker et al., 2011; Isasa et al., 2015).

      To assess whether the potentially higher false-positive rate in 16 ℃ could be impacting our conclusions related to PPI network organization and features of immutable and mutable PPIs, we repeated these analyses leaving out the 16 ℃ data and found that our main conclusions did not change. This new analysis is now presented in Figure S8 and described on page 5, line 10.

      Finally, we used a pair of more conservative PPI calling procedures that either identified PPIs with a low rate of false positives across all environments (FPR

      We have also added references to other panels in Figure S8 throughout the manuscript, where appropriate.

      *) 50 % overall retest validation rate is fair and reflects a value comparable to other large-scale approaches. However what is the actual variation, e.g. between mutable PPIs and immutable or between condition. e.g. at 16{degree sign}C.

      We validated 502 PPIs present in the SD environment and an additional 36 PPIs in the NaCl environment. As the reviewer suggests, we do indeed observe differences in the validation rate across mutability bins. This data is reported in Figures 3B and S6B, and we use this information to provide a confidence score for each PPI on page 5, line 4.

      To better estimate how the number of PPIs changes with PPI mutability, we used these optical density assays to model the validation rate as a function of the mean PPiSeq fitness and the number of environments in which a PPI is detected. This accurate model (Spearman's r =0.98 between predicted and observed, see Methods) provided confidence scores (predicted validation rates) for each PPI (Table S5) and allowed us to adjust the true positive PPI estimate in each mutability bin. Using this more conservative estimate, we still found a preponderance of mutable PPIs (Figure S6E).

      The validation rate in NaCl is similar to SD (39%, 14/36), suggesting that validation rates do not vary excessively across environments. Because validation experiments are time consuming (we performed 6 growth experiments per PPI), performing a similar scale of validations in all environments as in SD would be resource intensive. Insead, we report a number of metrics (true positive rate, false positive rate, positive predictive value) in Table SM6 using large positive and random reference sets. We believe these metrics are sufficient for readers to compare the quality of data across environments.

      *) What is the R correlation cutoff for PPIs explained in the mass equilibrium model vs. not explained?

      We do not use an R correlation cutoff to assess if a PPI is explained by the mass-action equilibrium model. We instead rely on ordinary least-squares regression as detailed in the methods on page 68, line 13.

      ...we used ordinary least-squares linear regression in R to fit a model of the geometric mean of the homodimer signals multiplied by a free constant and plus a free intercept. Significantly explained heterodimer PPIs were judged by a significant coefficient (FDR 0.05, single-test). This criteria was used to identify PPIs for which protein expression does or does not appear to play as significant of a role as other post-translational mechanisms.

      The first criterion identifies a quantitative fit to the model of variation being related. The second criterion is used to filter out PPIs for which the relationship appears to be explained by more than just the homodimer signals. This approach is more stringent, but we believe this is the most appropriate statistical test to assess fit to this linear model.

      *) 90 "randomly chosen" PPIs for validation. It needs to be demonstrated that these interaction are a random subset otherwise is could also mean cherry picked interactions.

      We selected 90 of the 284 glucose transport-related PPIs for validation using the “sample” function in R (replace = FALSE). We have now included text that describes this on page 63, line 3 in the supplementary methods:

      Diploids (PPIs) on each plate were randomly picked using the “sample” function in R (replace = FALSE) from PPIs that meet specific requirements.

      *) Figure 4 provides interesting correlations with the goal to reveal properties of mutable and less mutable PPIs. PPIs detected in the PPIseq screen can partially be correlated to co-expression (4A) as well as co-localization. Does it make sense to correlate the co-expression across number of conditions? Are the expression correlation condition specific. In this graph it could be that expression correlation stems from condition 1 and 2 and the interaction takes place in 4 and 5 still leading to the same conclusion ... Is the picture of the co-expression correlation similar when you simply look at individual environments like in S4A?

      We use co-expression mutual rank scores from the COXPRESdb v7.3 database (Obayashi et al., 2019). These mutual rank scores are derived from a broad set of 3593 environmental perturbations that are not limited to the environments we tested here. By using this data, we are asking if co-expression in general is correlated with mutability and report that it is in Figure 4A. We thank the reviewer for pointing out that this was not clear and have now added text to clarify that the co-expression analysis is derived from external data on page 6, line 7.

      We first asked whether co-expression is indeed a predictor of PPI mutability and found that it is: co-expression mutual rank (which is inversely proportional to co-expression across thousands of microarray experiments) declined with PPI mutability (Figures 4A and S11) (Obayashi and Kinoshita, 2009; Obayashi et al., 2019).

      The new figure S11 examines how the co-expression mutual rank changes with PPI mutability for PPIs identified in each environment, as the reviewer suggested. For each environment, we find the same general pattern as in Figure 4A (which considers PPIs from all environments).

      *) Figure 4C: Interesting, how dependent are the various categories?

      It is well known that many of these categories are correlated (e.g. mRNA expression level and protein abundance, and deletion fitness effect and genetic interaction degree). However, we believe it is most valuable to report the correlation of each category with PPI mutability independently in Figures 4C and S12, since similar correlations with related categories provide more confidence in our conclusions.

      *) Figure 4 F: When binned in the number of environments in which the PPI was found, the distribution peaks at 6 environments and decreases with higher and lower number of environments. The description /explanation in the text clearly says something else.

      We reported on page 7, line 15:

      We next used logistic regression to determine what features may underlie a good or poor fit to the model (Figure S14C) and found that PPI mutability was the best predictor, with more mutable PPIs being less frequently explained (Figure 4F). Unexpectedly, mean protein abundance was the second best predictor, with high abundance predicting a poor fit to the model, particularly for less mutable PPIs (Figure S14D and S14E).

      As the reviewer notes, Figure 4F shows that the percent of heterodimers explained by the model does appear to decrease for PPIs observed in the most environments. We suspect that the reviewer is correct that something more complicated is going on. One possibility is that extraordinarily stable PPIs (stable in all conditions) would have less quantitative variation in protein or PPI abundance across environments. If this is true, it would be statistically difficult to fit the mass action kinetics model for these PPIs (lower signal relative to noise), thereby resulting in the observed dip.

      A second possibility is that multiple correlated factors are associated with contributing positively or negatively to a good fit, and the simplicity of Figure 4F or a Pearson correlation does not capture this interplay. This second possibility is why we used multivariate logistic regression (Figure S14C) to dissect the major contributing factors. In the text quote above, we report that high abundance is anti-correlated with a good fit to the model (S14D, S14E). Figure 4C shows that immutable PPIs tend to be formed from highly abundant proteins. One possible explanation is that highly abundant proteins saturate the binding sites of their binding partners, breaking from the assumptions of mass action kinetics model. We have now changed the word “limit” to “saturate” on page 7, line 22 to make this concept more explicit.

      Taken together, these data suggest that mutable PPIs are subject to more post-translational regulation across environments and that high basal protein abundance may saturate the binding sites of their partners, limiting the ability of gene expression changes to regulate PPIs.

      A third possibility is that the dip is simply due to noise. Given the complexity of the possible explanations and our uncertainty about which is more likely, we chose to leave this description out of the main text and focus on the major finding: that PPIs detected in more environments are generally associated with a better fit to the mass action kinetics model.

      *) Figure 6: I apologize, but for my taste this is not a final figure 6 for this study. Investigation of different environments increases the PPI network in yeast, yes, yet it is very well known that a saturation is reached after testing of several conditions, different methods and even screening repetition (sampling). It does not represent an important outcome. Move to suppl or remove.

      We included Figure 6 to summarize and illustrate the path forward from this study. This is an explicit reference to impactful computational analyses done using earlier generations of data to assess the completeness of single-condition interaction networks (Hart et al., 2006; Sambourg and Thierry-Mieg, 2010). Here, we are extending PPI measurement of millions-scale networks across multiple environments, and are using this figure to extend these concepts to multi-condition screens. We agree that the property of saturation in sampling is well known, but it is surprising that we can quantitatively estimate convergence of this expanded condition-specific PPI set using only 9 conditions. Thus, we agree with Reviewer 1 that these are “remarkable results” and that the “upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.” We think this is an important advance of the paper, and this figure is useful to stimulate discussion and guide future work.

      Reviewer #2 (Significance (Required)):

      Liu et al. increase the current PPI network in yeast and offer a substantial dataset of novel PPIs seen in specific environments only. This resource can be used to further investigate the biological meaning of the PPI changes. The data set is compared to previous DHFR providing some sort of quality benchmarking. Mutable interactions are characterized well. Clearly a next step could be to start some "orthogonal" validation, i.e. beyond yeast growth under methotrexate treatment.

      The reviewer makes a great point that we also discuss on page 9, line 33:

      While we used reconstruction of C-terminal-attached mDHFR fragments as a reporter for PPI abundance, similar massively parallel assays could be constructed with different PCA reporters or tagging configurations to validate our observations and overcome false negatives that are specific to our reporter. Indeed, the recent development of “swap tag” libraries, where new markers can be inserted C- or N-terminal to most genes (Weill et al., 2018; Yofe et al., 2016), in combination with our iSeq double barcoder collection (Liu et al., 2019), makes extension of our approach eminently feasible.

      Reviewer 3

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      The manuscript "A large accessory protein interactome is rewired across environments" by Liu et al. scales up a previously-described method (PPiSeq) to test a matrix of ~1.6 million protein pairs of direct protein-protein interactions in each of 9 different growth environments.

      While the study found a small fraction of immutable PPIs that are relatively stable across environments, the vast majority were 'mutable' across environments. Surprisingly, PPIs detected only in one environment made up more than 60% of the map. In addition to a false positive fraction that can yield apparently-mutable interactions, retest experiments demonstrate (not surprisingly) that environment-specificity can sometimes be attributed to false-negatives. The study authors predict that the whole subnetwork within the space tested will contain 11K true interactions.

      Much of environment-specific rewiring seemed to take place in an 'accessory module', which surrounds the core module made of mostly immutable PPIs. A number of interesting network clustering and functional enrichment analyses are performed to characterize the network overall and 'mutable' interactions in particular. The study report other global properties such as expression level, protein abundance and genetic interaction degree that differ between mutable and immutable PPIs. One of the interesting findings was evidence that many environmentally mutable PPI changes are regulated post-translationally. Finally, authors provide a case study about network rewiring related to glucose transport.

      **Major issues**

      -The results section should more prominently describe the dimensions of the matrix screen, both in terms of the set of protein pairs attempted and the set actually screened (I think this was 1741 x 1113 after filtering?). More importantly, the study should acknowledge in the introduction that this was NOT a random sample of protein pairs, but rather focused on pairs for which interaction had been previously observed in the baseline condition. This major bias has a potentially substantial impact on many of the downstream analyses. For example, any gene which was not expressed under the conditions of the original Tarrasov et al. study on which the screening space was based will not have been tested here. Thus, the study has systematically excluded interactions involving proteins with environment-dependent expression, except where they happened to be expressed in the single Tarrasov et al. environment. Heightened connectivity within the 'core module' may result from this bias, and if Tarrasov et al had screened in hydrogen peroxide (H2O2) instead of SD media, perhaps the network would have exhibited a code module in H2O2 decorated by less-densely connected accessory modules observed in other environments. The paper should clearly indicate which downstream analyses have special caveats in light of this design bias.

      We have now added text the matrix dimensions of our study on page 3, line 3:

      To generate a large PPiSeq library, all strains from the protein interactome (mDHFR-PCA) collection that were found to contain a protein likely to participate in at least one PPI (1742 X 1130 protein pairs), (Tarassov et al., 2008) were barcoded in duplicate using the double barcoder iSeq collection (Liu et al., 2019), and mated together in a single pool (Figure 1A). Double barcode sequencing revealed that the PPiSeq library contained 1.79 million protein pairs and 6.05 million double barcodes (92.3% and 78.1% of theoretical, respectively, 1741 X 1113 protein pairs), with each protein pair represented by an average of 3.4 unique double barcodes (Figure S1).

      We agree with the reviewer that our selection of proteins from a previously identified set can introduce bias in our conclusions. Our research question was focused on how PPIs change across environments, and thus we chose to maximize our power to detect PPI changes by selecting a set of protein pairs that are enriched for PPIs. We have now added a discussion of the potential caveats of this choice to the discussion on page 9, line 4:

      Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately 3-fold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions. Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs. Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anaerobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

      -Related to the previous issue, a quick look at the proteins tested (if I understood them correctly) showed that they were enriched for genes encoding the elongator holoenzyme complex, DNA-directed RNA polymerase I complex, membrane docking and actin binding proteins, among other functional enrichments. Genes related to DNA damage (endonuclease activity and transposition), were depleted. It was unclear whether the functional enrichment analyses described in the paper reported enrichments relative to what would be expected given the bias inherent to the tested space?

      We did two functional enrichment analyses in this study: network density within Gene Ontology terms (related to Figure 2) and gene ontology enrichment of network communities (related to Figure 3). For both analyses, we performed comparisons to proteins included in PPiSeq library. This is described in the Supplementary Materials on page 63, line 35:

      To estimate GO term enrichment in our PPI network, we constructed 1000 random networks by replacing each bait or prey protein that was involved in a PPI with a randomly chosen protein from all proteins in our screen. This randomization preserves the degree distribution of the network.

      And on page 66, line 38:

      The set of proteins used for enrichment comparison are proteins that are involved in at least one PPI as determined by PPiSeq.

      -Re: data quality. To the study's great credit, they incorporated positive and random reference sets (PRS and RRS) into the screen. However, the results from this were concerning: Table SM6 shows that assay stringency was set such that between 1 and 3 out of 67 RRS pairs were detected. This specificity would be fine for an assay intended for retest or validate previous hits, where the prior probability of a true interaction is high, but in large-scale screening the prior probability of true interactions that are detectable by PCA is much lower, and a higher specificity is needed to avoid being overwhelmed by false positives. Consider this back of the envelope calculation: Let's say that the prior probability of true interaction is 1% as the authors' suggest (pg 49, section 6.5), and if PCA can optimistically detect 30% of these pairs, then the number of true interactions we might expect to see in an RRS of size 67 is 1% * 30% * 67 = 0.2 . This back of the envelope calculation suggests that a stringency allowing 1 hit in RRS will yield 80% [ (1 - 0.2) / 1 ] false positives, and a stringency allowing 3 hits in RRS will yield 93% [ (3 - 0.2) / 3] false positives. How do the authors reconcile these back of the envelope calculations from their PRS and RRS results with their estimates of precision?

      We thank the reviewer for bringing up with this issue. We included positive and random reference sets (PRS:70 protein pairs, RRS:67 protein pairs) to benchmark our PPI calling (Yu et al., 2008). The PRS reference lists PPIs that have been validated by multiple independent studies and is therefore likely to represent true PPIs that are present in some subset of the environments we tested. For the PRS set, we found a rate of detection that is comparable to other studies (PPiSeq in SD: 28%, Y2H and yellow fluorescent protein-PCA: ~20%) (Yu et al., 2008). The RRS reference, developed ten years ago, is randomly chosen protein pairs for which there was no evidence of a PPI in the literature at the time (mostly in standard growth conditions). Given the relatively high rate of false negatives in PPI assays, this set may in fact contain some true PPIs that have yet to be discovered. We could detect PPIs for four RRS protein pairs in our study, when looking across all 9 environments. Three of these (Grs1_Pet10, Rck2_Csh1, and YDR492W_Rpd3) could be detected in multiple environments (9, 7, and 3, respectively), suggesting that their detection was not a statistical or experimental artifact of our bar-seq assay (see table below derived from Table S4). The remaining PPI detected in the RRS, was only detected in SD (standard growth conditions) but with a relatively high fitness (0.35), again suggesting its detection was not a statistical or experimental artifact. While we do acknowledge it is possible that these are indeed false positives due to erroneous interactions of chimeric DHFR-tagged versions of these proteins, the small size of the RRS combined with the fact that some of the protein pairs could be true PPIs, did not give us confidence that this rate (4 of 70) is representative of our true false positive rate. To determine a false positive rate that is less subject to biases stemming from sampling of small numbers, we instead generated 50 new, larger random reference sets, by sampling for each set ~ 60,000 protein pairs without a reported PPI in BioGRID. Using these new reference sets, we found that the putative false positive rate of our assay is generally lower than 0.3% across conditions for each of the 50 reference sets. We therefore used this more statistically robust measure of the false positive rate to estimate positive predictive values (PPV = 62%, TPR = 41% in SD). We detail these statistical methods in Section 6 of the supplementary methods and report all statistical metrics in Table SM6.

      PPI

      Environment_number

      SD

      H2O2

      Hydroxyurea

      Doxorubicin

      Forskolin

      Raffinose

      NaCl

      16℃

      FK506

      Rck2_Csh1

      7

      0.35

      0.35

      0

      0.20

      0.54

      0.74

      0

      0.17

      0.59

      Grs1_Pet10

      9

      0.44

      0.39

      0.34

      0.25

      0.65

      1.19

      0.2

      0.16

      0.95

      YDR492W_Rpd3

      3

      0

      0.18

      0

      0

      0

      0

      0

      0.17

      0.61

      Mrps35_Bub3

      1

      0.35

      0

      0

      0

      0

      0

      0

      0

      0

      Positive_control

      9

      1

      0.8

      0.73

      0.62

      1.4

      2.44

      0.4

      0.28

      1.8

      Table. Mean fitness in each environment

      -Methods for estimating precision and recall were not sufficiently well described to assess. Precision vs recall plots would be helpful to better understand this tradeoff as score thresholds were evaluated.

      We describe in detail our approach to calling PPIs in section 6.6 of the supplementary methods, including Table SM6, and Figures SM3, SM4, SM6, and now Figure SM5. We identified positive PPIs using a dynamic threshold that considers the mean fitness and p-value in each environment. For each dynamic threshold, we estimated the precision and recall based on the reference sets (described supplementary methods in section 6.5). We then chose the threshold with the maximal Matthews correlation coefficient (MCC) to obtain the best balance between precision and recall. We have now added an additional plot (Figure SM5) that shows the precision and recall for the chosen dynamic threshold in each environment.

      -Within the tested space, the Tarassov et al map and the current map could each be compared against a common 'bronze standard' (e.g. literature curated interactions), at least for the SD map, to have an idea about how the quality of the current map compares to that of the previous PCA map. Each could also be compared with the most recent large-scale Y2H study (Yu et al).

      We thank the reviewer for this suggestion. We have now added a figure panel (Figure S4) that compares PPiSeq in SD (2 replicates) to mDHFR PCA (Tarassov et al., 2008), Y2H (Yu et al., 2008), and our newly constructed ‘bronze standard’ high-confidence positive reference set (PRS, supplementary method section 6.4).

      • Experimental validation of the network was done by conventional PCA. However, it should be noted that this is a form of technical replication of the DHFR-based PCA assay, and not a truly independent validation. Other large-scale yeast interaction studies (e.g., Yu et al, Science 2008) have assessed a random subset of observed PPIs using an orthogonal approach, calibrated using PRS and RRS sets examined via the same orthogonal method, from which overall performance of the dataset could be determined.

      We appreciate the reviewer’s perspective, since orthogonal validation experiments have been a critical tool to establish assay performance following early Y2H work. We know from careful work done previously that modern orthogonal assays have a low cross validation rate ((Yu et al., 2008) and that they tend to be enriched for PPIs in different cellular compartments (Jensen and Bork, 2008), indicating that high false negative rates are the likely explanation. High false negative rates have been confirmed here and elsewhere using positive reference sets (e.g. Y2H 80%, PCA 80%, PPiSeq 74% using the PRS in (Yu et al., 2008)). Therefore, the expectation is that PPiSeq, as with other assays, will have a low rate of validation using an orthogonal assay -- although we would not know if this rate is 10%, 30% or somewhere in between without performing the work. However, the exact number -- whether it be 10% or 30% -- has no practical impact on the main conclusions of this study (focused on network dynamics rather than network enumeration). Neither does that number speak to the confidence in our PPI calls, since a lower number may simply be due to less overlap in the sets of PPIs that are callable by PPiSeq and another assay. Our method uses bar-seq to extend an established mDHFR-PCA assay (Tarassov et al., 2008). The validations we performed were aimed at confirming that our sequencing, barcode counting, fitness estimation, and PPI calling protocols were not introducing excessive noise relative to mDHFR-PCA that resulted in a high number of PPI miscalls. Confirming this, we do indeed find a high rate of validation by lower throughput PCA (50-90%, Figure 3B). Finally, we do include independent tests of the quality of our data by comparing it to positive and random reference sets from literature curated data. We find that our assay performs extremely well (PPV > 61%, TPR > 41%) relative to other high-throughput assays.

      -The Venn diagram in Figure 1G was not very informative in terms of assessing the quality of data. It looks like there is a relatively little overlap between PPIs identified in standard conditions (SD media) in the current study and those of the previous study using a very similar method. Is there any way to know how much of this disagreement can be attributed to each screen being sub-saturation (e.g. by comparing replica screens) and what fraction to systematic assay or environment differences?

      We have now added a figure panel (Figure S4) that compares PPiSeq in SD (2 replicates) to mDHFR-PCA (Tarassov et al., 2008), Y2H (Yu et al., 2008), and our newly constructed ‘bronze standard’ high-confidence positive reference sets (PRS, supplementary methods section 6.4). We find that SD replicates have an overlap coefficient of 79% with each other, ~45% with mDHFR-PCA, ~45% the ‘bronze standard’ PRS, and ~13% with Y2H. Overlap coefficients between the SD replicates and mDHFR-PCA are much higher than those found between orthologous methods ((Yu et al., 2008), indicating that these two assays are identifying a similar set of PPIs. We do note that PPiSeq and mDHFR-PCA do screen for PPIs under different growth conditions (batch liquid growth vs. colonies on agar), so some fraction of the disagreement is due to environmental differences. PPIs that overlap between the two PPiSeq SD replicates are more likely to be found in mDHFR-PCA, PRS, and Y2H, indicating that PPIs identified in a single SD replicate are more likely to be false positives. However, we do find (a lower rate of) overlaps between PPIs identified in only one SD replicate and other methods, suggesting that a single PPiSeq replicate is not finding all discoverable PPIs.

      -In Figure S5C, the environment-specificity rate of PPIs might be inflated due to the fact that authors only test for the absence of SD hits in other conditions, and the SD condition is the only condition that has been sampled twice during the screening. What would be the environment-specific verification rate if sample hits from each environment were tested in all environments? This seems important, as robustly detecting environment-specific PPIs is one of the key points of the study.

      We use PPIs found in the SD environment to determine the environment-specificity because this provides the most conservative (highest) estimate of the number of PPIs found in other environments that were not detectable by our bar-seq assay. To identify PPIs in the SD environment, we pooled fitness estimates across the two replicates (~ 4 fitness estimates per replicate, ~ 8 total). The higher number of replicates results in a reduced rate of false positives (an erroneous fitness estimate has less impact on a PPI call), meaning that we are more confident that PPIs identified in SD are true positives. Because false positives in one environment (but not other environments) are likely to erroneously contribute to the environment-specificity rate, choosing the environment with the lowest rate of false positives (SD) should result in the lowest environment-specificity rate (highest estimate of PPIs found in other environments that were not detectable by our bar-seq assay).

      **Minor issues**

      -Re: "An interaction between the proteins reconstitutes mDHFR, providing resistance to the drug methotrexate and a growth advantage that is proportional to the PPI abundance" (pg 2). It may be more accurate to say "monotonically related" than "proportional" here. Fig 2 from the cited Freschi et al ref does suggests linearity with colony size over a wide range of inferred complex abundances, but non-linear at low complex abundance. Also note that Freschi measured colony area which is not linear with exponential growth rate nor with cell count.

      We agree with the reviewer and have changed “proportional” to “monotonically related” on page 2, line 41.

      -Re: "Using putatively positive and negative reference sets, we empirically determined a statistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) > 61% in SD media, Methods, section 6)." (pg 3). Should state the recall at this PPV.

      We agree with the reviewer and have added the recall (41%) in the main text (line 26, page3).

      Using putatively positive and negative reference sets, we empirically determined a statistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) > 61% and true positive rate > 41% in SD media, Methods, section 6).

      -Authors could discuss the extent to which related methods (e.g. PMID: 28650476, PMID: 27107012, PMID: 29165646, PMID: 30217970) would be potentially suitable for screening in different environments.

      We have now added a reference to a barcode-based Y2H study that examined interactions between yeast proteins to the introduction on page 2, line 2:

      Yet, little is known about how PPI networks reorganize on a global scale or what drives these changes. One challenge is that commonly-used high-throughput PPI screening technologies are geared toward PPI identification (Gavin et al., 2002; Ito et al., 2001; Tarassov et al., 2008; Uetz et al., 2000; Yu et al., 2008, Yachie et al., 2016), not a quantitative analysis of relative PPI abundance that is necessary to determine if changes in the PPI network are occurring. The murine dihydrofolate reductase (mDHFR)‐based protein-fragment complementation assay (PCA) provides a viable path to characterize PPI abundance changes because it is a sensitive test for PPIs in the native cellular context and at native protein expression levels (Freschi et al., 2013; Remy and Michnick, 1999; Tarassov et al., 2008).

      We have excluded the references to other barcode-based Y2H studies that reviewer mentions because they test heterologous proteins within yeast, and the effect of perturbations to yeast on these proteins would be difficult to interpret in the context of our questions. The yeast protein Y2H study, although a wonderful approach and paper, would also not be an appropriate method to examine how PPI networks change across environments because protein fusions are not expressed under their endogenous promoters and must be transported to, in many cases, a non-native compartment (cell nucleus) to be detected. Rather than explicitly discuss the caveats of this particular approach, we have instead chosen to discuss why we use PCA.

      • the term "mutable" is certainly appropriate according to the dictionary definition of changeable. The authors may wish to consider though, that in a molecular biology context the term evokes changeability by mutation (a very interesting but distinct topic). Maybe another term (environment-dependent interactions or ePPIs?) would be clearer. Of course this is the authors' call.

      We thank the reviewer for this suggestion, and have admittedly struggled with the terminology. For clarity of presentation, we strived to have a single word that describes the property of a PPI that is at the core of this manuscript -- how frequently a PPI is found across environments. However, the most descriptive words come with preloaded meanings in PPI research (e.g. transient, stable, dynamic), as does “mutable” with another research field. We are, quite frankly, open to suggestions from the reviewers or editors for a more appropriate word that does not raise similar objections.

      -Some discussion is warranted about the phenomenon that a PPI that is unchanged in abundance could appear to change because of statistical significance thresholds that differ between screens. This would be a difficult question for any such study, and I don't think the authors need to solve it, but just to discuss.

      We agree with the reviewer that significance thresholds could be impacting our interpretations and discuss this idea at length on page 4, line 23 of the Results. This section has been modified to include an additional analysis (excluding 16 ℃ data) in response to another reviewer’s comment:

      Immutable PPIs were likely to have been previously reported by colony-based mDHFR-PCA or other methods, while the PPIs found in the fewest environments were not. One possible explanation for this observation is that previous PPI assays, which largely tested in standard laboratory growth conditions, and variations thereof, are biased toward identification of the least mutable PPIs. That is, since immutable PPIs are found in nearly all environments, they are more readily observed in just one. However, another possible explanation is that, in our assay, mutable PPIs are more likely to be false positives in environment(s) in which they are identified or false negatives in environments in which they are not identified. To investigate this second possibility, we first asked whether PPIs present in very few environments have lower fitnesses, as this might indicate that they are closer to our limit of detection. We found no such pattern: mean fitnesses were roughly consistent across PPIs found in 1 to 6 conditions, although they were elevated in PPIs found in 7-9 conditions (Figure S6A). To directly test the false-positive rate stemming from pooled growth and barcode sequencing, we validated randomly selected PPIs within each mutability bin by comparing their optical density growth trajectories against controls (Figures 3B). We found that mutable PPIs did indeed have lower validation rates in the environment in which they were identified, yet putative false positives were limited to ~50%, and, within a bin, do not differ between PPIs that have been previously identified and those that have been newly discovered by our assay (Figure S65B). We also note mutable PPIs might be more sensitive to environmental differences between our large pooled PPiSeq assays and clonal 96-well validation assays, indicating that differences in validation rates might be overstated. To test the false-negative rate, we assayed PPIs identified in only SD by PPiSeq across all other environments by optical density growth and found that PPIs can be assigned to additional environments (Figure S6C). However, the number of additional environments in which a PPI was detected was generally low (2.5 on average), and the interaction signal in other environments was generally weaker than in SD (Figure S6D). To better estimate how the number of PPIs changes with PPI mutability, we used these optical density assays to model the validation rate as a function of the mean PPiSeq fitness and the number of environments in which a PPI is detected. This accurate model (Spearman's r =0.98 between predicted and observed, see Methods) provided confidence scores (predicted validation rates) for each PPI (Table S5) and allowed us to adjust the true positive PPI estimate in each mutability bin. Using this more conservative estimate, we still found a preponderance of mutable PPIs (Figure S6E). Finally, we used a pair of more conservative PPI calling procedures that either identified PPIs with a low rate of false positives across all environments (FPR

      We later examine major conclusions of our study using more conservative calling procedures, and find that they are consistent. On page 6, line 14:

      Both the co-expression and co-localization patterns were also apparent in our higher confidence PPI sets (Figures S7B, and S7C, S8B, S8C ), indicating that they are not caused by different false positive rates between the mutability bins.

      And on page 6, line 19:

      We binned proteins by their PPI degree, and, within each bin, determined the correlation between the mutability score and another gene feature (Figure 4C and S12A, Table S8) (Costanzo et al., 2016; Finn et al., 2014; Gavin et al., 2006; Holstege et al., 1998; Krogan et al., 2006; Levy and Siegal, 2008; Myers et al., 2006; Newman et al., 2006; Östlund et al., 2010; Rice et al., 2000; Stark et al., 2011; Wapinski et al., 2007; Ward et al., 2004; Yang, 2007; Yu et al., 2008). These correlations were also calculated using our higher confidence PPI sets, confirming results from the full data set (Figures S7D and, S7E, S8D, S8E). We found that mutable hubs (> 15 PPIs) have more genetic interactions, in agreement with predictions from co-expression data (Bertin et al., 2007; Han et al., 2004), and that their deletion tends to cause larger fitness defects.

      -More discussion would be helpful about the idea that immutability may to some extent favor interactions that PCA is better able to detect (possibly including membrane proteins?)

      We agree with the reviewer and now added a discussion of this potential caveats to the discussion on page 9, line 4:

      Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately 3-fold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions. Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs. Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anaerobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

      -Re: "As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins." (pg 6) This is a cool result. To what extent was this result driven by members of one or two complexes? If so, it would worth noting them.

      We thank the reviewer for this question. We have now included Figue S13, which shows the number and size of protein complexes that underlie the finding that mutable hubs are more likely to participate in multiple protein complexes. We find that proteins in our screen that participate in multiple complexes are distributed over a wide range of complexes, indicating that this observation is not driven by one or two complexes. On page 6, line 34:

      As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins (Figures S13A-C) (Costanzo et al., 2016).

      -Re: "Borrowing a species richness estimator from ecology (Jari Oksanen et al., 2019), we estimate that there are ~10,840 true interactions within our search space across all environments, ~3-fold more than are detected in SD (note difference to Figure 3, which counts observed PPIs)." (pg 8) Should note that this only allows estimation of the number of interactions that are detectable by PCA methods. Previous work (Braun et al, 2019) showed that every known protein interaction assay (including PCA approaches) can only detect a fraction of bona fide interactions.

      We agree with the reviewer and have modified the discussion to make this point explicit on page 9, line 4:

      Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately 3-fold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions.

      We continue in this paragraph to discuss the implications:

      Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs.

      -Re: "This analysis shows that the number of PPIs present across all environments is much larger than the number observed in a single condition, but that it is feasible to discover most of these new PPIs by sampling a limited number of conditions." (pg 8). The main point is surely correct, but it is worth noting that extrapolation to the number of true interactions depends on the nine chosen environments being representative of all environments. The situation could change under more extreme, e.g., anaerobic, conditions.

      We agree with the reviewer and make this point explicit, continuing from the paragraph quoted above on page 9, line 22:

      Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anaerobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

      -It stands to reason that proteins expressed in all conditions will yield less mutable interactions, if 'mutability' is primarily due to expression change at the transcriptional level. They should at least discuss that measuring mRNA levels could resolve questions about this. Could use Waern et al G3 2013 data (H202, SD, HU, NaCl) to predict the dynamic interactome purely by node removal, and see how conclusions would change

      We agree with the reviewer that mRNA abundance could potentially be used as a proxy for protein abundance and have added this point on page 10, line 28:

      Here we use homodimer abundance as a proxy for protein abundance. However, genome-wide mRNA abundance measures could be used as a proxy for protein abundance or protein abundance could be measured directly in the same pool (Levy et al., 2014) by, for example, attaching a full length mDHFR to each gene using “swap tag” libraries mentioned above (Weill et al., 2018; Yofe et al., 2016).

      However, using mRNA abundance as a proxy for protein abundance in this study has several important caveats that would make interpretation difficult. First, mRNA and protein abundance correlate, but not perfectly (R2 = 0.45) (Lahtvee et al., 2017), and our findings suggest that post-translational regulation may be important to driving PPI changes. Second, mRNA abundance measures are for a single time point, while our PPI measures coarse grain over a growth cycle (lag, exponential growth, diauxic shift, saturation). Although we may be able to take multiple mRNA measures across the cycle, time delays between changes in mRNA and protein levels, combined with the fact that we do not know when a PPI is occurring or most prominent over the cycle, would pose a significant challenge to making any claims that PPI changes are driven by changes in protein abundance. We instead chose to focus on a subset of proteins (homodimers) where abundance measures can be coarse grained in the same way as PPI measures. In the above quote, we point to a potential method by which this can be done for all proteins. We also point to how a continuous culturing design could be used to better determine how protein (or mRNA proxy) abundance impacts PPI abundance on page 10, line 6:

      Finally, our assays were performed across cycles of batch growth meaning that changes in PPI abundance across a growth cycle (e.g. lag, exponential growth, saturation) are coarse grained into one measurement. While this method potentially increases our chance of discovering a diverse set of PPIs, it might have an unpredictable impact on the relationship between fitness and PPI abundance (Li et al., 2018). To overcome these issues, strains containing natural or synthetic PPIs with known abundances and intracellular localizations could be spiked into cell pools to calibrate the relationship between fitness and PPI abundance in each environment. In addition, continuous culturing systems may be useful for refining precision of growth-based assays such as ours.

      -The analysis showing that many interactions are likely due to post-translational modifications is very interesting, but caveats should be discussed. Where heterodimers do not fit the expression-level dependence model, some cases of non-fitting may simply be due to measurement error or non-linearity in the relationship between abundance and fitness.

      We show the measurement error in Figures 1, S2, S3. While we agree with the reviewer that measurement error is a general caveat for all results reported, we do not feel that it is necessary to point to that fact in this particular case, which uses a logistic regression to report that PPI mutability was the best predictor of fit to the expression-level dependence model. We discuss the non-linearity caveat on page 9, line 41:

      Our assay detected subtle fitness differences across environments (Fig S5B and S5C), which we used as a rough estimate for changes in relative PPI abundance. While it would be tempting to use fitness as a direct readout of absolute PPI abundance within a cell, non-linearities between fitness and PPI abundance may be common and PPI dependent. For example, the relative contribution of a reconstructed mDHFR molecule to fitness might diminish at high PPI abundances (saturation effects) and fitness differences between PPIs may be caused, in part, by differences in how accessible a reconstructed mDHFR molecule is to substrate. In addition, environmental shifts might impact cell growth rate, initiate a stress response, or result in other unpredictable cell effects that impact the selective pressure of methotrexate and thereby fitness (Figure S2 and S3).

      -Line numbers would have been helpful to note more specific minor comments

      We are sorry for this inconvenience. We have added line numbers in our revised manuscript.

      -Sequence data should be shared via the Short-Read Archive.

      The raw sequencing data have been uploaded to the Short-Read Archive. We mentioned it in the Data and Software Availability section on page 68, line 41.

      Raw barcode sequencing data are available from the NIH Sequence Read Archive as accession PRJNA630095 (https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP259652).

      Reviewer #3 (Significance (Required)):

      Knowledge of protein-protein interactions (PPIs) provides a key window on biological mechanism, and unbiased screens have informed global principles underlying cellular organization. Several genome-scale screens for direct (binary) interactions between yeast proteins have been carried out, and while each has provided a wealth of new hypotheses, each has been sub-saturation. Therefore, even given multiple genome-scale screens our knowledge of yeast interactions remains incomplete. Different assays are better suited to find different interactions, and it is now clear that every assay evaluated thus far is only capable (even in a saturated screen) of detecting a minority of true interactions. More relevant to the current study, no binary interaction screen has been carried out at the scale of millions of protein pairs outside of a single 'baseline' condition.

      The study by Liu et al is notable from a technology perspective in that it is one of several recombinant-barcode approaches have been developed to multiplex pairwise combinations of two barcoded libraries. Although other methods have been demonstrated at the scale of 1M protein pairs, this is the first study using such a technology at the scale of >1M pairs across multiple environments.

      A limitation is that this study is not genome-scale, and the search space is biased towards proteins for which interactions were previously observed in a particular environment. This is perhaps understandable, as it made the study more tractable, but this does add caveats to many of the conclusions drawn. These would be acceptable if clearly described and discussed. There were also questions about data quality and assessment that would need to be addressed.

      Assuming issues can be addressed, this is a timely study on an important topic, and will be of broad interest given the importance of protein interactions and the status of S. cerevisiae as a key testbed for systems biology.

      *Reviewers' expertise:* Interaction assays, next-generation sequencing, computational genomics. Less able to assess evolutionary biology aspects.

      References

      Brauer, M.J., Huttenhower, C., Airoldi, E.M., Rosenstein, R., Matese, J.C., Gresham, D., Boer, V.M., Troyanskaya, O.G., and Botstein, D. (2007). Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast. Mol. Biol. Cell 19, 352–367.

      Breker, M., Gymrek, M., and Schuldiner, M. (2013). A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. J. Cell Biol. 200, 839–850.

      Chong, Y.T., Koh, J.L.Y., Friesen, H., Kaluarachchi Duffy, S., Cox, M.J., Moses, A., Moffat, J., Boone, C., and Andrews, B.J. (2015). Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis. Cell 161, 1413–1424.

      Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell 11, 4241–4257.

      Hart, G.T., Ramani, A.K., and Marcotte, E.M. (2006). How complete are current yeast and human protein-interaction networks? Genome Biol. 7, 120.

      Hilliker, A., Gao, Z., Jankowsky, E., and Parker, R. (2011). The DEAD-box protein Ded1 modulates translation by the formation and resolution of an eIF4F-mRNA complex. Mol. Cell 43, 962–972.

      Isasa, M., Suñer, C., Díaz, M., Puig-Sàrries, P., Zuin, A., Bichmann, A., Gygi, S.P., Rebollo, E., and Crosas, B. (2015). Cold Temperature Induces the Reprogramming of Proteolytic Pathways in Yeast. J. Biol. Chem. jbc.M115.698662.

      Jensen, L.J., and Bork, P. (2008). Not Comparable, But Complementary. Science 322, 56–57.

      Lahtvee, P.-J., Sánchez, B.J., Smialowska, A., Kasvandik, S., Elsemman, I.E., Gatto, F., and Nielsen, J. (2017). Absolute Quantification of Protein and mRNA Abundances Demonstrate Variability in Gene-Specific Translation Efficiency in Yeast. Cell Syst. 4, 495-504.e5.

      Obayashi, T., Kagaya, Y., Aoki, Y., Tadaka, S., and Kinoshita, K. (2019). COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 47, D55–D62.

      Sambourg, L., and Thierry-Mieg, N. (2010). New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size. BMC Bioinformatics 11, 605.

      Tarassov, K., Messier, V., Landry, C.R., Radinovic, S., Molina, M.M.S., Shames, I., Malitskaya, Y., Vogel, J., Bussey, H., and Michnick, S.W. (2008). An in Vivo Map of the Yeast Protein Interactome. Science 320, 1465–1470.

      Yu, H., Braun, P., Yıldırım, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science 322, 104–110.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The manuscript "A large accessory protein interactome is rewired across environments" by Liu et al. scales up a previously-described method (PPiSeq) to test a matrix of ~1.6 million protein pairs of direct protein-protein interactions in each of 9 different growth environments.

      While the study found a small fraction of immutable PPIs that are relatively stable across environments, the vast majority were 'mutable' across environments. Surprisingly, PPIs detected only in one environment made up more than 60% of the map. In addition to a false positive fraction that can yield apparently-mutable interactions, retest experiments demonstrate (not surprisingly) that environment-specificity can sometimes be attributed to false-negatives. The study authors predict that the whole subnetwork within the space tested will contain 11K true interactions.

      Much of environment-specific rewiring seemed to take place in an 'accessory module', which surrounds the core module made of mostly immutable PPIs. A number of interesting network clustering and functional enrichment analyses are performed to characterize the network overall and 'mutable' interactions in particular. The study report other global properties such as expression level, protein abundance and genetic interaction degree that differ between mutable and immutable PPIs. One of the interesting findings was evidence that many environmentally mutable PPI changes are regulated post-translationally. Finally, authors provide a case study about network rewiring related to glucose transport.

      Major issues

      -The results section should more prominently describe the dimensions of the matrix screen, both in terms of the set of protein pairs attempted and the set actually screened (I think this was 1741 x 1113 after filtering?). More importantly, the study should acknowledge in the introduction that this was NOT a random sample of protein pairs, but rather focused on pairs for which interaction had been previously observed in the baseline condition. This major bias has a potentially substantial impact on many of the downstream analyses. For example, any gene which was not expressed under the conditions of the original Tarrasov et al. study on which the screening space was based will not have been tested here. Thus, the study has systematically excluded interactions involving proteins with environment-dependent expression, except where they happened to be expressed in the single Tarrasov et al. environment. Heightened connectivity within the 'core module' may result from this bias, and if Tarrasov et al had screened in hydrogen peroxide (H2O2) instead of SD media, perhaps the network would have exhibited a code module in H2O2 decorated by less-densely connected accessory modules observed in other environments. The paper should clearly indicate which downstream analyses have special caveats in light of this design bias.

      -Related to the previous issue, a quick look at the proteins tested (if I understood them correctly) showed that they were enriched for genes encoding the elongator holoenzyme complex, DNA-directed RNA polymerase I complex, membrane docking and actin binding proteins, among other functional enrichments. Genes related to DNA damage (endonuclease activity and transposition), were depleted. It was unclear whether the functional enrichment analyses described in the paper reported enrichments relative to what would be expected given the bias inherent to the tested space?

      -Re: data quality. To the study's great credit, they incorporated positive and random reference sets (PRS and RRS) into the screen. However, the results from this were concerning: Table SM6 shows that assay stringency was set such that between 1 and 3 out of 67 RRS pairs were detected. This specificity would be fine for an assay intended for retest or validate previous hits, where the prior probability of a true interaction is high, but in large-scale screening the prior probability of true interactions that are detectable by PCA is much lower, and a higher specificity is needed to avoid being overwhelmed by false positives. Consider this back of the envelope calculation: Let's say that the prior probability of true interaction is 1% as the authors' suggest (pg 49, section 6.5), and if PCA can optimistically detect 30% of these pairs, then the number of true interactions we might expect to see in an RRS of size 67 is 1% 30% 67 = 0.2 . This back of the envelope calculation suggests that a stringency allowing 1 hit in RRS will yield 80% [ (1 - 0.2) / 1 ] false positives, and a stringency allowing 3 hits in RRS will yield 93% [ (3 - 0.2) / 3] false positives. How do the authors reconcile these back of the envelope calculations from their PRS and RRS results with their estimates of precision?

      -Methods for estimating precision and recall were not sufficiently well described to assess. Precision vs recall plots would be helpful to better understand this tradeoff as score thresholds were evaluated.

      -Within the tested space, the Tarassov et al map and the current map could each be compared against a common 'bronze standard' (e.g. literature curated interactions), at least for the SD map, to have an idea about how the quality of the current map compares to that of the previous PCA map. Each could also be compared with the most recent large-scale Y2H study (Yu et al).

      • Experimental validation of the network was done by conventional PCA. However, it should be noted that this is a form of technical replication of the DHFR-based PCA assay, and not a truly independent validation. Other large-scale yeast interaction studies (e.g., Yu et al, Science 2008) have assessed a random subset of observed PPIs using an orthogonal approach, calibrated using PRS and RRS sets examined via the same orthogonal method, from which overall performance of the dataset could be determined.

      -The Venn diagram in Figure 1G was not very informative in terms of assessing the quality of data. It looks like there is a relatively little overlap between PPIs identified in standard conditions (SD media) in the current study and those of the previous study using a very similar method. Is there any way to know how much of this disagreement can be attributed to each screen being sub-saturation (e.g. by comparing replica screens) and what fraction to systematic assay or environment differences?

      -In Figure S5C, the environment-specificity rate of PPIs might be inflated due to the fact that authors only test for the absence of SD hits in other conditions, and the SD condition is the only condition that has been sampled twice during the screening. What would be the environment-specific verification rate if sample hits from each environment were tested in all environments? This seems important, as robustly detecting environment-specific PPIs is one of the key points of the study.

      Minor issues

      -Re: "An interaction between the proteins reconstitutes mDHFR, providing resistance to the drug methotrexate and a growth advantage that is proportional to the PPI abundance" (pg 2). It may be more accurate to say "monotonically related" than "proportional" here. Fig 2 from the cited Freschi et al ref does suggests linearity with colony size over a wide range of inferred complex abundances, but non-linear at low complex abundance. Also note that Freschi measured colony area which is not linear with exponential growth rate nor with cell count. -Re: "Using putatively positive and negative reference sets, we empirically determined astatistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) > 61% in SD media, Methods, section 6)." (pg 3). Should state the recall at this PPV.

      -Authors could discuss the extent to which related methods (e.g. PMID: 28650476, PMID: 27107012, PMID: 29165646, PMID: 30217970) would be potentially suitable for screening in different environments.

      • the term "mutable" is certainly appropriate according to the dictionary definition of changeable. The authors may wish to consider though, that in a molecular biology context the term evokes changeability by mutation (a very interesting but distinct topic). Maybe another term (environment-dependent interactions or ePPIs?) would be clearer. Of course this is the authors' call.

      -Some discussion is warranted about the phenomenon that a PPI that is unchanged in abundance could appear to change because of statistical significance thresholds that differ between screens. This would be a difficult question for any such study, and I don't think the authors need to solve it, but just to discuss.

      -More discussion would be helpful about the idea that immutability may to some extent favor interactions that PCA is better able to detect (possibly including membrane proteins?)

      -Re: "As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins." (pg 6) This is a cool result. To what extent was this result driven by members of one or two complexes? If so, it would worth noting them.

      -Re: "Borrowing a species richness estimator from ecology (Jari Oksanen et al., 2019), we estimate that there are ~10,840 true interactions within our search space across all environments, ~3-fold more than are detected in SD (note difference to Figure 3, which counts observed PPIs)." (pg 8) Should note that this only allows estimation of the number of interactions that are detectable by PCA methods. Previous work (Braun et al, 2019) showed that every known protein interaction assay (including PCA approaches) can only detect a fraction of bona fide interactions.

      -Re: "This analysis shows that the number of PPIs present across all environments is much larger than the number observed in a single condition, but that it is feasible to discover most of these new PPIs by sampling a limited number of conditions." (pg 8). The main point is surely correct, but it is worth noting that extrapolation to the number of true interactions depends on the nine chosen environments being representative of all environments. The situation could change under more extreme, e.g., anaerobic, conditions.

      -It stands to reason that proteins expressed in all conditions will yield less mutable interactions, if 'mutability' is primarily due to expression change at the transcriptional level. They should at least discuss that measuring mRNA levels could resolve questions about this. Could use Waern et al G3 2013 data (H202, SD, HU, NaCl) to predict the dynamic interactome purely by node removal, and see how conclusions would change

      -The analysis showing that many interactions are likely due to post-translational modifications is very interesting, but caveats should be discussed. Where heterodimers do not fit the expression-level dependence model, some cases of non-fitting may simply be due to measurement error or non-linearity in the relationship between abundance and fitness.

      -Line numbers would have been helpful to note more specific minor comments

      -Sequence data should be shared via the Short-Read Archive.

      Significance

      Knowledge of protein-protein interactions (PPIs) provides a key window on biological mechanism, and unbiased screens have informed global principles underlying cellular organization. Several genome-scale screens for direct (binary) interactions between yeast proteins have been carried out, and while each has provided a wealth of new hypotheses, each has been sub-saturation. Therefore, even given multiple genome-scale screens our knowledge of yeast interactions remains incomplete. Different assays are better suited to find different interactions, and it is now clear that every assay evaluated thus far is only capable (even in a saturated screen) of detecting a minority of true interactions. More relevant to the current study, no binary interaction screen has been carried out at the scale of millions of protein pairs outside of a single 'baseline' condition.

      The study by Liu et al is notable from a technology perspective in that it is one of several recombinant-barcode approaches have been developed to multiplex pairwise combinations of two barcoded libraries. Although other methods have been demonstrated at the scale of 1M protein pairs, this is the first study using such a technology at the scale of >1M pairs across multiple environments.

      A limitation is that this study is not genome-scale, and the search space is biased towards proteins for which interactions were previously observed in a particular environment. This is perhaps understandable, as it made the study more tractable, but this does add caveats to many of the conclusions drawn. These would be acceptable if clearly described and discussed. There were also questions about data quality and assessment that would need to be addressed.

      Assuming issues can be addressed, this is a timely study on an important topic, and will be of broad interest given the importance of protein interactions and the status of S. cerevisiae as a key testbed for systems biology.

      Reviewers' expertise: Interaction assays, next-generation sequencing, computational genomics. Less able to assess evolutionary biology aspects.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Report on Liu et al. "A large accessory protein interactome is rewired across environments" Liu et al. use a mDHFR-based, pooled barcode sequencing / competitive growth / mild methotrexate selection method to investigate changes of PPI abundance of 1.6 million protein pairs across different 9 growth conditions. Because most PPI screens aim to identify novel PPIs in standard growth conditions, the currently known yeast PPI network may be incomplete. The key concept is to define immutable" PPIs that are found in all conditions and "mutable" PPIs that are present in only some conditions. The assay identified 13764 PPIs across the 9 conditions, using optimized fitness cut offs. Steady PPI i.e. across all environments, were identified in membrane compartments and cell division. Processes associated with the chromosome, transcription, protein translation, RNA processing and ribosome regulation were found to change between conditions. Mutable PPIs are form modules as topological analyses reveals.

      Interestingly, a correlation on intrinsic disorder and PPI mutability was found and postulated as more flexible in the conformational context, while at the same time they are formed by less abundant proteins.

      I appreciate the trick to use homodimerization as an abundance proxy to predict interaction between heterodimers (of proteins that homodimerize). This "mass-action kinetics model" explains the strength of 230 out of 1212 tested heterodimers.

      A validation experiment of the glucose transporter network was performed and 90 "randomly chosen" PPIs that were present in the SD environment were tested in NaCl (osmotic stress) and Raffinose (low glucose) conditions through recording optical density growth trajectories. Hxt5 PPIs stayed similar in the tested conditions, supported by the current knowledge that Hxt5 is highly expressed in stationary phase and under salt stress. In Raffinose, Hxt7, previously reported to increase the mRNA expression, lost most PPIs indicating that other factors might influence Hxt7 PPIs.

      Points for consideration:

      *) A clear definition of mutable and immutable is missing, or could not be found e.g. at page 4 second paragraph.

      *) Approximately half of the PPIs have been identified in one environment. Many of those mutable PPIs were detected in the 16{degree sign}C condition. Is there an explanation for the predominance of this specific environment? What are these PPIs about?

      *) 50 % overall retest validation rate is fair and reflects a value comparable to other large-scale approaches. However what is the actual variation, e.g. between mutable PPIs and immutable or between condition. e.g. at 16{degree sign}C.

      *) What is the R correlation cutoff for PPIs explained in the mass equilibrium model vs. not explained?

      *) 90 "randomly chosen" PPIs for validation. It needs to be demonstrated that these interaction are a random subset otherwise is could also mean cherry picked interactions ...

      *) Figure 4 provides interesting correlations with the goal to reveal properties of mutable and less mutable PPIs. PPIs detected in the PPIseq screen can partially be correlated to co-expression (4A) as well as co-localization. Does it make sense to correlate the co-expression across number of conditions? Are the expression correlation condition specific. In this graph it could be that expression correlation stems from condition 1 and 2 and the interaction takes place in 4 and 5 still leading to the same conclusion ... Is the picture of the co-expression correlation similar when you simply look at individual environments like in S4A?

      *) Figure 4C: Interesting, how dependent are the various categories?

      *) Figure 4 F: When binned in the number of environments in which the PPI was found, the distribution peaks at 6 environments and decreases with higher and lower number of environments. The description /explanation in the text clearly says something else.

      *) Figure 6: I apologize, but for my taste this is not a final figure 6 for this study. Investigation of different environments increases the PPI network in yeast, yes, yet it is very well known that a saturation is reached after testing of several conditions, different methods and even screening repetition (sampling). It does not represent an important outcome. Move to suppl or remove.

      Significance

      Liu et al. increase the current PPI network in yeast and offer a substantial dataset of novel PPIs seen in specific environments only. This resource can be used to further investigate the biological meaning of the PPI changes. The data set is compared to previous DHFR providing some sort of quality benchmarking. Mutable interactions are characterized well. Clearly a next step could be to start some "orthogonal" validation, i.e. beyond yeast growth under methotrexate treatment.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript is clearly written and the figures appropriate and informative. Some descriptions of data analyses are a little dense but reflect what would appear long hard efforts on the part of the authors to identify and control for possible sources of misinterpretation due to sensitivities of parameters in their fitness model. The authors efforts to retest interactions under non-competition conditions allay fears of most concerns that I would have. One problem though that I could not see explicitly addressed was that of potential effects of interactions between methotrexate and the other conditions and how this is controlled for. Specifically, I could be argued that the fact that a particular PPI is observed under a specific condition could have more to do with a synthetic effect of treatment of cells with a drug plus methotrexate. Is this controlled for and how? I raise this because in a chemical genetic screen for fitness it was shown that methotrexate is particularly promiscuous for drug-drug interactions (Hillenmeyer ME ,et al. Science 2008). I tried to think of how this works but couldn't come up with anything immediately. I'd appreciate if the authors would take a crack at resolving this issue. Otherwise I have no further concerns about the manuscript.

      Significance

      Lui et al expand on previous work from the Levy group to explore a massive in vivo protein interactome in the yeast S. cerevisiae. They achieve this by performing screens cross 9 growth conditions, which, with replication, results in a total of 44 million measurements. Interpreting their results based on a fitness model for pooled growth under methotrexate selection, they make the key observation that there is a vastly expanded pool of protein-protein interactions (PPI) that are found under only one or two condition compared to a more limited set of PPI that are found under a broad set of conditions (mutable versus immutable interactors). The authors show that this dichotomy suggests some important features of proteins and their PPIs that raise important questions about functionality and evolution of PPIs. Among these are that mutable PPIs are enriched for cross-compartmental, high disorder and higher rates of evolution and subcellular localization of proteins to chromatin, suggesting roles in gene regulation that are associated with cellular responses to new conditions. At the same time these interactions are not enriched for changes in abundance. These results are in contrast to those of immutable PPIs, which seem to form a core background noise, more determined by changes in abundance than what the authors interpret must be post-translational processes that may drive, for instance, changes in subcellular localization resulting in appearance of PPIs under specific conditions. The authors are also able to address a couple of key issues about protein interactomes, including the controversial Party-date Hub hypothesis of Vidal, in which they could now affirm support for this hypothesis based on their results and notably negative correlation of PPIs to protein abundance for mutable PPIs. Finally, they also addressed the problem of predicting the upper limit of PPIs in yeast, showing the remarkable results that it may be no more than about 2 times the number of proteins expressed by yeast. Such an upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.

      This manuscript is a very important contribution to understanding dynamics of molecular networks in living cells and should be published with high priority.
      
    1. Reviewer #3:

      In the manuscript "Kinetics of CDK4/6 inhibition determine different temporal locations of the restriction point" Kim et al., investigate the regulation of the Rb/E2F by CDK4/6 and CDK2 and how mitogen and stress signalling differently regulate kinetics of CDK4/6 inhibition before irreversible cell-cycle entry. Research into restriction point regulation recently experienced a revival due to advanced single cell approaches and the presented study falls into this category as well. Utilizing CDK4, CDK2 and APC/C activity reporters the authors investigate the position of the restriction point in response to external stimuli. Their main conclusions are that i) CDK4/6 activity alone initiates RB hyperphosphorylation and E2F activation, ii) that the CDK2-Rb feedback is the key signalling network controlling the restriction point, iii) that kinetics of CDK4/6 inhibition in response to mitogen removal and stress signalling explain previous observation in asynchronously cycling cells showing different locations of the restriction point and iv), that CDK2 activity alone without other mechanisms in S phase determines the temporal location of the restriction point with respect to CDK4/6 inhibition and S-phase entry.

      I have major concerns with presented work regarding the design of the study in relation to question asked, the one-sided introduction and discussion and imprecise wording of restriction point events, the tendency to overstating/generalize conclusion of their findings, the novelty of their results in relation to old restriction point studies using serum starvation and release regimes and the more recent studies from the Meyer, Spencer and Bakal labs focusing on asynchronously growing cells, and the fact that their results and interpretations are completely at odds with the recent Dowdy and Dyson studies, which are not mentioned at all in either the introduction or discussion. Finally, to my opinion the authors have not yet provided the experimental proof for one of their major claims, namely that CDK4/6 activity alone initiates RB hyperphosphorylation and E2F activation. My detailed criticisms are listed in the major and minor points below.

      Major points:

      1) The authors give the impression in the introduction that they will focus on probing the possibility of different temporal locations of the restriction point depending on the external stimuli (p3, l60ff). However, they only use mitogen withdrawal and NCS-induced DNA damage as "stimuli" but then claim that "we demonstrate that different extracellular environments cause different kinetics of CDK4/6 inhibition (p10, l96ff)". Certainly, these two treatments (in addition to direct CDK4 and CDK2) are not sufficient for such a general statement and in the context of their writing, NCS-induced DNA damage is rather a cell-intrinsic and not an external stimulus/condition as claimed. Similarly, the authors derive from their NCS experiments general and overarching statements about restriction point regulation in response to stress. In fact, CDK4/6 is a target of several integrated stress pathways, e.g. UPR/PERK, which regulate the levels of cyclin D on the translational level (e.g. Brewer at al., PNAS 1999) and are independent on p21. The authors also claim to investigate whether other mechanisms in S phase are required to initiate the restriction point. To me this is another example of unclear wording and unfulfilled expectations as the only factor analysed is the APC/C, which is inactivated at the entry of S phase. From the introduction, discussion and the mentioned literature it is unclear to me why the authors expect that a mechanism in S phase, hence after commitment to proliferation, would feed back on the restriction point during G1 phase of the same cell.

      2) Introduction and discussion are one-sided and completely omit recent findings of the Spencer lab (Min et al., PLOS Biology 2019) in relation to stress and most importantly the Dowdy (Narasimha et al. eLife 2014) and Dyson studies (Sandias et al., Mol Cell 2019), which are both at odds with a major claim of the presented work (see below).

      3) The authors claim throughout the paper that CDK4/6 is sufficient to hyperphosphorylate Rb based on nuclei that can be stained by antibodies specific to 4 Rb phospho sites and in situ extraction experiments that claim to dissociate hyperphopshorylated Rb from the DNA. This claim cannot be made as their results are completely consistent with the alternative, namely that multiple Rb molecules within the same cell (nucleus) are mono-phosphorylated at the analysed sites, or at either of the 14 possible sites. This would be in agreement with the Dowdy and Dyson studies (Fig. 1 & Fig. 2). For the situ extraction experiments investigating nuclear-bound Rb there is no real data shown. Fig. 1J basically shows the segmentation strategy the authors employ and indicate that same cells have less nuclear Rb staining. There are no controls, (e.g. before after extraction) and proof that the assay works in their hands - e.g. treating the cells with CDK4/6 inhibitors and CDK2 inhibitors before the assay. The authors show in Fig 1. that E2F1 is already induced hours before mitosis, yet cells only progress much later into S. However, it is as likely that mono phosphorylation of RB is sufficient to initiate E2F1 transcription, this could be easily tested using the published mutant cell lines expressing Rb variants with only one phosphosite.

      4) The authors claim that "However, previous studies showed that CDK2 inhibitors caused a loss of Rb phosphorylation and induced quiescence (Narasimha et al., 2014; Spencer et al., 2013)" (p5, l60). Reading these papers again it appears to me that this is a wrong statement/interpretation. Narasimha et al, show in Figure 3 that only CDK4 inhibition but not CDK2 inhibition results in a complete loss of Rb phosphorylation. The latter treatment resulted in RB mono phosphorylation (Fig 3i) and did not induce quiescence as the authors claim here. Instead, such cells remained in G1 phase and did not make the transition into G0. Also, the claim the Spencer et al., results are due to off-target effects of CDK2 inhibitors appears flawed, because the authors only detect those after a prolonged time (more than 9 hours), whereas Spencer et al, monitored the effect of such inhibitors on cells immediately after application. Hence, in my opinion this part, the corresponding data (Fig. S3), and interpretations should be removed.

      5) In asynchronously treated cells CDK2 appears to be activated early after mitosis (Spencer et al., 2013), whereas in their experimental setup CDK2 and CDK4 activation are only assessed after mitogen starvation and release. I imagine from the timing that in asynchronously growing cells also CDK2 activity will be tightly coordinated with E2F transcription (Fig 1D) - hence, a main foundation for their study may depend on the experimental setup used and thus this should clearly be discussed. I also wonder how their results on the requirement of CDK4 for RB phosphorylation would be without the synchronization step.

    2. Reviewer #2:

      In this manuscript, Kim et al. investigate the events required for irreversible commitment to division by immortalized mammalian cells in culture. They do so by tracking single, live cells by video-microscopy using an assortment of fluorescent biosensors (augmented by fixed-cell immunofluorescence), and perturbing cell-cycle progression with cyclin-dependent kinase (CDK) inhibitors, DNA-damaging agents, or mitogen withdrawal. This is a complicated problem, which has resisted a comprehensive solution since the initial attempts to define a commitment or "Restriction" point (R point) in mammalian cells over 40 years ago. This study yields some intriguing results, and generally adds significant molecular detail to previous work on this problem by the PI and former colleagues in the Meyer lab. There are serious flaws, however, both conceptual and technical. Some of them are inherent in the approach, for example, the overreliance on small-molecule inhibitors that are not as selective as one would hope, and on live-cell biosensors that are neither as sensitive nor as specific (for individual CDKs) as they would need to be to justify some of the stronger mechanistic conclusions. Then there is the central take-home message (I think), which is based on the observation that mitogen withdrawal or DNA damaging agents have different windows of sensitivity during G1, such that the former needs to be applied earlier than the latter in order to prevent cell cycle entry. This leads to re-interpretation of the R point as a moving target, occurring at different points in the cell cycle depending on which perturbations cells encounter as they take the necessary steps to commence DNA replication. This makes little biological sense to me. The R point concept seems to lose much or all of its usefulness if it is not understood as a cellular state in which the irreversible commitment to division has been made, irrespective of what might befall an individual cell that has passed it. I think a more reasonable interpretation, of a superficially (at least) similar phenomenon, was put forth by Skotheim and colleagues, who found that the threshold level of CDK1/2 activity that predicted subsequent R-point passage was higher when all mitogens were withdrawn than when a single mitogenic signaling pathway was ablated, e.g. with a MEK inhibitor (Schwarz et al., 2018, ref 22). In this take, the R point per se is not mutable, but the strength of an antimitogenic signal can determine how quickly cells can put on the brakes before reaching it. I would urge the authors to avoid this phrasing, and aim for a bit more clarity in describing an admittedly complicated set of data. Below I Iist my major, specific concerns:

      1) Probably the biggest problem for the current study emerged from a paper by Rubin and colleagues (Guiley et al., 2019, ref. 26), which showed, quite convincingly, that the "CDK4/6 inhibitors" Palbociclib, ribociclib and abemaciclib-used throughout the current study-almost certainly do not work in cells by direct inhibition of CDK4/6, but rather by binding CDK monomers and redistributing CDK inhibitor (CKI) proteins, notably p21, to CDK2. To be fair, this is a very recent paper, which, to their credit, the authors cite and try to address. But they address it only obliquely and, I'm afraid, inadequately; although they show that effects of Palbociclib et al. are partially independent of p21 (Fig. 3B,D), this doesn't rule out contributions by other CKIs such as p27 or p57, all of which could potentially be redistributing to CDK2 complexes if CDK4 complex assembly is impaired (Guiley et al. did not test this possibility and only evaluated CDK2-CKI binding in wild-type cells). Nor do they address the strong implication of Guiley et al., that loss of CDK4/6 activity is not the mechanism by which these compounds act. This is a hugely important point; the entire study (and several previous ones from the Meyer lab) depends on the ability to inhibit CDK4/6 or CDK1/2 with different inhibitors and distinguish the effects on various cellular phenotypes and biosensor signals, which is now in considerable doubt.

      2) More generally, the study relies on small-molecule inhibitors of different CDKs that are at best only modestly selective for their intended targets. The problem with using Palbociclib in this way has been discussed above, and is a recent development, but it should be noted that major "off targets" for the "CDK4/6 inhibitors" include transcriptional CDKs such as CDK9, which are also potently inhibited by "CDK1/2" inhibitors such as roscovitine (and others). One could make the case that these drugs are hitting different targets, because they have different effects on different biosensors, but the specificity of those bioesensors was established in part by using the inhibitors, so the case that their effects occur solely or primarily through their intended targets is in the end circular.

      3) The "CDK4/6 biosensor" has in fact been shown in a previous paper by the PI to detect CDK1/2 activity in addition to CDK4/6; there was residual signal after Palbociclib treatment in cells with high CDK2 activity. Setting aside the aforementioned problem of Palbociclib specificity, if I understand correctly, to "correct" for this lack of specificity, the authors subtract 35% to generate the signal they attribute to CDK4/6. This seems to assume that the relative contributions to this fluorescence by CDK4/6 and CDK1/2 will be in a fixed proportion, or am I missing something?

      4) In previous papers from the Meyer lab, Rb hyperphosphorylation was "inferred" from concurrently increased immunofluorescence signals, in fixed cells, from a panel of phosphoRb-specific antibodies (Chung et al., 2019, ref. 18). I have my problems even with inferring stoichiometry from these types of measurements, but in this manuscript the language is even stronger: IF signals are flatly described (and interpreted) as "markers" of Rb hyperphosphorylation. This too is a major issue; a prevailing model, supported by biochemical data that are by necessity ensemble measurements, holds that CDK4/6 is primarily responsible for Rb monophosphorylation, whereas hyperphosphorylation coincides with and is dependent on activation of CDK2 (Narasimha et al., 2014, ref. 28). Although for the moment the larger concern-that anything the authors have done to inactivate CDK4/6 is likely to be indirectly inhibiting CDK2-renders this more technical point somewhat moot, conclusions-or even inferences-about hyper- versus mono-phosphorylated forms of Rb should be based on actual measurements of stoichiometry.

    3. Reviewer #1:

      This manuscript reports a series of studies probing the relative roles of CDK4/6 and CDK2 in inactivation of the retinoblastoma (Rb) protein and in determining the restriction point, which marks the commitment of a cell to S phase and subsequent cell division. The work builds off the recent development of live-cell reporters for CDK activity, and it primarily uses relationships between those signals to conclude that while CDK4/6 activity is sufficient for Rb inactivation and E2F activation, CDK2 activation determines passage through the restriction point. Though well-studied over the last two decades, the questions addressed here related to the G1-S cell cycle transition are still not sufficiently answered, and they are important to understanding fundamental cell biology and cancer biology. The use of single-cell imaging and application of a CDK4/6 sensor is an exciting approach to study Rb inactivation and the restriction point, and many of the experiments here are well designed. In addition, aspects of the authors' approach, including the use of multiple cell lines, make the observations robust. However, there are several significant concerns. While most of the concerns could be addressed through more analysis of experiments already performed and rewriting, more experiments are likely necessary to address the first point.

      Significant concerns:

      1) The study relies on interpretation of the adjusted "CDK4/6 sensor" signal as a specific reporter of CDK4/6 activity. Because this assumption of specificity is so critical, the authors should briefly review the evidence supporting it and better explain the accounting of other activities that may result in sensor phosphorylation. It is problematic that one of the conclusions in the discussion is that the "the CDK4/6 sensor may report other activities which can be targeted by CDK4/6 inhibitors," particularly as these inhibitors were used to validate specificity in ref 19 (Yang et al 2020). It is also important that mounting evidence here (for example Fig. 3A) and elsewhere show that CDK4/6 inhibitors such as palbociclib may also impact CDK2 activity.

      The conclusion that CDK4/6 activity is sufficient for Rb phosphorylation is in large part based on the correlation of the CDK4/6 sensor response with measurements of Rb phosphorylation using phosphospecific antibodies (Fig. 1). However, the sensor was constructed using an Rb-based docking site, which is expected to give the sensor properties of Rb as a substrate. With the perspective that the sensor reports on Rb-like substrate phosphorylation, rather than CDK4/6 activity per se, the reported correlation is inevitable and cannot be used to support the conclusion. The sensor phosphorylation of course correlates with Rb phosphorylation, as it was designed precisely to behave that way. Some other independent measurement of CDK4/6 activity, for example activity toward a different substrate or measurement of the abundance of CDK4/6-CycD complexes is needed to avoid this circular reasoning.

      The plausible interpretation that the sensor merely reports on the threshold of any CDK activity sufficient to phosphorylate Rb would also make other conclusions less novel, for example, that sensor phosphorylation correlates with E2F activation. If one replaces "CDK4/6 activity sensor" with "Rb-phosphorylation sensor," few conclusions from the first two figures are compelling. For this reason, it is critical that the authors further detect and quantify CDK4/6 activity in some independent way. Otherwise, the data as presented are not sufficient to support several of the main conclusions of the paper as stated, and the conclusions that likely could be fairly drawn lack novelty.

      2) Experiments similar to those presented in Fig. S3 were published before in ref 19 (Yang et al 2020). In the previous paper, the effects of the drugs were used to validate the specificity of the CDK sensors. Here, the sensors are invoked to characterize the specificity and effects of the drugs. Again, this circular logic undercuts the validity of the conclusions. It is similarly plausible that either both the sensor and drugs have specificity or both lack specificity; the outcome of the set of experiments would be the same. These experiments are not as critical to the overall study, and the authors may consider removing this part of the manuscript, if further experiments are not possible.

      3) These conclusions following presentation of the data in Fig. 3 are not well substantiated: "the temporal location of the restriction point with respect to stress and CDK4/6 inhibition is closely coupled with engagement of feedback pathways" and "our data demonstrates that inhibition of CDK4/6 activity before threshold-based activation of CDK2-Rb feedback causes cell-cycle exit." The experiments only measure CDK activity and not engagement of CDK2-Rb feedback, so there must be some assumption about the correspondence of a threshold of CDK2 activity to activation of the feedback. How is it known that feedback is engaged? This question persists throughout the study. The authors should more carefully define what CDK2-Rb feedback is and how its initiation is detected experimentally. Is it Rb hyperphosphorylation, mRNA expression of an E2F target gene, or protein levels of CycE? One of these should perhaps be measured in Fig. 3 to state the conclusion in terms of CDK2-Rb feedback rather than a CDK2 activity threshold. Alternatively, if further experimentation is not possible, the conclusions should be carefully stated in terms of CDK2 activity rather than invoking the idea of "CDK2-Rb feedback."

      4) A number of recent studies have similarly used single cell reporter and other analyses to probe the relative roles of CDK4/6, CDK2, and APC-Cdh1 in the restriction point (including Rb inactivation) and S phase entry (e.g. refs 2-4, 16-19, 22, 26, 28). The authors need to better explain how the observations here fit into the paradigms being developed and disputed through this body of work. Several of the conclusions stated here have been reached before. For example, the order that CDK4/6, CDK2, and Apc-CDK1 activity changing en route to S phase, that CDK4/6 is sufficient for Rb hyperphosphorylation, and that CDK2 activity is a threshold for the restriction point have all been described and supported in some of the referenced papers and contradicted in other references. Yet, similar conclusions are stated here as if they are novel. This study still is important in that the use of a CDK4/6 activity reporter may be a powerful approach to investigating these questions. But the subtleties of how this work is distinct and/or confirming needs to be made more clear for the reader to understand its significance.

      A related concern is that the results and conclusions described in Fig. 5 are not particularly surprising or novel. There is extensive literature characterizing high CDK2 activity, including its upregulation through CycE expression, as a mechanism of acquired tumor cell resistance to CDK4/6 inhibitors (see for example references reviewed in PMID: 32289274). Other published studies have examined the effects of ectopic CycE expression on accelerating G1-S, including in the absence of CycD activity or even the absence of Rb (see for example PMID: 8108147, PMID: 7601350, PMID: 1388095, PMID: 14645251, and PMID: 9192874). The authors should place their results in the context of these previous results and emphasize what insights are novel here.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Although the reviewers all agreed that you are addressing an important problem, and that a single cell approach is likely to yield important insights, they had serious concerns over the specificity of the probes and reagents you are using, and the degree of advance that your study represents over the current literature. With regard to the latter, the referees strongly suggested that a more comprehensive literature review is needed to put your results in context.

    1. Reviewer #3:

      This work reports the results from a set of predominantly coarse-grained (CG) simulation of phospholipid interactions with the yeast fippase Drs2p:cdc50p in the outward facing state. Using the popular MARTINI force field, these simulations reveal multiple putative binding sites of lipid molecules and support a likelihood of the "credit-card" model of lipid transport. The authors have also analyzed the possible preference of different lipids at these sites. While these are interesting observations, they are severely limited by the CG nature of the model and lack strong corroborating support from either atomistic simulations or experiment.

      1) While this work includes a substantial set of atomistic simulations, they do not appear to provide much useful information or provide much support to any of the central conclusions of the work.

      2) Instead, virtually all key conclusions are based on MARTINI simulations. While this is indeed an outstanding CG model that has been successfully applied to an increasing number of problems (particularly self-assembly), it is highly questionable that MARTINI is appropriate for predicting binding sites. To the best of my knowledge, this model has not been demonstrated to be reliable for such purposes. It requires great caution and careful validation to establish and support the predicted binding sites.

      Are there any collaborating experimental evidence to support these sites? The authors only made minimal efforts to validate this critical prediction, largely by noting that EM densities suggest multiple binding sites. This needs to be investigated thoroughly, such as by direct comparison of these locations.

      Can one at least test if lipids can stably occupy those sites using atomistic simulations?

      3) Membrane thinning is only observed in CG but not atomistic simulations; this is alarming, as membrane thinning should be able to be captured in atomistic simulations within a few 100 ns. This has been demonstrated clearly in several published simulations of scramblases (e.g., Bethel and Grabe PNAS 2016, among others). This calls the quality of the MARTINI simulations into question for capturing detailed properties of this flippase complex.

      4) Free energy analysis was done with the MARTINI model, which greatly reduces its usefulness. As stated above, the MARTINI model is really not appropriate for such detailed free energy analysis of these putative binding sites.

    2. Reviewer #2:

      This manuscript "Computational Studies of Substrate Transport and Specificity in a Phospholipid Flippase" presents multiscale simulations to understand the details of a yeast flippase in lipid binding, membrane deformation, and protein hydration. Overall, an examination of the Drs2p-Cdc50p complex was carried out with 500-ns-long all-atom and 100-us-long coarse-grained simulations in different membrane models (pure PS, PE, PC and mixtures). Free-energy simulations were also employed to compare lipid binding free energies. A major finding is the identification of the anionic PS lipid binding to a water-filled substrate binding groove. However, I find the work lacks clarity, novelty, and biological insight.

      1) My primary concern is that three different phospholipids were selected in this work: PS, PE, and PC, but only the PS lipid is anionic. First of all, it is quite obvious that the PS lipid is preferred in this limited set, due to the formal charge difference. The higher affinity of anionic lipids to transmembrane proteins has been extensively studied (too many to list, but here are a few recent examples PNAS 2020 117, 7803-7813; Structure, 2019, 27, 392-403.e3; Sci Rep. 2018, 8, 4456; Sci Rep. 2016, 6, 29502)

      Second, according to prior experiments (Appl Environ Microbiol. 2014, 80, 2966-2972), the major phospholipids in yeast are phosphatidylcholine (PC), phosphatidylethanolamine (PE), phosphatidylinositol (PI), phosphatidylserine (PS), and phosphatidic acid (PA), with minor amounts of cytidinediphosphate-diacylglycerol (CDP-DAG). There are also glycosphingolipids, ergosterol, and proteins. None of the membrane models simulated in this work is an approximate to the realistic yeast cellular membrane. Because the lipid composition has important physiological impacts, I found a lack of justification of why key anionic lipids (like PI and PA) and ergosterol were not included.

      2) In addition, it was claimed "As our atomistic simulations were limited to 0.5-1.0 𝜇𝑠 due to their high computational cost". I cannot agree with the authors, given the system size of ~340,000 atoms. It is not rare to see microsecond or multiple-microsecond all-atom simulations (of this size or larger) in current studies of membrane proteins. Further, longer simulations might be more likely to sample lipid exchange and competition within the groove, as well as relevant protein conformational changes (which cannot be captured in CG simulations).

      3) Moreover, while I found the results presented in Fig. 5 quite interesting, the related paragraphs seem to lack the in-depth analysis and clarity to support "a 'credit-card'-like model" First, it is not clear to me how this lipid in Fig. 5 was selected. How did this lipid look in the outer leaflet vs. in the deep state of the groove? Second, there is no analysis of the event at ~21-23 us when the lipid starts to transition. What was the trigger of the event? Were there any specific interactions? Last but not the least, as the authors said "X-ray diffraction and Cryo-EM experiments on ATP8A1 and ATP11C show density for PL head groups", it is possible to compare the simulation results (lipid density) to the experimental density. It would greatly strengthen this paper if such analysis is included.

      4) The "water-filled cavities" results overall may need more clarification and probably even experimental support. First of all, how were the AA simulations compared with CG simulations, in terms of the cavities? Given the ENM constraints, there were little conformational changes of the cavities (of the protein) in response to PS moving the groove. There might be some induced fit effect and the cavities may adopt different shapes when such effect is fully considered in the AA modeling. Second, is there any experimental evidence to support this observation from MD simulations? For example, mutation of the key residue Ile508, suggested by the authors to separate the two cavities.

    3. Reviewer #1:

      This is an outstanding paper. MD simulations at two resolutions are employed to provide convincing predictions regarding the lipid-binding to flippases in terms of mechanism of binding and specificity. The topic is of fundamental biology interest and the results provide deeper insights than are possible with experimental structural biology methods alone.

      The simulations are certainly state-of-the-art in terms of methodology and are well ahead of the field in terms of simulation length.

      The paper is written and presented clearly. The results are explained in detail and have the necessary statistical treatment to provide confidence in them. The discussion is based on the results and contextualised appropriately- there is no claim that cannot be supported by the results.

      A number of important observations are reported including those concerning lipid tail orientations, water-filled cavities, and lipid binding affinity.

      Overall the authors should be commended on a thorough computational study.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In general, all reviewers agreed that the problem is of importance and the simulations have been well conceived and thoroughly conducted at the coarse-grained level. However, there is the concern that while MARTINI is able to capture many collective properties of lipid membranes, it is not sufficiently reliable for dissecting molecular recognition processes governed by subtle free energy differences, especially when electrostatics (difference in charge state) and protein conformational rearrangements are expected to play major roles. In absence of direct supporting experimental verification, this concern undermines the central conclusions of the study.

    1. Reviewer #2:

      Here the authors expand on their prior modeling of origin activity (Platel 2015) in xenopus extracts. Their prior work, while successful in some estimates, failed to reproduce the tight distribution of interorigin ("eye to eye") distances. Here the authors generate a series of nested models (MM1-MM4) of increasing complexity to describe the distribution and frequency of observed initiation events in an unperturbed S-phase. Not surprisingly, the fit improves with the increasing complexity of each model. The authors then built an even more complex model based on prior published work to generate in silico data for which they tested their MM4 model. I admit to being a little lost at this point as to why the authors were using simulated data to assess their model and identify key parameters. Finally, the authors compare prior published experimental data from an unperturbed S-phase and one with an abrogated intra s-phase checkpoint (chk1 inhibition) and three parameters stood out J (rate limiting factor), 𝜃 (fraction of the genome with high origin initiation activity), and Pout (probability of remaining origins to fire) which suggests that Chk1 limits the probability of origin activation outside of the regions of the genome with high origin activation efficiency and modulates the activity of the rate limiting factor (J). These conclusions are consistent with prior observations in other systems. In summary, the authors apply elegant modeling approaches to describe xenopus in vitro replication dynamics and the effects of Chk1 inhibition, but the work fails to reveal new principles of eukaryotic origin regulation and replication dynamics. The most powerful modeling approaches are those that reveal a new or unexpected mode of regulation (or parameter) that can then be experimentally tested.

      Additional points:

      This was a very specialized manuscript and would be difficult to read for general biologists. The terms/parameters were only defined in a table and many of the figures would not be parsable by a broad audience.

      Figure 1. Sets off the challenge at hand -- that the previous model couldn't account for the distribution of "eye to eye" distances; but this is never assessed in similar format with the newer model. I assume this is captured in the appendix 1 figures, but was uncles if this was eye length or gap length.

    2. Reviewer #1:

      The current work by Goldar and colleagues uses numerical simulations to model the spatiotemporal DNA replication program in an in vitro Xenopus DNA replication system. By comparing modeled data and experimental DNA combing data generated during unperturbed S-phase replication and upon intra-S checkpoint inhibition (which the authors published previously), the authors find that DNA replication in Xenopus extracts can be modeled by segmenting the genome in regions of high and low probability of origin activation, with the intra-S-phase checkpoint regulating origins with low but not high firing probability. Recapitulating the kinetics of global and local S-phase replication under different conditions through mathematical simulations represents an important contribution to the field. However, one concern I have pertains to the generality of the model, as the authors did not explore whether the model can accurately simulate replication under other conditions (e.g., checkpoint activation).

      Major comments:

      1) In figure 1a and 1c, the authors show data that were previously published by the authors. Yet, the displayed values in 1a and 1c differ from those displayed in Figure 10 of Platel et al, 2015. This discrepancy should be explained.

      2) The authors test whether their model can simulate replication when S-phase is perturbed by Chk1 inhibition, but not under opposite conditions of Chk1 activation. This important analysis should be included.

      3) Although the MM4 model developed by the authors is in agreement with previously published experimental DNA combing data measured in the Xenopus system, it is unclear whether it can also accurately predict the replication program in other systems. Comparing simulated data with experimental data from another metazoan system would serve as an important additional validation of the authors' model.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This paper uses numerical simulations to model DNA replication dynamics in an in vitro Xenopus DNA replication system, both in unperturbed conditions and upon intra-S-checkpoint inhibition. The current work extends previous studies by the authors that recapitulated some but not all features of the replication program. The new model is superior as it can model both the frequency and the distribution of observed initiation events. Although the reviewers found the work in principle interesting and well executed, they have identified limitations of the study, both with respect to model validation and the extent to which the findings represent new biological insights into origin regulation and replication dynamics.

    1. Reviewer #3:

      The manuscript consists of a nice confirmation study and further validate the PET-index (Stender et al., 2016) as well as the EEG classification (Sitt et al., 2014; Engemann et al., 2018). The introduction is clear, the method is clear as well, results are well described and the discussion is concise and precise. The clinical impact of the study would greatly benefit from the availability of the PET index code on a platform such as GiHub to allow all centers with a PET scanner to use this index and provide a better diagnosis for DoC patients.

      Major comments:

      1) Regarding the behavioural assessment (i.e., number of CRS-r), is there a minimum of CRS-R performed? This should be stated in the method. Based on the table, some patients received only 2 CRS-R, while the rate of misdiagnosis with 2 CRS-R is as high as 26% for UWS patients (Wannez et al 2017). This is an important limitation. The number of CRS-R should be included in supplementary material, in a table providing all individual data (see next comment). Some UWS patients with a high index (>3.07) may have received only 2 CRS-R, which would have an important impact on the validity of the results.

      2) Were the EEG and PET acquisitions done on the same day? Which CRS-R was taken? The best or the one done on the day of the PET-scan? As the study compared the validity of the PET index and EEG classification, the fact that the two exams may not have been performed on the same day, and knowing that DoC patients fluctuate a lot, is a clear limitation and should be clearly acknowledged and discussed in the limitation section.

      3) For the PET voxel-based analysis, the significance threshold was set at p<0.005 uncorrected. Why did the authors use this threshold? It seems a bit arbitrary or convenient for the authors. It would be interesting and more transparent to present the corrected results too (e.g. in Supplementary Material).

      4) It is crucial to add a limitation section. The study has many limitations (not 5 CRS-R, heterogeneity of the population, PET-EEG and behavioural assessments not done on the same day, while comparing their respective accuracy, PET isn't easily available which limits the clinical impact of the present study, etc.).

      5) Individual data should be added (initial diagnostic, gender, age, etiology, best crsr, number of crs-r, index, eeg classification, outcome etc.) in supplementary material. The excel file provided is terrible to read. Could the authors at least tabulate the columns and provide a legend? In any case, I strongly suggest adding a table in supplementary material with the individual data.

      6) The references should be carefully checked. Some of them are in the text but not in the list, and some of them are in the list but not referenced in the text. The reference "Wannez et al 2018" does not seem to be the appropriate one.

    2. Reviewer #2:

      The study is a prospective cohort study evaluating both PET and EEG regarding the diagnosis and prognosis in VS/MCS patients. Thus, it represents a logical advancement from Stender et al 2016 and Bekinschtein et al 2009 towards clinical evaluation of the retrospectively established methods. To my knowledge there is no other prospective data set examining these methods. The authors plausibly show that the methods are capable of improving the diagnosis. The included number of subjects of 57 sufficient given the high effort necessary for this multimodal assessment. The results regarding the prognosis using the combined methods though significant certainly needs a targeted study with a fixed design before use in clinical practice.

      In the following I would propose some minor improvements:

      1) I would move the first two sentences of paragraph 2 (31 ff) to the discussion. They introduce a new concept that is not necessary to understand your major points in the introduction. I would stick to your story a) DoCs are important clinically because we don't know who is aware of what (a potential nightmare for the patient) b) PET seems to be really robust at telling but is actually not evaluated prospectively c) EEG might also help but in the past was not very robust in prospective studies d) Maybe a combination of both helps too. Second problem is what to tell relatives how the prognosis is. Actually, we know only little mainly as a side finding of Stender 2014 and 2016. In my opinion the latter points are told nicely.

      2) I would remove the regional differences as a discriminator. I have two concerns about them. The first is technical in nature: you applied an anatomical atlas to potentially deformed brains after injury. The paper does not convince me that this worked sufficiently because it is not described in detail and from my experience it is very difficult to segment this type of brains. The second concern is that the result does not really support your main findings and is thus dispensable. I would recommend to focus on the main points: PET is really robust in your sample (even the cut-off from Stender et al 2016 is pretty much reproducible) and EEG is also pretty robust (although sensitivity drops from 94% in-sample to 58% out-of-sample). Also, the combination works well. I think these are the main findings that have potential to make it into clinical routine.

      3) I would also focus the discussion on two points. First, the clinical impact of your findings. I think if you would deliver a fully automatized tool to reproduce your data pipeline people world-wide would be willing to use PET for their VS patients. As a second point you should also discuss the concept of the cortically mediated state and how your work is related to that.

      In conclusion, I think the study presented is technically and conceptually strong and provides a valuable step towards clinical routine application of the demonstrated methods. The language is also enjoyable to read.

    3. Reviewer #1:

      The authors intended to test whether FDG-PET pseudo-quantitative metabolic index of the best preserved hemisphere (MIBH), as well as EEG-based classification (the auditory local-global paradigm)
, and combination of the two methods, were accurate complementary markers to discriminate VS from MCS. Their results showed that an MIBH was accurate 
and robust procedure across sites to diagnose MCS, which can even be improved in combination with EEG-based classification allowing the detection of covert cognition and 6- month responsiveness recovery in unresponsive patients. Additionally, their results indicated 
that the behavioral diagnosis of MCS does not correspond to an elusive and generic conscious 
state, but rather to a CMS that reveals the preservation of metabolic activity in specialized 
cortical networks. These results provide valuable information for the clinic use of MIBH and local-global paradigm in the future. There are several issues which should be mentioned:

      1) As the authors put the "methods and materials" before the results, they should describe the patients’ information in a clear way in the "methods and materials", not in the results.

      2) The authors may need to provide more information about the EEG design. For example, what is the exact experiment design, ITI, stimulus number, and so on. More importantly, the authors need to provide the exact number of the left epochs after the rejection of the bad epochs for each patient.

      3) The authors indicated that the auditory local-global paradigm could be used to detect the consciousness. Furthermore, they also mentioned the cognitive-motor dissociation patients (CMD). If they can discuss the distinction of local-global paradigm and motor imagery tasks (or other tasks) which were used to detect the CMD, this will be very helpful.

      4) The results about accuracy of MIBH to discriminate between MCS and VS are not strongly related to results about how MCS did not correspond to an elusive and generic conscious 
state. The latter is more interesting. I would suggest the authors put them into two independent papers.

      5) Please provide more information about the "MCS items are associated with metabolic specific of subscales", such as how many patients in the analysis for each subscale?

      6) Please clarify why there are results about Motor CRS-R subscale: one in Fig.5 and the other one in supplementary Figure e-1.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers in general agree that your study is solid with clear-cut results. In particular, the multimodal assessments of both PET and EEG, regarding the diagnosis and prognosis in VS/MCS patients, were carefully executed. As such, the results provide valuable information for future prognosis research guiding clinic use, e.g., a targeted study with a fixed design.

    1. Reviewer #3:

      1) As I state below the paper is carefully done (with a few minor issues) using a difficult and sophisticated biophysical technique, FCS to assess the changes in beta catenin diffusion within the cell following Wnt signaling. So it passes the test on being an original piece of work executed well. However what has been learned is quite limited. A few interactions, such as the slow diffusion in the cytoplasm can be interpreted several ways. It is very helpful to have concentrations in the nucleus and cytoplasm for beta catenin for future modeling. They could have tried to use single cross correlation with labeled APC or axin or the proteasome to derive more important information about the path through the destruction sequence. But that may be too hard to ask for at this stage. They could have combined their measurements with appropriate mutants or knockouts. I come down close to the line, high on the importance of the problem and the methods and execution; lower on the current take home lesson.

      2) The support for the somewhat limited conclusions is strong as it is.

      3) There are some technical issues. There is some concern with the FCS data itself. Figure 5F and 5G are of some concern. The curve doesn't drop to 1 at long correlation time (>100ms) and there are big fluctuations in the region of short correlation times (<0.1 ms). This could be due to the very long time course (120s) used in the experiment. Have the authors tried to image the same spot multiple times in short intervals (etc 10s), or try to analyze 10s sub-trace of the original long trace to see if the conclusions hold? This type of error could influence the calculation of the diffusion coefficient of complexes of CNNTB1. They also affect the quantification of concentration. In line 352-353 the authors mentioned the nuclear concentration of CNNTB1 increases 2.1 fold based on FCS measurement, which is smaller than the fluorescent intensity change. Is this the result of errors such as this.

      For confocal imaging analysis, the description was not clear as to whether there is background subtraction during the intensity quantification. If there is, the authors should mention it in the method explicitly. If not, the background could decrease the fold change estimation.

      In the model description line 877, equation (6), k7x6 should be k7x5

      Line 901 equation (15), there is no unit for the binding affinity

      Normally, a fraction of the fluorescent protein is not bright; the authors may not have a tool to measure the dark component but they should mention how it may affect the quantification in the discussion.

    2. Reviewer #2:

      The manuscript by S.M.A. de Man et al. presents a study on the cellular response to Wnt activation and on the intracellular kinetics of beta catenin (CTNNB1). The authors have developed cell lines expressing GFP reporters of CTNNB1 using CRISPR CAS9. They present different convincing controls on the specificity of the reporter and decided to analyze the temporal behavior of the best reacting clone. Then, they investigate the temporal evolution of fluorescent signals in the cell cytoplasm and nucleus upon Wnt signaling activation. They quantify the kinetics of the relocalization of CTNNB1 from the cytoplasm to the nucleus upon different strength of activation of the Wnt signaling and GSK3 inhibition. Using FCS, they identify that a dual diffusion model fits better the experimental data than a classical single diffusion model, suggesting the presence of complexes of different sizes. They measure the diffusion parameters and concentrations of the complexes in the nucleus and in the cytoplasm. Using a dynamical model, the authors reveal that, to recapitulate the experimental observations, the regulation of CTNNB1 upon Wnt signaling has to be controlled at three levels, the destruction complex, the nuclear transport and the binding affinity to the chromatin.

      Overall, the study is solid, presenting novel information on the kinetics of CTNNB1 during Wnt signaling. The results are consistent with the classical view on the regulation of beta catenin during Wnt signaling. I have few comments essentially on the methodology.

      Specific comments:

      -The authors have designed a new cell line allowing for tracing the kinetics of beta catenin over time following Wnt signaling activation. They follow the relative changes in concentration in the nucleus and cytoplasm upon activation of Wnt signaling. Normalized changes render difficult to evaluate if the difference in the increase in the cytoplasm and the nucleus is due to a higher increase in the nucleus or simply due the absence of beta catenin in the nucleus at the onset of the process therefore enhancing the quantification. A non-normalized plot showing the increase in grey levels in the nucleus and cytoplasm should be added to complement the quantification and identify the differences between nuclear and cytoplasmic beta catenin. It would also help the reader to compare with the results of concentrations extracted from the FCS.

      -The response in figure 4 upon Wnt signaling activation and GSK3 inhibition are different (with the absence of a plateau in the case of GSK3 inhibition). The explanation of this difference is unclear as it is. I would suggest the authors to detail a bit more their thoughts on the reason for the difference. Could this simply be that Wnt activation clusters just a subset of GSK3 at the membrane and that inhibition can reach a higher level of depletion of GSK3 in the cytoplasm?

      -How GSK3 inhibition treatment affects the FCS measurements, particularly concentrations and different complexes compositions? The differences with Wnt3 activation could provide additional information on the nature of the identified complexes.

      -The dynamical model presented in the paper shows a non-monotonous change in the concentration of beta catenin in the cytoplasm after activation. This seems to be due to the kinetics of nuclear transport and does not seem to be present in the experimental observations. Can the authors comment on this point? Is there a way by modulating parameters associated to transport to suppress this discrepancy?

      -Finally, the model is consistent with the experimental observations but the authors did not check with any type of perturbation how the model would compare with the experiments. For instance, how does the model compare with experiments in the case of GSK3 inhibition, or when nuclear transport is affected. Adding a perturbation case would significantly strengthen the connection between model and experiment and the message of the manuscript.

      -labels of the figure 4 and respective movies are inverted

      -The figure 1 only presents the classical model and no new concept/data. The figure 1 and figure 2 should be merged to my point of view.

      -The labels in the table 1 Wnt (ON -OFF) are inverted.

    3. Reviewer #1:

      CTNNB1 is a core component of canonical Wnt signalling that is frequently mutated in cancers. A constitutively active destruction complex (degradosome) binds and phosphorylates CTNNB1 earmarking it for proteasomal degradation, this complex is inactivated upon Wnt3a/GSK3β inhibition leading to CTNNB1 stabilisation and nuclear translocation. The authors have successfully employed CRISPR mediated endogenous tagging of CTNNB1 and determined its cellular concentration and diffusion dynamics in HAP1 cells, in both the cytoplasm and nucleus by live-cell imaging and analysis. They provide the relative subcellular CTNNB1 concentration for the nucleus and cytoplasm, like previous studies in other cell lines (Tan et al., 2012) and in Xenopus (Lee et al., 2003). In addition their results suggest CTNNB1 resides in slow moving complexes that persist upon Wnt but become slightly more mobile, these results are intriguing but raise several unanswered questions, such as whether these complexes represent the destruction complex (cytoplasm) or enhanceosome (nucleus). The work has been completed to a high standard but I have several concerns listed below.

      1) The authors acknowledge significant cell-cell heterogeneity. This is particularly noticeable in Fig.4A upon Wnt3a and CHIR99021 treatment. Fig.4B suggests all cells are analysed regardless of heterogeneity and the only exclusion criteria mentioned in the methodology is cells with a cytoplasm of less than 10pixels. Fig.4C/D does not seem to reflect the variation observed in Fig.4A? What is the spread pre-normalisation before and after treatment? How is the relative increase in nuclear/cytoplasmic intensity affected by cell size? Nuclear and cytoplasmic area? This may affect the relative fold increase and the cytoplasmic area seems highly variable at the confluence of cells shown.

      2) Using point FCS the authors determined two diffusion speeds corresponding to monomer and complexed CTNNB1 in both the nucleus and cytoplasm. A modest increase in cytoplasmic diffusion speed of complexed CTNNB1 was observed after Wnt3a (0.461μm2/s-1) but far from the speed of the monomer (14.9μm2/s-1) suggesting it remains complexed upon Wnt3a. In addition the fraction of complexed CTNNB1 (~40%) remains largely unaltered. Is the same true under CHIR299021 treatment? Point FCS samples a very small area of the cell cytoplasm/nucleus and therefore gives a small representation of the subcellular pool (which is likely heterogeneous), only a single point appears to have been analysed per-cell and within the 21 cells analysed clear outliers can be observed (Fig.6A/B), this has not been adequately discussed. What is the variation in diffusion measured at different points within a single cell? Some discussion has been made as to these complexes reflecting the destruction complex/proteasome or the enhanceosome but this really needs to be tested in order to make any conclusions about these observations. Especially as cytoplasmic complexes are maintained under Wnt conditions, this would challenge the notion that CTNNB1 disassociates from the destruction complex upon Wnt. Ideally endogenous tagging of other destruction complex components with a different fluorophore would be done to address this, if these complexes do represent the destruction complex and remain bound after Wnt this would have significant implications for our understanding of complex inactivation and greatly enhance the manuscript.

      3) The N&B analysis averages out monomeric and complexed CTNNB1 intensity across an image stack around a single ROI within each cell. The authors interpret Fig.6C to mean SGFP2-CTNNB1 is present as a monomer whether in a complex or not. This is based on the fact the relative brightness averages at 1.0 similar to a monomeric GFP control. However, the spread of relative brightness is large, and often less than <1 so a relative brightness of 1 cannot refer to a monomeric SGFP2-CTNNB1? Does cellular concentration affect relative brightness? If so transiently expressed monomer and dimer GFP may not be the best controls. Aggregation is spatially homogeneous and limited by the diffusion rate of protein/complexes - which your FCS measurements suggest is consistent with a large complex. Thus a single average may not represent the diversity of protein complexes, eN&B could be used (Cutrale et al., 2019). As mentioned in point 3, like FCS, you are only sampling a small region of the cell, which may or may not contain a destruction complex for example. Super-resolution imaging techniques such a STORM or LLSM may help with visualisation of cell complex heterogeneity and give a different impression of complex occupancy. I don't think the N&B data is sufficient to say complexes don't exist that contain more than one SGFP2-CTNNB1 molecule.

      4) The computational model relies on a number of assumptions determined in other studies that may not reflect the HAP1 cells used in this study. Lee et al., was performed in Xenopus and Tan et al., 2012 found a number of differences in their mammalian cell studies. Important information regarding the concentration of destruction complex components has also been omitted, this information is important for future comparisons of cell-type specific behaviours.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The authors investigate how cells respond to WNT signaling by altering beta catenin (CTNNB1) dynamics. They generated a number of cell lines in which they use different light microscopy techniques –such as FCS and number & brightness (N&B) measurements– to quantitatively investigate the diffusion behavior and complex formation of intracellular CTNNB1. The results are in general well explained, reasoned and technically well-controlled (except for some, which raised concerns that were pointed out by the reviewers). The main finding of the paper is that CTNNB1 seems to reside in slow-moving complexes (that exist both in the presence and absence of WNT) that become slightly more mobile after WNT addition. As pointed out by the reviewers, these results can be interpreted in different ways, and it is not clear whether these complexes represent the destruction complex (cytoplasm) or enhanceosome (nucleus). In summary, yet the work shows some technical proficiency which could address some critical issues in Wnt signaling, the authors would need to identify the issues that could be resolved by the technique and then design experiments to resolve them in the future.

    1. Reviewer #4:

      This paper presents CytofRUV, a new tool to remove technical batch effects in CYTOF data, inspired by tools used in the transcriptomics field. There is still a strong need for such tools and I expect this tool to be a valuable addition to the cytometry field. I especially appreciate the authors' effort in providing multiple evaluation measures and informative figures to estimate the properties of the batch effects before and after normalization. There is currently no one-fits-all solution for batch normalization, and having sufficient quality control along the way is absolutely invaluable.

      I recommend no major changes to the manuscript, but mainly some additional guidance in the reader's interpretation of some results, and some smaller suggestions to improve figures. Some of the more unexpected results are not commented on in the text and it would be helpful if some interpretation could be given in those cases.

      -Many methods cause an increased batch silhouette score compared to raw, does this mean that in those cases the methods increase the batch effects?

      -Also the Hellinger distances sometimes become bigger than originally. Would there be any way to check if this distance would be small given an adapted manual gating? Or could there be any reason that actually some cell types are indeed differing in proportion in the different batches, so you would not expect the batch correction to "restore" this (as no cells are added or removed by the correction)? As both CytoNorm and CytofRUV apply the normalization on a cluster-by-cluster basis, I am also not sure why the cluster proportions afterwards would become more similar. Can you give any further intuition about this?

      -While there is a section regarding "keeping biological differences" this is only explored on the population level in the individual samples. I would also find it of interest to read something about biological differences between samples which are preserved (e.g. maybe quantifying the differences between the healthy controls?)

    2. Reviewer #3:

      The manuscript in review discusses a new method to address technical variances in CYTOF data called CytofRUV and based on Remove Unwanted Variation methodology. CYTOF datasets are prone to significant batch-to-batch variation due to the technical nature of signal registration and this method adds to the group of previously published algorithms aimed to solve the same task.

      The manuscript is well-written and the narrative flows well. The authors come up with compelling examples of batch effect in CYTOF data (e.g. Fig.2) that honestly not only call for robust algorithmic normalization but make me somewhat question the claimed reliability of the CYTOF technology to deliver precise measurement of protein expression without robust replicates built into every experimental design of CYTOF experiments; this publication would surely raise awareness of existing issues. Authors also line up a series of metrics to quantify the efficiency of theirs and alternative methods for data normalization, and propose a strong battery of visual cues built into their Shiny app to evaluate the algorithm results.

      1) The algorithm performance deserves more discussion that is currently outsourced to the reference to original RUV paper (Molania et al). How computationally demanding is it? What computational resources were used? How does it scale to large datasets? How parametrization (choice of k value) affects the results specifically for CYTOF data (this is slightly touched upon in the Molania et al paper, but the data context is very different)?

      2) Are any of the metrics mentioned in the paper built into the R package/Shiny app? From the paper, it looks like the only outputs that the interface presents are the four visual plots but no evaluation metrics of how the normalization affected/improved the data.

      3) Besides silhouette scores, were there any other attempts to verify the data integrity post processing? For instance, how reproducible are clustering results after normalization if the processed data are clustered from scratch and compared to clustering performed before normalization?

      4) Based on existing datasets and metric outputs, would the authors suggest a way to estimate the minimal number of replicates (as discussed in lines 488-492) required for the specific panel/sample/instrument type to provide necessary power to preserve the resolution of the data post normalization?

    3. Reviewer #2:

      The authors have presented a novel approach based on RUV-III for normalizing CyTOF data leveraging replicate samples across batches. The article is clear, well laid out, thoughtful and presents well-substantiated conclusions. The RUV class of method has been applied across high throughput technologies including RNASeq, single-cell RNASeq, nanostring and others and it is a natural extension to single cell cytometry. I have few issues with the paper. My one minor concern is the conflation of the term cell subpopulation with cluster. I don't think this detracts from the conclusions of the paper, but the former typically is reserved for cells of a consistent and verified phenotype. FlowSOM and just about all other clustering methods do not necessarily produce clusters that correspond to consistent cell sub populations (the phenotype of the cells included in a cluster can and does vary). I think to make statements about sub populations, the authors would have to look at manual phenotype assignments as well. I am not suggesting that it is necessary, and I find the evaluation of the method with respect to clusters much more compelling and natural. However, I would request that the authors make the distinction between clusters and cell sub populations in this context.

      After looking at the software implementation I think some discussion of the computational complexity and limitations of the method and implementation is warranted, particularly time and memory considerations. Could the method scale to large data sets (100s or 1000s of samples with several 100k cells each), which are typical in clinical studies? Do all data need to be loaded into working memory for the current implementation, or in general?

    4. Reviewer #1:

      The article describes CytofRUV, an algorithm for normalization of mass cytometry datasets. The article is well written, the data is publicly available, and the source code is usable and well-documented. My comments are provided below:

      Major comments:

      1) I believe the focus of this article can be improved. The abstract is a bit confusing. If the article is focused on the algorithm, the focus of the abstract should not be on leukemia. This can be used in many settings. Similarly, much of the article (including 4 of the main figures) are dedicated to establishing that this one dataset indeed does have a batch effect issue. Other datasets are not introduced until the very end of the manuscript. However, for an article focused on the development of a new bioinformatics method, I believe the focus should be on evaluation of the algorithm on a broad range of datasets (which the authors have already done, but should be presented more prominently).

      2) Comparison with prior algorithms is only presented in a qualitative manner. Quantification of these comparisons, followed by appropriate statistical tests, would strengthen this article. I don't believe a new algorithm needs to outperform existing algorithms in every test (as it runs against the no free lunch theorem) but quantification should be provided regardless.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The authors present a new Cytof normalization approach based on RUV III that has proven useful for other technologies including RNASeq, single-cell RNAseq and nanostring. The reviewers all agreed that this was a strong manuscript that makes an important contribution to an area of the field that remains under-served.

    1. Reviewer #3:

      The relative contributions of both asymptomatic infections and super spreading events to the ongoing SARS-COV-2 pandemic are critical, controversial questions. As far as I know this may be the first paper to utilize the approach combining phylogenetic inferences from genomic data with time series case data to estimate these parameters from available data applied to the ongoing SARS-COV-2 pandemic. However, with so many papers coming out so quickly it's possible I missed this.

      Here, the authors combine viral phylogenetics with time series case data to estimate parameters (including temporally structured estimates of the reproductive number) about the SARS-COV-2 pandemic in 12 locations globally. They find that the number of undetected infections ranges substantially by location from 13% to 92% and the precision of their estimates improves substantially with the number of viral genomes included from each location and this is visualized in Figure 2.

      However, in its current form it suffers from some shortcomings..

      SARS-COV-2 evolves slowly relative to other viruses and this can lead to high levels of phylogenetic uncertainty in recovered trees and this can have a strong influence on parameter estimates. According to the methods and the supplemental material the authors inferred a single phylogenetic tree for each location. The authors should be encouraged to infer a distribution of trees for each location and condition their analyses across this additional uncertainty. If this has already been done then the manuscript needs to be augmented to make this clear.

      Abstract:

      This section requires a thorough edit to improve clarity, in its current form it is rather discombobulated and needs to better link aims to results to conclusions.

      Introduction:

      The first 2 paragraphs of the introduction should be switched. The introduction should start with the big questions - in this case why it is important in the big picture of epidemiology to estimate parameters like the total number of infections - and then introduce the study system in play to address the big questions in this case SARS-COV-2.

      The third paragraph addresses other ways to directly estimate the number of infected through serological surveys. Missing from this paragraph is acknowledging the assumption that markers of immunity lasts long enough for such surveys to be effective in detecting past infected individuals.

      The final paragraph of the introduction outlines the aims and is rather lacking in scientific detail namely what are the hypotheses? What are the alternatives? What are the predictions and tests of hypotheses in play? What specific hypotheses are the authors testing by applying their method? This requires clarification.

      Methods:

      Generally, the methods lack sufficient detail to replicate what the authors have done.

      In the Viral genomes section of the methods it is stated that several locations were excluded due to "multiple circulating lineages" however nearly all of the locations included (e.g. Guangdong, Hubei, Shanghai, UK) also have multiple circulating lineages. What was done here needs to be clarified greatly.

      Phylogenetic inference as performed in IQ-TREE is fine however as previously mentioned the authors need to minimally infer a distribution of trees for each region to condition their subsequent analyses across.

      In the section on sub-sampling the sequences to the dominant lineages, how was lineage assignment done? Using Pangolin? Or another classification system? More detail is needed.

      A bit more detail on how the authors determined convergence was achieved would be valuable. For example, how was visual confirmation of convergence done? Via visual inspection of parameter traces? A generalist reader may need more detail than has been provided.

      Results:

      More detail is needed in the figure legend for Figure 1. For example unless I misunderstand this it is mentioned that the red lines are HPD intervals on those days but it is actually a shaded area with a measure of central tendency as a red line.

      Discussion:

      Overall, the discussion puts the results in appropriate context. It seems though that caveats associated with these analyses were not appropriately acknowledged. A bit more thought should be put into appropriate acknowledgements of things which may affect the authors estimates and interpretations of findings.

      On balance I do think that the approach utilized in this manuscript makes a potentially useful contribution to addressing the current pandemic and it is to my knowledge this approach has not yet been applied to SARS-COV-2. I would like to see additional analyses (incorporation of phylogenetic uncertainty) and a thorough edit and revision for clarity.

    2. Reviewer #2:

      The authors presented a Bayesian inference framework to fit a branching process model that incorporates both viral genomes and time series of case data to estimate the undetected COVID-19 infections. While the method seems to be valid, the application of the method on the data is subject to some uncertainties especially for locations in Asia, such as Japan, Shanghai and Hong Kong. Please see below for my comments/suggestions:

      Major comments:

      1) My biggest concern is that in many of the locations in Asia in Table 1/Figure 1, no sustained local outbreak has been detected. So far the majority of cases in Hong Kong were imported cases (https://www.chp.gov.hk/files/pdf/local_situation_covid19_en.pdf ). By the end of Feb 2020, more than 50% of cases in Guangdong of China were imported cases from Hubei. How would the sequence analysis and model fit be if imported cases are excluded?

      2) As mentioned above, the proportion of imported cases would likely affect the estimation of the Rt and undetected infections. What if the method is applied to imported cases and local separately for some of the locations such as Hong Kong (in which the imported/local case status is clear for every case)?

    3. Reviewer #1:

      In this work the authors use previously-developed methods linking viral sequence data and reported case counts to estimate the percentage of undetected infections and the effective reproduction number Rt through time in a number of locations. This is an extremely important topic. It remains the case that despite the urgency, there has not been consistent population-based viral testing and the fraction of COVID-19 cases that are reported remains largely unknown. This is an important topic and if genomics can help it is very valuable.

      However, there are some concerns about the methods for this specific application. Validation on simulated data, and exploration of robustness to some of the assumptions and limitations, could help.

      Dates of confirmation may differ from dates of symptom onset by many days. This is discussed briefly but the impact of a shift is not explored. The bias may additionally depend on the population size, with more bias towards the beginning when there are few cases and few sequences. It could also impact the sequencing; this is discussed briefly but could be explored to some extent by shifting the dates and re-estimating.

      The authors subsampled the sequences to the dominant lineages. More information about how this was done would be helpful. In addition, of course without information to link viral genomes to reported case counts, the same adjustment cannot be made to the reported cases -- could this impact the results? It is not quite clear how multiple lineages, introductions, geographical mixing in the phylogeny are treated. For example, consider an example in which the California sequences have some Minnesota ones embedded in them, scattered in a clade. If the Minnesota sequences in entirety are treated as one phylogeny (without any of the CA tips) then there would be very long branches between these and other Minnesota sequences, and the likelihood would reflect no branching events on these branches. In reality there were plenty of events but they were in CA. Meanwhile those branching events do not occur in the CA tree either, because their descendants have been pruned out of the CA analysis. In any case it is not clear what precisely is meant by not including locations with co-circulating lineages, nor how geographical mixing is treated.

      The probability of sequencing, and its variation over time, may affect the model's inferences, because in times of more dense sequencing the intervals in the tree will be shorter (and conversely). The model may not be able to distinguish this from changes in prevalence and reporting fraction. Should there be a rho_t that applies to the sequencing data?

      I wonder if the authors are able to model tips that occur in the reported data, handling these dates differently. It seems that the only link is through the conditional independence of the yi and zi information (condition on the xi information). I also wonder about the impact of phylogenetic uncertainty.

      There seems to be a possible identifiability issue with rho_t and x_t, because surely a higher x and lower rho could give the same likelihood, particularly since we can't sequence cases that we can't detect.

      How do the estimates of the reporting fraction compare to those obtained for example with the model by Russell et al ( https://cmmid.github.io/topics/covid19/global_cfr_estimates.html ) or with other estimates of under-reporting? (Some of these are given in the results but CIs are wide).

      I would have liked to see more information for how this was done: "we computed the smallest number of individuals that could contribute to 80% of infections during each week (Figure 4)". Similarly, detailed methods are not given for the 'time to detect an outbreak' results.

      It would be interesting to see the comparison between the estimated reporting fractions and the testing data available at (for example) https://covidtracking.com which allows downloads of data on testing through time by state. It is mentioned in the discussion; information about testing is available for many places (US states and otherwise) .

      I am also concerned about the large population assumption that is inherent in the mathematics behind the core equation for lambda_t (which the authors should either derive or give the citation for). This equation requires that the mean of the number of offspring in the data is equal to the mean of the offspring distribution, which only happens in the limit when the present and past populations are both large. The same assumption is required for the variance. Particularly in the early stages the large population assumption is unlikely to be met.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      This paper uses a combination of sequence and case data to estimate the ascertainment rate of COVID19 in different settings. The methods are known but this is the first application to SARS-CoV-2 data, and the topic is of very high importance. The reviewers had some substantial concerns about the methodology and the clarity of description.

    1. Reviewer #3:

      In this manuscript the authors used high throughput light microscopy and image analysis to study the effects of essential gene knockdown via an arrayed CRISPRi library in M.smegmatis.

      There are many technical advances to this paper, and the experiments are well executed. The data and its analysis adds value to the mycobacterial field. I particularly appreciated the thoughtful Discussion, which honestly laid out the limitations for the author's work.

      However, in some areas, the lengthy manuscript came across as a bit unfocused. For example, in addition to describing the methods of their technique, the author's validate or give examples of what their data contain (identification of cryptic putative RM system, histidine auxotroph phenotypes, effects of disrupting mycolic acid biosynthesis). They then discuss the potential to use CRISPRi to conform compound MOA. This is a lot of information (10 figures with many subpanels), but none of these threads are really taken to completion. I appreciate the amount of work that doing that would take, so I'm not suggesting that as a revision. But, reshuffling or restructuring some of these sections, may help to guide the reader towards the utility of these data.

      Lastly, and I think importantly, after reading this manuscript, I was left with the lingering question: for any essential gene that I'm interested in, would these data help to make hypotheses about its function. And ... I'm not sure... The data as presented in Figure 6 do not help this case. While some functionally related genes cluster together, many do not, especially for genes that fall into cluster 2.

      With some textual changes to streamline the manuscript, I think the manuscript could be improved.

    2. Reviewer #2:

      de Wet et al. screen a CRISPRi library of M. smeg. essential genes for morphological phenotypes. Using a sensitive analytical approach, they find that most essential knockdown strains have morphological phenotypes. They further show that functionally related genes cluster by morphology in multidimensional space. Finally, they associate morphological changes with antibiotics to probe antibiotic MOA. This manuscript will be of interest to researchers studying essential genes in Mycobacteria.

      General Comments:

      1) "Moreover,to verify the reproducibility of the imaging workflow, replicate imaging was performed on separate days for 134 strains." Does this mean that the authors don't have replicate data for 29 strains? If so, imaging of these strains must be repeated to verify reproducibility. Have the authors validated any phenotypes with a second guide RNA to rule out off target effects?

      2) MSMEG_3213 isn't an example of defining the function of an uncharacterized gene--instead it simply validates existing database predictions. Further, the data presented here do not demonstrate that MSMEG_3213 is the methylase of an R-M pair.

      3) The his gene depletion phenotypes are likely due to translation defects that result from uncharged tRNAs. This is consistent with tRNA synthetase/ribosomal protein knockdown phenotypes presented in this manuscript, as well as the observation that translation inhibition by knockdown or serine hydroxymate produced elongated cells in Bacillus subtilis (PMID: 27238023).

    3. Reviewer #1:

      General assessment:

      This manuscript addresses the lag in identifying functions of genes annotated in bacterial genomes. It is an epic presentation of a line of investigation from inception through assay development and validation to identifying previously unknown functional associations. Beyond these initial novel insights, the developed phenoprinting approach and the resulting UMAP space provide a solid foundation for future conditional gene function and initial drug mechanism of action studies.

      Substantive concerns:

      None

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript combines a CRISPRi library in Mycobacterium smegmatis with high throughput light microscopy and image analysis to investigate the effects of essential gene knockdown on bacterial morphology. The reviewers all agree that there are many technical advances presented in this paper, the experiments are well executed, and the data and its analysis is significant for the field. However, there are some questions regarding the reproducibility of the data and the utility of these data as a predictive tool. The reviewers believe that these questions should be straightforward to address, as described more below.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their close reading and constructive comments on our manuscript. We believe that their insight has substantially strengthened our manuscript. Please find our response/revision plan for each comment below (in blue). Note, because of the substantial changes to the figures and the additional experiments that are we are undertaking, we have not initially revised the text. The proposed textual revisions will be included in the full revision.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The Katz lab has contributed greatly to the field of epigenetic reprogramming over the years, and this is

      another excellent paper on the subject. I enjoyed reviewing this manuscript and don't have any major

      comments/suggestions for improving it. The findings presented are novel and important, the results are clear

      cut, and the writing is clear.

      It's important to stress the novelty of the findings, which build upon previous studies from the same lab (upon

      a shallow look one might think that some of the conclusions were described before, but this is not the case).

      Despite the fact that this system has been studied in depth before, it remained unclear why and how

      germline genes are bookmarked by H3K36 in the embryo, and it wasn't known why germline genes are not

      expressed in the soma.

      To study these questions Carpenter et al. examine multiple phenotypes (developmental aberrations,

      sterility), that they combine with analysis of multiple genetic backgrounds, RNA-seq, CHIP-seq, single

      molecule FISH, and fluorescent transgenes.

      Previous observations from the Katz lab suggested that progeny derived from spr-5;met-2 double mutants

      can develop abnormally. They show here that the progeny of these double mutants (unlike spr-5 and met-2

      single mutants) develop severe and highly penetrate developmental delays, a Pvl phenotype, and sterility.

      They show also that spr-5; met-2 maternal reprogramming prevents developmental delay by restricting

      ectopic MES-4 bookmarking, and that developmental delay of spr-5;met-2 progeny is the result of ectopic

      expression of MES-4 germline genes. The bottom line is that they shed light on how SPR-5, MET-2 and

      MES-4 balance inter-generational inheritance of H3K4, H3K9, and H3K36 methylation, to allow correct

      specification of germline and somatic cells. This is all very important and relevant also to other organisms.

      **(very) Minor comments:**

      -Since the word "heritable" is used in different contexts, it could be helpful to elaborate, perhaps in the

      introduction, on the distinction between cellular memory and transgenerational inheritance.

      We are happy to elaborate on this in the revised manuscript.

      -It might be interesting in the Discussion to expand further about the links between heritable chromatin

      marks and heritable small RNAs. The do hint that the result regarding the silencing of the somatic transgene

      are especially intriguing.

      We are happy to expand this in the revised manuscript.

      Reviewer #1 (Significance (Required)):

      This is an exciting paper which build upon years of important work in the Katz lab. The novelty of the paper

      is in pinpointing the mechanisms that bookmark germline genes by H3K36 in the embryo, and explaining

      why and how germline genes are prevented from being expressed in the soma.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double

      mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development

      for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are

      interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the

      phenotypes into a more molecular analysis. The authors hypothesize that SPR-5 and MET-2 modify

      chromatin of germline genes (MES-4 targets) in somatic cells, and this is required to silence germline genes

      in the soma. A few issues need to be resolved to test these ideas and rule out others.

      **Main comments:**

      The authors' hypothesis is that SPR-5 and MET-2 act directly, to modify chromatin of germline genes (MES-

      4 targets), but alternate hypothesis is that the key regulated genes are i) MES-4 itself and/or ii) known

      regulators of germline gene expression e.g. the piwi pathway. Mis regulation of these factors in the soma

      could be responsible for the phenotypes. Therefore, the authors should analyze expression (smFISH and

      where possible protein stains) for MES-4 and PIWI components in the embryo and larvae of wildtype, double

      and triple mutant strains. These experiments are essential and not difficult to perform.

      In our RNA-seq analysis we see a small elevation of MES-4 itself (average 1.18 log2 fold change across 5 replicates). This does not seem likely to be solely driving such a dramatic phenotype. Nevertheless, it is possible that the small increase in expression of MES-4 itself could be contributing. To determine if MES-4 is being ectopically expressed in spr-5; met-2 double mutants, we have obtained a tag version of MES-4 from Dr. Susan Strome and will use this to examine the localization of MES-4 protein in spr-5; met-2 double mutants. We are definitely interested in the potential interaction between PIWI components and the histone modifying enzymes that we have explored in this study. However, since RNAi of MES-4 is sufficient to rescue the developmental delay of spr-5; met-2 mutants, we have chosen to focus on that interaction in this paper. In the future, we hope to examine the role of PIWI components in this system.

      A second aspect of the hypothesis is that spr-5 and met-2 act before mes-4 and that while these genes are

      maternally expressed, they act in the embryo. There really aren't data to support these ideas - the timing and

      location of the factors' activities have not been pinned down. One way to begin to address this question

      would be to perform smFISH on the target genes and on mes-4 in embryos and determine when and where

      changes first appear. smFISH in embryos is critical - relying on L1 data is too late. If timing data cannot be

      obtained, then I suggest that the authors back off of the timing ideas or at least explain the caveats.

      Certainly, figure 8 should be simplified and timing removed. (note: Typical maternal effect tests probably

      won't work because if the genes' RNAs are germline deposited, then a maternal effect test will reflect when

      the RNA is expressed but not when the protein is active. A TS allele would be needed, and that may not be

      available.)

      To determine the timing of the ectopic expression of MES-4 targets, we have performed smFISH on two MES-4 targets in embryos. Thus far, these experiments show that MES-4 targets are ectopically expressed in the embryo, but only after the maternal to zygotic transition. This is consistent with our proposed model. A figure containing this data will be added to the revised manuscript. In addition, our model is predicated on the known embryonic protein localization of SPR-5 and MES-4. Maternal SPR-5 protein is present in the early embryo up to around the 8-cell stage, but absent in later embryos (Katz et al., 2009). In addition, in mice, the SPR-5 ortholog LSD1 is required maternally prior to the 2-cell stage (Wasson et al., 2016 and Ancelin et al., 2016). In contrast, MES-4 continues to be expressed in the embryo until later embryonic stages where it is concentrated into the germline precursors Z2 and Z3 (Fong et al., 2002). This is consistent with SPR-5 establishing a chromatin state that continues to be antagonized by MES-4. There is evidence that MET-2 is expressed both in early embryos and later embryos. However, since the phenotype of MET-2 so closely resembles the phenotype of SPR-5 (Kerr et al., 2014), we have included it in our model as working with SPR-5. Further experimentation will be required to substantiate the model, but we believe the model is consistent with all of the current data.

      Writing/clarity:

      -It would be helpful to include a table that lists the specific genes studied in the paper and how they behaved

      in the different assays e.g. RNAseq 1, RNAseq 2, MES-4 target, ChIP. That way, readers will understand

      each of the genes better.

      We are happy to include a table in the revised manuscript.

      -At the end of each experiment, it would be helpful to explain the conclusion and not wait until the

      Discussion. For readers not in the field, the logic of the Results section is hard to follow.

      This seems like a stylistic choice. Traditionally, papers did not include any conclusions in the results section, and it is our preference to keep our paper organized this way. However, if the reviewer would still like us to change this, we are happy to do so.

      -The model is explained over three pages in the Discussion. It would be great to begin with a single

      paragraph that summarizes the model/point of the paper simply and clearly.

      The discussion in the revised manuscript will altered to include this.

      **Specific comments:**

      -Figure 1 has been published previously and should be moved to the supplement.

      In our original paper (Kerr et al.) we reported in the text that spr-5; met-2 mutants have a developmental delay. However, we did not characterize this developmental delay. Nor did we include any images of the double mutants, except for one image of the adult germline phenotype. As a result, we believe that the inclusion of the developmental delay in the main body of this manuscript is warranted.

      -Cite their prior paper for the vulval defects e.g. page 6 or show in supplement.

      We are happy to include a citation of our previous paper for the vulval defects in the revised manuscript.

      -The second RNAseq data should be shown in the Results since it is much stronger. The first RNAseq,

      which is less robust, should be moved to supplement.

      The revised manuscript will include this alteration.

      -Figure 3 is very nice. Please explain why the RNAs were picked (+ the table, see comment above), and

      please add here or in a new figure mes-4 and piwi pathway expression data in wildtype vs double/triple

      mutants.

      We performed RT-PCR on 9 MES-4 targets. These 9 targets were picked because they had the highest ectopic expression in spr-5; met-2 mutants and largest change in H3K36me3 in spr-5; met-2 mutants versus Wild Type. Amongst these 9 genes, we performed smFISH on htp-1 and cpb-1 because they are relatively well characterized as germline genes.

      The revised manuscript will include added panels to supplemental figure 2 showing the expression of PIWI pathway components.

      -Figure 3 here or later, please show if mes-4 RNAi removes somatic expression of target genes.

      We are currently carrying out this experiment. Once it is completed, the data will hopefully be added to the paper.

      -Is embryogenesis delayed?

      Embryogenesis seems to be sped up in spr-5; met-2 mutants. A supplemental figure will be added to the revised manuscript showing this. It is unclear why embryogenesis is sped up. However, this confirms that the developmental delay is unique to the L1/L2 stages.

      -Figure 4 since htp-1 smFISH is so dramatic, it would be helpful to include htp-1 in the lower panels.

      htp-1 will be added to the lower panels in the revised manuscript.

      -Figure 4, please add an extra 2 upper panels showing all the genes in N2 vs spr-5;met-2, for comparison to

      the mes-4 cohort.

      As a control, we will add panels showing a comparison to all germline genes, excluding MES-4 targets. This new data shows that germline genes that are not MES-4 targets do not have ectopic H3K36me3. This data, which further suggests that the phenomenon is confined to MES-4 targets, is consistent with our results showing that MES-4 RNAi is sufficient to suppress the developmental delay.

      -Figure 6. Please show a control that met-1 RNAi is working.

      We performed RT-PCR to try and confirm that met-1 RNAi was working. Despite controls repeating the MES-4 suppression and verifying that RNAi was working, we were unable to demonstrate that met-1 was knocked down. As a result, we will remove this result from the paper. Importantly, this does not affect the conclusion of the paper.

      -To quantify histone marks more clearly, it would be wonderful to have a graph of the mean log across the

      gene. showing the mean numbers would help clarify the degree of the effect. we had an image as an

      example but it does not paste into the reviewer box. Instead, see figure 2 or figure 4

      here: https://www.nature.com/articles/ng.322

      We will attempt to include this analysis in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double

      mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development

      for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are

      interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the

      phenotypes into a more molecular analysis.

      This work will be of interest to people following transgenerational inheritance, generally in the C. elegans

      field. People using other organisms may read it also, although some of the worm genetics may be

      complicated. Some of the writing suggestions could make a difference.

      I study C. elegans embryogenesis, chromatin and inheritance.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the paper entitled "C. elegans establishes germline versus soma by balancing inherited histone

      methylation" Carpenter BS et al examined a double mutant worm strain they had previously produced of the

      H3K4me1/2 demethylase spr-5 and the predicted H3K9me1/me2 methylase met-2. These mutant worms

      have a developmental delay that arises by the L2 larval stage. They performed an analysis of what genes

      get misexpressed in these double mutants by performing RNAseq and compare this to datasets generated

      from other labs on an H3K36me2/me3 methylase MES-4 where they see a high degree of overlap. They

      validate the misexpression of some germline specific genes in the soma by in situ and validate that there is a

      dysregulation of H3K36me3 in their double mutant worms. They further find that knocking down mes-4

      reverts the developmental delay.

      I think that the authors need to make more of an effort to be a bit more scholarly in terms of placing their

      work in the context of the field as a whole and also need to add a few additional experiments as well as

      reorganize a bit before this is ready for publication. Remember that the average reader is not necessarily an

      expert in C. elegans or this particular field and you really want to try and make the manuscript as accessible

      to everyone as possible.

      **Major Points**

      1)It would be good to see western blots or quantitative mass spec examining H3K36me3 in the WT and spr-

      5;met-2 double mutant worms. I believe this was also previously reported by Greer EL et al Cell Rep 2014 in

      the single spr-5 mutant worm so that work should be cited here in addition to the identification of JMJD-2 as

      an enzyme involved in the inheritance of H3K4me2 phenotype.

      The ectopic H3K36me3 is confined to a small set of MES-4 targets. We don’t even see ectopic H3K36me3 at non-MES-4 germline genes (see above). Therefore, we don’t expect to see any global differences in bulk H3K36me3. Greer et al reported that there are elevated H3K36me3 levels in spr-5 mutants. This discrepancy may be due to different stages (embryos, germline) present in their bulk preparation. Alternatively, the met-2 mutant may counteract the effect of the spr-5 mutation on H3K36me3. Regardless, we believe that the genome-wide ChIP-seq is more informative than bulk H3K36me3 levels.

      We will add a citation for the Greer paper in the revised manuscript.

      2)Missing from Fig.5 is mes-4 KD by itself. This is needed to determine whether these effects are specific to

      the spr-5;met-2 double mutants or more general effects that KD of mes-4 would decrease the expression of

      all these genes to a similar extent. Then statistics should be done to see if the decrease in the WT context is

      the same or greater than the decrease in the double mutants.

      The MES-4 targets are generally expressed only in the germline and defined by having mes-4 dependent H3K36me3. Knocking down mes-4 would be expected to prevent the expression of these genes in the germline, but this is difficult to test because mes-4 mutants basically don’t make a germline. Regardless, knocking down mes-4 by itself would only assess the role of MES-4 in germline transcription, not the ectopic expression that is being assayed in spr-5; met-2 mutants in Fig 5. Importantly, it remains possible that spr-5; met-2 mutants might also result in an increase in the expression of MES-4 targets in the germline. However, the experiments performed in this manuscript were conducted on L1 larvae, which do not have any germline expression, to eliminate this potential confounding contribution.

      **Minor Points**

      1)A greater attempt needs to be made to be more scholarly for citing previously published literature. This

      includes work on the inheritance of H3K27 and H3K36 methylation in C. elegans and other species as well.

      A few papers which seem germane to this story which should be cited in the intro are (Nottke AC et al PNAS

      2011, Gaydos LJ et al Science 2014, Ost A et al Cell 2014, Greer EL et al Cell Rep 2014, Siklenka K et al

      Science 2015, Tabuchi TM et al Nat Comm 2018, Kaneshiro KR et al Nat Comm 2019). This problem is not

      restricted to the intro.

      Although many of these excellent papers are broadly relevant to this current work, they are not necessarily directly relevant to this paper. For this reason, they were not originally cited. Nevertheless, we will attempt to cite these papers in the revised version when possible.

      2)I think that the authors need to be a little less definitive with your language. Theories should be introduced

      as possibilities rather than conclusions. Should remove "comprehensive" from intro as there are many other

      methods which could be done to test this.

      Throughout the manuscript, we have tried to be clear what the data suggests versus what is model based on the data. Nevertheless, to further clarify this, we are happy to remove “comprehensive” from the intro.

      3)The authors should describe what PIE-1 is. Is this a transcription factor?

      PIE-1 is a transcriptional inhibitor that is thought to block RNA polII elongation by mimicking the CTD of RNA polII and competing for phosphorylation. We are happy to add a reference to this function in the revised manuscript.

      4)The language needs clarification about MES-4 germline genes and bookmark genes. Are these bound by

      MES-4 or marked with K36me2/3?

      The revised manuscript will be modified to make this definition more clear.

      5)I think Fig S1 E+F should be in the main figure 1 so readers can see the extent of the phenotype.

      The original single image of the spr-5; met-2 adult germline phenotype (including the protruding vulva) was included in our previous publication. In this manuscript, we have now quantified this phenotype, which is why it is included in the supplement here. However, because the original picture was included in our original publication, we prefer to leave it as supplemental.

      6)For Fig S2 it would be good to do the same statistics that is done in Fig 2 and mention them in the text so

      the readers can see that the overlap is statistically significant.

      We are happy to include these statistics in the revised manuscript.

      7)Fig S2.2 should be yellow blue rather than red green for the colorblind out there.

      Thanks for pointing this out. We are happy to change the colors in the revised manuscript.

      8)When saying "Many of these genes involved in these processes..." the authors need to include numbers

      and statistics.

      We will amend the revised text to make the definition of the MES-4 genes more clear.

      9)Should use WT instead of N2 and specify what wildtype is in methods.

      We will use WT instead of N2 in the revised manuscript.

      10)Fig. 2A + B could be displayed in a single figure. And Fig 2D seems superfluous and could be combined

      with 2C or alternatively it could be put in supplementary.

      Figure 2A and 2B were purposely separated to make it clear how many of the overlapped changes are up versus down. In the revised manuscript, Figure

      2D will be moved to the supplement.

      11)Non-C. elegans experts won't understand what balancers are. An effort should be made to make this

      accessible to all. Explaining when genes are heterozygous or homozygous mutants seems relevant

      here.

      The text of the revised manuscript will be amended to make it more accessible for non-C. elegans readers.

      12)The GO categories (Fig. S2) should be in the main figure and need to be made to look more scientific

      rather than copied and pasted from a program.

      The GO categories were included to be comprehensive and do not contribute substantially to the main conclusion of the paper. This is why they are supplemental. In the revised manuscript, we will edit the GO results so that they look more scientific.

      13)Fig. 7 seems a bit out of place. If the authors were to KD mes-4 and similarly show that the phenotype

      reverts that would help justify its inclusion in this paper. Without it seems like a bit of an add on that belongs

      elsewhere.

      We believe that the somatic expression of a transgene in spr-5; met-2 mutants adds to our potential understanding of how this double mutant may lead to developmental delay. This is true, regardless of whether of whether the somatic transgene expression is mes-4 dependent or not.

      Reviewer #3 (Significance (Required)):

      I think this is an interesting and timely piece of work. A little more effort needs to be put in to make sure it is

      accessible to the average reader and has sufficient inclusion of more of the large body of work on

      inheritance of histone modifications. I think C. elegans researchers as well as people interested in

      inheritance and the setup of the germline will be interested in this work.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2's comments on experiments to include or exclude alternative models. I also agree

      about their statement about rewriting to make it more accessible to others who aren't experts in this

      specialized portion of C. elegans research. All in all it seems like the experiments which are required by

      reviewer #2 and myself as well as the rewriting should be quite feasible.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In the paper entitled "C. elegans establishes germline versus soma by balancing inherited histone methylation" Carpenter BS et al examined a double mutant worm strain they had previously produced of the H3K4me1/2 demethylase spr-5 and the predicted H3K9me1/me2 methylase met-2. These mutant worms have a developmental delay that arises by the L2 larval stage. They performed an analysis of what genes get misexpressed in these double mutants by performing RNAseq and compare this to datasets generated from other labs on an H3K36me2/me3 methylase MES-4 where they see a high degree of overlap. They validate the misexpression of some germline specific genes in the soma by in situ and validate that there is a dysregulation of H3K36me3 in their double mutant worms. They further find that knocking down mes-4 reverts the developmental delay.

      I think that the authors need to make more of an effort to be a bit more scholarly in terms of placing their work in the context of the field as a whole and also need to add a few additional experiments as well as reorganize a bit before this is ready for publication. Remember that the average reader is not necessarily an expert in C. elegans or this particular field and you really want to try and make the manuscript as accessible to everyone as possible.

      Major Points

      1)It would be good to see western blots or quantitative mass spec examining H3K36me3 in the WT and spr-5;met-2 double mutant worms. I believe this was also previously reported by Greer EL et al Cell Rep 2014 in the single spr-5 mutant worm so that work should be cited here in addition to the identification of JMJD-2 as an enzyme involved in the inheritance of H3K4me2 phenotype.

      2)Missing from Fig.5 is mes-4 KD by itself. This is needed to determine whether these effects are specific to the spr-5;met-2 double mutants or more general effects that KD of mes-4 would decrease the expression of all these genes to a similar extent. Then statistics should be done to see if the decrease in the WT context is the same or greater than the decrease in the double mutants.

      Minor Points

      1)A greater attempt needs to be made to be more scholarly for citing previously published literature. This includes work on the inheritance of H3K27 and H3K36 methylation in C. elegans and other species as well. A few papers which seem germane to this story which should be cited in the intro are (Nottke AC et al PNAS 2011, Gaydos LJ et al Science 2014, Ost A et al Cell 2014, Greer EL et al Cell Rep 2014, Siklenka K et al Science 2015, Tabuchi TM et al Nat Comm 2018, Kaneshiro KR et al Nat Comm 2019). This problem is not restricted to the intro.

      2)I think that the authors need to be a little less definitive with your language. Theories should be introduced as possibilities rather than conclusions. Should remove "comprehensive" from intro as there are many other methods which could be done to test this.

      3)The authors should describe what PIE-1 is. Is this a transcription factor?

      4)The language needs clarification about MES-4 germline genes and bookmark genes. Are these bound by MES-4 or marked with K36me2/3?

      5)I think Fig S1 E+F should be in the main figure 1 so readers can see the extent of the phenotype.

      6)For Fig S2 it would be good to do the same statistics that is done in Fig 2 and mention them in the text so the readers can see that the overlap is statistically significant.

      7)Fig S2.2 should be yellow blue rather than red green for the colorblind out there.

      8)When saying "Many of these genes involved in these processes..." the authors need to include numbers and statistics.

      9)Should use WT instead of N2 and specify what wildtype is in methods.

      10)Fig. 2A + B could be displayed in a single figure. And Fig 2D seems superfluous and could be combined with 2C or alternatively it could be put in supplementary.

      11)Non-C. elegans experts won't understand what balancers are. An effort should be made to make this accessible to all. Explaining when genes are heterozygous or homozygous mutants seems relevant here.

      12)The GO categories (Fig. S2) should be in the main figure and need to be made to look more scientific rather than copied and pasted from a program.

      13)Fig. 7 seems a bit out of place. If the authors were to KD mes-4 and similarly show that the phenotype reverts that would help justify its inclusion in this paper. Without it seems like a bit of an add on that belongs elsewhere.

      Significance

      I think this is an interesting and timely piece of work. A little more effort needs to be put in to make sure it is accessible to the average reader and has sufficient inclusion of more of the large body of work on inheritance of histone modifications. I think C. elegans researchers as well as people interested in inheritance and the setup of the germline will be interested in this work.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2's comments on experiments to include or exclude alternative models. I also agree about their statement about rewriting to make it more accessible to others who aren't experts in this specialized portion of C. elegans research. All in all it seems like the experiments which are required by reviewer #2 and myself as well as the rewriting should be quite feasible.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the phenotypes into a more molecular analysis. The authors hypothesize that SPR-5 and MET-2 modify chromatin of germline genes (MES-4 targets) in somatic cells, and this is required to silence germline genes in the soma. A few issues need to be resolved to test these ideas and rule out others.

      Main comments:

      The authors' hypothesis is that SPR-5 and MET-2 act directly, to modify chromatin of germline genes (MES-4 targets), but alternate hypothesis is that the key regulated genes are i) MES-4 itself and/or ii) known regulators of germline gene expression e.g. the piwi pathway. Mis regulation of these factors in the soma could be responsible for the phenotypes. Therefore, the authors should analyze expression (smFISH and where possible protein stains) for MES-4 and PIWI components in the embryo and larvae of wildtype, double and triple mutant strains. These experiments are essential and not difficult to perform.

      A second aspect of the hypothesis is that spr-5 and met-2 act before mes-4 and that while these genes are maternally expressed, they act in the embryo. There really aren't data to support these ideas - the timing and location of the factors' activities have not been pinned down. One way to begin to address this question would be to perform smFISH on the target genes and on mes-4 in embryos and determine when and where changes first appear. smFISH in embryos is critical - relying on L1 data is too late. If timing data cannot be obtained, then I suggest that the authors back off of the timing ideas or at least explain the caveats. Certainly, figure 8 should be simplified and timing removed. (note: Typical maternal effect tests probably won't work because if the genes' RNAs are germline deposited, then a maternal effect test will reflect when the RNA is expressed but not when the protein is active. A TS allele would be needed, and that may not be available.)

      Writing/clarity:

      -It would be helpful to include a table that lists the specific genes studied in the paper and how they behaved in the different assays e.g. RNAseq 1, RNAseq 2, MES-4 target, ChIP. That way, readers will understand each of the genes better.

      -At the end of each experiment, it would be helpful to explain the conclusion and not wait until the Discussion. For readers not in the field, the logic of the Results section is hard to follow.

      -The model is explained over three pages in the Discussion. It would be great to begin with a single paragraph that summarizes the model/point of the paper simply and clearly.

      Specific comments:

      -Figure 1 has been published previously and should be moved to the supplement.

      -Cite their prior paper for the vulval defects e.g. page 6 or show in supplement.

      -The second RNAseq data should be shown in the Results since it is much stronger. The first RNAseq, which is less robust, should be moved to supplement.

      -Figure 3 is very nice. Please explain why the RNAs were picked (+ the table, see comment above), and please add here or in a new figure mes-4 and piwi pathway expression data in wildtype vs double/triple mutants.

      -Figure 3 here or later, please show if mes-4 RNAi removes somatic expression of target genes.

      -Is embryogenesis delayed?

      -Figure 4 since htp-1 smFISH is so dramatic, it would be helpful to include htp-1 in the lower panels.

      -Figure 4, please add an extra 2 upper panels showing all the genes in N2 vs spr-5;met-2, for comparison to the mes-4 cohort.

      -Figure 6. Please show a control that met-1 RNAi is working.

      -To quantify histone marks more clearly, it would be wonderful to have a graph of the mean log across the gene. showing the mean numbers would help clarify the degree of the effect. we had an image as an example but it does not paste into the reviewer box. Instead, see figure 2 or figure 4 here: https://www.nature.com/articles/ng.322

      Significance

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the phenotypes into a more molecular analysis.

      This work will be of interest to people following transgenerational inheritance, generally in the C. elegans field. People using other organisms may read it also, although some of the worm genetics may be complicated. Some of the writing suggestions could make a difference.

      I study C. elegans embryogenesis, chromatin and inheritance.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The Katz lab has contributed greatly to the field of epigenetic reprogramming over the years, and this is another excellent paper on the subject. I enjoyed reviewing this manuscript and don't have any major comments/suggestions for improving it. The findings presented are novel and important, the results are clear cut, and the writing is clear.

      It's important to stress the novelty of the findings, which build upon previous studies from the same lab (upon a shallow look one might think that some of the conclusions were described before, but this is not the case). Despite the fact that this system has been studied in depth before, it remained unclear why and how germline genes are bookmarked by H3K36 in the embryo, and it wasn't known why germline genes are not expressed in the soma.

      To study these questions Carpenter et al. examine multiple phenotypes (developmental aberrations, sterility), that they combine with analysis of multiple genetic backgrounds, RNA-seq, CHIP-seq, single molecule FISH, and fluorescent transgenes.

      Previous observations from the Katz lab suggested that progeny derived from spr-5;met-2 double mutants can develop abnormally. They show here that the progeny of these double mutants (unlike spr-5 and met-2 single mutants) develop severe and highly penetrate developmental delays, a Pvl phenotype, and sterility. They show also that spr-5; met-2 maternal reprogramming prevents developmental delay by restricting ectopic MES-4 bookmarking, and that developmental delay of spr-5;met-2 progeny is the result of ectopic expression of MES-4 germline genes. The bottom line is that they shed light on how SPR-5, MET-2 and MES-4 balance inter-generational inheritance of H3K4, H3K9, and H3K36 methylation, to allow correct specification of germline and somatic cells. This is all very important and relevant also to other organisms.

      (very) Minor comments:

      -Since the word "heritable" is used in different contexts, it could be helpful to elaborate, perhaps in the introduction, on the distinction between cellular memory and transgenerational inheritance.

      -It might be interesting in the Discussion to expand further about the links between heritable chromatin marks and heritable small RNAs. The do hint that the result regarding the silencing of the somatic transgene are especially intriguing.

      Significance

      This is an exciting paper which build upon years of important work in the Katz lab. The novelty of the paper is in pinpointing the mechanisms that bookmark germline genes by H3K36 in the embryo, and explaining why and how germline genes are prevented from being expressed in the soma.

    1. Author Response

      Reviewer #1:

      Summary:

      In this paper, the authors utilize CRISPR-Cas9 to generate two different DMD cell lines. The first is a DMD human myoblast cell line that lacks exon 52 within the dystrophin gene. The second is a DMD patient cell line that is missing miRNA binding sites within the regulatory regions of the utrophin gene, resulting in increased utrophin expression. Then, the authors proceeded to test antisense oligonucleotides and utrophin up-regulators in these cell lines.

      Overall opinion (expanded in more detail below).

      The paper suffers from the following weaknesses:

      1) The protocol used to generate the myoblast cell lines is rather inefficient and is not new.

      2) Many of the data figures are of low quality and are missing proper controls (detailed in points 5,7,10, 12, 13,14)

      Detailed critiques:

      1) The title needs to be changed. The method used by the authors is inefficient. The title should instead focus on the two cell lines generated.

      We appreciate the reviewer’s comments: thanks to them, we have realized the focus of the manuscript should be in the new models we described and less in the methodology used to create them.

      Originally, we wanted to share the problems we faced when applying new CRISPR/Cas9 edition techniques to myoblasts: our conversations with other researchers in the field confirmed that many were having similar problems. However, the reviewer is right in the fact that there are many ways around this problem. We do describe ours and we are working in a new version of the manuscript with additional data to characterize our new models further and where the method used to create them, although included, is not the main focus of the manuscript. In this new version we will change the title accordingly.

      2) Line 104: The authors declare that the efficiency of CRISPR/Cas9 is currently too low to provide therapeutic benefit for DMD in vivo. There are lots of papers that show efficient recovery of dystrophin in small and large animals following CRISPR/Cas9 therapy. The authors should cite them properly.

      Thank you for your appreciation. We have reviewed the literature again to include new evidences of efficient dystrophin recovery as well as other studies with lower efficiency.

      3) Figures 1, 2,3, and 4 can be merged into one figure.

      4) Figure 2A and 2B can be moved to supplementary.

      5) Figure 2C and 2D are not clear. Are the duplicates the same? Please invert the black and white colors of the blots.

      Thank you for your comments. We have inverted the colors of the blots and changed the marks used in figure 2C and 2D to clarify that duplicates are indeed the same sample, assayed in duplicates. We have also merged figures 1 and 4 and moved figures 2 and 3 to supplementary in this new version.

      6) Figure 3: In order to optimize the efficiency of myoblast transfection, the plasmids containing the Cas9 and the sgRNA should have different fluorophores (GFP and mCherry). This approach would increase the percentage of positive edited clones among the clones sorted.

      We think the reviewer may have misunderstood our methodology: we are not using a plasmid with the Cas9 and another with the sgRNA, we are using two plasmids, both containing Cas9 and each a different sgRNA. We did try to use two different plasmids, one expressing GFP and one expressing puromycin resistance, but we found out that single GFP positive cell selection plus puromycin selection was too inefficient. We could have tried with two different fluorophores, but we tested the tools we had in our hands first and were successful at obtaining enough clones to continue with their characterization, so we did so instead of a further optimization to our editing protocol.

      7) Figure 4A: In the text, the authors state that only 1 clone had the correct genomic edit, but from the PCR genotyping in this figure shows at least 2 positive clones (number 4 and 7).

      Thank you for your appreciation. As you said, we got two positive clones (as we also indicate in figure 3B) but we completed the full characterization of one of them (clone number 7= DMD-UTRN-Model). In the new version of the manuscript we explain this further.

      8) Figure 4C: The authors should address whether one or both copies of the UTRN gene was edited in their clones.

      Thank you for your comment. Both copies of the UTRN gene were edited in our clones. We have included this information both in the text and in the figure 4 legend.

      9) Figure 4 B and D: The authors should report the sequence below the electropherograms.

      Thank you for this correction, we have included the sequence under the electropherograms.

      10) Figure 5B: This western blot is of poor quality. Also, the authors should specify that the samples are differentiated myoblasts. Lastly, a standard protein should be included as a loading control.

      Thank you for your comment. Poor quality of dystrophin and utrophin western blots was the main reason to validate a new method in our laboratory to measure these proteins directly in cell culture (1) like an alternative to western blotting. Since then, the myoblot method has been routinely used by us and in collaboration with other groups and companies. We included the western blot as it is sometimes easier for those used to this technique to be able to assess a blot in which there is no dystrophin expression. As you pointed out, our samples were all differentiated myotubes, not myoblasts, and we have modified this accordingly. Thank you very much for pointing out this mistake

      On the other hand, as described in the methods, Revert TM 700 Total Protein Stain (Li-Cor) and alpha-actinin were included as standards in dystrophin and utrophin western blots, respectively.

      11) Figure 5E: We would like to see triplicates for the level of Utrophin expression.

      We thank the reviewer for his/her recommendation, but we do not consider western blotting a good quantitative technique, we have included western blots to show the expression/absence of protein at the same level. We have included many more replicates than needed to show at the level of utrophin by myoblots. We acknowledge that western blotting is the preferred method for some reviewers, so in the new version of our manuscript we clearly indicate the value we give to each technique, being myoblots our choice for quantification.

      12) Figure 6: A dystrophin western blot should be included to demonstrate protein recovery following antisense oligonucleotide treatment. Also, the RT-PCR data could be biased as you can have preferential amplification of shorter fragments.

      Thank you for your recommendation but as we have explained before, myoblots have been validated in our laboratory to replace western blot for accurate dystrophin quantification in cell culture.

      13) Figure 6A: Invert the black and white colors. The authors should also report the control sequences and sequences of the clones under the electropherograms.

      Thank you for your suggestion, we have inverted the colors and added the sequences under the electropherograms.

      14) Figure 6B: Control myoblasts should be included in figure 5C.

      Thank you for this correction, we will include control myoblasts in the new manuscript version.

      15) Figure S2A: Invert the black and white colors.

      Thank you for your suggestion, we have inverted the colors.

      Reviewer #2:

      The work from Soblechero-Martín et al reports the generation of a human DMD line deleted for exon 52 using CRISPR technology. In addition, the authors introduced a second mutation that leads to upregulation of utrophin, a protein similar to dystrophin, which has been considered as a therapeutic surrogate. The authors provide a careful description of the methodology used to generate the new cell line and have conducted meticulous evaluations to test the validity of the reagents.

      However, if the main purpose of this cell line is to perform drug or small molecule compound screenings, a single line might not be sufficient to draw robust conclusions. The generation of additional DMD lines in different genetic backgrounds using the reagents developed in this study will strengthen the work and will be of interest to the DMD field.

      Thank you for your appreciation. We think that a well characterized immortalized culture, like the one we describe is sufficient for compound screening, as described in other recently published studies (2), (3). About the other suggestion, we have indeed used our method to generate other cultures for collaborators, but they will be reported in their own publications, as they are interested in them as tools in their own research projects.

      Further, the future use of the edited DMD line with upregulated utrophin is unclear. The utrophin upregulation adds a complexity to this line that might complicate the assessment of screened compounds. In contrast, this line could be used to test if overexpression of utrophin generates myotubes that produce increased force compared to the control DMD line.

      We think we may have not explained our screening platform well enough. Our suggestion is to offer our newly generated culture ALONGSIDE the original unedited culture: the original is treated with potential drug candidates, while the new one may or may not be treated, if these drug candidates are thought to act by activating the edited region (see an example in the figure below). In this case, the new culture will be a reliable positive control to the effects that may be reported in the unedited cultures by the drug candidates. We will make this clear in the new version of the manuscript.

      Created with BioRender.com

      In summary, while there is support and enthusiasm for the techniques and methodological approach of the study, the future use of this single line might be dubious and could be strengthened if additional lines are generated.

      We share the reviewer’s enthusiasm for this approach, and we have included in the new version of the manuscript further characterization of this new cell culture that we think would demonstrate its usefulness better.

    2. Reviewer #2:

      The work from Soblechero-Martín et al reports the generation of a human DMD line deleted for exon 52 using CRISPR technology. In addition, the authors introduced a second mutation that leads to upregulation of utrophin, a protein similar to dystrophin, which has been considered as a therapeutic surrogate. The authors provide a careful description of the methodology used to generate the new cell line and have conducted meticulous evaluations to test the validity of the reagents.

      However, if the main purpose of this cell line is to perform drug or small molecule compound screenings, a single line might not be sufficient to draw robust conclusions. The generation of additional DMD lines in different genetic backgrounds using the reagents developed in this study will strengthen the work and will be of interest to the DMD field.

      Further, the future use of the edited DMD line with upregulated utrophin is unclear. The utrophin upregulation adds a complexity to this line that might complicate the assessment of screened compounds. In contrast, this line could be used to test if overexpression of utrophin generates myotubes that produce increased force compared to the control DMD line.

      In summary, while there is support and enthusiasm for the techniques and methodological approach of the study, the future use of this single line might be dubious and could be strengthened if additional lines are generated.

    3. Reviewer #1:

      Summary:

      In this paper, the authors utilize CRISPR-Cas9 to generate two different DMD cell lines. The first is a DMD human myoblast cell line that lacks exon 52 within the dystrophin gene. The second is a DMD patient cell line that is missing miRNA binding sites within the regulatory regions of the utrophin gene, resulting in increased utrophin expression. Then, the authors proceeded to test antisense oligonucleotides and utrophin up-regulators in these cell lines.

      Overall opinion (expanded in more detail below).

      The paper suffers from the following weaknesses:

      1) The protocol used to generate the myoblast cell lines is rather inefficient and is not new.

      2) Many of the data figures are of low quality and are missing proper controls (detailed in points 5,7,10, 12, 13,14)

      Detailed critiques:

      1) The title needs to be changed. The method used by the authors is inefficient. The title should instead focus on the two cell lines generated.\

      2) Line 104: The authors declare that the efficiency of CRISPR/Cas9 is currently too low to provide therapeutic benefit for DMD in vivo. There are lots of papers that show efficient recovery of dystrophin in small and large animals following CRISPR/Cas9 therapy. The authors should cite them properly.

      3) Figures 1, 2,3, and 4 can be merged into one figure.

      4) Figure 2A and 2B can be moved to supplementary.

      5) Figure 2C and 2D are not clear. Are the duplicates the same? Please invert the black and white colors of the blots.

      6) Figure 3: In order to optimize the efficiency of myoblast transfection, the plasmids containing the Cas9 and the sgRNA should have different fluorophores (GFP and mCherry). This approach would increase the percentage of positive edited clones among the clones sorted.

      7) Figure 4A: In the text, the authors state that only 1 clone had the correct genomic edit, but from the PCR genotyping in this figure shows at least 2 positive clones (number 4 and 7).

      8) Figure 4C: The authors should address whether one or both copies of the UTRN gene was edited in their clones.

      9) Figure 4 B and D: The authors should report the sequence below the electropherograms.

      10) Figure 5B: This western blot is of poor quality. Also, the authors should specify that the samples are differentiated myoblasts. Lastly, a standard protein should be included as a loading control.

      11) Figure 5E: We would like to see triplicates for the level of Utrophin expression.

      12) Figure 6: A dystrophin western blot should be included to demonstrate protein recovery following antisense oligonucleotide treatment. Also, the RT-PCR data could be biased as you can have preferential amplification of shorter fragments.

      13) Figure 6A: Invert the black and white colors. The authors should also report the control sequences and sequences of the clones under the electropherograms.

      14) Figure 6B: Control myoblasts should be included in figure 5C.

      15) Figure S2A: Invert the black and white colors.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Lee Rubin (Harvard University) served as the Reviewing Editor.

      Summary:

      While the paper by Soblechero-Martín et al., may present an ultimately useful method for modifying genes in skeletal muscle, the reviewers felt that, at the current time, the robustness of the methods and the amount of data presented were insufficient. The reviews below point towards additional experiments that could be done to improve this paper.

    1. Author Response

      Reviewer #1:

      This study is an in silico analysis of data from the Cancer Genome Atlas (TCGA) on hepatitis B virus (HBV)-positive liver tumours and human papillomavirus (HPV)-positive cervical and head and neck tumours and association with viral load, genotytpe(s) and expression. It is unclear to me the rationale behind including two unrelated DNA tumour viruses in the study, especially as the number of HBV-positive samples is much less than for HPV. Overall the manuscript seems to be a validation of a bioinformatic tool rather than reporting significant research findings.

      We strongly believe that a global summary of key oncoviral-associated tumors makes sense in this context precisely because of the fundamental importance viral genotype is already known to have. While HBV and HPV are of course quite different viruses, there is extensive clinical evidence that linking outcomes to specific viral genotypes and phenotypes is of great value, which we expand upon in our work via a working demonstration of ViralMine. For this reason we think it is crucial to present both virally related cohorts together as they support each other, demonstrate robustness our methods across completely different systems while allaying concerns about fine-tuning, and create a cohesive picture of the effect of viral genotype across the molecular landscape of two key onco-viruses. As the reviewer notes this does implicitly demonstrate the utility of ViralMine but we do emphasize that it also does uncover significant research findings.

      Concerning the HBV/HPV sample sizes, in fact the number and percentage of infected HCC samples is substantially higher than that of cervical or head and neck HPV samples as discussed in detail on page 4 of our manuscript.

      Use of the TCGA has allowed analysis of a reasonably large number of RNASeq data sets. However, once the authors drill down to individual genotypes, numbers become quite small, which may compromise some of the observation. For example, the large discrepancy between numbers of HPV16 (173) and 18(39)-positive cases makes it difficult to make firm conclusions about the significance of differentially expressed cellular genes for each set of cancers. Similarly, in Figures 4 and 6 they compare HPV18 (23 cases) with HPV45 (39 cases) and HPV18/45 coinfections (number not stated but likely far fewer).

      While there is an imbalance in group size between HPV genotypes in the cervical cancer cohort, the test statistic used by the DESeq2 pipeline to identify differentially expressed genes does account for class imbalance and even in the most extreme case we have analyzed the dispersion parameter estimates are easily verified as accurate. In fact accurately inferring group-wise dispersion parameters given unequal group sizes is a well-known problem, and in any case this problem only becomes acute when one group becomes so small (~1 sample) that it becomes difficult to estimate its common dispersion parameter. That situation clearly does not arise here. Additionally, in Figure 4b, it should be noted that we are comparing ALL HPV co-infected cervical tumor samples (92 cases) against single-infection samples (193 cases), which the reviewer may find more confidence in and which is obviously statistically reasonable. Furthermore, while the comparison of cervical cancer HPV18 (n=10), HPV45 (n=9), and HPV18/45 coinfected (n=39) cases in Figure 6b does compare relatively small patient groups, the significant difference in neoantigen population TCR binding affinity is confirmed by a one-sided, non-parametric KS-Test and shown to be robust to subsampling, which formally demonstrates that the signal is not artefactual. Therefore from a statistical point of view the concerns raised about class imbalance and power are not fundamental and were addressed in the original manuscript draft. Thus, we believe we can completely address the reviewer’s concerns by:

      In Figure 3a, Figure 4a and b, signify the group sizes (n=X) compared in the barcode plots to improve transparency in the contrasts, and additionally add group numbers to Figure 6a and b. Further, we will include a new supplementary figure demonstrating that a bootstrap resampling of the HPV group neoantigens to balance for group size validates that the difference in TCR binding affinity distributions is robust.

      Much of the information that they derive from their analyses is not novel. For example, they report no preferential sites of HPV integration. Despite what they claim, quite a bit is known about HPV co-infection in cervical cancers and it is not uncommon but varies according to geographical regions, which was not a variable they used.

      We acknowledge that other oncoviral survey papers have provided evidence of preferential integration (as we originally cited, as well as referenced in Dall et al. (2008), Zhang et al. (2016)). However, these and other previous characterizations of recurrent HPV integration do not attempt to organize these sites by either genotype or co-infection status, which was our explicit and stated aim, principally because they could not efficiently and accurately determine these parameters from in-situ tumor RNA. As we found no preference in integration along these axes of variation (which we acknowledged openly in the manuscript as being expected when using RNA rather than DNA), we deliberately chose not to present these results as a main finding and included them in supplemental results for the sake of completeness.

      We also agree that HPV co-infection in cervical lesions is not per-say a novel finding, although to be clear most literature focuses on side-by-side infections of HPV with another virus (HHV, EBV, HIV, etc.), or uses the term to describe groupings of sub-variants or isolates under the same viral genotype header (Mirabello et al. (2016)). Additionally, most of the literature focuses on HPV co-infection in cervical neoplasia or high-grade lesions and cervical cancer risk (Chaturvedi et al. (2011); Senapati et al. (2017)) rather than assessing HPV co-infection in the tumoral tissue itself, post oncogenesis. As such, we believe that our approach at looking at in situ cervical tumor infections and the relatively high rate of HPV co-infections we observe does merit particular notice compared with previous studies. Furthermore, the analyses linking this cross-genotype co-infection phenotype with tumor gene expression, survival adjusted for major known clinical covariates, and tumor immunogenicity measures has not been reported elsewhere to our knowledge.

      For HPV, viral exon-level RNASeq analysis is irrelevant because HPV gene expression is polycistronic and is subject to changes by random viral integration events in individual cases. Therefore, it is unlikely that general overall viral gene expression signatures will be diagnostic besides, from multiple studies we understand that what matters in cervical cancer is the level of expression of the E6/E6 isoforms/E7 oncogenes.

      We agree that the post-transcriptional polycistronic nature of HPV expression makes it difficult to elucidate the effect of differing HPV gene-level expression on ultimate HPV gene translation and protein expression. However, our related yet distinct question here is on the effect HPV genotype and cancer type has on HPV gene transcriptional differences (as seen in Figure 7), so we believe we are within the limits of reasonable interpretation. Additionally, while E6 and E7 expression are well known to drive oncogenesis, it seems crucial to quantify the expression of these viral oncogenes across viral genotype and tissue type, which has not been done previously to our knowledge. Finally, even if we somehow accept that the average tumoral viral gene exon expression itself is best described as a random variable, which we do not, it remains to be explained why we observe and report persistent genotype-specific expression patterns across completely different cell-types.

      The references chosen for the HPV part of the study are either rather out of date or not representative of the extensive literature.

      We acknowledge that we have cited only a portion of the vast HPV-related cancer literature, so we have made an effort to include more recent surveys and studies as references.

      Reviewer #2:

      1) The authors comment that averaged infection phenotypes such as viral load or predominant genotype may be replaced by more granular measures, such exon-level viral expression or the ratio of expressed viral genotypes. In reality, viral expression, and the ratio of expressed viral genotypes, are still 'tumor averages' in the way that the authors have analysed them. HP associated tumors are heterogeneous, and without in situ analysis, it is hard to discern which transcripts are involved in driving the cancer phenotype, and which are found in associated precancerous tissue.

      We concede that the viral genotypes quantified by our method represent a computed average measure across the tumor, as would any measurement of any quantity in a bulk sequencing assay. However, the information provided by the admixture of genotypes and exon-level viral expression does provide an additional measure of granularity over previous bulk measures, and allows additional analyses not explored previously to our work. To make a comparison, this criticism could identically apply to cell-type decomposition algorithms like Cibersort, which despite their problems and inherent limitations do provide insightful information. We agree with the reviewer that with more targeted in situ analyses would allow for a truly specific association of particular viral transcripts with tumor phenotype, and would serve as a useful validation of some of our results, but this certainly does not invalidate the tumor aggregated genotype and co-infection presence associations we present here. We agree with the reviewer that multiple biopsies would allow for intra-tumoral heterogeneity to be taken into account in our study, however no major public resources (e.g. TCGA) include such data and we believe that such an undertaking lies out of any reasonable scope of this work.

      2) The authors use the term co-infection quite widely. For HPV, previous studies have shown that coinfection within cells in an individual cancer or neoplasia is rare, although independent infections by different HPV types can occur side-by-side. I expect something similar with HBV, although the study would need a higher level of analysis to establish this. The use of terminology, and the way in which data is interpreted, needs to be much more rigorous.

      We agree with the reviewer that the use of ‘co-infection’ in this context is unclear, as co-infection on a cellular level with two different HPV/HBV genotypes is impossible to determine by bulk RNA sequencing analysis. We will clarify ‘co-infection’ as strictly a mixture of independent HPV infections contained in the same tumor tissue.

      We will clearly define our meaning of ‘co-infection’ in the introduction as the aggregated mixture of HPV genotypes expressed in the tumor tissue (‘side-by-side’ infections), to remove ambiguity as to our cohort characterization.

      3) Viral load is generally used in the field as a measure of viral genome or genome-fragment abundance. This is already a misuse of the terminology, as the term implies virus numbers, or even infectious virus numbers. Here the term is used to refer to viral transcript abundance. The authors need to say precisely what they're measuring, and need to be aware that they are measuring the average across a heterogeneous tumour, which may have areas of high grade neoplasia, cancer, and even low-grade neoplasia. My feeling is that the level of analysis is too great, given the uncertainties regarding the heterogeneous nature of tissue that is being analysed, and the different cells with different levels of viral gene expression that are most likely present.

      We agree that as the reviewer frames it, our use of ‘viral load’ should be clarified as ‘viral transcript abundance’ as determined from the tumor RNASeq data in variance-stabilized units of log2 counts per million reads mapped across the viral contig. We do note however that it has been previously indicated that levels of viral transcripts do correlate well with virus numbers in infected tissue. Concerning the last comment of the reviewer, we wish to point out that our analysis goes no further in either analytic complexity nor in drawing inference from expression data than any published other study based on tumor bulk RNA-sequencing data. All samples will contain a mixture of cells and we emphasize that we are only measuring average signals, viral or host tumor specific, across this mixture.

      To address these comments we will change all references to viral load to normalized viral transcript abundance, to remove ambiguity. We can once again emphasize that our conclusions hold only in a strict averaged sense.

      4) Several of the figures don't obviously support the conclusions. For instance, it is not clear how the data shown in figure S2 supports the title of the S2 figure legend. Surely some statistical analysis is needed to support the conclusion stated in the legend. Given previous studies, I'm not at all convinced that the distribution of causative HPV genotypes is the same between SCC and Adenocarcinoma. An additional limitation of these large cancer association studies, comes from limitations in pathology diagnosis, which cannot always accurately distinguish borderline SCC/adenocarcinoma cases. With the large-scale transcriptional analysis, maybe the authors can use molecular information available in their samples to look at this.

      As the reviewer points out, we agree the statistical evidence backing our claim of no association between cervical histology and HPV infection genotype or co-infection should be added. This calculation was actually carried out and only reported in the text, but we will amend the figure to include the results and apologize for this key omission. We also note in passing that we are not making any claims about ‘causative’ HPV genotypes for the respective subtypes, but rather much more conservative statements about association. Concerning the reviewer’s concern about the quality of the phenotypic data reported in the TCGA, we heartily agree but are unable to really do much else. Indeed, concerning the last interesting comment about utilizing molecular information in our samples to distinguish SCC/adenocarcinoma subtypes, we did not find reliable gene expression signatures which could be used to validate or correct the phenotypic results.

      We will add in the spearman correlation rho and test significance results for the correlation between cervical cancer histological type and both viral phenotypes represented in figure S2.

      5) The APOBEC analysis is quite rudimentary in the text, and does not discuss the different members of the APOBEC family. Similarly, the different effects of single and multiple HPV infections on the IFR3 responsive genes is poorly developed at the biological level, which most probably reflects the general way in which the utility of the approach.

      We agree with the reviewer that our APOBEC expression analysis in the HPV+ cervical cohort could be more comprehensive, and therefore the interpretations of the results may be too far reaching. We believed the initial result to be of sufficient interest in the context of a very similar result from Zapatka et. al (2020), but concede it may make more sense as a supplemental result alone without additional evaluation or discussion of the greater APOBEC family. Additionally, the pathway analysis involving the differentially expressed genes from the co-infected and non-coinfected cervical tumors most likely should be moved to a supplemental result as well without further analyses to support the enrichment trends, following how we reported the HBV associated liver cancer co-infection DEG results (figure S5).

      We will move Figure 3d to a supplemental figure, and limit our comments in the results to just an observation in reference to Zapatka et. al., and delete any associated interpretation. We will move Figure 3c to a new supplemental figure as well, and remove the suggestion of expanded antiviral activation in co-infected tumors.

    2. Reviewer #2:

      The title of the manuscript suggests a detailed analysis of cancers using in situ gene expression approaches, which aims to provide new insight into tumour heterogeneity and co-infection. The manuscript is in fact an analysis of viral transcription and the presence of cellular mutations in a collection of tumours associated with HPV and HBV infection. Much of the starting data for the analysis has been drawn from the TCGA database. It is a little unclear as to whether the authors are pitching this paper as a methodological development manuscript, but I think that this is what it is at its heart. The ability to deconvolute RNA sequencing data from virus-associated tumours is interesting, and could be widely used as a research tool. However, much of the manuscript is concerned with interpreting the data, and I think the interpretation goes well beyond what can feasibly be achieved from the analysis of transcripts in extracts of total tumour tissue. The authors term 'co-infection' most likely refers to heterogeneous mixtures of viral infected cells which are competing with each other in the tumour. In my view, the biological interpretations are not particularly useful at the level that they are presented, but could serve as the starting point for future research. This manuscript could be repackaged as a description of a new analytical tool, or the most exciting aspects drawn out with the addition of biological studies to explain what the transcriptional analysis may mean. This would be a complex process, and would be facilitated by focus on either HPV or HBV, as trying to extend conclusions to the two disparate virus families in one manuscript is probably unrealistic. Without any analysis of tumour tissue using in situ analysis or single cell sequence analysis, or a combination of the two, there is little new information that can be drawn regarding the biology of disease development. My suggestion would be to repackage this as an analytical methodology publication, rather than a biology discovery manuscript.

      1) The authors comment that averaged infection phenotypes such as viral load or predominant genotype may be replaced by more granular measures, such exon-level viral expression or the ratio of expressed viral genotypes. In reality, viral expression, and the ratio of expressed viral genotypes, are still 'tumor averages' in the way that the authors have analysed them. HP associated tumors are heterogeneous, and without in situ analysis, it is hard to discern which transcripts are involved in driving the cancer phenotype, and which are found in associated precancerous tissue.

      2) The authors use the term co-infection quite widely. For HPV, previous studies have shown that coinfection within cells in an individual cancer or neoplasia is rare, although independent infections by different HPV types can occur side-by-side. I expect something similar with HBV, although the study would need a higher level of analysis to establish this. The use of terminology, and the way in which data is interpreted, needs to be much more rigorous.

      3) Viral load is generally used in the field as a measure of viral genome or genome-fragment abundance. This is already a misuse of the terminology, as the term implies virus numbers, or even infectious virus numbers. Here the term is used to refer to viral transcript abundance. The authors need to say precisely what they're measuring, and need to be aware that they are measuring the average across a heterogeneous tumour, which may have areas of high grade neoplasia, cancer, and even low-grade neoplasia. My feeling is that the level of analysis is too great, given the uncertainties regarding the heterogeneous nature of tissue that is being analysed, and the different cells with different levels of viral gene expression that are most likely present.

      4) Several of the figures don't obviously support the conclusions. For instance, it is not clear how the data shown in figure S2 supports the title of the S2 figure legend. Surely some statistical analysis is needed to support the conclusion stated in the legend. Given previous studies, I'm not at all convinced that the distribution of causative HPV genotypes is the same between SCC and Adenocarcinoma. An additional limitation of these large cancer association studies, comes from limitations in pathology diagnosis, which cannot always accurately distinguish borderline SCC/adenocarcinoma cases. With the large-scale transcriptional analysis, maybe the authors can use molecular information available in their samples to look at this.

      5) The APOBEC analysis is quite rudimentary in the text, and does not discuss the different members of the APOBEC family. Similarly, the different effects of single and multiple HPV infections on the IFR3 responsive genes is poorly developed at the biological level, which most probably reflects the general way in which the utility of the approach.

    3. Reviewer #1:

      This study is an in silico analysis of data from the Cancer Genome Atlas (TCGA) on hepatitis B virus (HBV)-positive liver tumours and human papillomavirus (HPV)-positive cervical and head and neck tumours and association with viral load, genotytpe(s) and expression. It is unclear to me the rationale behind including two unrelated DNA tumour viruses in the study, especially as the number of HBV-positive samples is much less than for HPV. Overall the manuscript seems to be a validation of a bioinformatic tool rather than reporting significant research findings.

      Use of the TCGA has allowed analysis of a reasonably large number of RNASeq data sets. However, once the authors drill down to individual genotypes, numbers become quite small, which may compromise some of the observation. For example, the large discrepancy between numbers of HPV16 (173) and 18(39)-positive cases makes it difficult to make firm conclusions about the significance of differentially expressed cellular genes for each set of cancers. Similarly, in Figures 4 and 6 they compare HPV18 (23 cases) with HPV45 (39 cases) and HPV18/45 coinfections (number not stated but likely far fewer).

      Much of the information that they derive from their analyses is not novel. For example, they report no preferential sites of HPV integration. Despite what they claim, quite a bit is known about HPV co-infection in cervical cancers and it is not uncommon but varies according to geographical regions, which was not a variable they used.

      For HPV, viral exon-level RNASeq analysis is irrelevant because HPV gene expression is polycistronic and is subject to changes by random viral integration events in individual cases. Therefore, it is unlikely that general overall viral gene expression signatures will be diagnostic besides, from multiple studies we understand that what matters in cervical cancer is the level of expression of the E6/E6 isoforms/E7 oncogenes.

      However, such an in silicio approach to quantify various aspects of virus-associated tumours could be a useful prognostic clinical tool in the future.

      The references chosen for the HPV part of the study are either rather out of date or not representative of the extensive literature.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Margaret Stanley (University of Cambridge) served as the Reviewing Editor.

      Summary:

      The reviewers agree that the study is technically impressive but the biological data generated is not particularly novel and there are criticisms of the interpretation of the data. The study may have value as a methodological and bioinformatics tool.

    1. Reviewer #3:

      In this paper, Arcaro and colleagues investigate the relationship between bumps along the macaque STS and functional selectivity for faces. Through a series of analyses, they convincingly demonstrate a strong structural-functional relationship between face patches and these small sulcal bumps. They show that this correspondence outperforms functional probabilistic atlases, and does not result from functional specialization per se, as visually-deprived monkeys show similar anatomical folding patterns. As someone familiar with the field of vision and cognitive neuroscience, I can say that this paper is thorough, employs careful single-subject analyses, and I honestly do not have much to add to improve what is already a great paper.

      Points:

      For clarification, were the borders of each bump drawn by hand on the cortical surface (e.g. what's show in Figure 2B)? Saying so in the text will help future researchers replicate the identification process.

      Monkey M3 looks odd; what do you think is going on there? I know there is individual variability, but the AL patch in that monkey seems atypical in its position in both hemispheres. For most monkeys it almost looks like you could mirror their STS from one hemisphere and predict relatively well their other hemisphere, but in M3 the AL patch doesn't look symmetric.

      The previous point got me thinking, was data across hemispheres within a monkey collapsed before statistical testing? Are there hemispheric differences in bump volume or spacing? Apologies if I missed that in the text.

      In the visually-deprived monkeys, were there any anatomical differences at all within the bumps? Volume differences? Thickness of the cortex that comprises a bump? If this is the topic of another paper and the authors excluded it purposely, I understand, but it might speak to how functional emergence interacts with existing structure (in this case the bumps).

      Did I miss something, or is there really a reference to Julius Caesar's Gaul in the discussion? Is that what "Gallia" is referring to? I appreciate a deep historical reference (if that's what this is) but I'm worried that this will go over most readers' heads. Happy to leave it for poetic purposes, but just noting that it will likely be confusing.

    2. Reviewer #2:

      Summary:

      Neuroimaging and electrophysiological experiments demonstrate a series of face-selective regions in the macaque superior temporal sulcus (STS). In normalised space, these regions partially align across individuals. The present report demonstrates that for some of these regions, local surface properties ("bumps") provide additional reliable information about the likely location of face-selective patches (in fMRI) and cells (in intracranial recordings). That pre-cursors of these bumps are identified both pre-natally, and in macaques reared with abnormal visual experience of faces, indicates that these bumps do not arise due to the normal development of face-selective cortical activity. Similar bumps are found in some other primates, although much of the relevant imaging and electrophysiology data that would help to assess homologies is not yet available.

      General assessment:

      This is a well-presented study that addresses a topic of ongoing interest with highly rigorous methods. On a narrow reading that holds close to the data, the paper offers an interesting observation that would seem to have mainly practical implications (e.g. in informing localisation for future electrophysiological work). In contrast, the effort to draw wider theoretical implications for understanding the visual organisation of STS seems to rely on unpicking the main observation that prompted the report in the first place, and on inferences and speculations that extend too far beyond the data that are reported.

      Substantive points:

      If "bumps" are the relevant physiological markers -- and demonstrating this is the thrust of most of the paper -- then it seems important to understand what a "bump" is. That is, what underlying properties or developmental processes are implied by the presence of a cortical bump, in contrast to regions with less prominent local curvature? The authors only very briefly review some possible mechanisms in the Discussion, and I felt more a complete exploration of this issue would have been useful.

      However, having established a structure-function correlation empirically, at the same time the paper provides many indirect lines of evidence to suggest that this relationship may be tangential at best. As the authors note, "STS bumps are not sufficient to produce face selectivity in the absence of face experience". Nor are bumps necessary to produce face selectivity, given the apparent absence of bumps related to MF and AF. Further, the overlap between bumps and faces patches is variable over individuals, and incomplete: the bumps are large, and not entirely comprised of face-selective populations. The authors also note studies that reveal broadly similar tri-partite STS organisation of retinotopic responses, and of body and colour-selective patches. For example, images of bodies tend (in macaque fMRI) to activate regions that are adjacent to face patches, suggesting that there would be a similar anatomy/function relationship for this visual category too. Finally, the authors note that the kinds of physiological processes that are likely to produce bumps are too generic to produce a face-specific mechanism. The authors' speculation, in light of such considerations, is that anatomical bumps in STS are in fact the indirect signals of three distinct, coherent, and complex visual areas that may contribute to a range of visual processes. The main difficulty with the manuscript, as I see it, is that while these wider possibilities are what give the paper the potential to engage a broad neuroscience audience, they are simply too far removed from the actual observations that are reported here. Substantial additional evidence would need to be mustered to support the (admittedly interesting) picture of arealisation in STS that the authors paint. Without such evidence, what remains is mainly a structure-function observation that is interesting, and perhaps practically useful for further studies, but with uncertain theoretical implications.

    3. Reviewer #1:

      This paper reports a correspondence between structural markers, convexities ("bumps") along the superior temporal sulcus (STS), and face-selective patches in the macaque inferior temporal cortex. They localized three face patches with fMRI and each of these face patches overlapped with one of three bumps. These bumps were also present in monkeys that lacked face patches because of being reared without exposure to faces. These data provide some evidence for a correspondence between structure and function in inferior temporal cortex in macaques, in line with recent evidence for a link between structure and function in the temporal lobe of humans. This is interesting work showing novel data on a potential correspondence between structure and function in macaque temporal cortex. They examined, for monkey studies, a relatively large number of subjects and employed two functional measurements, fMRI and multi-unit recordings. However, I have some concerns regarding the correspondence between the face patches and the anatomical structure that need to be addressed.

      Main comments:

      1) The authors employed an automatic procedure to compute the convexity of the pial/white matter, which is excellent because it is objective. However, I found it difficult to differentiate neighboring bumps in some of the animals (Figure 2 S1). One reason for this is the way Figure2 S1 was made, showing the bumps with different colors that occlude to some extent the underlying convexity map. The authors should show the convexity map for each monkey and then in a separate panel show the identified bumps, so that one can judge the correspondence between the convexity map and the bumps. Also, the group average data shown in Figure 2C look not very convincing to me: I find it difficult to differentiate the posterior from the middle bump: it looks like one long continuous convexity instead of two with a clear border in between. This could be due to the averaging across monkeys. That is why Figure 2S1, that shows the data of the individual monkeys, is important but that figure needs to be improved by showing the convexity maps alone (see above).

      2) The overlap between the bump surfaces and the patches depend on how the two are defined. As said above, I found it difficult to identify the individual bumps. The surface area/size of a face patch depends on the statistical threshold (and number of runs etc) that is used to define it and thus is arbitrary to some extent. These two factors make it difficult to evaluate the degree of overlap between patches and bumps and to interpret the DICE overlap analysis. The authors should address this by using several thresholds to define the face patch surface and examine how this affects the DICE outcome and analyses using centroids.

      3) Because the face patches appear to be a (in some cases) small part of a bump and its location can vary within the bump, how predictive is the bump then about the location of the face patch? The correspondence between structure and function appears to be rather coarse: I have the impression from the comparison of the centroids of the bumps and face patches (Figure 4) that there is a reasonable correspondence between ML and the middle bump, but that it is weaker for PL and AL. Furthermore, it is highly variable amongst animals. For instance, in M3, face patch AL appears to lie in between the middle and anterior bump. This suggests that the bumps might not determine the presence of a face patch but that perhaps the presence of a bump and a face patch are unrelated mechanistically.

      4) The authors' work ignores the most anterior face patch, AM, which is located outside the STS (as in fact also PL typically is (in fact, also in the present study)). It has been suggested that AM is important for face identification, having a high tolerance for identity-preserving transformations such as viewpoint (see the work by Freiwald and Tsao), and thus is difficult to ignore. How does AM fit into the proposed correspondence between STS bumps and face patches?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that the paper reports an interesting finding: a potential correspondence between structure and function in macaque temporal cortex. However, they also noted that this correspondence was only partial and variable across individuals. Furthermore, the reviewers were unsure of the broader theoretical implications of this finding.

    1. Reviewer #2:

      In this work the authors seek to disentangle the reason for a well-documented effect regrading reduced model-based tendencies among high compulsive individuals. The authors collected behavioral and EEG data from ~200 participants performing a two-stage decision task. Main findings show a latent compulsivity factor is associated with weaker transition-type effects at the task's 2nd stage. Specifically, high compulsive individuals show smaller reaction-times and parietal-occipital alpha-band power differences between uncommon and common transitions. These findings are interpreted as evidence in favor of less accurate model as a reason for reduced deployment of model-based strategies in compulsive individuals. Authors further note reduced theta power for compulsive individuals during 1st stage choice.

      I am generally very impressed with this manuscript. I think the authors are addressing an important question that has a lot of promise in pushing the field forward. I also believe that given the number of participants, this is a relatively well powered EEG and behavioral study. Yet, I have one major concern as detailed below:

      The authors relay on 2nd stage effects to estimate to what extent individuals are more or less aware of the transition structure (i.e., to what extent individuals are surprised by an uncommon state, or unsurprised by the common one). However, unlike Konovalov & Krajbich, 2020 who used a mouse tracking procedure to capture participants' 2nd stage expectation, both the RT and alpha band scores might be confounded due to 1st stage choice strategy. Individuals with stronger deployment of model-based strategies in the 1st stage tend to get more often to the best 2nd stage choice by means of a common transition. In contrast, the choices made by MF individuals at the 1st stage will not direct them more often to the best 2nd stage option by means of a common transition. This means that for a MF individual, the overall value difference for the two options offered at the 2nd stage will be similar in common and rare transitions, while for a MB individual the value difference will be higher in common vs. rare transitions. This is even when both MB and MF agents have a perfect knowledge regarding the task transition structure, and are equally surprised by an uncommon transition. Since the 2nd stage decision is easier on average on common vs. rare transitions for MB agents, they should also exert stronger transition effects compared with MF agents on 2nd stage estimates. One such effect might be greater alpha-band on rare transitions reflecting a greater mental effort (as the authors note). Also, when the decision is easier due to larger value difference, shorter RTs are to be expected (e.g., Pedersen et al., 2017 on pbr; Shahar et al., 2019 on plos-cb). This means that transition effect on both alpha-band and RTs is expected due to the use of MB strategies in the 1st stage, even if transition probability is perfect. Indeed, the authors report lower MB deployment at the 1st stage for compulsive individuals, which is in-line with their weaker transition-related effects on the 2nd stage.

    2. Reviewer #1:

      In this report, the authors test a hypothesis about the nature of high-level ("model-based") vs. low-level ("model-free") learning across the spectrum of behavioral compulsivity. Prior literature has suggested that high-compulsive individuals have a deficit in either forming a model of the world, or implementing that model due to competition from learned low-level action-outcome tendencies. This report tested a large number of participants with concurrent EEG (N=192) across a range of compulsivity on the well-known two-step reinforcement learning task.

      The authors note that they "replicated prior work in findings that individual differences in compulsivity and intrusive thought ... were associated with reduced model-based planning" with analyses of accuracy (pg. 8). Analysis of RT revealed a novel effect of compulsivity on model-based planning, which was replicated using archival data of the same task. E-phys findings indicated that the candidate biomarkers of control in P300 and frontal midline theta were unrelated or not specifically related to model-based planning deficits in compulsivity, respectively (more on this below). However, the novel biomarker of posterior alpha power during the transition period was indeed linked with model-based planning deficits in compulsivity. This is novel.

      This report is extremely well motivated by prior literature, it is very well written, and very well executed. Supplemental controls for age and IQ, tests of the specificity of EEG effects with compulsivity, and tests of the specificity of this compulsivity dimension on dependent measures in relation to associated personality variables (e.g. anxious depression & social withdrawal, also raw item measures) all work together to bolster the conclusions. This is a very carefully presented report.

      Despite these virtues and advantages, the take-home message that I leave with is that EEG is not ideally suited for revealing the nature of compulsivity on model-based planning. P300 was irrelevant, frontal theta was possibly indirectly related (see below), and only posterior alpha was indicative of the compulsivity-related findings revealed in the behavioral analysis. This is unfortunately the least useful assessment of cognition used here, as it reflects the lowest level of control or decision making amongst these EEG measures. This perceptual effect is likely more of a consequence of the behavior than a candidate mechanism underlying it. This conclusion unfortunately diminishes the utility of these findings.

      Regarding theta: Theta power and compulsivity were related to RT change, and they were related to each other, even though theta was not related to model-based planning (presumably tested via accuracy / choice). Although these patterns are carefully interpreted, it isn't perfectly clear how these were tested and I suspect there may be more that could be tested / inferred here. First, theta may still be related to the latent feature of "model-based choice" even if it is not significant due to the manifest measure based on choice patterns. This requires some careful unpacking of semantics and what latent constructs can be inferred from which manifest variables, but it is always a good idea to question what a single measure can infer about complex cognitive states. Second, taking this theoretical issue and including a methodological point, the single trial theta-RT relationship may still be altered by compulsivity even if theta power is not. Power and power-RT correlations have been presented as different measures of control that can be differently affected by a host of variables. This could presumably be tested by a thetaRTcompulsivity interaction, and could be visualized as a correlation between the individual theta*RT beta weight (Y-axis) with compulsivity on the X-axis.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The authors aimed to disentangle the processes underlying compulsive individuals' difficulties forming models of the world, or implementing such models due to competition from lower-level action-outcome tendencies. To this end, they obtained behavioral and EEG data from ~200 participants performing a well-established two-step reinforcement learning task. The authors note that they "replicated prior work in findings that individual differences in compulsivity and intrusive thought ... were associated with reduced model-based planning" with analyses of accuracy. RT analyses revealed a novel effect of compulsivity on model-based planning, which was replicated using archival data of the same task. EEG findings indicated that the P300 and frontal midline theta, both well-established measures of cognitive control, were unrelated or not specifically related to model-based planning deficits in compulsivity, respectively. Posterior alpha power during the transition period, a novel marker, was linked with model-based planning deficits in compulsivity.

    1. Reviewer #4:

      The author investigated the relationship between spontaneous (SR), peak (PR), and steady-state (SS) firing rate in sensory neurons (across modalities) by extracting data from approximately ten studies, published between 1928 and 2017. The relationship between SR, PR, and SS is surprisingly simple: SS =sqrt(PRxSR). The author concludes that this is a universal law of sensory adaptation.

      General Assessment: The claims of universality are not supported by the analysis.

      Major comments: The primary claim of a universal law of adaptation is based on a meta-analysis of fewer than 20 several hand-picked papers. This is contrary to good scientific practice of meta-analyses. To truly assess the universality of the rule, the author should define a time period, a set of journals, and possibly some other criteria for exclusion/inclusion and then study the relationship across all publications that meet these criteria. Without such a clearly defined approach, the reader cannot know whether the examples were cherry-picked.

      The comparisons of (extracted) experimental data and the model are entirely "by eye"; statistical analysis is lacking.

      Even within the chosen sample, universality is clearly a step too far and the author's explanations of why the universal law fails are not particularly convincing ("the visual system is complex"). There may be something to the claim that the law only applies "in the absence of interaction from others cells in the neural circuitry", but this should be part of the study (i.e investigate only those papers that studied neurons that were isolated in this sense), not as an ad-hoc explanation of discrepancies.

      The manuscript only states the "universal law" but leaves an explanation to future work. This is unsatisfactory. Detailed neuronal models exist that explain adaptation (e.g. in terms of the opening of potassium channels). These alternative biophysical explanations need to be considered.

    2. Reviewer #3:

      This manuscript by Wong proposes that steady state responses to a constant sensory stimulus-the responses observed after adaptation-are well predicted by a simple relationship between the spontaneous firing rate and the peak firing rate, namely their geometric mean. The author provides evidence extracted from measurements made in previous published studies, across species and modalities.

      The paper presents a simple and somewhat interesting observation. However, it is difficult to accept the claim and support publication for several reasons:

      1) The comparisons between the predicted and measured responses are entirely qualitative, and there is no alternative model considered. The predictions in Table 1 are pretty good but in many cases the arithmetic mean works reasonably as well (unless peak rates are very high). The steady state will lie somewhere between peak and spontaneous. Where is the quantitative evidence that the geometric mean is better than an alternative? What other relationships might better map the quantities on to each other?

      2) There is little context for the observation: if true, why should we care that Eq 1,2 hold? The discussion hints that the observation is consistent with theoretical principles. If this were laid out in a compelling way, it would greatly increase the impact and relevance of the observation. As it stands, the observation has little context. The implications are unclear.

      3) It is not clear how the studies considered here (i.e. where the data came from) were chosen. Surely there are many studies of sensory responses to the constant stimuli. How did the author choose this small subset (~10 studies)? For a 'universal' law, one would want to see many studies considered. In addition, in the studies considered here, the values were extracted in an ad hoc manner.

      4) The discussion points out many cases in which the rule does not apply (whenever neurons are embedded in a circuit as opposed to being primary sensory neurons). This limits the appeal of the proposal, unless one can provide theory/explanation for why such a relationship should hold in the periphery but not in more central structures.

      5) Previous work has dispelled the notion of a steady state response, arguing that responses continue to decrease with adaptation duration, following a power law dependence (Drew and Abbott, 2006, J Neurophysiol 96: 826). If so, the rule proposed here is unlikely to hold across adaptation duration, again suggesting they are not broadly applicable.

    3. Reviewer #2:

      This paper proposes a universal law of adaptation that occurs during sustained sensory stimulation. The law states that the sustained response of sensory afferents equals the square root of the product of the spontaneous and transient, peak response. The author shows several examples of previously published results to support the claim, some dating back to the seminal studies by Adrian. The author states that the law can be derived from a theory of sensory processing but does not provide further information on this (he refers to a publication in preparation).

      This is interesting work and the paper is well-written. However, I am not convinced by this claim of a universal law of adaptation. First, it does not appear to be universal, and, second, the empirical data that are provided to support its universality are not convincing yet.

      1) The law is not universal: in his Discussion, the author lists exceptions to the rule, in the visual system, auditory system and even for somatosensory afferents. Explanations are given of why the law does not hold in some of these cases, but the exceptions show that the law is not universal. Even when it is not universal, the theory should be able to predict in which cases it holds and when it does not hold.

      2) I am not convinced by the evidence presented in Figure 2. In several instances, the slope of the relationship between the log peak and log sustained (steady state) activity does not seem to be equal to the predicted 1/2: e.g. in panels b and c .The author should have computed the slope and tested whether it was 1/2.

    4. Reviewer #1:

      This study reports an interesting observation, namely that the firing rate after sensory adaptation appears to be equal to the geometric mean of the peak firing rate and the spontaneous firing rate. However, there are concerns about the theoretical motivation and general empirical evidence supporting this observation.

      1) Theoretical motivation: still unclear even after discussion, although we are told it exists. "The derivation of Eqs 1-2 of will be the subject of a later publication."

      2) Is this relationship supposed to hold for each stimulation or on average? The author seems to be only working with averages.

      It is not clear why exactly these (quite old) studies were selected. What was the criteria to include these studies in the meta-analysis? A number of exceptions are later discussed however.

      Alternative of in-depth analysis of existing datasets requested from other authors was explicitly not done. Could also address the trial-wise validity.

      "Not only does adaptation show time-varying changes in firing rate, but the variability makes it difficult to know exactly which value to choose. Averaging the data is not feasible without extracting a large number of data points, and this was not possible from noisy images. As such, with the exception of two studies, ... a visual estimation of the average activity in the final portion of the adaptation curve was used."

      The error introduced by visual estimation remains unknown.

      3) Counter-example in one of the few easily accessible papers, in the ferret, reference 16 (https://pubmed.ncbi.nlm.nih.gov/22694786/#&gid=article-figures&pid=fig-6-uid-5 ). Another counterexample appears in Fig 8 of this randomly chosen paper, although maybe the mechanoreceptor of the cricket doesn't count due to some exclusion criterion, since it is an interneuron. (https://journals.physiology.org/doi/full/10.1152/jn.1997.77.1.207?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed)

      4) The law doesn't hold for a number of exceptions, this is not announced until the discussion (missing in abstract).

      5) The discussion ignores existing literature on information content and possible function of the sustained response, and of adaptation in general (e.g. gain control).

      6) Introduction: failure to cite recent reviews on this topic, e.g. "has been repeated many times.... More modern methodologies..." cites nothing after 1970s.

      7) At which time point is the relationship supposed to hold true? What happens when stimulation time becomes very long? Does the firing rate reach steady state in all of these studies?

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    1. Reviewer #3:

      This study provides very informative trends regarding long-term (~4-month) recording with Neuropixels probes in chronically implanted, freely moving rats. This is accomplished by recording across many animals (n = 18) and many recording locations and analyzing the number of single (and multi) units that can be automatically isolated as a function of time since implant, recording location, and other features (e.g. shank orientation). The authors perform these experiments with a modular system that allows the implanting of multiple probes simultaneously in a single rat (here they mostly implanted 1 probe, sometimes 2, once 3) and that allows the removal of probes for re-use in another animal, both of which are also valuable contributions. The analysis of neuron yield is framed in terms of a sum of 2 decaying exponentials model (an initial fast decay of one subpopulation of neurons followed by a slower decay of the remainder) that the authors fit to find the primary features determining neuron yield. The major trends they report include: substantially better yield over time in regions anterior to the bregma and ventral to the most dorsal 2mm of the brain surface. They also show that re-used probes perform essentially as well as new probes in terms of noise and unit quality (e.g. average unit amplitude), and also neuron yield (at least for medial frontal cortex, but see below).

      Major comments:

      The results averaged over many animals and the model are good for extracting the major trends, but there are hints of significant and important variability across animals or probes. Points 1-3 are about this variability, potential sources of variability, and displaying the variability whether or not potential causes can be found.

      1) The results shown in Figure 2, especially the averages in Figure 2H,K, indicate severe losses in unit yield over time for probes implanted posterior of bregma and electrode sites in the dorsal 2mm of the rat brain. However, Figure 2B shows at least one animal (open circles) for which high neuron yield was obtained in motor cortex and dorsal striatum for at least 4 months. First, is this from 1 or 2 probes? Whether it is from 1 or 2 probes, the stability of the recording over time from day 1 is much greater than the other animals, and much better than what is expected from Figure 2H. Is the preservation of units over time for this animal due to the stability of units in dorsal striatum (presumably mostly >2mm below the surface) or also motor cortex?

      2) There are some additional potential causes that could also account for yield differences, and one of these is age of the animal at the time of implant. The authors should list age at implant in their table in Figure 4-supplement 1. The authors should display yield over time as a function of age at implant, and also try adding age as one of the regressors for their model of neuron yield.

      3) Another potential cause is whether the probe is new or reused. The authors showed that probe re-use did not result in statistically different yield for the medial prefrontal cortex. But is this also true for the other brain regions? Does the data in Figure 2 include implants of both new and re-used probes, or only new probes? The authors should try to add whether the probe was new or re-used as a regressor in their neuron yield model.

      Regarding points 1-3, whether or not it is possible to add age or probe newness as regressors in the model, the authors should create a supplementary figure that shows the single unit yield curves as in Figure 2A-C for all probes in all animals: one panel per major brain region (e.g. splitting motor cortex from dorsal striatum from ventral striatum), with one curve per probe. There should be a legend for each panel that gives the (AP,ML,DV) coordinates of the approximate midpoint of the probe's location within that brain region. The legend should also indicate for each probe/curve: the animal, age at time of implant, probe newness, probe tip depth, estimated number of electrodes recorded from in that region, and shank orientation. This will repeat some pieces of information that's in the tables, however it's very useful to see all this information together in a form that would be very valuable for readers, especially experimenters who may want to record from some of the more posterior and dorsal areas. The information that could be gleaned would include knowledge of the variance in yield over time across implant attempts, so they could see if, say, 1 of 3 attempts to implant in a given area may give very good long-term yield.

      4) It is stated starting on Line 172 that "The relative number of units corresponding to the fast- and slowly-decaying subpopulations did not significantly vary across brain regions along either anatomical axis, nor did the rate of decay of the fast population (Figure 2--supplement 3). This suggests that the rapid decline in yield observed in the days after surgery may be due to a process that is relatively uniform across brain regions."

      The support for this statement can be seen in the indicated Figure 2-supplement 3. On the other hand, the point is made (and shown in Figure 2-supplement 4) that there is no loss of units in mPFC over time. This is apparently at odds with the Line 172 statement and model assumption of a fixed fraction of fast-decaying units. Was a model tried in which alpha varies with location? If the Line 172 statement is ultimately kept, there should at least be a comment made there that the most anterior, ventral regions appear to differ from the model's assumption/interpretation.

    2. Reviewer #2:

      In this paper, the authors report a device that can be used to implant and later explant Neuropixels probes in freely moving rats. The device consists of an adaptor, an internal holder and an external chassis. The chassis protects the probe, is attached to the animal's head via adhesive cement and acrylic. The internal part can be explanted at the end of the experiment, allowing the NP probe to be re-used.

      The work builds on existing technology in important ways: the authors examined the long-term yield across different brain regions, they more extensively assessed the feasibility of probe reuse compared to previous work, and they evaluated probe performance over a long period of time and also after explanation (measuring the input referred noise of explanted probes in saline). It was also impressive that they used a cohort of 18 rats to evaluate performance of both the animals and the probe, and that they were able to implant up to 3 NP probes at a time. Because of the importance of using freely moving animals in Neuroscience research, and the differences between rats and mice that necessitate modifications on existing technology, this paper is timely and likely to be very useful to a sizeable group of researchers. My suggestions are aimed at furthering the usefulness of this "Tools and Resources" paper for investigators who wish to use this important technology.

      At the moment, the majority of the paper seems aimed at evaluating the performance of the device as a function of time, depth and location. This performance evaluation was useful, very carefully done, and makes important points that aid in the interpretation of other papers (such as the unusual stability of recordings in mPFC reported in previous papers). Nonetheless, readers are likely interested in the paper because they wish to make and implant the device in order to benefit from the scholarly analysis done here. The manuscript does contain very helpful technical details, but these are hard to find and are not front-and-center in the main text. For instance, the material from the "Neuropixels implant procedure" is really helpful and would be critical for anyone who wants to use this technique. But at the moment, that information is in a google doc linked from the associated GitHub, a long way from the main manuscript. This information should be in the main manuscript, either in Results or Methods. Also use of consistent nomenclature across documents would help a lot. I believe the part referred to as the "chassis" in the main text is referred to as the "external" on the google doc with the instructions. Similarly, the part referred to as the "internal" in the google doc is called an "internal holder" in the manuscript.

      A reader hoping to use the device might also benefit from more information on the grounding procedure. The text in the "Implantation" section of the methods was helpful, but more information would be useful, such as where on the probe the ground wire should be connected and how one should fix the grounding wire (tapping the wire and covering with Metabond?). Also, it would be nice to know how one should protect the grounding wire from being touched by the animal. Figure 6 in the google doc protocol is really helpful and should definitely be in the main manuscript. An additional figure showing how to connect the wire to the ground during the surgery would be quite useful. Finally, are the craniotomy and durotomy necessary for grounding? Could one simply connect the grounding wire to a couple of screws on the skull?

    3. Reviewer #1:

      This manuscript presents new techniques for obtaining chronic recordings using multiple neuropixel probes in rats. The resources, I imagine, will be of high value to the neuroscience community at large. They also address short and long terms unit stability, probe recovery and impact of the probe on behavior. I have only a few minor comments.

      I understand the authors rationale to avoid manual curation but there have been reports of inconsistencies in the identification of units across different sorters. Did the authors consider comparing their kilosort unit identification with manual curation or another sorting software?

      The authors speculate in the discussion about the possible reason for the slow loss of units. It wasn't quite clear to me however, what types of changes might improve this loss?

      Figure 2 is perhaps one of the most informative findings but I wonder how applicable this will be to future probe iterations. Do the authors have a hypothesis for what features of the probe might contribute (or not contribute) to the long term loss of units?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Lisa Giocomo (Stanford University School of Medicine) served as the Reviewing Editor.

      Summary:

      This manuscript presents new techniques for obtaining chronic recordings using multiple neuropixel probes in rats. The study provides very informative trends regarding long-term (~4-month) recording with Neuropixels probes in chronically implanted, freely moving rats. This is accomplished by recording across many animals (n = 18) and many recording locations and analyzing the number of single (and multi) units that can be automatically isolated as a function of time since implant, recording location, and other features (e.g. shank orientation). The authors perform these experiments with a modular system that allows the implanting of multiple probes simultaneously in a single rat (here they mostly implanted 1 probe, sometimes 2, once 3) and that allows the removal of probes for re-use in another animal, both of which are also valuable contributions. The work builds on existing technology in important ways: the authors examined the long-term yield across different brain regions, they more extensively assessed the feasibility of probe reuse compared to previous work, and they evaluated probe performance over a long period of time and also after explanation (measuring the input referred noise of explanted probes in saline). Because of the importance of using freely moving animals in Neuroscience research, and the differences between rats and mice that necessitate modifications on existing technology, this paper is timely and likely to be very useful to a sizable group of researchers.

    1. Author Response

      Reviewer #1

      This paper investigates the role of Lhx6 and other transcription factors in the development of GABAergic neurons in the hypothalamus. The authors report that a small fraction of hypothalamic GABAergic neurons express Lhx6 and further depend on this expression for their survival. Dlx1/2, Nkx1-1 and Nkx2-2 define 5 subpopulations and at least three of these populations depend on these TFs to maintain Lhx6 expression. A strength of the paper is the multimodal analysis and the fact that descriptive assays like RNAseq and ATACseq are followed up with specific knockouts of candidate transcription factors. However, the relationships between the developmental populations identified and adult subtypes of hypothalamic neurons remain unclear. Although the results will surely interest those already interested in hypothalamic development, it is not clear that broader developmental or functional principles have been identified. The authors make much of the fact that the identified populations do not resemble forebrain interneurons defined by Lhx6 expression, but it is not clear why this should have been expected. Many developmental transcription factors are utilized both across diverse brain regions and across tissues outside of the brain. Perhaps the emphasis of this point could be tempered.

      We thank the Reviewer for his/her comments, although we respectfully but strongly disagree with the statement that “it is not clear that broader developmental or functional principles have been identified”. This manuscript aims to provide a broad overview, and by no means exhaustive, an overview of the molecular mechanisms controlling the development of hypothalamic neurons that express Lhx6. Although these neurons comprise only approximately 2% of all hypothalamic GABAergic neurons, they are highly heterogeneous at the molecular level. Using traditional methods such as histology and more recent methods such as scRNA-Seq, we have not found a selective marker of hypothalamic Lhx6+ neurons other than Lhx6 itself. However, we have found multiple spatially distinct domains in hypothalamic Lhx6+ neurons that express specific sets of transcription factors such as Dlx1/2, Nkx2-1, and Nkx2-2, as we and others have previously observed in developing hypothalamic nuclei.

      In addition, a subpopulation of these neurons later gives rise to a subset of Lhx6+ neurons of the zona incerta, which have been previously shown by us to promote sleep. Unlike all previously described sleep-promoting neurons, Lhx6+ zona incerta neurons are only one of few neuronal subtypes that can regulate both REM and NREM, which likely reflects molecular and functional heterogeneity among these neurons.

      Our thus manuscript speaks to both broader developmental principles by demonstrating the molecular heterogeneity of hypothalamic Lhx6+ cells that arises through the action of diverse transcriptional networks, and broader functional principles by identifying developmental networks that potentially control the specification, differentiation, and survival of sleep-promoting neurons.

      We believe that there are several compelling reasons for including a direct comparison of hypothalamic and cortical Lhx6 neurons, both of which arise from different regions of the forebrain (or secondary prosencephalon, if using the prosomere model). First, the role of Lhx6 in development of telencephalic interneurons is extensively studied, with 72 publications ((Pubmed: Lhx6 AND development AND (cortex OR telencephalon OR interneuron), accessed 7/27/20), and virtually all our understanding of how Lhx6 controls neuronal development has been acquired from this work. It is thus critically important that we directly connect our findings to a prior understanding of the mechanism of action of Lhx6.

      Second, current work in the field of developmental neuroscience in general, is heavily focused on studying telencephalic development. It is very much an open question, however, whether telencephalic structures are themselves particularly good models for studying the development of physiologically vital brain regions, such as the hypothalamus. By identifying many key differences in the function of this extensively studied gene between Lhx6+ MGE-derived neural precursors and hypothalamic Lhx6+ neurons, we establish some important caveats in generalizing studies of telencephalic development even to nearby forebrain structures.

      Nonetheless, we certainly agree with the Reviewer that the organization and clarity of the manuscript can be substantially improved. To this end, we have revised the manuscript carefully to improve clarity, focusing on its key findings.

      The presentation of the manuscript could be improved by clarifying the relationships between embryonic and more mature structure within the hypothalamus. For example, It is extremely hard to follow the evidence split across figures 5, S6 and S7 for parsing the cell groups by TF expression.

      We have revised the manuscript carefully to improve clarity. We have moved scRNA-Seq analysis of postnatal Lhx6-expressing neurons as Fig 3, and embryonic Lhx6-expressing neurons as Fig. 4, to improve the overall flow of the manuscript.

      The ATAC seems to be used only to bolster the impression that the populations identified by gene expression are different. The description of footprinting seems to imply an effort to analyze binding sites for specific factors (e.g. to identify targets of the TFs studied), but the statistical approach employed and even the conclusions reached are not fully spelled out. As such, this part of the study is underdeveloped or not well enough described.

      Specific details of the ATAC-Seq analysis are extensively described in the Method section, with each bioinformatics package (and package version) listed and, when non-default parameters were used, parameters clearly stated. However, we have added details of the statistical approaches used for data analysis to the revised manuscript.

      There is little use in conducting ATAC-Seq analysis without a matched RNA-Seq dataset, as changes in peaks (open chromatin regions) do not necessarily correlate with changes in gene expression levels. By integrating ATAC-Seq data with differential gene expression obtained using RNA-Seq, we have been able to identify changes in motif accessibility and candidate transcription factor footprinting that to identify changes in gene regulatory networks that control Lhx6 expression in both hypothalamus and cortex. We have revised the manuscript to make this clearer, and better explain the findings of this part of the study.

      Reviewer #2:

      Kim and colleagues used a combination of state-of-art sequencing and mouse genetic tools to study the mechanisms that control the development of a subset of GABAergic neurons in the developing hypothalamus.

      While neurodevelopment of GABAergic neurons has been extensively studied in the developing telencephalon, little is known about their counterparts in the developing hypothalamus. The authors focused their work on a specific subset of GABAergic neurons that express the LIM homeodomain factor Lhx6. Lhx6 is a master regulator of GABAergic neuron differentiation, specification, and migration in cortical interneurons. In contrast, Lhx6-expressing neurons make up only 2-3% of GABAergic neurons in the hypothalamus. The authors' previous work demonstrated that these neurons play a critical role in sleep homeostasis. Therefore, understanding how these neurons are formed and maintained is of great importance.

      The authors show that hypothalamic Lhx6 is necessary for neuronal differentiation and survival. Furthermore, by profiling and comparing multiple RNA-seq, scRNA-seq, and ATAC-seq datasets, they were able to identify three transcription factors Nkx2.1, Nkx2.2, and Dlx1/2 that each delineates non-overlapping subdomains of Lhx6 neurons and are necessary for Lhx6 expression in the hypothalamus. Finally, the authors demonstrate that mature Lhx6 neurons manifest extensive molecular heterogeneity that is distinct from their counterparts in the telencephalon.

      We thank the Reviewer for his/her comments, and for appreciating the key findings of the manuscript.

      The work presented is of high quality and is a technological tour de force. The scope and depth of the study are unparalleled among similar studies of hypothalamic neurodevelopment. That said I only have a couple of minor suggestions.

      1) In Figure S2, the number of tomato+ cells appear to be reduced, but not eliminated. Do the authors think that Lhx6 is necessary for the survival of all Lhx6 neurons, or just a subset? The use of the floxed Bax allele is clever, but is there evidence directly supporting increased cell death? Can the authors completely rule out the possibility of the mismigration of cell bodies after the postnatal deletion of Lhx6?

      We appreciate the Reviewer for his/her comments. We conclude that Lhx6 is necessary for the survival of all Lhx6 neurons due to the lack of read-through transcription in Lhx6-CreER/CreER mice (Fig 2), and the rescue of Lhx6-deficient mice that is seen using conditional Bax mutants (Fig. 2). The fact that numbers of cells labeled with Lhx6-CreER are rescued by the deletion of this key positive regulator of apoptosis strongly implies that Lhx6-deficient neurons simply die. Finally, we observe very few Lhx6-expressing hypothalamic neurons that undergo even short-range tangential migration (Fig. 1), and observe no evidence for an increase in these cells in the analysis described in Fig. 2.

      The fact that postnatal loss of function of Lhx6 leads to a more modest cell loss than the constitutive mutant may simply reflect a reduced overall requirement for Lhx6 in regulating neuronal survival in the postnatal hypothalamus or may indicate that the survival of a specific subset of Lhx6+ neurons is no longer Lhx6-dependent at this age. We cannot currently distinguish between these alternatives, and state this fact in the text.

      2) In Figure 4, the authors acknowledged that the ectopic gene expression in Lhx6CreER/lox; Baxlox/lox mice could be due to the loss of function of Bax. If so, would Lhx6CreER/+; Baxlox/lox mice be a better control in this experiment?

      We initially thought of using Lhx6-CreER/+;Baxlox/lox as a control since our phenotype could be due to loss of Bax itself, but not due changes in cell survival. However, we observed the same rescue phenotype in initial experiments using Lhx6-CreER/Bak-null (#006329), which strengthened our initial hypothesis. We now discuss potential limitations that may result from the fact that RNA-Seq data from Lhx6CreER/+;Baxlox/lox mice is not included in this study.

      Reviewer #3:

      Kim et al. aimed to characterize the similarities and differences between the development and molecular identity of telencephalic versus hypothalamic (HT) Lhx6+ GABAergic neurons. By analyzing a diverse repertoire of transgenic mice at different developmental stages and through the use of fate mapping, bulk and single cell sequencing approaches, ISH and immunostaining, the authors descriptively compare transcriptional networks and upstream regulators of LHX6. They found essential differences between LHX6-dependent networks and those in telencephalic neurons and suggest a role of LHX6 in survival instead of migration regulation HT neurons. Moreover, spatially distinct LHX6+ HT cell clusters were identified and transcriptionally profiled.

      1) Only 1-2% of the GABAergic neurons express LHX6, and the cells expressing LHX6 in the HT were identified to be very diverse. Apart from a putative role for LHX6 in promoting the survival of HT neurons, which in my opinion is not analyzed convincingly, nothing functional was revealed. For this, I do not judge the potential significance and influence of the findings as broad or fundamental.

      We respectfully but strongly disagree with this conclusion, most of which have already been described at length in our response to Reviewer #1. In brief, hypothalamic Lhx6+ neurons are key regulators of sleep initiation and maintenance, and nothing is known about their development. In much the same way that studies of the development of Lhx6+ cortical interneurons potentially help inform our understanding of neurodevelopmental disorders such as autism, so too may an understanding of the development of hypothalamic Lhx6+ neurons improve our understanding of sleep disorders and their treatment. In this study, we characterize the fate of hypothalamic Lhx6+ neurons, identify transcriptional regulatory networks that control their patterning and survival, and characterize their molecular heterogeneity in the postnatal period. We identify the homeodomain factor Nkx2.2 as a key regulator of both regional patterning of hypothalamic Lhx6 neurons, but also as a marker of a substantial subset of Lhx6+ ZI neurons that are activated by sleep pressure. This represents the groundwork needed for a basic understanding of the development of this physiologically important cell type, and forms the basis of more detailed future studies.

      Unless the Reviewer simply believes that studies of hypothalamic development are inherently uninteresting and of little significance, these comments simply do not seem to reflect a careful reading of the manuscript, and come across as vague and unconstructive. In future reviews, we urge the Reviewer to be more specific, and to offer concrete and constructive comments, to support sweeping statements of this sort.

      2) The manuscript could be better focused, and more coherent. The authors jump between different aspects of the story. First, the authors address a potential role of LHX6 in survival regulation in HT interneurons, and try to identify potential LHX6 target genes mediating this effect. The latter was neither analyzed convincingly nor validated. Then the authors switch to the comparative analysis of transcriptional networks in cortical versus hypothalamic LHX6+ interneurons, and the identification of different clusters of LHX6+ HT cells. Next, potential upstream regulators of LHX6 in HT neurons were addressed by fate mapping studies. Then, the authors again switch focus, and analyzed distinct anatomical regions covered by Lhx6+ neurons by single cell RNA seq and investigated an instructive role of Nkx2-1, Nkx2-2 and Dlx1/2 in the establishment of these hypothalamic regions.

      Subheadings in the result section might be very useful. However, the focus of this study requires clarification and also respective consideration in the introduction.

      As stated in our response to Reviewer #1, we have sought to conduct a broad characterization of the development and diversity of hypothalamic Lhx6+ neurons, a subset of which are important regulators of sleep. While we cover multiple aspects of this question, we strongly disagree that the manuscript “lacks focus”. However, we do agree that organization and clarity could be improved. To this end, we have incorporated subheadings into the Results section, and clearly outlined the experiments conducted, and the reasons why each were conducted.

      3) The authors use a variety of different reporter and loss of function mouse models and jump between developmental stages for analysis. Apart from being confusing, the experimental/analytical pipeline is not sufficiently rigorous with respect to age and genetic background. E.g. to analyze target genes of LHX6 through which the effect on cell survival could be mediated, the authors compared expression profiles from P10 Lhx6CreER/+;Ai9 neurons with hypothalamic and cortical Lhx6-GFP positive and negative cells from P8 mice. Hypothalamic enriched genes were then compared to single-cell RNA-Sequencing (scRNA-Seq) datasets of E15.5 and P8 hypothalamic Lhx6-expressing neurons. Transcriptional profiles tremendously change with progressing development, and different mouse lines were used, which were not all time-matched. This might have caused Lhx6-independent variation, which likely masks relevant genes. This could be an explanation why so few LHX6 target genes were identified through which LHX6 putatively acts on neuronal survival.

      This is another instance where the Reviewer seems to have failed to appreciate the rationale for the work presented here. We have modified the text to make this clearer. In summary, while it is certainly true that gene expression patterns are dynamic during development, cells of common origin and/or function also typically show core patterns of gene expression that are expressed across multiple stages of development. Our findings suggest that constitutive loss of function seen in Lhx6CreER/Lhx6CreER mice leads to a complete loss of hypothalamic Lhx6+ cells (Fig. 2), while the postnatal loss of function leads to a partial loss of Lhx6+ cells (Fig. 2). This suggests that Lhx6 may control the expression of similar target genes in both embryonic and postnatal hypothalamus to promote neuronal survival. In addition, since Lhx6 clearly is not required for survival of telencephalic neurons, we predict that Lhx6 will regulate the expression of specific sets of genes in both embryonic and postnatal hypothalamus, but not telencephalon, which promotes neuronal survival.

      In Figure 4, we therefore identify candidates for these prosurvival genes both by comparing gene expression profiles between embryonic (E15) and postnatal (P8) hypothalamic and cortical Lhx6+ cells and also by directly comparing the gene expression profile of P10 control Lhx6-CreER;Ai9 and Lhx6-deficient but viable Lhx6CreER/Lhx6lox;Baxlox/lox;Ai9 mice. These were analyzed at P10 rather than P8 because of the need to ensure efficient disruption of the conditional alleles of Lhx6 and Bax, and induction of sufficient levels of tdTom to allow for efficient cell isolation, following daily 4-OHT administration between P1 and P5. While this might lead to the failure to identify whatever the small number of Lhx6-regulated genes that are differentially expressed between P8 and P10, we believe that this will identify the great majority of Lhx6-dependent genes that promote neuronal survival. Any readers who wish to delve further into this dataset, and identify additional genes we may have missed in this initial screen, can do so using the data in Table S1.

      We are frankly puzzled by the Reviewer’s statement that we “identified so few Lhx6 target genes”, when we clearly state in Figure S2 that over 2,000 differentially expressed genes were observed between control and Lhx6/Bax-deficient hypothalamic neurons. A major reason why data was incorporated from the E15 and P8 datasets was to better select strong candidate regulators of neuronal survival from this very long list of genes.

      4) The proposed survival regulatory function of LHX6 in HT interneurons represents the main functional finding of this study, which however was not analyzed in great detail. Likewise, the analysis of LHX6 target genes that mediate the survival regulating function was not very successful, identifying only the ERBB4 receptor and other genes related to the neurotrophic neuregulin pathway. Of note, the authors proposed a clear difference of LHX6-associated transcriptional networks and LHX6 function in telencephalic versus HT neurons (migration versus survival). However, THE identified target gene of LHX6 suggested to regulate survival in HT neurons was Erbb4. Erbb4 is likewise expressed in telencephalic neurons, here being involved in migration regulation. Studies that confirm Erbb4 function in survival regulation in HT neurons are lacking. By applying a more coherent analysis, comparing transcriptional profiles of Lhx6 KO and WT cells of the same age, better candidates might be identified. For this, the time window of the LHX6-dependent survival regulation needs to be identified.

      This is exactly the point we were trying to make here. Lhx6 is strongly expressed in a large subset of progenitors and precursors of GABAergic neurons in the telencephalon, and in a much smaller subset of GABAergic neuronal precursors in has different functions between telencephalic and hypothalamic populations, yet is strongly expressed in both populations.

      Quoting Reviewer #1 “Many developmental transcription factors are utilized both across diverse brain regions and across tissues outside of the brain”. Errb4 has been shown to regulate tangential migration in cortical interneurons but has been shown to promote neuronal survival in other cell types. Since hypothalamic Lhx6+ neurons do not undergo long-range tangential migration, we therefore conclude that the function of Errb4 in hypothalamic Lhx6+ neurons is likely related to promoting survival, rather than controlling migration. It is certainly possible, however, that Erbb4 could also contribute to the regulation of short-range tangential migration of Lhx6-expressing neuronal precursors, such as the likely migration of Nkx2.2-expressing cells from the hinge to the ZI. We have revised the text to make this point clearer. We certainly believe that further functional studies of these genes are worthwhile and compelling, but are also beyond the scope of this study.

      5) With respect to the survival analysis, the analysis of Lhx6CreER/lox;Baxlox/lox;Ai9 mice although elegant, should be supplemented with other data, eg caspase and/or TUNEL labeling to support this main conclusion.

      Both TUNEL and Caspase-3 staining is detectable for only a relatively brief period during apoptosis, and neither are highly sensitive tools for detecting neuronal death. We were unable to observe changes in staining with either marker between P5 and P10 following the postnatal loss of function of Lhx6 (Fig. 2). This is now mentioned in the text. The use of Bax mutants in this analysis, in which apoptosis altogether, was done with the aim of maximizing our ability to detect Lhx6-dependent regulation of neuronal survival.

    2. Reviewer #3:

      Kim et al. aimed to characterize the similarities and differences between the development and molecular identity of telencephalic versus hypothalamic (HT) Lhx6+ GABAergic neurons. By analyzing a diverse repertoire of transgenic mice at different developmental stages and through the use of fate mapping, bulk and single cell sequencing approaches, ISH and immunostaining, the authors descriptively compare transcriptional networks and upstream regulators of LHX6. They found essential differences between LHX6-dependent networks and those in telencephalic neurons and suggest a role of LHX6 in survival instead of migration regulation HT neurons. Moreover, spatially distinct LHX6+ HT cell clusters were identified and transcriptionally profiled.

      1) Only 1-2% of the GABAergic neurons express LHX6, and the cells expressing LHX6 in the HT were identified to be very diverse. Apart from a putative role for LHX6 in promoting the survival of HT neurons, which in my opinion is not analyzed convincingly, nothing functional was revealed. For this, I do not judge the potential significance and influence of the findings as broad or fundamental.

      2) The manuscript could be better focused, and more coherent. The authors jump between different aspects of the story. First, the authors address a potential role of LHX6 in survival regulation in HT interneurons, and try to identify potential LHX6 target genes mediating this effect. The latter was neither analyzed convincingly nor validated. Then the authors switch to the comparative analysis of transcriptional networks in cortical versus hypothalamic LHX6+ interneurons, and the identification of different clusters of LHX6+ HT cells. Next, potential upstream regulators of LHX6 in HT neurons were addressed by fate mapping studies. Then, the authors again switch focus, and analyzed distinct anatomical regions covered by Lhx6+ neurons by single cell RNA seq and investigated an instructive role of Nkx2-1, Nkx2-2 and Dlx1/2 in the establishment of these hypothalamic regions.

      Subheadings in the result section might be very useful. However, the focus of this study requires clarification and also respective consideration in the introduction.

      3) The authors use a variety of different reporter and loss of function mouse models and jump between developmental stages for analysis. Apart from being confusing, the experimental/analytical pipeline is not sufficiently rigorous with respect to age and genetic background. E.g. to analyze target genes of LHX6 through which the effect on cell survival could be mediated, the authors compared expression profiles from P10 Lhx6CreER/+;Ai9 neurons with hypothalamic and cortical Lhx6-GFP positive and negative cells from P8 mice. Hypothalamic enriched genes were then compared to single-cell RNA-Sequencing (scRNA-Seq) datasets of E15.5 and P8 hypothalamic Lhx6-expressing neurons. Transcriptional profiles tremendously change with progressing development, and different mouse lines were used, which were not all time-matched. This might have caused Lhx6-independent variation, which likely masks relevant genes. This could be an explanation why so few LHX6 target genes were identified through which LHX6 putatively acts on neuronal survival.

      4) The proposed survival regulatory function of LHX6 in HT interneurons represents the main functional finding of this study, which however was not analyzed in great detail. Likewise, the analysis of LHX6 target genes that mediate the survival regulating function was not very successful, identifying only the ERBB4 receptor and other genes related to the neurotrophic neuregulin pathway. Of note, the authors proposed a clear difference of LHX6-associated transcriptional networks and LHX6 function in telencephalic versus HT neurons (migration versus survival). However, THE identified target gene of LHX6 suggested to regulate survival in HT neurons was Erbb4. Erbb4 is likewise expressed in telencephalic neurons, here being involved in migration regulation. Studies that confirm Erbb4 function in survival regulation in HT neurons are lacking. By applying a more coherent analysis, comparing transcriptional profiles of Lhx6 KO and WT cells of the same age, better candidates might be identified. For this, the time window of the LHX6-dependent survival regulation needs to be identified.

      5) With respect to the survival analysis, the analysis of Lhx6CreER/lox;Baxlox/lox;Ai9 mice although elegant, should be supplemented with other data, eg caspase and/or TUNEL labeling to support this main conclusion.

    3. Reviewer #2:

      Kim and colleagues used a combination of state-of-art sequencing and mouse genetic tools to study the mechanisms that control the development of a subset of GABAergic neurons in the developing hypothalamus.

      While neurodevelopment of GABAergic neurons has been extensively studied in the developing telencephalon, little is known about their counterparts in the developing hypothalamus. The authors focused their work on a specific subset of GABAergic neurons that express the LIM homeodomain factor Lhx6. Lhx6 is a master regulator of GABAergic neuron differentiation, specification, and migration in cortical interneurons. In contrast, Lhx6-expressing neurons make up only 2-3% of GABAergic neurons in the hypothalamus. The authors' previous work demonstrated that these neurons play a critical role in sleep homeostasis. Therefore, understanding how these neurons are formed and maintained is of great importance.

      The authors show that hypothalamic Lhx6 is necessary for neuronal differentiation and survival. Furthermore, by profiling and comparing multiple RNA-seq, scRNA-seq, and ATAC-seq datasets, they were able to identify three transcription factors Nkx2.1, Nkx2.2, and Dlx1/2 that each delineates non-overlapping subdomains of Lhx6 neurons and are necessary for Lhx6 expression in the hypothalamus. Finally, the authors demonstrate that mature Lhx6 neurons manifest extensive molecular heterogeneity that is distinct from their counterparts in the telencephalon.

      The work presented is of high quality and is a technological tour de force. The scope and depth of the study are unparalleled among similar studies of hypothalamic neurodevelopment. That said I only have a couple of minor suggestions.

      1) In Figure S2, the number of tomato+ cells appear to be reduced, but not eliminated. Do the authors think that Lhx6 is necessary for the survival of all Lhx6 neurons, or just a subset? The use of the floxed Bax allele is clever, but is there evidence directly supporting increased cell death? Can the authors completely rule out the possibility of the mismigration of cell bodies after the postnatal deletion of Lhx6?

      2) In Figure 4, the authors acknowledged that the ectopic gene expression in Lhx6CreER/lox; Baxlox/lox mice could be due to the loss of function of Bax. If so, would Lhx6CreER/+; Baxlox/lox mice be a better control in this experiment?

    4. Reviewer #1:

      This paper investigates the role of Lhx6 and other transcription factors in the development of GABAergic neurons in the hypothalamus. The authors report that a small fraction of hypothalamic GABAergic neurons express Lhx6 and further depend on this expression for their survival. Dlx1/2, Nkx1-1 and Nkx2-2 define 5 subpopulations and at least three of these populations depend on these TFs to maintain Lhx6 expression. A strength of the paper is the multimodal analysis and the fact that descriptive assays like RNAseq and ATACseq are followed up with specific knockouts of candidate transcription factors. However, the relationships between the developmental populations identified and adult subtypes of hypothalamic neurons remain unclear. Although the results will surely interest those already interested in hypothalamic development, it is not clear that broader developmental or functional principles have been identified. The authors make much of the fact that the identified populations do not resemble forebrain interneurons defined by Lhx6 expression, but it is not clear why this should have been expected. Many developmental transcription factors are utilized both across diverse brain regions and across tissues outside of the brain. Perhaps the emphasis of this point could be tempered.

      The presentation of the manuscript could be improved by clarifying the relationships between embryonic and more mature structure within the hypothalamus. For example, It is extremely hard to follow the evidence split across figures 5, S6 and S7 for parsing the cell groups by TF expression.

      The ATAC seems to be used only to bolster the impression that the populations identified by gene expression are different. The description of footprinting seems to imply an effort to analyze binding sites for specific factors (e.g. to identify targets of the TFs studied), but the statistical approach employed and even the conclusions reached are not fully spelled out. As such, this part of the study is underdeveloped or not well enough described.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for providing insightful critiques on our manuscript.

      Changes to document and comments made are marked e.g. “Reply 1.1” (referring the Reviewer #1 item #1, etc.) as described below.

      Reviewer #1

      I found this study to be very convincing. Prior studies are referenced appropriately, the text is well written and clear, the figures are clear also. In my opinion the paper does not need further experiment.

      [1.1] The conclusions are well supported by the data. However, the concatenation model seems very speculative at this point. Also, it does not take into account the dynamics of these molecules.

      Reply 1.1: The concatenation model combines the structural data from our manuscript with prior biochemical insights into tetraspanin homodimerization and with scanning-EM data on immunogold-labeled CD81 and CD9 on cells. It is not completely clear to us what reviewer #1 refers to with “the dynamics of these molecules”. The cryo-EM data revealed that CD9 - EWI-F is a dynamic complex with straight and bent conformations, which could account for both circular and linear arrangements of tetraspanin-microdomains in cell membranes through the higher-order oligomerization of stable CD9 - EWI-F tetramers. Moreover, transient CD9 - CD9 interactions likely yield a variable number of complexes present in these concatenated and flexible strings of complexes. Such a concatenation model indeed requires further validation. However, it is consistent with experimental data and, importantly, provides a long-awaited molecular basis for TEM assembly. Although it was not within the scope of the current study, it will be of great interest to further investigate the concatenation model through detailed cell-biology based approaches.

      **Minor comment:**

      [1.2] There seems to be a mix up between the two structures in the following sentence p4: "In CD9EC2 - 4C8, the D loop adopts a partially helical conformation and central residue F176 is sandwiched by 4E8 residues W59 of CDR2 and W102 and R105 of CDR3 (Fig. 1D). In the 4C8-bound CD9EC2 structure the tip of the D loop points more outward and the Cα atom of F176"

      Reply 1.2: The first sentence indeed mixed up the two structures and wrongfully mentioned CD9EC2 - 4C8 instead of CD9EC2 - 4E8. This has now been updated: “In CD9EC2 - 4E8, the D loop adopts …”

      Reviewer #2

      The paper is well written and the conclusions made are supported by the data presented.

      [2.1] The ternary structure is in agreement with that of CD9 in complex with the related EWI-2 published earlier this year by Umeda et al (ref #25). The present work thus adds little structural insights but may be useful in showing that the interaction pattern seen extends to another EWI protein family member.

      Reply 2.1: We agree with reviewer #2 that that the CD9 - EWI-F structure presented in our work is similar to the CD9 - EWI-2 structure published recently by Umeda et al. (ref #25). However, as also pointed out by reviewer #1, we believe that the CD9 - EWI-F structure adds new important information to understand the molecular mechanism underlying the assembly of tetraspanin-enriched microdomains. Notably, the different conformations of the CD9 - EWI-F complex observed in the cryo-EM data provide structural biology evidence for the dynamic nature of the interaction between a tetraspanin and a partner protein, which is consistent with a wealth of prior biochemical data. Guided by the distinct shape of the CD9EC2 - 4C8 densities, we were able to distinguish a range of straight to bent conformations of the complex. CD9 regions that represent known tetraspanin homo-dimerization sites, orient away from EWI-F and are available for interactions. Thus, combining our structural data with previous biochemical interaction data allowed for the generation of a long-awaited model for the assembly of tetraspanin-microdomains at the molecular level. We believe that these implications for TEM assembly will stimulate new, innovative research into the molecular principles that govern the function of tetraspanins.

      [2.2] As such it may be acceptable for publication. In this case, the authors should improve the quality of Figs. 3D and 4D.

      Reply 2.2: Figures 3D and 4D depict raw cryo-electron microscopy images (micrographs). The protein complexes imaged in this study only contain light atoms (H, N, C, O, S). Therefore, the collected micrographs only reveal low-contrast images of protein particles, and, for a typical cryo-EM experiment, it is required to average particles from thousands of micrographs to obtain a 3-dimensional reconstruction. We would like to keep the raw micrographs in figures 3 and 4, as it will aid cryo-EM scientists in judging the quality of the data.

      Reviewer #3

      The work is technically well performed and clearly presented including methodological details. I just have a few minor comments:

      [3.1] Page 4 and Figure S1: it is hard to see how a reliable affinity for 4E8 can be obtained from the cell binding data in S1A, as there is no indication of saturation. It would be good to at acknowledge that this is at best a rough estimate. Fortunately the data for this nanobody in purified situation seems solid.

      Reply 3.1: The obtained affinities are indeed an ±estimation based on a non-linear regression curve fitting on the measured data, performed in triplicate. The text has been updated and now reads as “4C8 and 4E8 bind to purified, full-length CD9 as well as to endogenous CD9 expressed on HeLa cells with apparent binding affinities in the nanomolar range (Fig. S1A, B, C)”. Next to that, a table stating the calculated KDs has been included as Fig. S1C.

      [3.2] Page 6: Does the absence of micellar density for the EWI-F complex indicate flexibility of the extracellular domain relative to the TM? Does this happen because the classification focuses on the highly elongated Ig region?

      Reply 3.2: These are indeed plausible assumptions. We observed highly heterogeneous, elongated particles in the micrograph shown in Fig. 3D, indicating inter-domain flexibility. If the alignment software focusses on certain Ig-like domains, other regions of the protein complex will be averaged out. An additional complexity with these elongated particles was to select an appropriate box size for particle picking and particle extraction, because the particles differ greatly in size based on their orientation (fully elongated side-views vs. much smaller top-views). When taken together, the complex of CD9 with full-length EWI-F was unsuitable for high-resolution structure determination; the subsequent strategy using EWI-FΔIg1-5 resulted in globular particles with less flexibility (Fig. 4D), which allowed for a more detailed structural characterization of the complex.

      [3.3] Page 8: "Recently, a cryo-EM density map has been reported..." - please reference here.

      Reply 3.3: We added the appropriate reference to the sentence: “Recently, a cryo-EM density map has been reported of CD9 in complex with an EWI-F homolog, EWI-2 (25).”

      [3.4] Relatively little is known about how tetraspanins help to organize partner receptors into defined membrane domains, evidence for which has emerged from super-resolution light microscopy. Based on their structural analysis of the CD9-EWI-F complex, including the heterogeneity apparent in the cryo-EM structure, they propose a feasible concatenation model for higher order oligomerization of these complexes in the membrane. Obviously the model will need to be tested rigorously by mutational analysis, particularly the EWI Ig6 interface, but as it stands the paper is a significant contribution to the field of tetraspanins.

      Reply 3.4: From the 8.6 Å cryo-EM data, the amino-acid residues that form the EWI-F Ig6 dimer interface can indeed not be distinguished. However, our data on CD9 in complex with full-length EWI-F (Fig. 3E) and previous cross-linking data (André et al. In situ chemical cross-linking on living cells reveals CD9P-1 cis-oligomer at cell surface - PMID: 19703604) support that EWI-F forms dimeric assemblies. Regarding the concatenation model, we therefore think that it will be of great interest to establish the putative CD9 - CD9 interactions (identified through biochemical approaches), that would link CD9 - EWI-F tetramers into higher assemblies, in the context of native membranes. However, investigating these transient interactions would require various non-trivial experiments and was therefore not within the scope of the current study.