6,070 Matching Annotations
  1. Sep 2020
    1. Reviewer #2:

      In this paper, the authors mainly tested peripheral blood mononuclear cells (PBMCs) samples from pediatric cancer and healthy patients by CyTOF, and analyzed the phenotypes of NK, T cells and monocytes. Some scientists have reported these related phenotypes. There is a lack of mechanistic research and many of the conclusions are not yet supported by presented data.

      Specific concerns:

      1) The authors collected pediatric cancer samples including hepatoblastoma, neuroblastoma, wilms tumor, lymphoma and et al. These types of tumors are quite different. Whether it's appropriate to analyze together? Lymphoma is a disease of the blood system unlike any other types of tumors. Their systemic immunity must have changed.

      2) No statistical analysis was performed in Fig2D and E. The conclusion of " Classical monocytes are enriched in pediatric cancer patients" is not supported.

      3) Figure 3a is different from the conventional diagram. It was a surprise to see that it showed CD56-dim CD16- and CD56-CD16+ NK cells.

      4) Figure 4 lacks statistical analysis.

      5) Figure 7 lacks correlation analysis. The conclusion of "Pediatric cancer associated immune perturbations vary by age " is not supported. In addition, the presented correlation diagram is insufficient to prove the above conclusion and title.

    2. Reviewer #1:

      The immune status of pediatric cancer patients may differ from that of adult cancer patients and healthy children. Unraveling the distinct immunological features of pediatric cancers may provide novel therapeutic strategies. Dr. Murali Krishna and colleagues analyzed the composition and phenotype of peripheral immune cells in both pediatric cancer patients and age-matched healthy individuals, and they found some interesting alternations in NK cells, monocytes, and T cell subsets. In general, this descriptive study can be potentially interesting for clinicians, immunologists and cancer researchers. However, several major points remain to be addressed.

      1) The incidence of hematologic tumors is relatively high in children. It is shown in supplemental table 2 that pediatric patients bearing solid tumor and hematologic malignancies were all included in this study. If solid tumors and lymphoma were analyzed separately, in comparison to healthy individuals, will the major conclusions remain the same?

      2) The type, stage, and therapeutic regimens of cancer may affect the landscape of peripheral immune cells. It is not clear whether any of these factors influence the major conclusion. What were the standards to include healthy pediatric individuals as controls in this study?

      3) The authors focused on immune cell-related differences between healthy and tumor-bearing children. To reveal typical immunological features of pediatric cancer patients, it is recommended to perform similar analyses with samples from adult cancer patients, particularly those bearing the same type of cancers.

      4) The authors claimed that the frequency and cytotoxicity of peripheral NK cells were reduced in young pediatric cancer patients, compared with healthy controls, but these parameters returned to normal in older pediatric cancer patients (>8yrs). Can they separately compare young and old patients with age-matched controls?

      5) The authors believe that diminished killing of tumor cells by NK cells from pediatric cancer patients was due to decreased cytotoxic capacity, rather than inefficient recognition or degranulation. More experimental evidence is needed to substantiate this conclusion. These NK cells were significantly shifted to an immunosuppressive/tolerant pattern (high in PD-1, NKG2A, but low in perforin and Granzyme-B), while Long-term (14 days) stimulation with IL-2 can improve their cytotoxicity. Can short-term IL-2 treatment achieve similar effects (e.g. increased cytotoxicity, elevated expression of lytic molecules and CD57)? Since the frequency and cytotoxicity of NK cells in older pediatric cancer patients (>8yrs) were actually similar to that in normal children, do serum IL-2 levels increase in older cancer patients?

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      Dr Taylor and colleagues aimed to emphasize NK cell-related defects in pediatric cancer patients, in comparison to healthy children. This study was potentially interesting, although it was based on descriptive analyses, lacking mechanistic exploration. In addition, this study included a mixed cohort of pediatric patients bearing tumors of different types, stages, and perhaps distinct therapeutic regimens. Some conclusions were not strongly supported by current experimental evidence. It remains unknown whether similar differences can be found between adult cancer patients and age-matched healthy individuals. To address all these above points, a large amount of further work will be necessary.

    1. Reviewer #2:

      Kroll et al. presented a strategy to achieve biallelic knockout effects in the founder (F0) generation of zebrafish, by targeting three different loci within the same target gene, with injection of Cas9 RNP mixtures. They showed that in addition to target single genes, this method could be successfully used to create double knockouts of slc24a5 and tbx5a gene pair, or tyr and ta gene pair, in F0 embryos. Strikingly, they also demonstrated direct generation of triple gene knockouts of mitfa, mpv17 and slc45a2 in F0 larvae, which fully recapitulated the pigmentation defects of the crystal mutant. Furthermore, they provide evidence of the feasibility of their method in dissecting complex and multi-parameter behavioural traits in the biallelic F0 knockouts of trpa1b, csnk1db, scn1lab genes. Interestingly, they established a rapid sequencing-free method to evaluate the activity of Cas9 RNP by using headloop PCR, facilitating the selection of target sites. Finally, the authors proposed a three-step protocol for F0 knockout screens in zebrafish. The strategy described here is quite impressive, and represents evident improvements of the method published by Wu et al. (Developmental Cell, 2018), which was based on the administration of four Cas9/gRNA RNPs. Nevertheless, the manuscript could be further clarified and improved in the following aspects.

      1) What are the essential differences in methodology of this method compared with that reported by Wu et al. in 2018 (Developmental Cell)? Or why and how the target sites could be reduced to three from four?

      2) Several genes were tested in both work, such as slc24a5, tyr, tbx16, and tbx5a, did you use or compare the same target sites in these genes as reported by Wu et al.?

      3) Is the dosage/amount of Cas9 or RNP used in this study different or comparable with Wu et al.? Does it account for the improvement of the method described in the study?

      4) The authors propose to design the three target sites in distinct exon within each gene. Is it really important and/or necessary to achieve high efficient biallelic knockouts? Any evidence?

      5) According to the section of MATERIALS AND METHODS, the synthetic gRNA was made of two components, i.e., crRNA and tracrRNA. Synthesis of gRNA as a single molecule by in vitro transcription is usually more popular and economic, is it really necessary to use crRNA and tracrRNA to achieve high efficient biallelic knockouts? Any evidence?

      6) Could headloop PCR be used for the quantification of mutagenesis efficiency (indel-producing mutation rate) of Cas9/gRNA? How sensitive is this method? Could small indels (such as 1-bp insertion or deletion) be detected by the headloop PCR?

      7) In addition to indels, deletions between two double strand breaks induced by two gRNAs are also important for the generation of biallelic knockouts of the target gene. The authors showed the analysis of mutations in each site (such as in Fig. 2A), is it possible to quantify the distribution and contribution of all the different deletions?

      8) Fig. 1C and 1D: The authors compared the effects of the injection of 1, 2, 3, and 4 loci. How were the 1, 2, and 3 loci selected from the four target sites? Will each of the four loci give the same or different phenotypic ratio if tested individually? Will different combinations of 2 loci or 3 loci give the same or different phenotypic ratio? Or which combination of 2 loci or 3 loci will give the highest mutagenic effect? For example, in Fig. 1C, the 3-loci showed comparable effect with 4-loci, while the 2-loci is less effective; is it possible to find other 2-loci combinations which could show higher mutagenic efficiency than the current 2-loci, such that the effect of the new 2-loci combination is as good as the 3-loci or 4-loci combination? Conversely, in Fig. 1D, the 2-loci already showed the highest mutagenic effect, is it because of this particular 2-loci combination, or any 2-loci combination will show the same efficiency?

      9) Figure 6: The phenotypes of scn1lab F0 knockouts are more severe than those of scn1lab-/- mutant. Any explanation?

    2. Reviewer #1:

      Kroll and colleagues describe an efficient strategy to reliably generate F0 zebrafish embryos with (multiple) genes knocked out using CRISPR/Cas9 RNPs. In their most dramatic and broadly applicable proof-of-principle experiment, authors demonstrate successful recapitulation of the triple mutant crystal phenotype in 9/10 F0 embryos. As the authors point out, their methodology is extremely likely to be adapted for candidate genes for traits which display a range of phenotypes among wild type embryos or larvae.

      The manuscript points out a rather obvious but somehow underreported feature of NHEJ-based mutagenesis: assuming random size of indels, when 100% of DNA is mutated fewer than 50% (.67x.67) of cells in an embryo will contain frameshift mutations in both alleles. Thus, successful recapitulation of a mutant phenotype in an F0 embryo relies on mutagenesis of an essential part of the protein (not always as straightforward as it seems), utilization of other repair pathways such as MMEJ (not always reliable), or fortuitous help from largely unknown factors which skew the distribution of indel sizes (multiple guide would RNAs need to be tested without guarantee of success). Simultaneously designing several guide RNAs against the gene and co-injecting them, as the authors propose, seems to be an excellent and straightforward strategy.

      My most significant criticism is that although new to zebrafish, the described strategies - use multiple guide RNAs and headloop PCR - have been successfully deployed in other systems. Adapting these strategies to the zebrafish model system offers tremendous value, but the distinction between development of new methods and adoption of existing methodologies must be considered.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      Summary:

      The authors describe a new efficient strategy to reliably generate F0 zebrafish embryos with (multiple) genes knocked out using CRISPR/Cas9 RNPs. They showed that in addition to target single genes, this method could be successfully used to create double knockouts of slc24a5 and tbx5a gene pair, or tyr and ta gene pair, in F0 embryos. Strikingly, they also demonstrated direct generation of triple gene knockouts of mitfa, mpv17 and slc45a2 in F0 larvae, which fully recapitulated the pigmentation defects of the crystal mutant. Their methodology is extremely likely to be adapted for candidate genes for traits which display a range of phenotypes among wild type embryos or larvae.

      This is a new tool for the zebrafish community. Despite the presented data on several loci, it is not clear whether and how this method is better compared to a series of prior related F0 approaches. This question is the crux of this methods manuscript.

    1. Reviewer #3:

      In this manuscript, Chakravarti and colleagues analyzed the functions of several p53 isoforms in the Drosophila germline. They created novel isoform-specific alleles by CRISPR/Cas9 to untangle the functions of p53A and p53B isoforms. They made use of a Phid-GFP reporter line to follow p53 transcriptional activity. The role of p53 in the development of Drosophila germline has been published several times before with a focus on the silencing of retro-transposons (TEs) and meiotic DNA breaks response (Lu, 2010; Wylie, 2014; Wylie, 2016). Despite this published literature, the authors created novel and very valuable tools, which allowed them to make several novel and interesting observations. My main criticism is that most of these observations remain unexplained and the manuscript feels descriptive as it stands. However, this manuscript has great potential if it could follow up some of these novel observations. Some examples are the following:

      1) In Figure 5C, the authors made the interesting observation that hid-GFP was stronger in region 1 of p53A-B+ than in the wild type p53A+B+. This activity of p53 cannot be explained by meiotic DSBs as previously published, since meiotic DSBs only occur later in region 2. This observation remains unexplained and is not explored further.

      One possibility is that it could relate to transposable elements (TEs) activity in this region. TEs can create DSBs (thus non-meiotic) and p53 has been published to silence TEs in Drosophila (Wylie, 2014; Wylie, 2016). It is also particularly interesting that the silencing of TEs is known to be weakened in this specific region of the germarium even in wild type condition (Dufourt J, NAR, 2013; Theron E, NAR, 2018). Could p53A play a role in silencing TEs in this region when Piwi is downregulated? This would bring novel insights on when and where TEs are silenced in germ cells.

      A transcriptomic analysis of p53A-B+ germ cells could show whether TEs are upregulated in this hid-GFP++ cells. It is probably out of the scope of this manuscript. Another possibility would be to perform FISH for TEs known to be expressed in p53 mutant, such as TAHRE (Wylie, 2016). In addition, do the authors detect DSBs in region 1 in p53A-B+?

      2) On Figure 7 and 8, the authors analyzed the role of p53 in "persistent" meiotic DSBs. I am not convinced that these DSBs are only persistent meiotic DSBs. As discussed by the authors themselves (page 13), the origin of these DSBs could be TEs mobilization. I think it is a very important caveat for their conclusions. Another non-exclusive possibility for DSBs appearing in endoreplicating nurse cells is incomplete replication and associated DNA deletions during repair as shown in (Yarosh and Spradling, GD, 2014).

      To distinguish between these possibilities and strengthen their conclusions, the authors should perform the same experiments in the absence of meiotic DSBs, such as in a meiW68 mutant background (meiW68, p53AB double mutant). meiW68, okra, p53 mutants may be hard to generate but shRNAs against meiW68 are publicly available and effective, while they may also exist for okra or other spindle genes, and could make this combination easier to generate.

      3) The authors showed that p53A and p53B levels are developmentally regulated (Figure 6G): does overexpression of one or both of the isoforms have any phenotype?

      4) I agree with the authors that karyosome defects are part of an array of phenotypes induced by the activation of DNA damage checkpoints. However, I would not equal it to the activation of a pachytene checkpoint and conclude that p53 is part of that checkpoint.

      5) On Figure 7D, in p53A+B-, there seems to be a lot of DNA damages in follicular cells. Is this reproducible?

    2. Reviewer #2:

      The Drosophila genome encodes multiple p53 isoforms. P53 is an important factor in maintaining genome integrity and having multiple isoforms in flies raises an interesting evolutionary concept because humans have a gene family of p53 members. In this paper, the expression and function of the isoforms is compared in the germ line. There are two significant findings based on investigating these two isoforms. First, the apoptotic response depends on the A form, and both have roles in the response to meiotic DSBs. These results represent a significant and important extensions of previous work from another group that showed p53 suppresses transposon activity.

      With one important exception, the data are solid and support the conclusions. The data regarding the apoptotic response is based on TUNEL and a hid-GFP reporter. This data shows that irradiation induces a response in the mitotic region but not later regions. Conversely, there is a milder induction in the meiotic region (region 2a). Both could be in response to DSBs. But it is amazing that there is no HID induction following IR in these meiotic regions. Thus, there is a satisfying correlation between the apoptosis and HID responses to IR, and both are diminished in the meiotic region.

      The most significant concern with this paper is that conclusions that the p53 isoforms respond to meiotic DNA breaks. Indeed, this is the title of the section starting at the end of pg 7, but there are no experiments which lead to this conclusion. Similarly, the sentence "To determine whether p53A or p53B isoforms responds to meiotic DNA breaks" (pg 8), is followed by an experiment which does not do that (it compares HID expression in different p53 genotypes). The data in the paper are correlations between p53 expression and where DSBs occur in the germarium. Two experiments are needed. First, and most important, hid-GFP expression needs to be analyzed in a mei-W68 mutant. In addition, the germarium should be stained for both HID and gH2AV, the latter being the antibody the authors use in later Figures. It would also be satisfying to see the genotypes in Figure 7 performed in a mei-W68 mutant background, to determine if the persistent DNA damage in the p53 mutants depends on meiotic breaks.

    3. Reviewer #1:

      In this manuscript Chakravarti et al build on the previous work from the Calvi lab characterizing specific roles for the p53A isoform. In their 2015 paper Zhang et al showed, using isoform specific loss of function mutants, that p53A is primarily responsible for mediating the apoptotic response to ionizing radiation in the soma and that p53B is very lowly expressed in the cell types studied. They speculated that p53B might function in germline specific roles, such as meiotic checkpoints and DNA repair, identified in mammalian p53 studies.

      Here Chakravarti et al, have further characterized the functions of the p53A and B isoforms in Drosophila. In the ovary, p53A mediates the apoptotic response to IR and is also required for meiotic checkpoint activation. p53B is both necessary and sufficient for repair of meiotic breaks in nurse cells but not oocytes. p53B is required for expression of a hid-GFP reporter in region 2a-2b cells which may be related to a loss of p53B detection in p53A/B nuclear bodies at that stage.

      There are no substantive concerns with this manuscript.

      Minor concerns: CRISPR/Cas9 was used to create isoform-specific mutants for both p53A and p53B. RT-PCR was used to show the mutant alleles are isoform specific and that neither disrupts the expression of the others endogenous protein. The RT-PCR assay can only assess the expression of isoforms, not their function as the authors state.

      The authors noted that, even in the absence of IR, there was low level hid-GFP expression in late region 1/early region 2, the point when meiotic DSBs are induced by Mei-W68. Quantitation of hid-GFP expression in the various p53(A+/-,B+/-) mutant backgrounds showed that hid-GFP expression in the absence of IR requires p53 activity and that both isoforms are capable of activating hid-GFP expression. The authors suggest that the increased and earlier expression of hid-GFP seen in the p53A-/p53B+ mutant is due to precocious hyperactivation by p53B unrelated to meiotic breaks which have yet to occur. The authors then seem to contradict themselves saying that p53 reporter construct expression is dependent on Mei-W68, and both isoforms respond to DSBs. Since p53B is capable of precocious activation of at least one p53 target in the absence of p53A expression it is not clear that meiotic breaks themselves directly regulate p53B activity. From the data presented it seems plausible that p53A responding to DSBs might attenuate p53B activity. Quantitation of p53A and p53B levels across oogenesis shows a transient reduction in p53B levels in regions 2a-2b which coincides with the timing of meiotic breaks. Again, it is unclear whether this is a direct response of p53B to meiotic breaks. The authors suggest this change in p53B concentration in the p53A/B body might be due to transient relocalization from the p53A/B body to the nucleoplasm and back but that variation in fluorescence intensity makes it impossible to accurately assess levels in the nucleoplasm to confirm this. While p53B is undetectable in region 2a-2b cells, its presence is required there for expression of hid-GFP, thus translocation from the p53A/B body to fulfill this function is plausible.

      The figures are well done and appropriate to the message, however, in the fluorescent images the high background in the mCh channel makes it difficult to see the true signal and it is often completely lost in the merged images. Perhaps use of a greyscale panel would be more informative.

      In 2019 Park et al, using Gal4/UAS transgenes in a p53 null background concluded that both p53A and p53B mediated the apoptotic response to IR in the Drosophila ovary. I feel the authors adequately addressed this issue in stating that their current results using loss of function, isoform-specific alleles at the endogenous locus better reflects the true physiological response. Thus, I feel their conclusions on the role of p53 in the ovary have more merit.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The authors have generated new and useful p53 reagents, which they have employed in four functional assays: apoptosis (TUNEL after 40 Gy irradiation (Figures 2-3), transcriptional induction (monitored by hid-GFP (Figures 4-5)), double stranded DNA breaks (DSB) (monitored by gammaH2AV (Figures 7-8)) and activation of pachytene checkpoint (monitored by synaptonemal complex protein C(3)G (Figure 8F-K)).

      The main findings are: 1) the apoptotic response to ionizing radiation (IR) depends on p53A; 2) expression of hid-GFP in region 2a-2b germ cells requires p53B; 3) DSBs occur at higher rates in both the p53A and the p53B mutants; and 4) p53B can repair of meiotic breaks in nurse cells but in not oocytes.

      Despite the generation of high-quality, new reagents, this paper is currently fairly descriptive. Of 8 figures, two show the expression pattern of the tagged p53 isoforms in various parts of the germarium (Figs 1 and 6). Some of the observations based on functional assays remain unexplained and need further experiments, including points 1 and 2 below.

      1) The authors conclude that the p53 isoforms respond to meiotic DNA breaks, but there are no experiments which lead to this conclusion. If the authors want to conclude this, they need (a) to analyze hid-GFP expression a mei-W68 mutant and (b) stain the germarium with both HID and gammaH2AV. The authors should also examine meiotic breaks in p53A+B+, p53A-B-, p53A-B+ and p53A+B- in a background that is also mei-W68 mutant.

      2) The authors are missing a more detailed analysis of the interesting observation that hid-GFP is stronger in region 1 of p53A-B+ than in the wild type p53A+B+. This observation cannot be explained by meiotic DSBs (which occurs in region 2), but the authors do not provide a mechanism. Is this due to transposable elements? The authors need to supply new data to provide a mechanistic understanding of this observation.

      3) The authors are encouraged to provide better data to support the conclusion that the DNA damage phenotypes of p53 and okra mutants are comparable. The images in Figs. 7, 8B and B' are not sufficient to assess this. The authors could quantify the number of gammaH2AV foci or intensity (rather than measure the number of positive cells). Related to this, it is surprising that p53 mutants lack the DV defects seen in okra mutants, particularly since defects in DSB repair should cause nondisjunction. Okra mutants are sterile. The authors should comment upon the fertility of p53 mutants.

      4) Some experiments have only 2 biological replicates (Figs 4 and 8K). Figs 7 and 8 have "2-3 replicates". The authors need to state specifically for each experiment how many replicates were scored. Ideally, they should have at least 3 replicates for each experiment or explain why that is not necessary.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** This interesting study by Putker et al. showed that circadian rhythmicity persists in several typical circadian assay systems lacking Cry, including Cry knockout mouse behavior and gene expression in Cry knockout fibroblasts. They further demonstrated weak but significant circadian rhythmicity in Cry- and Per- knockout cells. Cry- (and potentially Per-)-independent oscillations are temperature compensated, and CKId/e still has a role in the period regulation of Cry-independent oscillations. **Major comments:** 1) The authors propose that the essential role of mammalian Cryptochrome is to bring the robust oscillation. As the authors analyze in many parts, the robustness of oscillation can be validated by the (relative) amplitude and phase/period variation, both of which should be affected significantly by the method for cell synchronization. Unfortunately, the method for synchronization is not adequately written in this version of supplementary information. This reviewer has no objection to the "iterative refinement of the synchronization protocol" but at least the correspondence between which methods were used in which experiments needs to be clearly explained. The detailed method may be found in the thesis of Dr. Wong, but the methods used in this manuscript need to be detailed within this manuscript.

      We thank the reviewer for recognising the importance of different synchronisation protocols. In experiments where bioluminescent CKO rhythms were observed, different synchronisation protocols resulted in similar results when comparing WT with CKO cells. The different synchronisation methods used in each experiment are now specified in the supplementary methods.

      2) The authors revealed that CKO mice have apparent behavioral rhythmicity under the condition of LL>DD. This is an intriguing finding. However, it should be carefully evaluated whether this rhythmicity (16 hr cycle) is the direct consequence of circadian rhythmicity observed in CKO and CPKO cells (24 hr cycle) because the period length is much different. Is it possible to induce the 16 hr periodicity in CKO mice behavior by 16 hr-L:16 hr-D cycle? Would it be a plausible another possibility that the 16 hr rhythmicity is the mice version of internal desynchronization or another type of methamphetamine-induced-oscillation/food-entrainable-oscillattion?

      The reviewer makes an excellent suggestion. As described in the manuscript text (page 13), CKO mice have already been shown to entrain to restricted feeding cycles (Iijima et al., 2005) and we therefore assessed whether CKO rhythms would entrain to a 16h day as suggested. Whilst CKO (but not WT) mice showed 16h behavioural rhythms during entrainment, they were arrhythmic under constant darkness thereafter (Revised Figure S2A). CKO cellular rhythms show reduced robustness under constant conditions ex vivo, and our other work has revealed that CRY-deficiency renders cells much more susceptible to stress (Wong et al, 2020, BioRxiv). The parsimonious explanation, therefore, is that whilst the cellular timing mechanism remains functional when CRY is absent, the amplitude of cellular clock outputs is severely attenuated (as we showed previously in Hoyle et al., Sci Trans Med, 2017) in a fashion that impairs the fidelity of intercellular synchronisation under most conditions in vivo, as well as the molecular mechanisms of entrainment to light-dark cycles.

      With respect to the apparent discrepancy between mean periods of CKO cultured cells (~21h), SCN (~19h) and mice (~17h). This is also observed in WT cells (~26h), SCN (~25h) and mice (~24h), simply with a smaller effect size and longer intrinsic period.

      We believe this difference in effect size can adequately be explained by differences in oscillator coupling, combined with the reduced robustness of CKO timekeeping. In Figure 1F we show that the range of rhythmic periods expressed by cultured CKO fibroblasts (14-30h) is much greater than for their WT counterparts (range of 22-26h), or that which is observed when cellular oscillators are coupled in CKO SCN (19h). Thus period of CKO oscillations is demonstrably more plastic (less robust) than WT, and with a cell-intrinsic tendency towards shorter period which is revealed more clearly when oscillators are coupled.

      In vivo there is more oscillator coupling in the intact SCN than in an isolated slice, from which communication with the caudal and rostral hypothalamus has been removed. Thus it seems plausible that increased coupling in vivo, combined with positive feedback via behavioural cycles of feeding and locomotor activity, resonate with a common frequency which is shorter than in isolated tissue.

      Critically, for both WT and CKO mice/SCN, the circadian period lies within the range of periods observed in isolated fibroblasts. To communicate this rather nuanced point we have inserted the following text into the supplementary discussion:

      “Circadian timekeeping is a cellular phenomenon. Co-ordinated ~24h rhythms in behaviour and physiology are observed in multi-cellular mammals under non-stressed conditions when individual cellular rhythms are synchronised and amplified by appropriate extrinsic and intrinsic timing cues. In light of short period (~16.5h) locomotor rhythms observed in CKO mice after transition from constant light to constant dark, but failure to entrain to 12h:12h light:dark cycles, it seemed plausible that either CKO mice might entrain to an short 8h:8h light:dark (16h day) or else have a general deficiency to entrainment by light:dark cycles. The data in Figure S2 supports the latter possibility, in that neither WT nor CKO mice stably entrained to 16h cycles whereas WT but not CKO mice entrained to 24h days. The bioluminescence oscillations observed in CKO cells conform to the long-established definition of a circadian rhythm (temperature-compensated ~24h period of oscillation with appropriate phase-response to relevant environmental stimuli). Whereas the locomotor rhythms observed in CKO mice under quite specific environmental conditions correlates with both the cellular and SCN data to suggest the persistence of capacity to maintain behavioural rhythms close to the circadian range, but which is masked under most circumstances. We suggest that in vivo the (pathophysiological) stress of CRY-deficiency is epistatic to the expression of daily rhythms in locomotor activity following standard entrainment by light:dark cycles and thus, whilst not arrhythmic, also cannot be described as circadian in the strictest sense.”

      3) The authors proposed that CKId/e at least in part is the component of cytoscillator (Fig. 5D), and turnover control of PER (likely to be controlled by CKId/e) may be an interaction point between cytoscillator and canonical circadian TTFL (Fig. 4). Strictly speaking, this model is not directly supported by the experimental setting of the current manuscript. The contribution of CKId/e is evaluated in the presence of PER by monitoring the canonical TTFL output (i.e. PER2::LUC); thus it is not clear whether the kinase determines the period of cytoscillator. It would be valuable to ask whether the PF and CHIR have the period-lengthening effect on the Nrd1:LUC in the CPKO cell.

      Another excellent suggestion, thanks. The experiment, showing similar results in CKO and CPKO cells, was performed and is now reported in Revised Figure S5D. The text was amended as follows: “We found that inhibition of CK1d/e and GSK3-α/β had the same effect on circadian period in CKO cells, CPKO cells, and WT controls (Figure 5A, B, S5A, B, D).”

      Moreover, our data are further supported by findings in RBCs, where CK1 inhibition affects circadian period in a similar manner as in WT and CKO cells (Beale et al, JBR 2019).

      **Minor comments:**

      4) The authors argue that the CKO cells' rhythmicity is entrained by the temperature cycle (Fig. 2C). Because the data of CKO cell only shows one peak after the release of constant temperature phase, it is difficult to conclude whether the cell is entrained or just respond to the final temperature shift.

      We agree with the reviewer and have replaced the original figure with another recording that includes an extra circadian cycle in free-running conditions (Revised Figure 2C).

      5) It would be useful for readers to provide information on the known phenotype of TIMELESS knockout flies; TIM is widely accepted as an essential component of the circadian clock in flies; are there any studies showing the presence of circadian rhythmicity in Tim-knockout flies (even if it is an oscillation seen in limited conditions, such as the neonatal SCN rhythm in mammalian Cry knockout)?

      The reviewer is correct that TIM is widely accepted as an essential component of the circadian clock in flies. Using more sensitive modern techniques however, ~50% of classic Tim01 mutant flies exhibit significant behavioural rhythms in the circadian range under constant darkness, as reported:

      https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/year/2015/docId/11914

      For this reason we employed a full gene knockout of the Timeless gene (Lamaze et al., Sci Rep, 2017), where the majority of flies are behaviourally arrhythmic under constant conditions following standard entrainment by light cycles and therefore represents a more appropriate model for CRY-deficient cells.

      We have revised the legend of Figure S2 to include the following:

      “N.B. The generation of Timout flies is reported in Lamaze et al, Sci Rep, 2017. Similar to CRY-deficient mice, whole gene Timeless knockout flies are characterised as being behaviourally arrhythmic under constant darkness following entrainment by light:dark cycles: https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/year/2015/docId/11914”

      5) Figure 3C shows that the amount of PER2::LUC mRNA changes ~2 fold between time = 0 hr and 24 hr in the CKO cell. This amplitude is similar to that observed in WT cell although the peak phase is different. Does the PER2::LUC mRNA level show the oscillation in CKO cells?

      No, we think we have shown convincingly this is not the case. We argue the data in figure 3C show that: (a) there is no circadian variation in mRNA PER2::LUC expression (mRNA levels increase but no trough is observed) and (b) that the temporal relationship between protein and mRNA as observed in WT is broken; i.e. the CRY-independent circadian variation in protein levels cannot be “driven by” changes in transcript levels. Similar results were obtained using transcriptional reporters Per2:LUC and Cry1:LUC (Figure S3E and F). Moreover, our findings are also in line with previous reports, such as Nangle et al. (2014, eLife) and Ode et al. (Mol Cell, 2017).

      6) Figure 3D: the authors discuss the amplitude and variation (whether the signal is noisier or not) of reporter luciferase expression between different cell lines. However, a huge difference in the luciferase signal can be observed even in the detrended bioluminescence plot. This reviewer concerns that some of the phenotypes of CKO and CPKO MEF reflect the lower transfection efficiency of the reporter gene, not the nature of circadian oscillators of these cell lines.

      As reported in the methods, these are stable cell lines rather than transiently transfected cells. The detrended luciferase data presented here do not actually reflect raw levels of luciferase protein expression, but rather reflect the amount of deviation from the 24 hour average. To make it easier to compare expression levels of Per2:LUC and Nr1d1:LUC between the different cell lines we have added figure S3H, presenting the average raw bioluminescence levels over 24 hours (after 24 hours of recovery from media change; ie from 24-48 hours). Using these data one can appreciate that expression levels of the Per2 reporter are never lower in CRY KO cells when compared to WT. We hope these data can take away the reviewer’s concerns about expression levels causing the differences observed.

      Reviewer #1 (Significance (Required)): Although Cryptochrome (Cry) has been considered a central component of the mammalian circadian clock, several studies have shown that circadian rhythms are maintained in the absence of Cry, including in the neonate SCN and red blood cells. Thus, although the need for Cry as a circadian oscillator has been debated, its essential role as a circadian oscillator remains established, at least in the cell-autonomous clock driven by the TTFL. This study provides additional evidence that the circadian rhythmicity can persist in the absence of Cry. More general context, the presence of a non-TTFL circadian oscillator has been one of the major topics in the field of circadian clocks except for the cyanobacteria. In mammals, the authors’ and other groups lead the finding of circadian oscillation in the absence of canonical TTFL by showing the redox cycle in red blood cells (O’Neil, Nature 2011). The presence of circadian oscillation in the absence of Bmal1 is also reported recently(Ray, Science 2020). Bmal1(-CLOCK), CRY, and PER compose the core mechanism of canonical circadian TTFL; thus, this manuscript put another layer of evidence for the non-TTFL circadian oscillation in mammals. Overall, the manuscript reports several surprising results that will receive considerable attention from the circadian community. This reviewer has expertise in the field of mammalian circadian clocks, including genomics, biochemistry, and mice's behavior analysis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In the canonical model of the mammalian circadian system, transcription factors, BMAL1/CLOCK, drive transcription of Cry and Per genes and CRY and PER proteins repress the BMAL1/CLOCK activity to close the feedback loop in a circadian cycle. The dominant opinion was that CRY1 and CRY2 are essential repressors of the mammalian circadian system. However, this was challenged by persistent bioluminescence rhythms observed in SCN slices derived from Cry-null mice (Maywood et al., 2011 PNAS) and then by persistent behavior rhythms shown by the Cry1 and Cry2 double knockout mice if they are synchronized under constant light prior to free running in the dark (Ono et al., 2013 PLOS One). In the manuscript, the authors first confirmed behavioral and molecular rhythms in the Cry1/Cry2- deficient mice and then provided evidence to suggest the rhythms of Per2:LUC and Nr1d1:LUC in CKOs are generated from the cytoplasmic oscillator instead of the well-studied transcription and translation feedback loop: Constant Per2 transcription driven by BMAL1/CLOCK plus rhythmic degradation of the PER protein result in a rhythmic PER2 level in the absence of both Cry1 and Cry2, which suggests a connection between the classic transcription- and translation-based negative feedback loops and non-canonical oscillators. **Major points:** Line 38-39, "Challenging this interpretation, however, we find evidence for persistent circadian rhythms in mouse behavior and cellular PER2 levels when CRY is absent." The rhythmic behavioral phenotype of cry1 and cry2 double knockout mice was first documented by Ono et al., 2013 PLOS ONE, in which eight cry1 and cry2 double knockout mice after synchronization in the light displayed circadian periods with different lengths and qualities. The paper reported two period lengths from the Cry mutant mice: "An eye-fitted regression line revealed that the mean shorter period was 22.86+/-0.4 h (n= 8) and the mean longer period was 24.66+/-0.2 h (n =9). The difference of two periods was statistically significant (p, 0.01).", either of which is quite different from the ~16.5 hr period in Figure 1B of the manuscript. A brief discussion on the period difference between studies will be helpful for readers to understand. Period information from the individual mouse should be calculated and shown since big period variations exist among CKO mice (Ono et al., 2013 PLOS One).

      Thanks for this suggestion. The mice used by Ono et al were raised from birth in constant light, whereas we used mice that were weaned and raised in normal LD cycles before being subject to constant light then constant dark as adults. Instead of the somewhat subjective fitting of regression lines by eye performed by Ono et al, our analysis was performed using the periodogram analysis routine of ClockLab 6.0 with a significance threshold for rhythmicity of p=0.0001. We have now repeated this experiment with 10 adult CKO mice (male and female), and found no evidence for two period lengths in that the second most significant period was consistently double that of the first. As the reviewer suggests, there is a much broader distribution of CKO mouse periods compared with WT, as we also found in cultured cells and SCN. These new data are now reported in revised Figure S1B & C. We have also included a statement about how our study differs from Ono et al in the supplementary discussion.

      The behavioral phenotype of Cry-null mice and luminescence from their SCNs are robustly rhythmic while fibroblasts derived from these mice only produce rhythms with very low amplitudes compared with those in WT, which may reflect the difference between the SCN’s rhythm and peripheral clocks. The behavioral phenotype is supposed to be controlled mainly by SCN. However, most molecular analyses in the work were done with MEF and lung fibroblasts. These tissues may not be the best representative of the behavioral phenotype of the CKO mice.

      Behavioural rhythms of CKO mice are significantly less robust than WT, with mean amplitude less than 50% of WT controls (Figures 1A & B, revised S1B. Furthermore, as reported, 40% of CKO SCN slices exhibited PER2::LUC rhythms, compared with 100% of WT SCN slices (as also observed by Maywood et al., PNAS, 2013), and therefore are also less robust by the definition used in this manuscript.

      As now discussed in the revised supplementary discussion:

      Circadian timekeeping is a cellular phenomenon. Co-ordinated ~24h rhythms in behaviour and physiology are observed in multi-cellular mammals under non-stressed conditions when individual cellular rhythms are synchronised and amplified by appropriate extrinsic and intrinsic timing cues.”

      The objective of this study was to understand the fundamental determinants that allow mammalian cells to generate a circadian rhythm, which we find does not include an essential role for CRY genes/proteins. Thus the cell is the appropriate level of biological abstraction at which to investigate the phenomenon, whereas the SCN and behavioural recordings simply serve to illustrate the competence of CRY-independent timing mechanisms to co-ordinate biological rhythms at higher levels of biological scale which are manifest under some conditions. To reiterate, the behavioural data supports the cellular observations, not the converse.

      Stronger evidence is needed to fully exclude the possibility that in CKO cells, the rhythm is not generated by PERs' compensation for the loss of Crys to repress BMAL1 and CLOCK. Since the rhythms of Per:LUC or Nr1d1:LUC (Figures 3D and S3E) are much weaker than those in WT, molecular analyses might not be sensitive enough to reflect the changes across a circadian cycle in the CKOs if the TTFL still occurs. CLOCKΔ19 mutant mice have a ~4 hr longer period than WT (Antoch et al., 1997 Cell; King et al., 1997 Cell). CLOCKΔ19; CKO cells or mice should be very helpful to address the question. Periods of Per:LUC and Nr1d1:LUC from the CLOCKΔ19; CKO should be similar to those in the CKO alone if the transcription feedback does not contribute to their oscillations.

      We agree this would be an interesting experiment, however the data in this manuscript and Wong et al. (BioRxiv, 2020), whilst not disputing the existence of the TTFL, strongly suggest that it fulfils a different function to that which is currently accepted and is not the mechanism that ultimately confers circadian periodicity upon mammalian cells. CLOCKΔ19 is an antimorphic gain-of-function mutation with many pleiotropic effects. Therefore, if the TTFL is not the basis of circadian timekeeping in mammalian cells, it follows that the CLOCKΔ19 mutation may not elicit its effects on circadian rhythms through delaying the timing of transcriptional activation, as was proposed. As such, whether or not CLOCKΔ19 alters circadian period of CKO cells/mice would not allow the two models to be distinguished in the way that the reviewer envisions.

      Secondly, we cannot detect any interaction between PER2 and BMAL1 in the absence of CRY using an extremely sensitive assay.

      Thirdly, very strong biochemical evidence suggests that PER has no repressive function in the absence of CRY (Chiou et al., 2016; Kume et al., 1999; Ode et al., 2017; Sato et al., 2006).

      Finally, in several figures particularly 3C and 4A, we show that PER2 peaks at the same time CKO and WT cells, but in CKO cells this is not accompanied by a coincident peak in the mRNA. Thus, even if PER were able to repress BMAL1/CLOCK without CRY, rhythms in PER2 protein level could not be explained by some residual PER/BMAL1-dependent TTFL mechanism.

      To address the reviewer’s concern however, we have employed mouse red blood cells which offer unambiguous insight into the causal determinants of circadian timing, as we can be absolutely confident that there is no transcriptional contribution to cellular timekeeping. Briefly, we took fibroblasts and RBCs from WT, short period Tau/Tau and long period Afh/Afh mutant mice. The basis of the circadian phenotype of these mutations is quite well established as occurring through the post-translational regulation of PER and CRY proteins respectively, and result in short and long period PER2::LUC rhythms compared with WT fibroblasts. RBCs do not express PER or CRY proteins, and commensurately no genotype-dependent differences of RBC circadian period were observed (Beale et al, 2020, in submission). In contrast, RBC circadian rhythms are sensitive to pharmacological inhibition of casein kinase 1 (Beale et al., JBR, 2019).

      Lines 51-52, "PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping" and lines 310-311, "We found that transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in animal and cellular models." The authors have not excluded the possibility that the rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN. In the fibroblasts, only two genes, Per2 and Nr1d1, have been studied in the work, which cannot be simply expanded to the thousands of circadian controlled genes. Also amplitudes of PER2:LUC and NR1D1:LUC in the CKOs are much lower than those in WT and no evidence has been provided to show that their weak rhythms are biologically relevant.

      The definition of a circadian rhythm (Pittendrigh, 1960) does not mention biological relevance or stipulate any lower threshold for amplitude. As now stated in the revised text (page 6):

      PER2::LUC rhythms in CKO cells were temperature compensated (Figure 2A, B) and entrained to 12h:12h 32°C:37°C temperature cycles in the same phase as WT controls (Figures 2C), and thus conform to the classic definition of a circadian rhythm (Pittendrigh, 1960) – which does not stipulate any lower threshold for amplitude or robustness.

      We make no claims about biological relevance or amplitude in this manuscript, which are addressed in our related manuscript (Wong et al., BioRxiv, 2020). In this related manuscript, we explicitly address whether CRY is necessary for mammalian cells to maintain a circadian rhythm in the abundance of clock-controlled proteins and find that it is not. Indeed, twice as many rhythmically abundant proteins are observed in CKO cells than WT controls, which suggests that, if anything, CRY functions to suppress rhythms in protein abundance rather than to generate them.

      We observe circadian rhythms in the activity of two different bioluminescent reporters, which have already been extensively characterised. The mouse and SCN data in figure 1 are correlative, and simply show that previous published observations are reproducible. PER2::LUC oscillations are not accompanied by Per2 mRNA oscillations. This, together with the absence of a BMAL1-PER2::LUC complex strongly argues against a model where PER2 oscillations are driven by residual (PER2-driven) transcriptional oscillations.

      We therefore concede the reviewer’s point that we “cannot exclude rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN”. The reviewer will agree however, that there exists very strong biochemical evidence suggests that PER has no repressive function in the absence of CRY (Chiou et al., 2016; Kume et al., 1999; Ode et al., 2017; Sato et al., 2006); that there exists no experimental evidence to suggest that PERs can fulfil this function in the absence of CRY in any mammalian cellular context; and finally that our observations are not consistent with the canonical model for the generation of circadian rhythms in mammals.

      We have therefore amended the text to focus on CRY specifically, as follows:

      PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping

      Page 12. “We found that CRY-mediated transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in cellular models. Whilst we cannot exclude the possibility that in the SCN, but not fibroblasts, PER alone may be competent to effect transcriptional feedback repression in the absence of CRY, we are not aware of any evidence that would render this possibility biochemically feasible.”

      **Minor points:** Lines 66-67, "...(Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)." to "... (reviewed in Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)."

      Thanks, changed as requested.

      Line 70, "...((Liu et al., 2008..." to "...(Liu et al., 2008..."

      Thanks, changed as requested.

      Lines 174-175, "Considering recent reports that transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ...". Larrondo et al., 2015 paper says "however, in such ∆fwd-1 cells, the amount of FRQ still oscillated, the result of cyclic transcription of frq and reinitiation of FRQ synthesis." The point of the paper is "we unveiled an unexpected uncoupling between negative element half-life and circadian period determination." instead of "...transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ,"

      This is a good point which, following discussion with Profs Dunlap and Larrondo, we have revised into “no obligate relationship between clock protein turnover and circadian regulation of its activity” – a more accurate summary of their findings.

      Lines 249-252, "CKO cells exhibit no rhythm in Per2 mRNA (Figure 3C, D), nor do they show a rhythm in global translational rate (Figure S4A, B), nor did we observe any interaction between BMAL1 and S6K/eIF4 as occurs in WT cells (Lipton et al, 2015) (Figure S4C)." In figures 3D and S3E, in CKO and CPKO cells the Per2:LUC data without fitting look better than that of Nr1d1:LUC. But the Nr1d1:LUC rhythm became clear after fitting the raw data. So to better visualize the low amplitude rhythm, if any, of Per2:LUC and compare with Nr1d1:LUC, fitted the Per2:LUC data in CKOs and CPKOs in Figure 3D and S3E should be shown as what has been done to Nr1d1:LUC.

      Thanks, these data can be found in Figure S3F. The detrended Per2:Luc CKO and CPKO bioluminescence traces were better fit by the null hypothesis (straight line) than a damped sine wave (p>0.05) and so were not significantly rhythmic by the criteria used in this manuscript.

      Lines 258-259, "much less than the half-life of luciferase expressed in fibroblasts under a constitutive promoter" In figure S4D, the y-axis of the PER2::LUC is ~800 while the y-axis of the SV40::LUC is ~600000. The over-expressed LUC by the SV40 promoter might saturate the degradation system in the cell so the comparison is not fair. A weaker promoter with the level similar to Per2 should be used to make the comparison.

      Thank you for this suggestion. In our experience, the SV40 promoter is actually a rather weak promoter compared with CMV, and faithfully facilitates the constitutive (non-rhythmic) expression of heterologous proteins such as Luciferase (Feeney et al., JBR, 2016). It has been shown previously that constitutive over-expression of heterologous proteins such as GFP or even CRY1 does not affect circadian rhythms in fibroblast cells (e.g. Chen et al., Mol Cell, 2009). To address the reviewer’s reasonable concern however, multiple stable SV40:Luc fibroblast lines were generated by puromycin selection, grown to confluence in 96-well plates, then treated with 25 μg/mL CHX at the beginning of the recording. Random genomic integration of SV40:Luc leads to a broad range of different levels of luciferase expression, evident from the broad range of initial luciferase activities. For each line the decline in luciferase activity was fit with a simple one-phase exponential decay curve (R2≥0.98) to derive the half-life of luciferase in each cell line. There was no significant relationship between the level of luciferase expression and luciferase stability (straight line vs. horizontal line fit p-value = 0.82). Therefore constitutive expression of SV40:Luc in fibroblasts does affect the cellular protein degradation machinery within the range of expression used for our half-life measurements. These new data are reported in Revised Figure S3H.

      Line 430, "sigma" to "Sigma".

      Changed

      In figure S2, the classification of rhythms in Drosophila is not clear since even the "Robustly rhythmic" ones have high background noise. Detrending or fitting the data might be able to improve the quality of the rhythms prior to classification.

      These are noisy data as they come from freely behaving flies. The mean data was shown in Figure S3A and individual examples in S3B, and look very similar to previous bioluminescence fly recordings of XLG-LUC flies in papers from the Stanewsky lab who have published extensively using this model. The classifications arose from double-blinded analysis of the bioluminescence traces by several individuals, but we agree that this was not clearly communicated in our original submission. In Revised figure S2 we now present the mean bioluminescence traces, with and without damped sine wave vs. straight line fitting, as suggested, which is more consistent with the mammalian cellular data presented elsewhere.

      In figure S3B, the original blots for Per2 including Input and IP should be shown.

      The original blots for BMAL1 are shown in figure S3I. PER2::LUC levels were assessed by measuring bioluminescence levels present on the anti-bmal1-beads, as described in the figure 3B legend.

      Supplemental information Line 44, "...(reviewed in (Lakin-Thomas,..." to "...(reviewed in Lakin-Thomas,..."

      Changed

      Line 188, "Period CDS", the full name of CDS should be provided the first time it appearances.

      Changed to “coding sequence”.

      Reviewer #2 (Significance (Required)): The work suggests a link between the TTFL and non-canonical oscillators, which should be interesting to the circadian field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** The paper "CRYPTOCHROMES confer robustness, not rhythmicity, to circadian timekeeping" by Putker et al. answers the question of whether or not the rhythmic abundance of clock proteins is a prerequisite for circadian timekeeping. They addressed this by monitoring PER2::LUC rhythms in WT and CRY KO (CKO) cells. CRY forms a complex with PER, which in turn represses the ability of CLOCK/BMAL1 to drive the expression of clock-controlled genes, including PER and CRY. Consistent with previous observations, the authors found residual PER2::LUC rhythms in CKO SCN slices, fibroblasts and in a functional analogue KO of CRY in Drosophila, even in the absence of rhythmic Per2 transcription due to the loss of CRY as a negative regulator of the oscillation. They have shown that these rhythms, in the absence of CRY, follow the formal definition of circadian rhythms. They attributed these residual PER2::LUC rhythms to the maintenance of oscillation in PER2::LUC stability independent of CRY, by testing the decay kinetics of luciferase activity when translation is inhibited. Moreover, they implicated the kinases CK1d/e and GSK3 to be involved in regulating PER2::LUC post-translational rhythms through kinase inhibitor studies. They concluded that CRY is not necessary for maintaining PER2::LUC rhythms, but plays an important role in reinforcing high-amplitude rhythms when coupled to a proposed "ctyoscillator" likely composed of CK1d/e and GSK3. **Major comments:** The authors have shown sufficient data that under different testing conditions (mice locomotor activity, SCN preps or fibroblasts), behavioral rhythms and PER2::LUC rhythms are still observed in the CRY KO (CKO) cells, contrary to a previous study (Liu et al., 2007). They also indicated limitations to some of the.experimental work. However, there are some parts of the paper that need clarification to support their conclusions. 1.In Fig. 1A, the x-axes of the actograms for WT and CKO are different. While they mentioned this in the figure legend, and described the axis transformation in Fig. S1A, they need a justification statement about why they did this in the results.

      Thanks, we have included the following sentence in the results section as requested:

      Figure 1 representative actograms are plotted as a function of endogenous tau (**t) to allow the periodic organisation of rest-activity cycles to be readily discerned; 24h-plotted actograms are shown in Figure S1A and S2A

      2.In an attempt to show conservation of their proposed role for CRY, they tested the model system Drosophila melanogaster where TIMELESS serves as the functional analogue of CRY. While they showed in the figures and described in the text that rhythms still persisted with lower relative amplitude in the TIMELESS-deficient flies, they did not describe any period differences between WT and mutant. Showing the period quantification in Supp. Fig. S2 using the robustly rhythmic datasets, and describing this data in the text, will strengthen their claim.

      These analyses are now reported in revised Figure S2 as requested. As described in our response to reviewer 2, the “robustly rhythmic” flies were scored as such through double-blinded analysis by several individuals. We hope the reviewer will appreciate our concern that exclusion of the majority of TIMELESS-deficient flies that were not robustly rhythmic might skew their apparent period by unconscious bias towards favouring traces that most clearly resemble robustly rhythmic WT controls. To avoid any potential bias we therefore included all flies of both genotypes in the analysis of circadian period for the revised figure, as suggested by our other reviewers.

      In Fig. S2B, there is no clear distinction between the representative datasets shown for poorly rhythmic and arrhythmic, i.e. they all appear arrhythmic, without an indicated statistical test. The authors could present better representative data to better reflect the categories.

      As described above, we now show the grouped mean with and without fitting for all flies of both genotypes. The statistical test for rhythmicity and analysis of circadian period is now the same as was performed for the cellular data presented elsewhere.

      3.In Fig. 2A, the authors note the lack of rhythmicity in the CKO fibroblasts in the 1st three days at 37oC. How are the conditions here different from fibroblasts in Fig. 1E, where rhythms are seen during the 1st three days in CKO fibroblasts?

      As discussed in the manuscript, PER2::LUC rhythms in CKO cells and SCN are observed stochastically between recordings i.e. if one dish in a recording showed rhythms, all dishes showed rhythms and vice versa. The media change that occurred after 3 days in Fig 2A, in this case, was sufficient to initiate clear rhythms of PER2::LUC in all experimental replicates. In other experiments, media change did not have this effect. Herculean efforts by multiple lab members over many years, including the PI, have been unable to delineate the basis of this variability – which is discussed at length in the thesis of Dr. David Wong https://www.repository.cam.ac.uk/handle/1810/300610. As such, we clearly state in the discussion:

      We were unable to identify all of the variables that contribute to the apparent stochasticity of CKO PER2::LUC oscillations, and so cannot distinguish whether this variability arises from reduced fidelity of PER2::LUC as a circadian reporter or impaired timing function in CKO cells. In consequence, we restricted our study to those recordings in which clear bioluminescence rhythms were observed, enabling the interrogation of TTFL-independent cellular timekeeping.”

      1. The authors claimed in the results section- "in contrast and as expected, Per2 mRNA in WT cells varied in phase with co-recorded PER2::LUC oscillations." but Fig. 3C does not show this expected lag between mRNA and protein levels. This needs to be explained

      No lag is expected in vitro. A lag between PER protein levels and Per mRNA does occur in vivo and is very likely to attributable to daily rhythms in feeding (Crosby et al, Cell, 2019), where increased insulin signalling elicits an increase in PER protein production 4-6h after E-box and GRE-stimulated increase in Per transcription.

      When luciferin is saturating intracellularly, PER2::LUC activity correlates most closely with the amount of PER2::LUC protein that was translated during the preceding 1-2h, rather than the total amount of PER2, due to the enzymatic inactivation of the luciferase protein (Feeney et al, JBR, 2016). Consistent with many previous observations, under constant conditions, the rate of nascent PER protein synthesis is largely determined by the level of Per2 mRNA, and thus more similar phases are observed between protein and mRNA in vitro than in vivo.

      We have inserted an additional citation of Feeney et al at this point in the text to make this clear.

      5.In Figs. 5A-B, the PER2::LUC periods in the CKO untreated cells seem to vary significantly between A, B, and C. While this could be due to the high variability in the rhythms that were previously described by the authors, the average periods here seem to be longer than the one reported in Fig. 1F. Are there specific condition differences?

      There are no specific condition differences. As reported in Figure S1B, D & E, the range of CKO cellular periods is simply much broader than for WT cells. Over several dozen experiments the average period was significantly shorter, but the period variance is an equally striking feature of rhythms in these cells which we take as evidence for their lack of robustness.

      *Would additional experiments be essential to support the claims of the paper?*

      1. There is sufficient experimental data to support the major claims; however some suggested experiments are listed below.

        a. If CKO exhibits residual rhythms in PER::LUC, it would be interesting to know how CRY overexpression influences PER2::LUC rhythms, or point to previous reference papers which may have already shown such effects. The prediction would be PER2::LUC levels will still be rhythmic when CRY is overexpressed. What would be the extent of "robustness" conferred by CRY on PER2::LUC rhythms based on CRY KO and overexpression studies?

      These experiments have largely already been performed (see Chen et al., Mol Cell; Nangle et al., eLife, 2014; Fan et al., Curr Biol, 2007; Edwards et al., PNAS, 2016) and are cited in this manuscript. As suggested, PER2 rhythms remain intact under CRY1 over-expression, though are clearly perturbed, but their robustness was not investigated in any detail. We hope to be able to address this important question in our subsequent work

      The authors found that CK1d/e and GSK3 contribute to CRY-independent PER2 oscillations by showing that addition of kinase inhibitors affect the PER2::LUC period lengths in WT and CKO in the same manner. It would be interesting to know if a) PER2::LUC stability and b) PER2 phosphorylation status, is affected in WT and CKO in the presence of the inhibitors, or point to previous reference papers which may already have shown such effects.

      As the reviewer points out, PER2 stability is already reported to be regulated via phosphorylation by GSK3 and CK1. We have made explicit reference to this in the revised manuscript as follows:

      In contemporary models of the mammalian cellular clockwork CRY proteins are essential for rhythmic PER protein production, however, the stability and activity of PER proteins are also regulated post-translationally (Lee et al., 2009; Philpott et al., 2020; Iitaka et al, 2005).”

      *Are the data and the methods presented in such a way that they can be reproduced?*

      1. The protocol for the inhibitor treatments are not in the main or supplemental methods.

      In the main text methods, section luciferase recordings we state: “For pharmacological perturbation experiments (unless stated otherwise in the text) cells were changed into drug-containing air medium from the start of the recording. Mock-treatments were carried out with DMSO or ethanol as appropriate.”

      *Are the experiments adequately replicated and statistical analysis adequate?*

      1. All experiments had the sufficient number of technical and biological replicates to make valid statistical analyses. For Fig. S2, the authors used RAIN to assess rhythmicity in WT and mutant flies, but it is not clear whether the different categories (rhythmic, poorly rhythmic, and arrhythmic) were based on amplitude differences alone, or a combination of amplitude and p-values as determined by RAIN.

      As reported above, we have revised the analysis of the fly data to be consistent with the cellular data reported elsewhere in the manuscript.

      **Minor comments:** *1. Are prior studies referenced appropriately?* Authors may wish to include Fan et al., 2007, Current Biology which demonstrated that cycling of CRY1, CRY2, and BMAL1 is not necessary for circadian-clock function in fibroblasts.

      Apologies for the omission of citation to this excellent paper. Now referenced in the introduction.

      *2. Are the text and figures clear and accurate?* Figures were clear and illustrated well. See minor comments on text below:

      1. Other minor comments

      Main Text: p3, line 62; p12, line l32: It doesn't seem necessary or appropriate to cite the dictionary for the definition of robust.

      Thanks for this suggestion. During preparation of the manuscript we found that there was some disagreement between authors as to the meaning of robustness in a circadian context. We therefore feel it most necessary to define clearly what we mean by the use of this word to avoid any potential ambiguity.

      p4, line l87: "~20 h" rhythms instead of "~20h-hour" p3, line 70; p5, line 121; p14, line 380; p16, line 416 and p18, line 458: Close parentheses have been doubled in parenthetical references. p14, line 363: "crassa" instead of "Crassa" p17, line 430: "Sigma" instead of "sigma" p18, lines 464 and 483; p20, line 521: put a space between numerical values and units, to be consistent with other entries p19, line 488: "luciferase" instead of Luciferase p20, line 512: "Cell Signaling" instead of "cell signalling" p20, line 526: "single" instead of "Single"

      We thank the reviewer for his/her thoroughness, all of the above have been changed.

      Main figures: Fig. 2 p37, line 921: close parenthesis was doubled on "red"

      This was actually correct.

      Fig. 4 p41, line 989: "0.1 mM" instead of "0.1 mM" for consistency throughout text Supplementary text: line 171: "30 mM HEPES" instead of "30mM HEPES" line 184: "Cell Signaling" instead of "cell signalling" Supplementary figures: Fig. S2A "Drosophila melanogaster" instead of "Drosophila Melanogaster"

      All of the above have been changed.

      Reviewer #3 (Significance (Required)): This paper revisits the previously proposed idea that rhythmic expression of central TTFL components is not essential for circadian timekeeping to persist. However, this paper does not add a significant advance in the understanding of the underlying reasons behind sustained clock protein rhythmicity like PER in the absence of CRY, since such mechanisms in functional analogs have been shown in other systems, like Neurospora (Larrondo et al., 2015). However, this paper does clarify some issues in the field, such as discrepancies between behavioral and cellular rhythms observed in CKO mice, leading future researchers to examine closely the conditions of their CKO rhythmic assays before making conclusions pertaining to rhythmicity. The identification of the kinases as components of the proposed cytosolic oscillator (cytoscillator) needs further validation, but this is perhaps beyond the scope of the paper. The data provides incremental evidence for the existence of a cytoscillator, but opens up opportunities to identify other players, like phosphatases, to establish the connection between the central TTFL and the proposed cytoscillator.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The paper "CRYPTOCHROMES confer robustness, not rhythmicity, to circadian timekeeping" by Putker et al. answers the question of whether or not the rhythmic abundance of clock proteins is a prerequisite for circadian timekeeping. They addressed this by monitoring PER2::LUC rhythms in WT and CRY KO (CKO) cells. CRY forms a complex with PER, which in turn represses the ability of CLOCK/BMAL1 to drive the expression of clock-controlled genes, including PER and CRY. Consistent with previous observations, the authors found residual PER2::LUC rhythms in CKO SCN slices, fibroblasts and in a functional analogue KO of CRY in Drosophila, even in the absence of rhythmic Per2 transcription due to the loss of CRY as a negative regulator of the oscillation. They have shown that these rhythms, in the absence of CRY, follow the formal definition of circadian rhythms. They attributed these residual PER2::LUC rhythms to the maintenance of oscillation in PER2::LUC stability independent of CRY, by testing the decay kinetics of luciferase activity when translation is inhibited. Moreover, they implicated the kinases CK1and GSK3 to be involved in regulating PER2::LUC post-translational rhythms through kinase inhibitor studies. They concluded that CRY is not necessary for maintaining PER2::LUC rhythms, but plays an important role in reinforcing high-amplitude rhythms when coupled to a proposed "ctyoscillator" likely composed of CK1and GSK3.

      Major comments:

      The authors have shown sufficient data that under different testing conditions (mice locomotor activity, SCN preps or fibroblasts), behavioral rhythms and PER2::LUC rhythms are still observed in the CRY KO (CKO) cells, contrary to a previous study (Liu et al., 2007). They also indicated limitations to some of the.experimental work. However, there are some parts of the paper that need clarification to support their conclusions.

      1.In Fig. 1A, the x-axes of the actograms for WT and CKO are different. While they mentioned this in the figure legend, and described the axis transformation in Fig. S1A, they need a justification statement about why they did this in the results.

      2.In an attempt to show conservation of their proposed role for CRY, they tested the model system Drosophila melanogaster where TIMELESS serves as the functional analogue of CRY. While they showed in the figures and described in the text that rhythms still persisted with lower relative amplitude in the TIMELESS-deficient flies, they did not describe any period differences between WT and mutant. Showing the period quantification in Supp. Fig. S2 using the robustly rhythmic datasets, and describing this data in the text, will strengthen their claim.

      In Fig. S2B, there is no clear distinction between the representative datasets shown for poorly rhythmic and arrhythmic, i.e. they all appear arrhythmic, without an indicated statistical test. The authors could present better representative data to better reflect the categories.

      3.In Fig. 2A, the authors note the lack of rhythmicity in the CKO fibroblasts in the 1st three days at 37oC. How are the conditions here different from fibroblasts in Fig. 1E, where rhythms are seen during the 1st three days in CKO fibroblasts?

      1. The authors claimed in the results section- "in contrast and as expected, Per2 mRNA in WT cells varied in phase with co-recorded PER2::LUC oscillations." but Fig. 3C does not show this expected lag between mRNA and protein levels. This needs to be explained

      5.In Figs. 5A-B, the PER2::LUC periods in the CKO untreated cells seem to vary significantly between A, B, and C. While this could be due to the high variability in the rhythms that were previously described by the authors, the average periods here seem to be longer than the one reported in Fig. 1F. Are there specific condition differences?

      Would additional experiments be essential to support the claims of the paper?

      1. There is sufficient experimental data to support the major claims; however some suggested experiments are listed below.

      a. If CKO exhibits residual rhythms in PER::LUC, it would be interesting to know how CRY overexpression influences PER2::LUC rhythms, or point to previous reference papers which may have already shown such effects. The prediction would be PER2::LUC levels will still be rhythmic when CRY is overexpressed. What would be the extent of "robustness" conferred by CRY on PER2::LUC rhythms based on CRY KO and overexpression studies?

      b. The authors found that CK1and GSK3 contribute to CRY-independent PER2 oscillations by showing that addition of kinase inhibitors affect the PER2::LUC period lengths in WT and CKO in the same manner. It would be interesting to know if a) PER2::LUC stability and b) PER2 phosphorylation status, is affected in WT and CKO in the presence of the inhibitors, or point to previous reference papers which may already have shown such effects.

      Are the data and the methods presented in such a way that they can be reproduced?

      1. The protocol for the inhibitor treatments are not in the main or supplemental methods.

      Are the experiments adequately replicated and statistical analysis adequate?

      1. All experiments had the sufficient number of technical and biological replicates to make valid statistical analyses. For Fig. S2, the authors used RAIN to assess rhythmicity in WT and mutant flies, but it is not clear whether the different categories (rhythmic, poorly rhythmic, and arrhythmic) were based on amplitude differences alone, or a combination of amplitude and p-values as determined by RAIN.

      Minor comments:

      1. Other minor comments

      Main Text:

      p3, line 62; p12, line l32: It doesn't seem necessary or appropriate to cite the dictionary for the definition of robust.

      p4, line l87: "~20 h" rhythms instead of "~20h-hour"

      p3, line 70; p5, line 121; p14, line 380; p16, line 416 and p18, line 458: Close parentheses have been doubled in parenthetical references.

      p14, line 363: "crassa" instead of "Crassa"

      p17, line 430: "Sigma" instead of "sigma"

      p18, lines 464 and 483; p20, line 521: put a space between numerical values and units, to be consistent with other entries

      p19, line 488: "luciferase" instead of Luciferase

      p20, line 512: "Cell Signaling" instead of "cell signalling"

      p20, line 526: "single" instead of "Single"

      Main figures:

      Fig. 2 p37, line 921: close parenthesis was doubled on "red"

      Fig. 4 p41, line 989: "0.1 mM" instead of "0.1 mM" for consistency throughout text

      Supplementary text:

      line 171: "30 mM HEPES" instead of "30mM HEPES"

      line 184: "Cell Signaling" instead of "cell signalling"

      Supplementary figures:

      Fig. S2A "Drosophila melanogaster" instead of "Drosophila Melanogaster"

      Significance

      This paper revisits the previously proposed idea that rhythmic expression of central TTFL components is not essential for circadian timekeeping to persist. However, this paper does not add a significant advance in the understanding of the underlying reasons behind sustained clock protein rhythmicity like PER in the absence of CRY, since such mechanisms in functional analogs have been shown in other systems, like Neurospora (Larrondo et al., 2015). However, this paper does clarify some issues in the field, such as discrepancies between behavioral and cellular rhythms observed in CKO mice, leading future researchers to examine closely the conditions of their CKO rhythmic assays before making conclusions pertaining to rhythmicity. The identification of the kinases as components of the proposed cytosolic oscillator (cytoscillator) needs further validation, but this is perhaps beyond the scope of the paper. The data provides incremental evidence for the existence of a cytoscillator, but opens up opportunities to identify other players, like phosphatases, to establish the connection between the central TTFL and the proposed cytoscillator.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the canonical model of the mammalian circadian system, transcription factors, BMAL1/CLOCK, drive transcription of Cry and Per genes and CRY and PER proteins repress the BMAL1/CLOCK activity to close the feedback loop in a circadian cycle. The dominant opinion was that CRY1 and CRY2 are essential repressors of the mammalian circadian system. However, this was challenged by persistent bioluminescence rhythms observed in SCN slices derived from Cry-null mice (Maywood et al., 2011 PNAS) and then by persistent behavior rhythms shown by the Cry1 and Cry2 double knockout mice if they are synchronized under constant light prior to free running in the dark (Ono et al., 2013 PLOS One). In the manuscript, the authors first confirmed behavioral and molecular rhythms in the Cry1/Cry2- deficient mice and then provided evidence to suggest the rhythms of Per2:LUC and Nr1d1:LUC in CKOs are generated from the cytoplasmic oscillator instead of the well-studied transcription and translation feedback loop: Constant Per2 transcription driven by BMAL1/CLOCK plus rhythmic degradation of the PER protein result in a rhythmic PER2 level in the absence of both Cry1 and Cry2, which suggests a connection between the classic transcription- and translation-based negative feedback loops and non-canonical oscillators.

      Major points:

      Line 38-39, "Challenging this interpretation, however, we find evidence for persistent circadian rhythms in mouse behavior and cellular PER2 levels when CRY is absent." The rhythmic behavioral phenotype of cry1 and cry2 double knockout mice was first documented by Ono et al., 2013 PLOS ONE, in which eight cry1 and cry2 double knockout mice after synchronization in the light displayed circadian periods with different lengths and qualities. The paper reported two period lengths from the Cry mutant mice: "An eye-fitted regression line revealed that the mean shorter period was 22.86+/-0.4 h (n= 8) and the mean longer period was 24.66+/-0.2 h (n =9). The difference of two periods was statistically significant (p, 0.01).", either of which is quite different from the ~16.5 hr period in Figure 1B of the manuscript. A brief discussion on the period difference between studies will be helpful for readers to understand. Period information from the individual mouse should be calculated and shown since big period variations exist among CKO mice (Ono et al., 2013 PLOS One).

      The behavioral phenotype of Cry-null mice and luminescence from their SCNs are robustly rhythmic while fibroblasts derived from these mice only produce rhythms with very low amplitudes compared with those in WT, which may reflect the difference between the SCN's rhythm and peripheral clocks. The behavioral phenotype is supposed to be controlled mainly by SCN. However, most molecular analyses in the work were done with MEF and lung fibroblasts. These tissues may not be the best representative of the behavioral phenotype of the CKO mice.

      Stronger evidence is needed to fully exclude the possibility that in CKO cells, the rhythm is not generated by PERs' compensation for the loss of Crys to repress BMAL1 and CLOCK. Since the rhythms of Per:LUC or Nr1d1:LUC (Figures 3D and S3E) are much weaker than those in WT, molecular analyses might not be sensitive enough to reflect the changes across a circadian cycle in the CKOs if the TTFL still occurs. CLOCKΔ19 mutant mice have a ~4 hr longer period than WT (Antoch et al., 1997 Cell; King et al., 1997 Cell). CLOCKΔ19; CKO cells or mice should be very helpful to address the question. Periods of Per:LUC and Nr1d1:LUC from the CLOCKΔ19; CKO should be similar to those in the CKO alone if the transcription feedback does not contribute to the their oscillations.

      Lines 51-52, "PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping" and lines 310-311, "We found that transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in animal and cellular models." The authors have not excluded the possibility that the rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN. In the fibroblasts, only two genes, Per2 and Nr1d1, have been studied in the work, which cannot be simply expanded to the thousands of circadian controlled genes. Also amplitudes of PER2:LUC and NR1D1:LUC in the CKOs are much lower than those in WT and no evidence has been provided to show that their weak rhythms are biologically relevant.

      Minor points:

      Lines 66-67, "...(Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)." to "... (reviewed in Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)."

      Line 70, "...((Liu et al., 2008..." to "...(Liu et al., 2008..."

      Lines 174-175, "Considering recent reports that transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ...". Larrondo et al., 2015 paper says "however, in such ∆fwd-1 cells, the amount of FRQ still oscillated, the result of cyclic transcription of frq and reinitiation of FRQ synthesis." The point of the paper is "we unveiled an unexpected uncoupling between negative element half-life and circadian period determination." instead of "...transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ,"

      Lines 249-252, "CKO cells exhibit no rhythm in Per2 mRNA (Figure 3C, D), nor do they show a rhythm in global translational rate (Figure S4A, B), nor did we observe any interaction between BMAL1 and S6K/eIF4 as occurs in WT cells (Lipton et al, 2015) (Figure S4C)." In figures 3D and S3E, in CKO and CPKO cells the Per2:LUC data without fitting look better than that of Nr1d1:LUC. But the Nr1d1:LUC rhythm became clear after fitting the raw data. So to better visualize the low amplitude rhythm, if any, of Per2:LUC and compare with Nr1d1:LUC, fitted the Per2:LUC data in CKOs and CPKOs in Figure 3D and S3E should be shown as what has been done to Nr1d1:LUC.

      Lines 258-259, "much less than the half-life of luciferase expressed in fibroblasts under a constitutive promoter" In figure S4D, the y-axis of the PER2::LUC is ~800 while the y-axis of the SV40::LUC is ~600000. The over-expressed LUC by the SV40 promoter might saturate the degradation system in the cell so the comparison is not fair. A weaker promoter with the level similar to Per2 should be used to make the comparison.

      Line 430, "sigma" to "Sigma".

      In figure S2, the classification of rhythms in Drosophila is not clear since even the "Robustly rhythmic" ones have high background noise. Detrending or fitting the data might be able to improve the quality of the rhythms prior to classification.

      In figure S3B, the original blots for Per2 including Input and IP should be shown.

      Supplemental information

      Line 44, "...(reviewed in (Lakin-Thomas,..." to "...(reviewed in Lakin-Thomas,..."

      Line 188, "Period CDS", the full name of CDS should be provided the first time it appearances.

      Significance

      The work suggests a link between the TTFL and non-canonical oscillators, which should be interesting to the circadian field.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This interesting study by Putker et al. showed that circadian rhythmicity persists in several typical circadian assay systems lacking Cry, including Cry knockout mouse behavior and gene expression in Cry knockout fibroblasts. They further demonstrated weak but significant circadian rhythmicity in Cry- and Per- knockout cells. Cry- (and potentially Per-)-independent oscillations are temperature compensated, and CKId/e still has a role in the period regulation of Cry-independent oscillations.

      Major comments:

      1) The authors propose that the essential role of mammalian Cryptochrome is to bring the robust oscillation. As the authors analyze in many parts, the robustness of oscillation can be validated by the (relative) amplitude and phase/period variation, both of which should be affected significantly by the method for cell synchronization. Unfortunately, the method for synchronization is not adequately written in this version of supplementary information. This reviewer has no objection to the "iterative refinement of the synchronization protocol" but at least the correspondence between which methods were used in which experiments needs to be clearly explained. The detailed method may be found in the thesis of Dr. Wong, but the methods used in this manuscript need to be detailed within this manuscript.

      2) The authors revealed that CKO mice have apparent behavioral rhythmicity under the condition of LL>DD. This is an intriguing finding. However, it should be carefully evaluated whether this rhythmicity (16 hr cycle) is the direct consequence of circadian rhythmicity observed in CKO and CPKO cells (24 hr cycle) because the period length is much different. Is it possible to induce the 16 hr periodicity in CKO mice behavior by 16 hr-L:16 hr-D cycle? Would it be a plausible another possibility that the 16 hr rhythmicity is the mice version of internal desynchronization or another type of methamphetamine-induced-oscillation/food-entrainable-oscillattion?

      3) The authors proposed that CKId/e at least in part is the component of cytoscillator (Fig. 5D), and turnover control of PER (likely to be controlled by CKId/e) may be an interaction point between cytoscillator and canonical circadian TTFL (Fig. 4). Strictly speaking, this model is not directly supported by the experimental setting of the current manuscript. The contribution of CKId/e is evaluated in the presence of PER by monitoring the canonical TTFL output (i.e. PER2::LUC); thus it is not clear whether the kinase determines the period of cytoscillator. It would be valuable to ask whether the PF and CHIR have the period-lengthening effect on the Nrd1:LUC in the CPKO cell.

      Minor comments:

      4) The authors argue that the CKO cells' rhythmicity is entrained by the temperature cycle (Fig. 2C). Because the data of CKO cell only shows one peak after the release of constant temperature phase, it is difficult to conclude whether the cell is entrained or just respond to the final temperature shift.

      5) It would be useful for readers to provide information on the known phenotype of TIMELESS knockout flies; TIM is widely accepted as an essential component of the circadian clock in flies; are there any studies showing the presence of circadian rhythmicity in Tim-knockout flies (even if it is an oscillation seen in limited conditions, such as the neonatal SCN rhythm in mammalian Cry knockout)?

      5) Figure 3C shows that the amount of PER2::LUC mRNA changes ~2 fold between time = 0 hr and 24 hr in the CKO cell. This amplitude is similar to that observed in WT cell although the peak phase is different. Does the PER2::LUC mRNA level show the oscillation in CKO cells?

      6) Figure 3D: the authors discuss the amplitude and variation (whether the signal is noisier or not) of reporter luciferase expression between different cell lines. However, a huge difference in the luciferase signal can be observed even in the detrended bioluminescence plot. This reviewer concerns that some of the phenotypes of CKO and CPKO MEF reflect the lower transfection efficiency of the reporter gene, not the nature of circadian oscillators of these cell lines.

      Significance

      Although Cryptochrome (Cry) has been considered a central component of the mammalian circadian clock, several studies have shown that circadian rhythms are maintained in the absence of Cry, including in the neonate SCN and red blood cells. Thus, although the need for Cry as a circadian oscillator has been debated, its essential role as a circadian oscillator remains established, at least in the cell-autonomous clock driven by the TTFL. This study provides additional evidence that the circadian rhythmicity can persist in the absence of Cry.

      More general context, the presence of a non-TTFL circadian oscillator has been one of the major topics in the field of circadian clocks except for the cyanobacteria. In mammals, the authors' and other groups lead the finding of circadian oscillation in the absence of canonical TTFL by showing the redox cycle in red blood cells (O'Neil, Nature 2011). The presence of circadian oscillation in the absence of Bmal1 is also reported recently(Ray, Science 2020). Bmal1(-CLOCK), CRY, and PER compose the core mechanism of canonical circadian TTFL; thus, this manuscript put another layer of evidence for the non-TTFL circadian oscillation in mammals.

      Overall, the manuscript reports several surprising results that will receive considerable attention from the circadian community.

      This reviewer has expertise in the field of mammalian circadian clocks, including genomics, biochemistry, and mice's behavior analysis.

    1. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to Version 1 of the preprint: https://www.biorxiv.org/content/10.1101/856153v1

      Summary

      This is an original, focussed study that offers a new model to explain the neuronal “computation” that underlies insect navigation. The authors identify shortcomings in existing models – specifically, that they do not explain the entire range and flexibility of insect navigational capabilities – and integrate and build upon prior models to successfully fill these gaps. The integrated model is particularly valuable because it relates specific computational functions to specific anatomical structures, most notably the central complex and the mushroom body. It is an important addition to both the literature on the insect central complex, as well as to theoretical work on insect navigation. Many testable predictions can be made based on the presented models. The figures are well made and the writing is compact. Nevertheless, several points need to be addressed.

    1. Reviewer #3:

      Unlike other ionotropic glutamate receptors, GluD2 is not gated by glutamate. No specific or high-affinity chemical modulators that induce channel activity exist for this receptor--as such, it’s role as a functional channel has been questioned. To address this challenge, the authors have utilized a previously characterized photoswitchable tethered ligand (PTL) called MAGu to target a very non-specific blocker (pentamidine) to a new ion channel target (the GluD2 receptor). This approach (using this exact PTL) has been used to target knock-in cysteine mutants of the GABAA receptor in mouse brain slices and in vivo in an awake, behaving mouse. Based on this precedent, it is not unreasonable to believe that this tool could similarly be used for the GluD2 receptor (which would be a significant advance in the field for understanding the physiological role of this protein in disease), although the authors only characterized MAGu response against GluD2 in heterologous cell culture within this manuscript. Because the GluD2 receptor is not ligand-activated in the traditional sense, the authors have exploited a previously characterized constitutively open point mutant (L654T) as a background to test different photoactivatable GluD2 cysteine mutants and have nicely demonstrated a reversible current block response in the presence of purple (380 nm = "cis-" = channel "on") and green (535 nm = "trans-" = channel "off") light. The authors have numerous publications and experience in the photopharmacology of ion channels, and the characterization data here look solid.

      That said, there are a few questions that should potentially be addressed:

      1) How does MAGu work on the cysteine-engineered receptor that would presumably be used for future in vivo studies? Because the GluD2-I677C point mutant (lacking the L654T background) does not show current, the authors use the known effect of mGlu1 receptor agonism as a readout of GluD2-I677C activity in response to light and only see a 23% decrease in mGlu1 current - is this very small effect physiologically significant or to be expected? It seems like MAGu might be a useful tool to modulate GluD2 in Lurcher mice (which harbor the L654T mutation), but it is hard to know what the probe efficacy and usefulness is for evaluating the physiology of the WT GluD2 receptor in the absence of a way to measure a direct functional effect on the channel. How else might this be addressed?

      2) PTLs have been shown to generate a high local concentration of ligand to accelerate pharmacological response (and in this case, provide some level of specificity for a very non-specific, greasy cation), but it is hard to rationalize "absolute" pharmacological specificity claimed by the authors (line 35, 211). At the mid-micromolar concentrations required to elicit response, it seems unlikely that MAGu will not react with any other extracellular cysteines present in cells. Further, the guanidinium group by itself will certainly not direct the maleimide reactivity towards GluD2 over any other cation channel or electronegative protein surface. The language of this claim should be modified in the absence of other types of specificity assays.

      Minor Comments:

      1) Provide description of the step-by-step protocol for Fig. 2C (or label "washout" of pentamidine)

      2) Why does normalized current plateau at 80% for 535 nm (Fig. 4B)?

      3) There is a current biorxiv paper reporting the GluD2 structure. https://www.biorxiv.org/content/10.1101/2020.01.10.902072v2.full.pdf If this is published during the course of this review, it would be interesting for the authors to comment on how this compares to their homology model and if it makes sense with respect to their mutagenesis experiments.

    2. Reviewer #2:

      The present manuscript investigates the development of a photo-activatable pore blocker to block the glutamate receptor delta receptor (GluD) ion channel as a potential tool to study this receptor in vitro and in vivo. GluD shares structural homology to other members of the family and plays key roles in synapse formation and signaling. However, in contrast to other members of the family, it does not have a clear ionotropic function - complicating defining how it contributes to synaptic function in vivo. Many labs have studied GluD and have provided key insights into its function and role. Still, the availability of new a tool to study and clarify its function has high potential.

      The manuscript lays out quite well, with some minor quibbles (see below), the issues. Proper controls are carried out to define the specificity of the action of the photo-switchable MAGu and how it can alter membrane currents and how it might work. The potential for a photo-switchable pore blocker to study the role of the ion channel in GluD is extremely high. I do have some concerns about signal-to-noise, since the pore block by trans-MAGu is only a fraction of total presumed current through GluD. In addition, how to introduce a specific cysteine in vivo will not be trivial. Still, overall this is an important manuscript that introduces an interesting strategy to study and further clarify GluD in the brain.

      1) Abstract/Introduction. It would be helpful to define early and explicitly what the photoswitchable functional strategy is - that it is working via a pore block mechanism. In the Abstract, for example, instead of calling it '...a photoswitchable ligand.' how about just '...a photoswitchable pore blocker." Once I realized the general functional strategy (at the beginning of description of results, where it was explicitly stated), everything became clearer. The functional strategy, that you are generating a photoswitchable pore blocker, should also be explicitly stated in the Introduction, where right now it is touched on but not explicitly stated.

      2) Figure 2C. The extent of block for photoswitching is being quantified relative to that for pentamidine, which is reasonable. However, for pentamidine, what is the concentration used for the experiments? Where is it at on the concentration-block curve for pentamidine? Presumably, if a complete block the leak current should go to zero and hence the efficacy of the photoswitching blocker would be less (e.g., Figure 4B). Please clarify.

      3) Figure 4A. Would be nice to see difference currents and perhaps to contrast to what is shown in Figure 2A. This would clarify the 'voltage-independence' of action for those unfamiliar.

      4) Figure 4D. Not clear how the 'ion channel' or red/green pore was generated? Is this from the structure or from some modeling? Please add details. This is an interesting figure but it is also somewhat speculative, I think, but needs more details to understand its basis. One question is what is driving the positioning of the trans MAGu? Is it being fixed? And what is driving the change in the coloration - presumed pore blocking by trans MAGu?

      Minor Comments:

      1) Figure 1. Minor point. Technically, there is no transmembrane segment 2 (TM2) in iGluRs. M2 is a pore loop, like the P loop in K+ channels, and enters and exits on the same side of the membrane - and does not span the membrane (and hence not a transmembrane segment). Simple solution would be to just rename TM2 to M2 leaving TM1, TM3, and TM4 as is and just noting somewhere in Figure legend that M2 is a non-membrane spanning pore loop.

      2) Figure 2D. Minor point. Although I understand the intent of figure, it is Very hard to discern what is being shown. Might be helpful to remove the 'red' subunit?

    3. Reviewer #1:

      The study by Lemoine and colleagues demonstrates a novel chemogenetic tool to probe ion channel function of GluD2 in HEK cells. By introducing cysteine mutations and engineering a photoswitchable ligand, ionic current carried by constitutively-open GluD2 mutant channels was reversibly decreased by light. Further, GluD2 current produced by activation of mGluRs was partially reduced by light. This tool has the potential to be very powerful to advance the understanding of GluD2 channel function in neurons.

      Major:

      1) The introduction and abstract are rather general and antiquated, to the disservice of the readers. It may be time to move away from the notion that the ion channel function of GluD is debated. The authors have published many elegant studies demonstrating ion channel function. By appearances of the literature, the interpretation of these studies are not contested. In addition to pharmacology, ion channel function of GluD has been demonstrated using selective genetic strategies (e.g. Ady et al., 2013; Benamer et al., 2018; Gantz et al., 2020). To this end, lines 28-29, 51, 55, & 73-75 should be changed. It does not seem fitting to state "direct evidence for ionotropic activity of GluD in neuronal setting [sic] is lacking" provided the studies referenced above. Broadly, the readers would benefit from restructuring of the introduction and abstract to state the specific issue addressed by the present study (i.e. the lack of specific antagonists/pore blockers to study GluD without affecting other iGluRs) and highlight the potential application of the ligand.

      2) This photoswitchable ligand MAGu has great potential to probe GluD channel function in neurons, although the present study stops short of demonstrating its utility in neurons. Lines 211-212 state that the WT receptor is insensitive to MAGu, but it is not clear where those data are presented. It would be beneficial to show the magnitude of the DHPG-induced current in WT GluD2-expressing cells before and after addition of MAGu to address the possibility that MAGu affects the current irrespective of trans- or cis- conformation. It is also not clear how MAGu will be selective for site-specific conjugation when introduced in a neuronal setting. Is it expected MAGu will react with any available cysteine? It would be helpful to discuss possible limitations going forward towards use in neurons.

      3) The data show convincingly that 380 nm light unblocks MAGu-induced GluD2 block by darkness or 535 nm light. But it is not clear how trans-MAGu affected leak current from GluD2 Lurcher mutant channels. In Figure 2C I677C, there is still substantial leak in 535 nm. The quantification in Figure 2C (% photoswitching) shows the % of I-Blockphoto over I-Blockpenta, but the arrows in the right-hand trace, it would appear I-Blockphoto is actually the current unblocked. It would be helpful to quantify the amount of leak current blocked by trans-MAGu. Additional discussion as the structural basis for incomplete block may also be helpful.

      Minor:

      1) Recommendation to include model system in the title ("in expression systems" or "in HEK cells", vel sim)

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript was assessed by three reviewers. After the completion of their reviews, the editor and reviewers discussed the paper and arrived at the following consensus review. For transparency, the individual reviews are also presented.

      Summary:

      Unlike other ionotropic glutamate receptors, GluD2 is not gated by glutamate. No specific or high-affinity chemical modulators that induce channel activity exist for this receptor. To address this challenge, the authors used a previously characterized photoswitchable tethered ligand (PTL) called MAGu to target a very non-specific blocker (pentamidine) to a new ion channel target (the GluD2 receptor). This approach (using this exact PTL) has been used to target knock-in cysteine mutants of the GABAA receptor in mouse brain slices and in vivo in an awake, behaving mouse. Based on this precedent, it is not unreasonable to believe that this tool could similarly be used for the GluD2 receptor, which would be a significant advance in the field for understanding the physiological role of this protein in disease.

      The original reviews, below, reflect the reviewers’ initial enthusiasm for the potential of the approach to study GluD2 channels. In the discussion, all reviewers agreed that the issue of signal-to-noise is critical and that additional experiments are essential to demonstrate that the MAGu response will be sufficient for physiological studies in vivo.

    1. Reviewer #2:

      I very much like the general idea of this paper, but my opinion is that this is not an idea that can/should be applied to these data. As elaborated below, the ABIDE data are from numerous sites with different scanners, imaging acquisition sequences and parameters, sample ascertainment, etc, The methods used in the current paper rely on there not being such heterogeneity; and its presence can either render true ASD-related deviance invisible, or create an illusion of ASD-related deviance where there is none. Such heterogeneity is, of course, problematic for more conventional approaches; but is far more problematic for the methods proposed here.

      Major Issues and Questions:

      1) The authors are critical of case-control models but do not present an alternative to dealing with the heterogeneity in the data. Indeed, linear models are inadequate to deal with the heterogeneity in the ABIDE data given the lack of overlap in the data for different sites. But the normative approach presented here seems to not deal with the problem at all, potentially transforming what would be taken out by a nuisance variable into an alteration in ASD-related deviance.

      2) The sparsity of the data beyond childhood is extremely problematic for this approach. The approach of taking data in one-year bins requires large amounts of data within each bin to make the means and standard deviations reliable. By the teenage years, this is clearly not the case. The authors limit age bins to having at least 3 control points; this is clearly wildly insufficient, and would be even if there were no issues with site heterogeneity. Conventional linear models are to be preferred to normative models under these conditions.

      3) The comparison of results from a case-control model versus a normative model seems misleading. A case-control model approach requires a specification of the age at which the comparison is made. This is not provided, leading one to suspect that the age data were not centered, but were absolute, and thus the differences were essentially projecting backwards to birth. (This is, I believe, a common mistake.) The model specification is also completely lacking. Moreover, a case-control approach does not preclude the possibility of centering the data at different ages (as in e.g. Khundrakpam et al. (2017)). Between this and the problems with heterogeneity for the normative models, it is unclear how to interpret these results.

      4) The idea that individuals that are more than 2 stddevs away from the mean of the controls are outliers and should be eliminated from the analysis seems mistaken. If all individuals with ASD are substantially far from the mean of the controls, they are clearly not to be treated as outliers.

      5) The impact statement claims that "normative modelling has the potential to isolate specific highly deviant subsets of individuals with ASD, which will have implications for understanding the underlying mechanisms and bring clinical impact closer"; there is no indication that that is the case. The normative model has identified primarily children, and has identified nothing in particular about those children. Case-control models have done the same.

      6) It appears to this reviewer that this paper outlines an approach which could be worthwhile in a data set without massive heterogeneity, but within the context of the ABIDE data actually seems harmful.

    2. Reviewer #1:

      This paper describes the impact of outliers in normative cortical thickness (CT) measurements when examining those suffering from autism spectrum disorder (ASD). The authors used the ABIDE sample and binned subjects by age, and assessed outliers as a function of a "w-score" which they estimated across CT parcellations across the entire cortex. They then demonstrate that cortical thickness differences that can ascribed to ASD can essentially be attributed to a small number of outliers within the sample. They also demonstrate that this w-score may be sensitive to clinical variables as well.

      Overall, it is unclear to me what the exact goal of the work is: To describe the anatomy of ASD better? To subtype? Or is there another "take-home" message of this paper? I would imagine that the case-control differences in most neurodevelopmental disorders with high heterogeneity and high variability would demonstrate a similar kind of trend. And thus, at the end of the day, I am not sure how much this technique advanced our understanding of ASD.

      Issues and Questions:

      1) It is unclear from the methods how the authors deal with motion and image quality. Recent work by Pardoe and Bedford demonstrate the importance of dealing with this issue, particularly in the context of the ABIDE sample. This would likely have a significant impact on any of the results. It's unclear if the use of the Euler index at the extremes of the distribution of the dataset being used is sufficient. How did the authors come up with their Euler number cut-off?

      2) The W-score could use a much better explanation. It is not clear to me as to what it is and how this should be interpreted. The lack of information regarding the number of age-bins used also makes interpreting these findings confusing in my mind.

      3) The authors report that, "The median number of brain regions per subject with a significant p-value was 1 (out of 308), indicating that the w-score provides a robust measure of atypicality." I guess this could be true, but given the variation in normative ageing and development, I suspect this would also be true of a large number of TD children. That being the case, would it be worth doing a permutation test to determine the threshold of how man "atypical" areas one could expect by chance?

      4) The authors note "Unfortunately, despite a significant female subgroup, the age-wise binning greatly reduced the number of bins with enough data-points in the female group." I understand that this could indeed be a problem. However, I think it would be good for the authors to provide more details. Potentially a histogram to demonstrate the issue. My feeling is that with sex difference with respect to ASD, the more information that could be provided the better. Overall, it is unclear to me as to how useful a sex-specific analysis may be in this particular context given the sample sizes available in ABIDE.

      5) Results, page 8: "Because we also had computed w-scores from our normative age-modelling approach, we identified specific 'statistical outlier' patients for each individual region with w-scores > 2 standard deviations from typical norms and excluded them from the case-control analysis."

      I'm not sure I agree with the premise of this statement. First, it is hard to know without seeing all of the data, but based on Fig 1, it seems that there are ASD individuals that fall on both sides of this distribution. So if there are effect sizes that can be gleaned, this would be in spite of the variability. Second, it would be paramount to determine how many people are outliers-by-region. This, in and of itself, would be useful information. If a significant proportion of individuals can be identified as outliers, this suggests that variability is the norm rather than an exception. I'm skeptical as to whether you get interesting information from removing these individuals from analyses.

      6) Result, page 9: "While the normative modelling approach can be sensitive to different pathology." I don't think you're capturing anything interesting about pathology with this method, especially as it pertains to CT values.

      7) Result, page 9-10: I'm still confused by this notion of atypicality. Presumably this suggests that 5-10% of all ASDs are more than 2SDs from a normative distribution. But is this at both tails of the distribution? There are significant interpretational issues with this. thus, it is imperative on the authors to do a better job of describing these distributions.

      8) Part of the rationale of this paper is that using the w-score is far more robust than using simple CT values. I'm sure that residualized CT values could have been used for any of these analyses. If that were to be done how would this change the results?

      Minor comments and suggestions on presentation:

      1) While this paper has some merits, I found it hard to read. There is not a clear delineation between the methods and the results, and some methodological considerations are written into the results section and vice-versa.

      2) In the introduction, the authors use the word "deviance" to describe what appears more to me like age-related variation and heterogeneity in ASD. Deviance may be too strong a term and easily mis-interpretable. I would suggest replacing it with something a bit more like variation. Also, the work at the institution of the main author (for example by Baron-Cohen and authors) really champions the use of terms like "neurotypical" rather normally developing. I think, in general, the authors may want to take their cues from this type of language.

      3) This passage in the Introduction need of references. The work by Hong (in Boris Bernhardt's group), Bedford (in Mallar Chakravarty's group), Schuetze (in Signe Bray's group), and Meng-Chuan Lai all come to mind.

      "Even within mesoscopic levels of analysis such as examining brain endophenotypes, heterogeneity is the rule rather than the exception (Ecker, 2017). At the level of structural brain variation, neuroimaging studies have identified various neuroanatomical features that might help identify individuals with autism or reveal elements of a common underlying biology (Ecker, 2017). However, the vast neuroimaging literature is also considerably inconsistent, with reports of hypo- or hyper-connectivity, cortical thinning versus increased grey or white matter, brain overgrowth, arrested growth, etc., leaving stunted progress towards understanding mechanisms driving cortical pathophysiology in ASD."

      4) I found the Discussion missed the mark. It was mostly written as a rehash of the results, with no real biological interpretation. There is not a sufficient examination of the relationship of these findings to other important papers (Kundrakpham, Bedford, Hong, Ecker, Hyde, Lange, etc...).

      5) Figure 3 - The colour bars should be labelled.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to Version 4 of the preprint: https://www.biorxiv.org/content/10.1101/252593v4

      Summary:

      This paper uses data from the Autism Brain Imaging Data Exchange (ABIDE) to model the relationship between cortical thickness in different brain regions and patients with autism spectrum disorders (ASD) compared to neurotypical controls. The reviewers appreciated the goals and approach of this paper, but, as described below, had questions about the suitability of the data for this analysis, the ways in which the data were processed, the way in which the results were interpreted, and the significance of these findings for understanding autism spectrum disorders.

    1. Reviewer #3:

      The methods used by the authors seem like potentially really useful tools for research on neural activity related to sequences of stimuli. We were excited to see that a new toolbox might be available for these sorts of problems, which are widespread. The authors touch on a number of interesting scenarios and raise relevant issues related to cross-validation and inference of statistical significance. However, given (1) the paucity of code that they've posted, and its specificity to specific exact data and (2) the large literature on latent variable models combined with surrogate data for significance testing, I would hesitate to call TDLM a "framework". Moreover, in trying to present it in this generic way, the authors have made it more difficult to understand exactly what they are doing.

      Overall: This paper presents a novel approach for detecting sequential patterns in neural data however it needs more context. What's the contribution overall? How and why is this analysis technique better than say Bayesian template matching? Why is it so difficult to understand the details of the method?

      Major Concerns:

      The first and most important problem with this paper is that it is intended (it appears) to be a more detailed and enhanced retelling of the author's 2019 Cell paper. If this is the case, then it's important that it also be clearer and easier to read and understand than that one was. The authors should follow the normal tradition in computational papers:

      Present a clear and thorough explanation of one use of the method (i.e., MEG observations with discrete stimuli), then present the next approach (i.e., sequences?) with all the details necessary to understand it.

      The authors should start each section with a mathematical explanation of the X's - the equation(s) that describes how they are derived from specific data. Much of the discussion of cross validation actually refers to this mapping.

      Equation 5 also needs a clearer explanation - it would be better to write it as a sum of matrices (because that is clearer) than with the strange "vec" notation. And TAUTO, TF and TR should be described properly - TAUTO is "the identity matrix", TF and TR are "shift matrices, with ones on the first upper and lower off diagonals".

      The cross validation schemes need a clear description. Preferably using something like a LaTeX "algorithm" box so that they are precisely explained.

      Recognizing the need to balance readability for a general reader and interest, perhaps the details could be given for the first few problems, and then for subsequent results, the detail could go into a Methods section. Alternatively, the methods section could be done away with (though some things, such as the MEG data acquisition methods are reasonably in the methods).

      Usually, we think about latent variable model problems from a generative perspective. The approach taken in this paper seems to be similar to a Kalman filter with a multinomial observation (which would be equivalent to the logistic regression?), but it's unclear. Making the connection to the extensive literature on dynamical latent variable models would be helpful.

      Minor concerns:

      1) Many of the figures, and some of the text are from the 2019 Cell paper. The methods text is copied verbatim without citation.

      2) The TLDM model is presented without context or comparison to other computational approaches employed to identify sequences. Is it also used in the 2016 Kurth-Nelson paper? How does it compare, e.g., to Bayesian template matching (in the case of hippocampal data)?

      3) Cite literature from recent systems neuroscience using hidden Markov models and related discrete state space approaches on neural activity.

      4) How does this method deal with a long sequence for which the intra-sequences have variance in their delta t's? Or data where the observations have some temporal lag relative to each other?

      5) In the "sequences of sequences" section, the authors talk about combining states into meta states. But then the example they give, it appears they just use their vanilla approach. This whole section belongs in a different place than a "supplemental note". The data need proper attribution, an IACUC/ethics statement, etc.

      6) While code can be useful, it is not archival in the same way equations are. Supplementary Note 1 should be in the Methods, and needs to be rewritten in such a way that it explains the steps (i.e., in an algorithm box) rather than just using code. Moreover, when the data generated via this code is used in the text, this section in the methods can be mentioned/linked.

    2. Reviewer #2:

      This paper addresses the important overall issue of how to detect and quantify sequential structure in neural activity. Such sequences have been studied in the rodent hippocampus for decades, but it has recently become possible to detect them in human MEG (and perhaps even fMRI) data, generating much current excitement and promise in bringing together these fields.

      In this paper, the authors examine and develop in more detail the method previously published in their groundbreaking MEG paper (Liu et al. 2019). The authors demonstrate that by aiming their method at the level of decoded neural data (rather than the sensor-level data) it can be applied to a wide range of data types and settings, such as rodent ephys data, stimulating cross-fertilization. This generality is a strength and distinguishes this work from the typically ad hoc (study-specific) methods that are the norm; this paper could be a first step towards a more domain-general sequence detection method. A further strength is that the general linear modeling framework lends itself well to regressing out potential confounds such as autocorrelations, as the authors show.

      However, our enthusiasm for the paper is limited by several overall issues:

      1) It seems a major claim is that the current method is somehow superior to other methods (e.g. from the abstract: "designed to take care of confounds" implying that other methods do not do this, and "maximize sequence detection ability" implying that other methods are less effective at detection). But there is very little actual comparison with other methods made to substantiate this claim, particularly for sequences of more than two states which have been extensively used in the rodent replay literature (see Tingley and Peyrache, Proc Royal Soc B 2020 for a recent review of the rodent methods; different shuffling procedures are applied to identify sequenceness, see e.g. Farooq et al. Neuron 2019 and Foster, Ann Rev Neurosci 2017). The authors should compare their method to some others in order to support these claims, or at a minimum discuss how their method relates to/improves upon the state of the art.

      2) The scope or generality of the proposed method should be made more explicit in a number of ways. First, it seems the major example is from MEG data with a small number of discrete states; how does the method handle continuous variables and larger state spaces? (The rodent ephys example could potentially address this but not enough detail was provided to understand what was done; see specific comments below.) Second, it appears this method describes sequenceness for a large chunk of data, but cannot tell whether an individual event (such as a hippocampal sharp wave-ripple and associated spiking) forms a sequence not. Third, there is some inconsistency in the terminology regarding scope: are the authors aiming to detect any kind of temporal structure in neural activity (first sentence of "Overview of TDLM" section) which would include oscillations, or only sequences? These are not fatal issues but should be clearly delineated.

      3) The inference part of the work is potentially very valuable because this is an area that has been well studied in GLM/multiple regression type problems. However, the authors limit themselves to asking "first-order" sequence questions (i.e. whether observed sequenceness is different from random) when key questions -- including whether or not there is evidence of replay -- are actually "second-order" questions because they require a comparison of sequenceness across two conditions (e.g. pre-task and post-task; I'm borrowing this terminology from van der Meer et al. Proc Royal Soc B 2020). The authors should address how to make this kind of comparison using their method.

      Minor Comments:

      1) Some discussion of grounding the question of what is considered a sequence should be included. What may look like a confound to a modeler may or may not be impacting downstream readout neurons; without access to a neural readout it is not a priori clear what our statistical methods "should" be detecting.

      2) The abstract emphasizes hippocampal replay, but no actual analysis of this is done. I don't think performing such analysis is necessary (although it would be a good way to compare this method to others) but the two should be more aligned.

      3) In the "Statistical Inference" section, the authors stated "Permuting time destroys the temporal smoothness of neural data, creating an artificially narrow null distribution...". Did the authors try shift shuffles, which shifts the time dimension of each row rather than randomly permuting it, hence breaking the relationship between variables but keeping their autocorrelation?

      4) In the "Regularization" section, it is hard to tell how L1 outperforms L2 in terms of detecting sequenceness without benchmarking them with ground truth. Are the authors doing this by quantifying decoding performance on withheld task data? Van der Meer et al. Hippocampus 2017 examine this issue for hippocampal place cell data.

      5) As a rodent ephys person I was excited about the application to hippocampal place cell data, but I couldn't understand Figure 5d and the associated supplementary description. In order for me to evaluate this component of the ms, substantially more explanation is needed on how the data is preprocessed and arranged, and what the analysis pipeline looks like. For instance, Is the left plot in Fig. 5d an average of all pairwise sequences (each decoded location with its neighbors)? And the right plot is the timescale at which this sequence repeats? If so, the repeat frequency should be at rat theta frequency or a little faster (because of phase precession) so I would expect 9 or 10 Hz max -- surprised to see what looks like 12 Hz? In the Supplementary note, I found the discussion about running direction distracting, wouldn't it be simpler and easier to understand to analyze only one direction to start? Also, please clarify if the sequence algorithm was run on the raw decoded probabilities, or on the maximum a posteriori (MAP) locations. What happens if there are no spikes in a given time bin (likely to happen with a small 10 ms window) and were putative interneurons excluded (they should be)? Finally, the authors should note that theta sequences can arise from independent spiking of phase precessing neurons (Chadwick et al. eLife 2015) which seems exactly the kind of issue that the multiple regression framework of TDLM should be able to elucidate; what covariates could be added into the model to test Chadwick et al's claim?

    3. Reviewer #1:

      This paper describes temporal delayed linear modelling (TDLM), a method for detecting sequential replay during awake rest periods in human neuroimaging data. The method involves first training a classifier to decode states from labeled data, then building linear models that quantify the extent to which one state predicts the next expected state at particular lags, and finally assessing reliability by running the analysis with permuted labels.

      This method has already been fruitfully used in prior empirical papers by the authors, and this paper serves to present the details of the method and code such that others may make use of it. Based on existing findings, the method seems extremely promising, with potential for widespread interest and adoption in the human neuroimaging community. The paper would benefit, however, from more discussion of the scope of the applicability of the method and its relationship to methods already available in the rodent and (to a lesser extent) human literature.

      1) TDLM is presented as a general tool for detecting replay, with special utility for noninvasive human neuroimaging modalities. The method is tested mainly on MEG data, with one additional demonstration in rodent electrophysiology. Should researchers expect to be able to apply the method directly to EEG or fMRI data? If not, what considerations or modifications would be involved?

      2) How does the method relate to the state of the art methods for detecting replay in electrophysiology data? What precludes using those methods in MEG data or other noninvasive modalities? And conversely, do the authors believe animal replay researchers would benefit from adopting the proposed method?

      3) It would be useful for the authors to comment on the applicability of the method to sleep data, especially as rodent replay decoding methods are routinely used during both awake rest and sleep.

      4) How does the method relate to the Wittkuhn & Schuck fMRI replay detection method? What might be the advantages and disadvantages of each?

      5) The authors make the point that spatial correlation as well as anti-correlation between state patterns reduces the ability to detect sequences. The x axis for Fig 3c begins at zero, demonstrating that lower positive correlation is better than higher positive correlation. Given the common practice of building one classifier to decode multiple states (as opposed to a separate classifier for each state), it would be very useful to provide a demonstration that the relationship in Fig 3c flips (more correlation is better for sequenceness) when spatial correlations are in the negative range.

      6) In the Results, the authors specify using a single time point for spatial patterns, which would seem to be a potentially very noisy estimate. In the Methods, they explain that the data were downsampled from 600 to 100 Hz to improve SNR. It seems likely that downsampling or some other method of increasing SNR will be important for the use of single time point estimates. It would be useful for the authors to comment on this and provide recommendations in the Results section.

      7) While the demonstration that the method works for detecting theta sequences in navigating rodents is very useful, the paper is missing the more basic demonstration that it works for simple replay during awake rest in rodents. This would be important to include to the extent that the authors believe the method will be of use in comparing replay between species.

      8) The authors explain that they "had one condition where we measured resting activity before the subjects saw any stimuli. Therefore, by definition these stimuli could not replay, but we can use the classifiers from these stimuli (measured later) to test the false positive performance of statistical tests on replay." My understanding of the rodent preplay literature is that you might indeed expect meaningful "replay" prior to stimulus exposure, as existing sequential dynamics may be co-opted to represent subsequent stimulus sequences. It may therefore be tricky to assume no sequenceness prior to stimulus exposure.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The reviewers all felt that the work is extremely valuable: a domain-general replay detection method would be of wide interest and utility. However, as it stands, the paper is lacking context and comparisons to existing methods. Most importantly, the paper would have a larger impact if comparisons with standard replay methods were included. The paper would also benefit from additional detail in the description of the methods and data.

    1. Reviewer #3:

      This study by Kiss and colleagues reports the findings of proximity biotinylation experiments for the discovery of novel RAB18 effectors. The authors perform careful proteomic analysis that appears well-controlled and successful in recapitulating known interactions. That small GTPase interactions can be identified with this approach has been previously demonstrated, though the application of this approach to RAB18 is novel and of interest to the field. A number of intriguing findings with potentially important implications are reported. However, this manuscript has several weaknesses.

      Major concerns and questions:

      1) As the authors report, proximity biotinylation may not reflect direct protein-protein interactions but simply colocalization of bait and prey proteins. A true protein-protein interaction ideally would be further supported by ancillary experiments such as in vitro binding or co-immunoprecipitation, including an assessment of whether the interaction is affected by the GTP- or GDP-bound state. While co-IP in WT and GEF-deficient cells was performed for 1 candidate interactor (TMC04, Figure 6C), protein-protein interactions were not tested for the other 2, with the latter relying on either repeat BioID (SPG20, Figure 3A) or reciprocal BioID (SEC22A, Figure 5B).

      2) Putative RAB18 interactions may be affected by the BioID fusion itself or by heterologous expression. While it is reassuring that known interactors were detected with this approach, the conclusions would be better supported by testing the localization of the fusion protein in comparison to endogenous RAB18, and/or by rescue of a phenotype associated with RAB18-deficiency.

      3) Conclusions about the dependence of RAB18 interactions on its GTP or GDP-bound state rely on differences observed in cells with deficiency of RAB18 GEFs. It is certainly possible however that RAB3GAP may serve as a GEF for other GTPases, or have other functions, that cause the observed differences in labeling. The conclusions would be strengthened by additional experiments showing a direct effect - e.g. reproducing the disrupted labeling of candidate effectors with a GDP-locked RAB18 point mutant, or showing that RAB3GAP deficiency reduces binding of a candidate effector to RAB18.

      4) The putative role of SEC22A in regulating lipid droplet morphology relies on siRNA perturbations that are prone to off-target effects. This is especially concerning given the high degree of sequence similarity between SEC22A and SEC22B, the latter of which has a known role in regulating LD morphology. Rescue of this phenotype with a siRNA-resistant SEC22A cDNA would rule out this possibility.

      5) The finding of SPG20 protein abundance being affected by RAB18-deficiency relies on immunofluorescence with an antibody exhibiting cross-reactivity. While the authors do attempt to adjust for this non-specific background fluorescence, this conclusion would be strengthened by immunoblotting for a change in abundance of the specific band corresponding to SPG20. If confirmed, measurement of SPG20 transcripts levels would also help clarify the level of regulation for the altered protein abundance.

      6) The influence of stable expression of a RAB18 GTP-locked point mutant on cholesterol metabolism is intriguing but the experimentation appears perfunctory. For 14C-CE cellular levels in 14C-oleate-loaded cells (Figure 7A), the most striking difference is the greatly enhanced synthesis level of CE at t=0. Is the subsequent drop due to an effect of RAB18 on efflux, or simply a consequence of the higher starting level at t=0? For efflux assays on 3H-cholesterol-loaded cells (Figure 7B), the data is only presented as a ratio of 3H activity in media relative to lysates after a 5 hr incubation with HDL. Interpretation of these results would be aided by a more detailed analysis. How does 3H-cholesterol uptake compare after 24 hr incubation but prior to addition of HDL (t=0)? After the 5 hr HDL chase, are the differences in the ratio driven by an increase in extracellular activity, a decrease in intracellular activity, or both? Ultimately these conclusions would be better supported by a more detailed analysis. Does disruption of the candidate effectors phenocopy the effect of RAB18 disruption? Are any known mediators of cholesterol efflux affected by RAB18 disruption? While a comprehensive mechanism may be reasonably considered beyond the scope of this paper, some additional descriptive analysis would be useful in interpreting these findings.

    2. Reviewer #2:

      This study used WT and mutant RAB18 to look for interacting proteins in normal and GEF-deficient cells. A catalog of interactions that are regulated by nucleotide binding and/or GEF activity were uncovered. Among identified proteins, there are known/established ones and there are some new ones. Initial validation was carried out for some newly identified effectors such as TMCO4 and Sec22A.

      Major concerns and questions:

      1) While the addition of new RAB18 effectors is useful to researchers who are interested in RAB18, the overall conclusion that RAB18 may regulate membrane contacts and lipid metabolism is not new.

      2) Figure 7: the effect of RAB18 on cholesterol esterification and efflux may arise from multiple causes. This set of experiments do not provide any real insights into RAB18's role in cholesterol metabolism.

      3) Given RAB18's interaction with ORP2, TMEM24 and OCRL, perhaps the authors may examine plasma membrane PIP2. The results would be more specific and novel.

    3. Reviewer #1:

      This manuscript used proximity biotinylation to discriminate functional RAB18 interactions. The authors provide some evidence for several of the interactors and some functional data supporting a role for RAB18 in modulating cholesterol mobilization.

      Major concerns and questions:

      1) Based on the spectral counts, the author calculated a mutant:WT ratios as a readout to identify nucleotide-binding-dependent effectors. But it is important to show that WT protein and mutant protein have similar expression level to begin with. And the intracellular localization of the mutation and WT should also be determined. Do they show the similar intracellular localization?

      2) The ratio of mutation:WT is useful to remove some background. But this may omit some very highly interacting proteins just because their fold change is low. The converse is true for rare proteins. It would be better to have a list of candidate effectors based on the absolute counts.

      3) Sec22A knockdown will change the morphology of lipid droplets. A knockdown efficiency test and some representative fluorescence images here would make this data more compelling.

      4) Same comment for the cholesterol mobilization experiment. Expression level of the protein is needed. Figure 7A is rather confusing, as the Gln67Leu mutation already has higher CE before loading HDL. Why is this this? Better uptake or reduced efflux? What is the de novo cholesterol synthesis activity in this cell line?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to Version 1 of the preprint: https://www.biorxiv.org/content/10.1101/871517v1

      Summary:

      As a possible path to better understand and develop treatments for Warburg Micro Syndrome (WMS), the authors have investigated the networks of protein-protein interactions involving genes mutated in this rare genetic disease. The goals of the work are to identify new proteins involved in the pathophysiology of the disease and to better understand the molecular and cellular effects of disease-causing mutations. The data will likely be of interest to researchers studying WMS and RAB18, the protein focused on here, but reviewers expressed some concerns about the validation and interpretation of the presented protein interaction data.

    1. Reviewer #3:

      This manuscript describes measurements of neuronal activity in mice performing a discrimination task, and a new model that links these data to psychophysical performance. The key element of the new model is that sensory neurons are subject to gain modulations that evolve during each trial. They show that the model can produce pure sensory integration, Weber-Fechner performance, or intermediate states that nicely replicate the behavioral observations. This is an interesting and valuable contribution.

      My only significant comment relates to the discussion, which should do more to make sure the reader understands how very different the sensory representation is in this study compared with the great majority of earlier related work in the primate:

      First, choice related signals are not systematically related to stimulus preferences (no Choice Probability). This is mentioned, but only very briefly.

      Second, there appears to be no relationship between stimulus preference (visual field in this case) and noise correlation. Unfortunately, this emerges from the model fits, not an analysis of data. But is an important difference with profound implications for how the coding of information is organized. It really needs a discussion. It should also be supported by an analysis of correlations in the data. I know some people argue that 2 photon measures make this difficult, but if that's true then surely they can’t be used to support a model in which correlations are a key component.

    2. Reviewer #2:

      In this manuscript, the authors present an in-depth analysis of the properties of sensory responses in several visual areas during performance of an evidence-accumulation task for head-fixed running mice (developed and studied by the authors previously), and of how these properties can illuminate aspects of the performance of mice and rats during pulsatile evidence accumulation, with a focus on the effect of "overall stimulus strength" on discriminability (Weber-Fechner scaling).

      The manuscript is very dense and presents many findings, but the most salient ones are a description of how the variability in the large Ca++ transients evoked by the behaviourally-relevant visual stimuli (towers) are related to several low-level behavioural variables (speed, view) and also variables relevant for the task (future choice, running count of accumulated evidence), and a framework based on multiplicative-top down feedback that seeks to explain some aspects of this variability and ultimately the psychophysical performance in the accumulating-towers task. The first topic is framed in the context of the literature on choice-probability, and the second in the context of "Weber-Fechner" scaling, which in the current task would imply constant performance for given ratios of Left/Right counts as their total number is varied.

      Overall, the demonstration of how trial to trial variability is informative about various relevant variables is important and convincing, and the model with multiplicative feedback is elegant, novel, naturally motivated by the neural data, and an interesting addition to a topic with a long-history.

      Main Comments

      1) Non-integrable variability. In addition to 'sensory noise' (independent variability in the magnitude of each pulse), it is critical in the model to include a source of variability whose impact does not decay through temporal averaging (to recover Weber-Fechner asymptotically for large N). This is achieved in the model by positing trial-to-trial variability (but not within-trial) in the dot product of the feedforward (w) and feedback (u) directions. But the way this is done seems to me problematic:

      The authors model variability in wu as LogNormal (pp42 middle). First, the justification for this choice is incorrect as far as I can tell. The authors write: "We model m_R with a lognormal distribution, which is the limiting case of a product of many positive random variables". But neither is the dot product of w and u a product (it's a sum of many products), nor are the elements of this sum positive variables (the vector u has near zero mean and both positive and negative elements allowing different neurons to have opposite preferences on choice - see e.g., fifth line from the end in pp15 where it is stated that u_i<0 for some cells), nor would it have a LogNormal distribution even if the elements of the sum were indeed positive. Without further assumptions, the dot product wu will have a normal distribution with mean and variance dependent on the (chosen) statistics of u and w.

      Two conditions seem to be necessary for uw: it should have a mean positive but close to zero (if it's too large a(t) will explode), and it should have enough variability to make non-integrable noise have an impact in practice. For a normal distribution, this would imply that for approximately half of the trials, wu would need to be negative, meaning a decaying accumulator and effectively no feedback. This does not seem like a sensible strategy that the brain would use.

      The authors should clarify how this LogNormality is justified and whether it is a critical modelling choice (as an aside, although LogNormality in u*w allows non-negativity, low mean and large variability, the fact that it has very long tails sometimes leads to instability in the values of a(t)).

      2) Related to this point, it would be helpful to have more clarity on exactly what is being assumed about the feedback vector u. The neural data suggests u has close to zero mean (across neurons). At the same time, it is posited that u varies across trials (3rd paragraph in pp18: "accumulator feedback is noisy") and that this variability is significant and important (previous comment). However, it would seem like neurons keep their choice preference across trials, meaning the trial to trial variability in each element of u has to be smaller than the mean. The authors only describe variability in uw (LogNormal), but, in addition to the issues just mentioned about this choice, what implications does this have for the variability in u? The logic of the approach would greatly increase if the authors made assumptions about the statistics of u consistent with the neural data, and then derived the statistics of uw.

      3) Overall, it seems like there is an intrinsically hard problem to be solved here, which is not acknowledged: how to obtain large variability in the effective gain of a feedback loop while at the same time keeping the gain "sufficiently restricted", i.e., neither too large and positive (runaway excitation) nor negative (counts are forgotten). While the authors avoid worrying about model parameters by fitting their values from data (with the caveats discussed above), their case would become much stronger if they studied the phenomenology of the model itself, exposing clearly the computational challenges faced and whether robust solutions to these problems exist.

    3. Reviewer #1:

      This study investigates the responses of neurons in the parietal cortex of mice (recorded via two-photon Ca imaging) performing a virtual navigation task, and then relates their activity to the animal's psychophysical performance. It is essentially two studies rolled into one. The analysis of neurophysiological activity in the first part shows that visually driven responses in the recorded "cue cells" are strongly modulated by the eventual choice and/or by the integrated quantity that defines that choice (the difference in left vs right stimulus counts), as well as by other task variables, such as running speed. The model comparison study of the second part shows that, in the context of a sensory-motor circuit for performing the task, this type of feedback may account for subtle but robust psychophysical effects observed in the mice from this study and in rats from previous studies from the lab. Notably, the feedback explains intriguing deviations in choice accuracy from the Weber-Fechner law.

      Both parts are interesting and carefully executed, although both are pretty dense; there are a ton of important technical details at each step. I wonder if this isn't too much for a single study. Had I not been reading it as a reviewer, I probably would have stopped after Fig. 4 or just skimmed the rest. After that, the motivation, methods, and analyses shift markedly. I'm not pushing hard on this issue, but I think the authors should ponder it.

      Other comments:

      1) It wasn't clear to me how the time of a particular cue onset was defined. In a real environment the cues would appear small (from afar) and get progressively bigger as the animal advances (at least if they are 3D objects, as depicted in Fig 1). What would be the cue onset in that case, and does the virtual environment work in the same way? This is probably not a serious issue, but it comes across as a bit at odds with the supposed "pulsatile" nature of the sensory stream, and would seem somewhat different from the auditory case with clicks.

      A related question concerns multiple references to cue timing made in the Intro, as if such timing were very precise. This seems strange given that all time points depend on the running speed of the mice, which is probably variable. So, how exactly is cue position converted to cue time, and why is there an assumption of very low variability? Some of this detail may be in previous reports, but it would be important to make at least a brief, explicit clarification early on.

      2) "positively and negatively choice-modulated cells exhibited gradually increasing effect sizes vs. place/time in the trial (Fig. 4e)" I found Fig. 4e confusing. Some curves are monotonic and some are not, and I'm not sure what is the point of showing the shades (which cover everything). If the key point is to contrast SSA and feedback models/effects, then it would be better to plot their corresponding effects directly, on the same graph, or to show predictions versus actual data in each case, in two graphs.

      3) Fig 6 and the accompanying section of the manuscript investigate a variety of models with different architectures (feedback vs purely feedforward) and noise sources. Here, if I understood correctly, the actual cue-driven responses are substituted with variables that are affected by different types of noise. It is this part that I found a bit disconnected from the rest, and somewhat confusing.

      Here, there's a jump from the actual cells to model responses. I think this needs an earlier and more explicit introduction. It is clear what the objective of the modeling effort is; what's unclear are the elements that initially go into it. This is partly because the section jumps off with a discussion about accumulator noise, but the modeling involves many more assumptions (i.e., simplifications about the inputs to the accumulators).

      What I wondered here was, what happened to all the variance that was carefully peeled away from the cue driven responses in the earlier part of the manuscript? Were the dependencies on running speed, viewing angle, contra versus ipsi sensitivity, etc still in play, or were the modeled cue-driven responses considering just the sensory noise from the impulse responses? I apologize if I missed this. I guess the broader question is how exactly the noise sources in the model relate to all the dependencies of the cue cells exposed in the earlier analyses.

      Overall, my general impression is that this section requires more unpacking (perhaps it should become an independent report?).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript carefully studies the properties of sensory responses in several visual areas during performance of a task in which head-fixed mice run along a virtual corridor and must turn toward the side that has more visual cues (small towers) along the wall. The results provide insight into the mechanisms whereby sensory evidence is accumulated and weighted to generate a choice, and into the sources of variability that limit the observed behavioral performance. All reviewers thought the work was generally interesting, carefully done, and novel.

      However, the reviewers' impression was that the manuscript as it stands is very dense. In fact, it is largely two studies with different methods and approaches rolled into one. The first one (physiology) is still dense but less speculative and with interesting, solid results, and the revisions suggested by the reviewers should be relatively straightforward to address. In contrast, the modeling effort is no doubt connected to the physiology, but it really addresses a separate issue. The general feeling was that this material is probably better suited for a separate, subsequent article, for two reasons. First, because it will require substantial further work (see details below), and second, because it adds a fairly complex chapter to an already intricate analysis of the neurophysiological data.

      We suggest that the authors revise the neurophys analyses along the lines suggested below (largely addressing clarity and completeness), leaving out the modeling study for a later report.

    1. Reviewer #2:

      Arg5, 6, a polyprotein is cleaved to produce two proteins Arg5 and Arg6. The authors report that production of these two proteins is mediated by a mitochondrial protease that is known for its function in N-terminal cleavage.

      The in vitro analysis is interesting, but the possibility of a contaminating activity cannot be ruled out. This needs to be tested by additional experiments, preferably by more data in intact cells.

    2. Reviewer #1:

      This study investigates the biogenesis of Arg5,6 in the yeast S. cerevisiae. Arg5,6 is a polyprotein that was previously established to be proteolytically processed into two proteins (Arg5 and Arg6) that are part of a complex with Arg2. The primary advance reported in the current study is to assign this processing to MPP, a mitochondrial protease known primarily for removing the N-terminal signal peptides from mitochondrial precursors. Additional work showed that the cleavage occurs at an internal sequence that resembles a mitochondrial targeting sequence (MTS), which presumably explains why it is recognised by MPP. This MTS-like internal processing signal is ineffective for directing translocation on its own. Some species contain this polyprotein organisation of Arg5,6, whereas other species encode the two proteins as separate open reading frames. S. cerevisiae Arg5,6 can be replaced effectively by two separately encoded products.

      Specific points:

      1) The authors use purified MPP to show that in vitro synthesized Arg5,6 precursor can be processed to the correct sized products. At that point, the authors "conclude that Arg5,6 is imported into the mitochondrial matrix and processed twice by MPP". This is plausible, but is premature based on the data, which show that MPP is able to process Arg5,6. However, the conclusion that MPP actually does process Arg5,6 in vivo is not documented, and the alternative that something else does this job is not formally excluded. This caveat should be acknowledged unless the authors are able to show necessity of MPP, not just sufficiency.

      2) The experiment showing cleavage with purified MPP (Fig. 1E and S1A) would be strengthened with control experiments using a catalytically inactive mutant of MPP, and a Arg5,6 substrate with a mutated site for cleavage. The first control would rigorously exclude any contaminants, and the second would help verify the site of cleavage.

      3) The conclusion that MPP processes Arg5,6 at the correct site in their in vitro experiments is based on size by SDS-PAGE. The resolution is not sufficient to draw this conclusion, which should be adjusted to say that processing occurs at approximately the correct site (unless the authors perform additional analysis to document the precise cleavage site). Mutagenesis of the putative site (point 2 above) would also be helpful in establishing the site more precisely.

      4) The smaller products seen in Fig. 1E would seem to suggest that MPP exhibits a degree of promiscuity in vitro that is not seen in vivo. This should be noted in the text.

      5) The authors observe that Arg6(1-343) cannot replace Arg6(1-502). They conclude that residues 344-502 are needed for enzyme activity, but this could be for many reasons. For example, Arg6(1-343) might not associate with Arg5. It is premature to imply that catalytic activity is impaired without making such measurements. The conclusion should be adjusted.

      6) It is worth testing whether Arg5(344-862) produced by in vitro translation can be processed by purified MPP. This would help distinguish between some intrinsic problem with access versus a more nuanced issue relating to how import is mediated by the iMTS-L versus a bona fide MTS (e.g., with only the latter recruiting MPP as speculated by the authors).

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      There were some technical concerns regarding the confidence with which the authors draw conclusions about whether the MPP is indeed the responsible protease. It is likely that the authors will be able to address these concerns with relatively straightforward additional experiments.

      We feel that the notion of a polyprotein being processed into multiple functional products by cellular proteases is very well established. Providing an additional example relevant for some species, but not others, is a modest advance in our opinion, as emerged during discussion among the referees and editors. This problem is further compounded by a very similar concept for another mitochondrial protein reported by a subset of these authors recently.

    1. Reviewer #3:

      This is an interesting paper that looks for neural markers of "team flow" experiences compared to individual flow or social interaction using EEG measured during a musical social app game. The approach and analyses are sophisticated, with the main findings being that in a combined beta-low gamma frequency range there was higher power in regions of left temporal cortex for team flow than the other conditions; that other brain regions responded to individual flow or social interaction; that directed analyses found greater information from these other brain regions to the left temporal cortex; and that the left temporal cortices of players engaged in team flow synchronized.

      However, these findings are difficult to interpret as they depend on the behavioural manipulation of the experiment that is purported to separate team flow, individual flow and social interaction, and I don't think these are clearly separated behaviourally. There were 3 conditions. In SyncA, players each tapped on a screen to control one stream of the music. In ScrA the music was scrambled and in Occl the game was as in SyncA, but the players were separated by a barrier. SyncA is supposed to measure team flow, ScrA individual flow but not team flow, and Occl is supposed to reduce social interaction. However, when one examines the ratings that players gave for team flow, individual flow and social interaction, they do not line up exactly with this theoretical manipulation. Specifically (Fig 1), individual flow ratings are higher in SyncA and Occl than ScrA, so SyncA and Occl don't differ in individual flow. Social interaction ratings are higher in SyncA than ScrA, and SrcA is higher than Occl, so Occl disrupts social interaction, but so does ScrA. And Team flow is disrupted by both ScrA and Occl. In other words, there is no clean mapping of the 3 experimental manipulations to the three ratings scales. Also very problematic is that for the rating questions, the three scales of individual flow, team flow and social interaction were not independent (Figure S2). Individual flow was taken as the average of questions 1-6, the social interaction as questions 7-9 and the team flow as questions 1-9! This makes it hard to interpret the findings because team flow is conceptually taken here as the combination of individual flow and social interaction, making the arguments appear circular.

      The "depth of flow state" is a potentially interesting measure, consisting of the mean auditory evoked response (although I note that it is not clear how it was calculated: if it is the average of P1, N1, P2 and N2 or the power in theta) to unexpected task irrelevant beeps. Essentially it measures how distractible the person is from the task. So theoretically, it is not clear exactly how this relates to the complex concept of team flow. People were found to be most distractible for ScrA, not surprisingly, as the scrambled game is probably less fun and engaging, but across subjects, only SyncA was correlated with the individual flow index. Why? I also assume there was no correlation with team flow. Why not? So this is an interesting measure, but conceptually I'm not sure what it tells us about team flow.

      For the analysis of beta-gamma, power at the electrode level at left temporal regions was higher for SyncA - but it was also higher for ScrA than for Occl (Fig 3), so what does that mean? From Fig 1e, team flow ratings were actually lower for ScrA than Occl (although maybe not significantly, but this is in the opposite direction). Also, this difference became exaggerated with high gamma, so why was this not analyzed? And how is this interpreted within the team flow concept?

      For the cluster analysis, some clusters were found with higher beta-gamma power for SyncA, other clusters for ScrA and yet other clusters where power was lower for Occl. However, given as I describe above, that it is not clear exactly how these conditions relate to the concepts of individual flow, team flow and social interaction, I don't think the authors can say as they do that clusters where power is highest for SyncA represent team flow. Clusters where power is lowest for Occl were said to represent social interaction, but this cannot be said because Occl also had high ratings for individual flow (Fig 1) so could be either or both high individual flow and/or low social interaction. Clusters where power is highest for ScrA are interpreted as "flow suppression", but not clear why and whether this refers to individual flow or team flow as both are suppressed behaviourally (Fig 1)?

      The directed connectivity analyses are interesting, but again difficult to interpret in terms of the individual flow, group flow, social interaction model. The regions need to be named more descriptively than GP1, etc. At the very least a table in the main text saying what these regions are would be helpful.

      For the analyses of inter-brain effects, why did they authors go to a new measure, information, rather than using a directed measure as in the previous analysis?

      I am also concerned about the very large number of statistical tests done here - probably experiment-wise error rate control is necessary. The more significant tests will survive this in any case.

      I am also questioning the very detailed brain regions used in the source analysis. It would be difficult I think for EEG to be able to independently separate signals coming from nearby regions so precisely.

      It also seems problematic that many participants were eliminated because they did not prefer to play the game in an interpersonal way over a solo or occlusion setup. Thus it seems that a very selected type of participant was used and I'm not sure if this can generalize. Also, some of the participants were friends, and this may have also influenced how they responded. At least some discussion of these issues is necessary.

    2. Reviewer #2:

      In the present manuscript, the authors introduce a novel task to measure 'team flow'. They test if alignment of brain activity is indicative of a shared experience, similar to mutual understanding (see e.g. work by Stolk et al. TiCS). They utilize a hyperscanning procedure where EEG recordings were obtained for two participants, while they were engaged in a task that requires cooperation.

      While the approach is interesting and the topic timely; all the results rest on a methodological assumption, which has not been accounted for.

      Both participants are presented with the same visual and auditory stimuli, which, when presented simultaneously, elicit the very same evoked response. When applying spectral analysis techniques to these simultaneously evoked responses, one can easily observe 'synchronization', which however, is completely driven by the simultaneous presentation of the external stimuli. This problem is aggravated when rhythmic visual stimuli are presented.

      In addition, several statistical comparisons do not explicitly test the interactions, which are implied by the authors (this problem has been discussed in detail here: https://www.nature.com/articles/nn.2886)

      In addition, several queries apply:

      1) The Flow index needs to be defined earlier in the manuscript (at least prior to Figure 1)

      2) a. Per Fig. 2c: The authors state 'As expected, the mean AEP response was significantly higher in the Inter-ScrA condition more than the other two conditions.' - Why was this expected? This statement is not trivial, why should the violation introduce a stronger response?

      b. Furthermore, it is difficult to reconcile it with the next statement 'Thus, this weaker AEP for the task-irrelevant stimulus in the Inter-SyncA and Occl-SyncA conditions provides neural evidence that the brain has reached a distinct selective-attentional state marking the flow experience.'

      • This is a far stretch from the ERP data

      3) Fig. 2d - The authors need to test for differences in interactions and they cannot claim differences when one test is significant and the other is not. See e.g. https://garstats.wordpress.com/2017/03/01/comp2dcorr/

      This again pertains to Figure 4c

      4) Testing different frequency bands independently is again not valid, since, power values across bands are strongly correlated, see e.g. see work by Donoghue and Voytek (2020) biorxiv or Haller et al. (2018) biorxiv. Fig 3c makes this even more likely that some of the effects are broadband and not band-limited 'oscillations'.

      5) All the differences localize to auditory areas, which makes one very suspicious that we are looking at evoked and therefore synchronized activity, and not alignment of endogenous oscillations, see e.g. a recent commentary: https://doi.org/10.1080/23273798.2020.1758335 The current paradigm basically would show synchrony (mistaken as team flow), when simultaneous spurious 'entrainment' (simultaneous evoked activity) is present in both participants; this confound needs to be accounted for since it confounds subsequent metrics of phase synchrony

      6) Statistics in Fig. 4b, these tests and ROIs are not independent, a data-driven cluster approach could be utilized instead (see Maris and Oostenveld 2007).

      7) Bar plots are deprecated, see Weissgerber et al PLOS Biol 2015.

      8) Analysis for Figure 5a needs a depiction on what is actually analyzed. The hierarchical clustering approach is introduced with clear rationale and explanation.

      Overall, this is an interesting approach. It is a methodological challenge to record EEG data from two interacting participants, but given that this is a relatively young field, some methodological prerequisites need to be established first. Critically, the authors need to present convincing evidence that we are not just facing the results of simultaneously evoked auditory and visual evoked responses.

    3. Reviewer #1:

      In this EEG study, the authors aimed to identify neural correlates of the subjective feeling of "team flow", i.e., a particular feeling of ease, task-related attention and control while doing a task together with someone else. This is a clearly interesting question and with a recent surge of hyperscanning research a timely study. The authors seem to have carefully selected pairs of participants who have similarly good performance in the game and similar music taste to be able to induce feelings of flow in their participants. Unfortunately, there seem to be quite serious problems in their statistical analyses which should be corrected first before the work can be assessed.

      1) Participants:

      a. The methods state that there are 15 participants, of which five were paired twice (p.13). In the Statistical analysis section, the authors state that "the unit of analysis" was participation, i.e., n = 20 (p. 17). This means apparently that five participants took part twice but were considered as independent measures in the statistical analyses. However, these are obviously dependent measures (or, repeated measures). The authors should include 20 (independent) participants in their analyses or need to take into account that five of the recorded 20 participants are identical.

      b. The supplementary material explains in detail the selection of participants. Based on the selection criteria, 38 participants were identified (suppl mat p. 3), but it is not explained what happened to the 23 participants which are not part of the current manuscript. (Also, only the supplementary materials state that preferably friends were selected as pairs and that only those were selected (and called "prosocial") who considered doing the task together more pleasurable than doing the task alone. This should be mentioned in the main text and it seems to bias the subjective evaluation of the conditions presented in Fig 1?)

      2) Statistical analyses:

      Several of the analyses compare the neural data in the three different conditions with one-way ANOVAs. As these are dependent measures from the same participants, this should be analyzed with repeated measures ANOVA. Also, I didn't quite understand the statistics presented on p.8 (on information flow, with two-way ANOVAs with the impressive df of 26 and 494) and on p.9 (F(26,10133) = ... ?), but again the different measures within one subject seem to be considered as independent measures?

      3) At several points of the analyses, it seemed like the analyses were biased. For instance, for the AEP analyses (which I generally considered a nice way to establish an "objective" measure of flow) only those channels were considered which in each resting trial robustly showed an AEP (p.14/15). Does that mean that different channels were considered for each trial and condition? I would suggest selecting the same set of central electrodes and then take these for all AEP analyses. Another case is the clustering analyses in which the number of cluster was selected such that condition differences were significant. Maybe I misunderstood this point but I guess the clustering should be done first and in the second (and independent) step, the condition differences can be assessed.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Your manuscript reports on a sophisticated experimental study in human participants. The study looks for neural markers of "team flow" experiences compared to individual flow or social interaction using EEG measured during a musical social app game. While the approach and analyses are sophisticated, all reviewers individually raised a series of substantial concerns with respect to EEG and statistical analysis. The editors and reviewers hence are unable to share the conclusions the authors would like to draw.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the Reviewers for the positive assessment of our work and their insightful remarks. Please find below a point-by-point response to each comment.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Scheckel et al. report a large dataset on cell type-specific translational profiling of PrD-associated molecular alterations in the a mouse model thorough RiboTRAP and ribosome profiling approaches. They report a more severe alteration in the translatome specifically in astrocyte and microglia as compared to neuronal populations. This highlights that changes in these two cell classes might have a predominant role in the pathology of PrD.

      Data and the methods are presented such that they can be reproduced. The data analysis section of the manuscript could be further elaborated. In particular, it could be clarified which / how comparisons with existing dataset have been performed. Statistical analysis description is sometimes missing (e.g. fig 6e, not clear what the stars on top of the bars stands for, which test was performed and the significance). Moreover, the section of the methods regarding the western blots presented in figure 6 appear to be missing.

      Fig 6e shows the output (log2 fold change) of DESeq2. Genes with a Benjamini-Hochberg adjusted p value \*Major concern:**

      The most important improvement the authors should consider for their paper is to more specifically attempt to isolate specific effects on translational efficiency of mRNAs. As it stands, the authors largely use RiboTrap data as a reference to compare their footprinting data - but arguably, this misses mRNAs that are present in the transcriptome and not efficiently recruited onto ribosomes. It appears to be somewhat a lost opportunity to not attempt to test in the dataset (possibly by comparison to RNA-Seq from FACS isolated cells as a reference) whether there is a systematic change in translational efficiency (possibly in mRNAs with specific features?). In the current form, the RiboTrap and footprinting approaches largely serve to isolate mRNAs from cre-defined cell types but given the lack of a "total transcriptome" reference from the respective cells, it can not be easily interpreted whether certain transcripts are heavily regulated at the level of translation. Thus, despite using much more advanced methodologies than the Sorce study, the fundamental conclusions emerging from this work are rather similar to this previously published piece of work.

      Translational changes can be assessed in a cell-type specific manner without artefacts related to dissociation/isolation procedures and are arguably more relevant than transcriptional changes (Haimon et al., Nat. Immunol. 2018). Both, the assessment of translation as well as the investigation of specific cell types differentiates this study from transcriptional profiling studies including Sorce et al. Accordingly, our approach identified > 1000 cell-type specific translational changes that were missed in the Sorce study (Fig. 5a-d).

      We agree however with the reviewer that a comparison of our data with RiboTrap data does not take non-transcribed RNAs into account. We have refrained from such a comparison for several reasons:

      We agree with the reviewer that a systematic comparison of transcriptomes and translatomes in the assessed cell types at every time point would have allowed us to identify genes regulated on a post-transcriptional level. The goal of this study was however to identify biologically relevant prion-induced molecular changes in a cell-type specific manner rather than identify post-transcriptional regulation. To assess the validity of our approach we chose closely related datasets (RiboTrap datasets) to compare our data to. The inclusion of RNAseq datasets from FACS-isolated cells would require an additional 2 years of work since all samples and datasets would need to be newly generated (breeding mice, inoculating mice with prions and waiting for up to 8 months for mice to reach the terminal time point, establishing procedures, generating and analyzing datasets) RNA-Seq from FACS isolated neurons is problematic due to neuronal processes often being lost during the dissociation/isolation procedures. Additionally, dissociation/isolation procedures typically introduce stress-related artefacts. These procedure-induced changes complicate comparisons with techniques that have been optimized to avoid such artefacts (including the method applied in this manuscript). Differences between transcriptional and translational datasets could thus be either due to post-transcriptional regulation or due to artefact differences and are likely difficult to interpret.

      **Additional suggestions:**

      1) In Figure 1d the authors point out occasional neuronal cells exhibiting Rpl10a-GFP expression with arrows. It appears that these arrows may have moved during figure preparation - please check/fix if necessary.

      Thank you for pointing this out. We have fixed the arrows.

      2) In Supplementary Figure 1b and c it appears that the PV labeling is missing in the panel for Rpl10a:GFP controls. If this is intentional please indicate this in the figure legend.

      A co-localization of GFP-positive cells and PV was assessed only in Cre-positive (GFP expressing) mice but not in Cre-negative mice that don’t express GFP. We have clarified this point in the corresponding figure legend.

      3) It appears that the authors sequenced a significant number of libraries generated for multiple time points post-inoculation. From the figures and legends it was not entirely clear to me, how many replicates were analyzed given that in some analyses samples from different time points were combined in a single plot.

      All analyzed samples are listed in Supplementary File 1. We have emphasized this pointed in the results section.

      4) It was unclear to me how long after inoculation the group of "terminally ill" mice were sacrificed. Somewhere in the text it states that there are 2 months between 24 wpi and terminally ill - but it appears that this was not a preset timepoint but varied from animal to animal based on symptoms. Please clarify.

      We sacrifice mice at the last humane time point possible at which they show terminal disease symptoms, including piloerection, hind limb clasping, kyphosis and ataxia. Intraperitoneal inoculated mice reach that time point at 31 - 32 weeks post inoculation (+/- few days). Control mice (inoculated with non-infectious brain homogenate) were sacrificed at the same time. We have clarified this point in the methods section.

      5) From the Western blot data in Figure 6f the authors conclude that GFAP expression is upregulated in PrD mice whereas astrocyte number is unchanged. Given that the translatome is assessed based on a Rpl10-GFP dependent on recombination mediated by cre driven from GFAP promoter it is possible that the astrocytic alterations in ribosome footprints are in part a secondary consequence of increased Rpl10-GFP recombination/ expression in PrD mice (due to activation of the GFAP promoter). To estimate the impact of such an effect the authors should compare GFP levels in terminally ill control and PrD mice by western blotting.

      We agree with the reviewer that this information would be important to add. We have therefore assessed GFP levels in Rpl10a:GFP mice bred with GFAPCre and Cx3cr1CreER mice. The corresponding western blots are included in Supplementary Figure 11. GFP levels remained constant in terminally ill GFAPCre mice. This is not surprising since even a low GFAP promoter activity is likely to allow sufficient Cre recombinase expression to remove a STOP cassette allowing GFP expression (controlled by the Rosa26 promoter) in GFAPCre mice. In contrast, we observed an increase in GFP expression in terminally Cx3cr1CreER mice, which is most likely linked to the increase in microglia numbers. As pointed out in the manuscript, the translational changes we identified cannot reflect differences in cell numbers due to the nature of our assay. This suggests that a difference in GFP expression does not impact our analyses.

      We have added this data to the manuscript.

      6) The western blot analysis of fig 6f-g has been performed using a normalization over calnexin, yet no calnexin signals shown to support this statement.

      We have included blots of the normalization control calnexin as Supplementary Figure 11a.

      7) Clarify the percentage of non-parenchimal machrophages that are accounting for the Cx3cr1-creER mouse line since the authors consider this only to be a minor contamination.

      The labeling of non-parenchymal macrophages using Cx3cr1CreER mice has previously been estimated to be ~1% (Haimon et al., Nat. Immunol. 2018). We have added this information to the manuscript.

      8) Regarding the presentation of the data, Fig 5a would be clearer if in the y axes, for each cell type the order of PrD and Ctrl samples was maintained.

      Fig 5a displays hierarchical clustering based on Euclidian distances. As samples are ordered according their distance from each other, we cannot change the order as suggested by the reviewer.

      Reviewer #1 (Significance (Required)):

      Overall, this is an important and interesting study. Besides its insights into the biology, the transcriptomic data will provide a valuable resource for researchers in the field.

      Previous studies employed bulk RNAseq or microdissection for mapping transcriptomic changes (Majer et al.2019; Sorce et al. 2020 and others). The Sorce et al study concluded that astrocytic alterations in the transcriptome are more dominant than neuronal gene expression changes. While the conclusion of the present study remains the same, it is the first to use of ribosome profiling to dissect actively translated transcripts over the progression of the pathology in the mouse model. Thus, the data presented here would allow for identifying cell type-specific alterations as well as alterations specifically in mRNA translation which would be missed by bulk RNA-Seq and RNA-Seq on FACS-isolated cells. However, the authors do not fully capitalize on this strength, given that no detailed comparisons are done to a real transcriptome reference are performed (see above).

      This work is of broad interest to scientists in neurodegeneration as well as glial biology.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Using a series of Cre-driven mouse strains a GFP-tagged version of RPL10a (a ribosomal protein) was targeted to different cell types allowing Dr Scheckel and colleagues to investigate translational changes as prion disease progresses in mice. Their data suggest massive changes in microglia and astrocytes but not neurons. The approach was particularly powerful as ribosome IP has been combined with ribosome profiling. The manuscript is very well written. What might help, however, is to make the figures more accessible (perhaps change some of the labelling?)

      I have only minor comments regarding some of the figures:

      Fig 1a: This scheme could be improved, adding wpi and better aligning the cell-types in relation to the time when the cell-types were analysed.

      We have replaced weeks with wpi and changed the alignment of cell types to clarify that all cell types were analyzed at every time point.

      Fig 1b-e: The resolution could be improved to better discern the different cell-types.

      We submitted low-quality figures due to an upload limit but will submit final figures of higher quality. Additionally, we have added higher magnification pictures to better discern the different cell types as Supplementary Fig. 1d-e.

      Fig 4: Astrocytes are categorised into A1 and A2 and microglia based on DAM and homeostatic signature (How does this relate to the M1 and M2 classification?).

      The categorization of microglia into homeostatic and disease-associated (as well as other) microglia has largely replaced the initial categorization into pro-inflammatory M1 and anti-inflammatory M2 microglia (Dubbelaar et al., Front Immunol. 2018), We have therefore opted for the more current categorization. This explanation has also been added to the manuscript.

      Reviewer #2 (Significance (Required)):

      Highly significant. I have published on de novo protein synthesis in neurodegenerative disease

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors sampled actively translated proteins by cell type in the brains of RiboTag expressing mice under the control of cell specific cre recombination to determine changes in the translational profiles. They injected prions IP to induce prion disease. Their model shows little to no neuron loss at the terminal stage due to animal welfare regulations, but neuronal loss is a key hallmark of prion disease, along with gliosis. However, since other groups under different animal welfare regulations have shown that prion injection is sufficient to fully model the disease given enough time, there is sufficient evidence that this model captures early disease pathogenesis. The methodology used here has some clear advantages over previous cell-type isolation methods that require more lengthy sorting procedures. However, proteins with a long half-life or tightly regulated levels (such as TDP-43) are likely underrepresented by this method. The method also depends strongly on the specificity of the cre driver used; CamkIIa (excitatory N), parvalbumin (inhibitory N), GFAP (A), Cx3cr1 (microglia). While there is some off-target expression of the GFAP and Cx3cr1, the overall expression profiles generally match cell-specific transcriptomes obtained by other groups using other methods. They find major changes in astrocytes and microglia at terminal stages, after the onset of neurological symptoms, and comparatively fewer in neurons. Oligodendrocytes are not examined. The authors are commended on a thorough and well-designed study, especially in the comparison of multiple neuronal and glial types simultaneously.

      **Major comments:**

      Key conclusion 1: "Our results suggest that aberrant translation within glia may suffice to cause severe neurological symptoms and may even be the primary driver of prion disease." This conclusion is well-supported, serving as a hypothesis for future work. The data shows that the most abundant PTG changes are indeed in microglia at 24 wpi, before the onset of symptoms. In addition, although some genes are also differentially translated in the neuronal populations, examination of the Supplemental Tables shows that these are mostly highly expressed glial genes and could represent contamination of the sample during gliosis. The authors may wish to discuss this more prominently to avoid confusion. This data indeed suggests that glial changes alone are could be sufficient to produce the neurological symptoms in these mice. However, the authors should include discussion that the two genes changed at 24 weeks in PV neurons (Oprm1, Cyp2s1) do appear to be neuronal and may be relevant to pathogenesis as well. These mRNAs were also decreased in their previous paper conducting bulk sequencing in the hippocampus, according to the authors' online Prion RNAseq Database. Knockout experiments in mouse models have shown that dysregulation of one or a few critical genes in neurons can be sufficient to induce dysfunction and neurological symptoms, and the current evidence does not seem sufficient to rule it out. Fig 3d also suggests that PTGs in PV neurons may be particularly important, even accounting for the additional regions present in the RP analysis.

      We agree with the reviewer that few critical neuronal genes might be sufficient to induce neurological dysfunction and symptoms and have added this point to the results and discussion. Additionally, we have highlighted that many neuronal genes are glia-enriched and might reflect glia contamination.

      Key Conclusion 2: "Cell-type specific changes become only evident at late PrD stages." This conclusion is well supported. However, as the authors noted, due to legal constraints their model represents early to mid disease onset rather than a true terminal environment matching that of patients. Therefore, it would be advantageous to choose a more appropriate name for the "terminal" group, perhaps based on one of the key humane endpoint criteria that would help readers in the field to place these important results in context of the overall disease process.

      We have added additional information to clarify our definition of terminal stage to the methods.

      Key Conclusion 3: "This suggests that the prion-induced molecular phenotypes reflect major glia alterations, whereas the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes such as altered neuronal connectivity." The authors should modify the second half of this claim. As discussed above, changes to even a few neuronal genes can be sufficient to induce neurodegeneration. The claim that "the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes," fails to acknowledge the changes in PV neurons observed in this study, however few they may be. The authors also do not take into account the possible role of transcribed RNAs that are not immediately translated (for example those that accumulate at synapses for fast translation on demand) or the overall proteome, which are not included in their analysis. Though their method cannot detect these components, the authors should examine the implications that such other changes may still be present in the discussion. The authors should also discuss the functions of the few specific PV PTGs and explore their potential relationship with neurodegeneration. This is especially important since the authors acknowledge that a key reason for including PV neurons in the analysis is ample evidence in the literature that they play a role in disease pathogenesis. Finally, the authors note that a top GO term in microglial cells was synaptic transmission. The authors should expand on this finding in the discussion, as the interplay of glia and neurons in the pathogenesis of disease is likely highly relevant.

      We have removed the claim that “behavioral phenotypes may be ascribed to biochemically undetectable changes” and added the point that few neuronal changes might be sufficient to induce neuronal dysfunction & symptoms. As stated in the manuscript, we believe that the enrichment of the GO term synaptic transmission in microglia is an artefact. We therefore refrained from further discussing this finding and have highlighted that it is in artefact in the results.

      • *Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.* - *Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      As discussed above, the inclusion of RNAseq datasets from FACS isolated cells would require an additional 2 years of work since all samples and datasets would need to be newly generated (breeding mice, inoculating mice with prions and waiting for up to 8 months for mice to reach the terminal time point, establishing procedures, generating and analyzing datasets).

      Key Conclusion 1: No additional experiments needed. Key Conclusion 2: No additional experiments needed. Key Conclusion 3: No additional experiments needed for a modified statement.

      The data and methods are largely reproducible. Additional information should be provided about the methods for Gene Ontology analysis, how it was controlled, and what was used as a significance measure.

      We have added additional information about the GO analysis to the methods section. The complete list of GO terms is now included as Supplementary File 10.

      Some groups contain only two animals. At least three should be included per group for a minimally robust analysis.

      We have tried to include 3 replicates per group as suggested by the reviewer. In few exceptions, we lost an individual sample and one sample had to be excluded due to low quality. In these instances (GFAP_2wpi Ctrl; CamKIIa_CX_term_Ctrl, CamKIIa_CX_term_PrD, Cx3cr1_term_Ctrl and Cx3cr1_term_PrD) we ensured that both replicates showed a high correlation and could still yield reliable results (see below). Consistently, the DESeq2 algorithm (which can handle also just 2 replicates per group) identified differentially translated genes in the terminal samples.

      **Minor comments:**

      Fig. 1 c-e all panels should have a scale bar. E, closer insets or larger images are needed to see the colocalization in these very small cells.

      We have added scale bars to all panels. A colocalization is indeed not visible in the uploaded low-quality Figures that were submitted due to the size limit. We believe that a colocalization is visible in the high-quality final pictures but are also happy to provide closer insets upon editorial request.

      Fig. 5f: To allow interpretation of the Gene Ontology analysis, authors should include the number of genes involved in the pathway and the number of those genes found in their sample input list.

      We have added details regarding the GO analysis to the methods section, and are now providing the requested information in Supplementary File 10.

      Fig. S6: It is not clear from viewing the figure or the legend what the percentages on the axes refer to.

      The principal components 1 and 2 are plotted on the x and y axes, respectively. The % of variance explained by these principal components is indicated. We have added this information to the figure legend.

      Fig. S7: the gene numbers are confusing because they do not match the data in Fig. 4a. It would be helpful to use the same LFC cutoff as in Fig. 4a to avoid misunderstandings by the reader, or explain why no cutoff is used and what information the authors wish to convey by presenting the data that way.

      *Typically, all significant changes (p adj Fig S9: The legend indicates that genes changed in all 5 datasets are colored in green, however this is not easily visible on the graphs (appears more gray).

      Genes changing in all datasets are colored in green in Fig. 5. Genes changing in all datasets are colored in grey in Supplementary Fig. 9. We have adjusted the corresponding legends. The quality of the figures is very low due to the upload limit. The final figures will be of higher quality.

      Fig. S10: on page 12 Supplementary Fig. 10c is referenced, but likely refers to 10b. Throughout manuscript: It should be RNase, not RNAse.

      Both points have been addressed.

      Reviewer #3 (Significance (Required)):

      This work provides an important conceptual advance in prion disease research that glia may be primary drivers of disease equal to or surpassing certain neuronal populations. Though the authors have shown previously that glial changes are dominant in bulk sequencing of the hippocampus, cell type-specific analysis adds an important level of detail to convince the field that few transcriptional changes occur in neurons though neurological defects are already present. Historically, neuronal defects have been assumed to occupy the main role, with glia being largely ignored. This echoes recent similar changes in other areas of the neurodegenerative disease field where we are recognizing the important roles of glia in pathogenesis, and how they may be modulated to treat disease.

      Their findings in PV neurons also may reflect early key changes in this important neuronal population that contribute to neurological symptom onset. They will allow further study of the genes and pathways involved and may lead to additional effective treatments for disease. Finally, the thorough comparison of multiple neuronal and glial populations will allow future investigation of the interplay of neurons and microglia in pathogenesis and shows the importance of studying them synergistically rather than individually.

      *Audience:*

      The neurodegenerative disease field in general will be interested in the findings. Immunologists, other neuroscientists, and pharmaceutical and other drug development organizations will also be influenced by the work.

      *Own expertise:*

      Neurodegenerative disease, transgenic mouse models, neuropathology, translational neuroscience

      REFEREE'S CROSS-COMMENTING:

      I agree with Reviewer 1 that a comparison of the total transcriptome with ribosomally active transcripts would aid the interpretation of this work. It would also uncover or refute the presence of cell-type differences in translation efficiency that directly impact the authors' major conclusion that glia are more affected than neurons. I support the request of this additional experiment.

      As discussed above we have refrained from such a comparison since 1) the scope of this study was to identify biologically relevant prion-induced molecular changes and not study post-transcriptional regulation, 2) the generation of such dataset will take ~ 2 years, and 3) difference between transcriptional and translational changes are likely a combination of post-transcriptional regulation and artefact induced change that are probably difficult to interpret.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors sampled actively translated proteins by cell type in the brains of RiboTag expressing mice under the control of cell specific cre recombination to determine changes in the translational profiles. They injected prions IP to induce prion disease. Their model shows little to no neuron loss at the terminal stage due to animal welfare regulations, but neuronal loss is a key hallmark of prion disease, along with gliosis. However, since other groups under different animal welfare regulations have shown that prion injection is sufficient to fully model the disease given enough time, there is sufficient evidence that this model captures early disease pathogenesis. The methodology used here has some clear advantages over previous cell-type isolation methods that require more lengthy sorting procedures. However, proteins with a long half-life or tightly regulated levels (such as TDP-43) are likely underrepresented by this method. The method also depends strongly on the specificity of the cre driver used; CamkIIa (excitatory N), parvalbumin (inhibitory N), GFAP (A), Cx3cr1 (microglia). While there is some off-target expression of the GFAP and Cx3cr1, the overall expression profiles generally match cell-specific transcriptomes obtained by other groups using other methods. They find major changes in astrocytes and microglia at terminal stages, after the onset of neurological symptoms, and comparatively fewer in neurons. Oligodendrocytes are not examined. The authors are commended on a thorough and well-designed study, especially in the comparison of multiple neuronal and glial types simultaneously.

      Major comments:

      Key conclusion 1: "Our results suggest that aberrant translation within glia may suffice to cause severe neurological symptoms and may even be the primary driver of prion disease." This conclusion is well-supported, serving as a hypothesis for future work. The data shows that the most abundant PTG changes are indeed in microglia at 24 wpi, before the onset of symptoms. In addition, although some genes are also differentially translated in the neuronal populations, examination of the Supplemental Tables shows that these are mostly highly expressed glial genes and could represent contamination of the sample during gliosis. The authors may wish to discuss this more prominently to avoid confusion. This data indeed suggests that glial changes alone are could be sufficient to produce the neurological symptoms in these mice. However, the authors should include discussion that the two genes changed at 24 weeks in PV neurons (Oprm1, Cyp2s1) do appear to be neuronal and may be relevant to pathogenesis as well. These mRNAs were also decreased in their previous paper conducting bulk sequencing in the hippocampus, according to the authors' online Prion RNAseq Database. Knockout experiments in mouse models have shown that dysregulation of one or a few critical genes in neurons can be sufficient to induce dysfunction and neurological symptoms, and the current evidence does not seem sufficient to rule it out. Fig 3d also suggests that PTGs in PV neurons may be particularly important, even accounting for the additional regions present in the RP analysis.

      Key Conclusion 2: "Cell-type specific changes become only evident at late PrD stages." This conclusion is well supported. However, as the authors noted, due to legal constraints their model represents early to mid disease onset rather than a true terminal environment matching that of patients. Therefore, it would be advantageous to choose a more appropriate name for the "terminal" group, perhaps based on one of the key humane endpoint criteria that would help readers in the field to place these important results in context of the overall disease process.

      Key Conclusion 3: "This suggests that the prion-induced molecular phenotypes reflect major glia alterations, whereas the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes such as altered neuronal connectivity." The authors should modify the second half of this claim. As discussed above, changes to even a few neuronal genes can be sufficient to induce neurodegeneration. The claim that "the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes," fails to acknowledge the changes in PV neurons observed in this study, however few they may be. The authors also do not take into account the possible role of transcribed RNAs that are not immediately translated (for example those that accumulate at synapses for fast translation on demand) or the overall proteome, which are not included in their analysis. Though their method cannot detect these components, the authors should examine the implications that such other changes may still be present in the discussion. The authors should also discuss the functions of the few specific PV PTGs and explore their potential relationship with neurodegeneration. This is especially important since the authors acknowledge that a key reason for including PV neurons in the analysis is ample evidence in the literature that they play a role in disease pathogenesis. Finally, the authors note that a top GO term in microglial cells was synaptic transmission. The authors should expand on this finding in the discussion, as the interplay of glia and neurons in the pathogenesis of disease is likely highly relevant.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Key Conclusion 1: No additional experiments needed. Key Conclusion 2: No additional experiments needed. Key Conclusion 3: No additional experiments needed for a modified statement.

      The data and methods are largely reproducible. Additional information should be provided about the methods for Gene Ontology analysis, how it was controlled, and what was used as a significance measure. Some groups contain only two animals. At least three should be included per group for a minimally robust analysis.

      Minor comments:

      Fig. 1 c-e all panels should have a scale bar. E, closer insets or larger images are needed to see the colocalization in these very small cells. Fig. 5f: To allow interpretation of the Gene Ontology analysis, authors should include the number of genes involved in the pathway and the number of those genes found in their sample input list. Fig. S6: It is not clear from viewing the figure or the legend what the percentages on the axes refer to. Fig. S7: the gene numbers are confusing because they do not match the data in Fig. 4a. It would be helpful to use the same LFC cutoff as in Fig. 4a to avoid misunderstandings by the reader, or explain why no cutoff is used and what information the authors wish to convey by presenting the data that way. Fig S9: The legend indicates that genes changed in all 5 datasets are colored in green, however this is not easily visible on the graphs (appears more gray). Fig. S10: on page 12 Supplementary Fig. 10c is referenced, but likely refers to 10b. Throughout manuscript: It should be RNase, not RNAse.

      Significance

      This work provides an important conceptual advance in prion disease research that glia may be primary drivers of disease equal to or surpassing certain neuronal populations. Though the authors have shown previously that glial changes are dominant in bulk sequencing of the hippocampus, cell type-specific analysis adds an important level of detail to convince the field that few transcriptional changes occur in neurons though neurological defects are already present. Historically, neuronal defects have been assumed to occupy the main role, with glia being largely ignored. This echoes recent similar changes in other areas of the neurodegenerative disease field where we are recognizing the important roles of glia in pathogenesis, and how they may be modulated to treat disease.

      Their findings in PV neurons also may reflect early key changes in this important neuronal population that contribute to neurological symptom onset. They will allow further study of the genes and pathways involved and may lead to additional effective treatments for disease. Finally, the thorough comparison of multiple neuronal and glial populations will allow future investigation of the interplay of neurons and microglia in pathogenesis and shows the importance of studying them synergistically rather than individually.

      Audience:

      The neurodegenerative disease field in general will be interested in the findings. Immunologists, other neuroscientists, and pharmaceutical and other drug development organizations will also be influenced by the work.

      Own expertise:

      Neurodegenerative disease, transgenic mouse models, neuropathology, translational neuroscience

      REFEREE'S CROSS-COMMENTING:

      I agree with Reviewer 1 that a comparison of the total transcriptome with ribosomally active transcripts would aid the interpretation of this work. It would also uncover or refute the presence of cell-type differences in translation efficiency that directly impact the authors' major conclusion that glia are more affected than neurons. I support the request of this additional experiment.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Using a series of Cre-driven mouse strains a GFP-tagged version of RPL10a (a ribosomal protein) was targeted to different cell types allowing Dr Scheckel and colleagues to investigate translational changes as prion disease progresses in mice. Their data suggest massive changes in microglia and astrocytes but not neurons. The approach was particularly powerful as ribosome IP has been combined with ribosome profiling. The manuscript is very well written. What might help, however, is to make the figures more accessible (perhaps change some of the labelling?)

      I have only minor comments regarding some of the figures:

      Fig 1a: This scheme could be improved, adding wpi and better aligning the cell-types in relation to the time when the cell-types were analysed. Fig 1b-e: The resolution could be improved to better discern the different cell-types. Fig 4: Astrocytes are categorised into A1 and A2 and microglia based on DAM and homeostatic signature (How does this relate to the M1 and M2 classification?).

      Significance

      Highly significant. I have published on de novo protein synthesis in neurodegenerative disease

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Scheckel et al. report a large dataset on cell type-specific translational profiling of PrD-associated molecular alterations in the a mouse model thorough RiboTRAP and ribosome profiling approaches. They report a more severe alteration in the translatome specifically in astrocyte and microglia as compared to neuronal populations. This highlights that changes in these two cell classes might have a predominant role in the pathology of PrD.

      Data and the methods are presented such that they can be reproduced. The data analysis section of the manuscript could be further elaborated. In particular, it could be clarified which / how comparisons with existing dataset have been performed. Statistical analysis description is sometimes missing (e.g. fig 6e, not clear what the stars on top of the bars stands for, which test was performed and the significance). Moreover, the section of the methods regarding the western blots presented in figure 6 appear to be missing.

      Major concern:

      The most important improvement the authors should consider for their paper is to more specifically attempt to isolate specific effects on translational efficiency of mRNAs. As it stands, the authors largely use RiboTrap data as a reference to compare their footprinting data - but arguably, this misses mRNAs that are present in the transcriptome and not efficiently recruited onto ribosomes. It appears to be somewhat a lost opportunity to not attempt to test in the dataset (possibly by comparison to RNA-Seq from FACS isolated cells as a reference) whether there is a systematic change in translational efficiency (possibly in mRNAs with specific features?). In the current form, the RiboTrap and footprinting approaches largely serve to isolate mRNAs from cre-defined cell types but given the lack of a "total transcriptome" reference from the respective cells, it can not be easily interpreted whether certain transcripts are heavily regulated at the level of translation. Thus, despite using much more advanced methodologies than the Sorce study, the fundamental conclusions emerging from this work are rather similar to this previously published piece of work.

      Additional suggestions:

      1) In Figure 1d the authors point out occasional neuronal cells exhibiting Rpl10a-GFP expression with arrows. It appears that these arrows may have moved during figure preparation - please check/fix if necessary.

      2) In Supplementary Figure 1b and c it appears that the PV labeling is missing in the panel for Rpl10a:GFP controls. If this is intentional please indicate this in the figure legend.

      3) It appears that the authors sequenced a significant number of libraries generated for multiple time points post-inoculation. From the figures and legends it was not entirely clear to me, how many replicates were analyzed given that in some analyses samples from different time points were combined in a single plot.

      4) It was unclear to me how long after inoculation the group of "terminally ill" mice were sacrificed. Somewhere in the text it states that there are 2 months between 24 wpi and terminally ill - but it appears that this was not a preset timepoint but varied from animal to animal based on symptoms. Please clarify.

      5) From the Western blot data in Figure 6f the authors conclude that GFAP expression is upregulated in PrD mice whereas astrocyte number is unchanged. Given that the translatome is assessed based on a Rpl10-GFP dependent on recombination mediated by cre driven from GFAP promoter it is possible that the astrocytic alterations in ribosome footprints are in part a secondary consequence of increased Rpl10-GFP recombination/ expression in PrD mice (due to activation of the GFAP promoter). To estimate the impact of such an effect the authors should compare GFP levels in terminally ill control and PrD mice by western blotting.

      6) The western blot analysis of fig 6f-g has been performed using a normalization over calnexin, yet no calnexin signalis shown to support this statement.

      7) Clarify the percentage of non-parenchimal machrophages that are accounting for the Cx3cr1-creER mouse line since the authors consider this only to be a minor contamination.

      8) Regarding the presentation of the data, Fig 5a would be clearer if in the y axes, for each cell type the order of PrD and Ctrl samples was maintained.

      Significance

      Overall, this is an important and interesting study. Besides its insights into the biology, the transcriptomic data will provide a valuable resource for researchers in the field.

      Previous studies employed bulk RNAseq or microdissection for mapping transcriptomic changes (Majer et al.2019; Sorce et al. 2020 and others). The Sorce et al study concluded that astrocytic alterations in the transcriptome are more dominant than neuronal gene expression changes. While the conclusion of the present study remains the same, it is the first to use of ribosome profiling to dissect actively translated transcripts over the progression of the pathology in the mouse model. Thus, the data presented here would allow for identifying cell type-specific alterations as well as alterations specifically in mRNA translation which would be missed by bulk RNA-Seq and RNA-Seq on FACS-isolated cells. However, the authors do not fully capitalize on this strength, given that no detailed comparisons are done to a real transcriptome reference are performed (see above).

      This work is of broad interest to scientists in neurodegeneration as well as glial biology.

    1. Reviewer #3

      Jaron and collaborators provide a large-scale comparative work on the genomic impact of asexuality in animals. By analysing 26 published genomes with a unique bioinformatic pipeline, they conclude that none of the expected features due to the transition to asexuality is replicated across a majority of the species. Their findings call into question the generality of the theoretical expectations, and suggest that the genomic impacts of asexuality may be more complicated than previously thought.

      The major strengths of this work is (i) the comparison among various modes and origins of asexuality across 18 independent transitions; and (ii) the development of a bioinformatic pipeline directly based on raw reads, which limits the biases associated with genome assembly. Moreover, I would like to acknowledge the effort made by the authors to provide on public servers detailed methods which allow the analyses to be reproduced. That being said, I also have a series of concerns, listed below:

      1) Theoretical expectations.

      As far as I understand, the aim of this work is to test whether 4 classical predictions associated with the transition to asexuality and 5 additional features observed in individual asexual lineages hold at a large phylogenetic scale. However, I think that these predictions are poorly presented, and so they may be hardly understood by non-expert readers. Some of them are briefly mentioned in a descriptive way in the Introduction (L56 - 61), and with a little more details in the Boxes 1 and 2. However, the evolutive reasons why one should expect these features to occur (and under which assumptions) is not clearly stated anywhere in the Introduction (but only briefly in the Results & Discussion). I think it is important that the authors provide clear-cut quantitative expectations for each genomic feature analysed and under each asexuality origin and mode (Box 1 and 2). Also highlighting the assumptions behind these expectations will help for a better interpretation of the observed patterns.

      2) Mutation accumulation & positive selection.

      A subtlety which is not sufficiently emphasized to my mind is that the different modes of asexuality encompass reproduction with or without recombination (Box 2), which can lead to very different genetic outcomes. For example, it has been shown that the Muller's ratchet (the accumulation of deleterious mutations in asexual populations) can be stopped by small amounts of recombination in large-sized populations (Charlesworth et al. 1993; 10.1017/S0016672300031086). Similarly a new recessive beneficial mutation can only segregate at a heterozygous state in a clonal lineage (unless a second mutation hits the same locus); whereas in the presence of recombination, these mutations will rapidly fix in the population by the formation of homozygous mutants (Haldane's Sieve, Haldane 1927; 10.1017/S0305004100015644). Therefore, depending on whether recombination occurs or not during asexual reproduction, the expectations may be quite different; and so they could deviate from the "classical predictions". In this regard, I would like to see the authors adjust their conclusions. Moreover, it is also not very clear whether the species analysed here are 100% asexuals or if they sometimes go through transitory sexual phases, which could reset some of the genomic effects of asexuality.

      3) Transposable elements.

      I found the predictions regarding the amount of TEs expected under asexuality quite ambiguous. From one side, TEs are expected not to spread because they cannot colonize new genomes (Hickey 1982); but on the other side TEs can be viewed as any deleterious mutation that will accumulate in asexual genome due to the Muller's ratchet. The argument provided by the authors to justify the expectation of low TE load in asexual lineages is that "Only asexual lineages without active TEs, or with efficient TE suppression mechanisms, would be able to persist over evolutionary timescales". But this argument should then equally be applied to any other type of deleterious mutations, and so we won't be able to see Muller's ratchet in the first place. Therefore, not observing the expected pattern for TEs in the genomic data is not so surprising as the expectation itself does not seem to be very robust. I would like the authors to better acknowledge this issue, which actually goes into their general idea that the genomic consequences of asexuality are not so simple.

      4) Heterozygosity.

      Due to the absence of recombination, asexual populations are expected to maintain a high level of diversity at each single locus (heterozygosity), but a low number of different haplotypes. However, as presented by the authors in the Box 2, there are different modes of parthenogenesis with different outcomes regarding heterozygosity: (1) preservation at all loci; (2) reduction or loss at all loci; (3) reduction depending on the chromosomal position relative to the centromere (distal or proximal). Therefore, the authors could benefit from their genome-based dataset to explore in more detail the distribution of heterozygosity along the chromosomes, and further test whether it fits with the above predictions. If the differing quality of the genome assemblies is an issue, the authors could at least provide the variance of the heterozygosity across the genome. The mode #3 (i.e. central fusions and terminal fusions) would be particularly interesting as one would then be able to compare, within the same genome, regions with large excess vs. deficit of heterozygosity and assess their evolutive impacts.

      Moreover, the authors should put more emphasis on the fact that using a single genome per species is a limitation to test the subtle effects of asexuality on heterozygosity (and also on "mutation accumulation & positive selection"). These effects are better detected using population-based methods (i.e. with many individuals, but not necessarily many loci). For example, the FIS value of a given locus is negative when its heterozygosity is higher than expected under random mating, and positive when the reverse is true (Wright 1951; 10.1111/j.1469-1809.1949.tb02451.x).

      5) Absence of sexual lineages.

      A second limit of this work is the absence of sexual lineages to use as references in order to control for lineage-specific effects. I do not agree with the authors when they say that "the theoretical predictions pertaining to mutation accumulation, positive selection, gene family expansions, and gene loss are always relative to sexual species [...] and cannot be independently quantified in asexuals." I think that this is true for all the genomic features analysed, because the transition to asexuality is going to affect the genome of asexual lineages relative to their sexual ancestors. This is actually acknowledged at the end of the Conclusion by the authors.

      To give an example, the authors say that "Species with an intraspecific origin of asexuality show low heterozygosity levels (0.03% - 0.83%), while all of the asexual species with a known hybrid origin display high heterozygosity levels (1.73% - 8.5%)". Interpreting these low vs. high heterozygosity values is difficult without having sexual references, because the level of genetic diversity is also heavily influenced by the long term life history strategies of each species (e.g. Romiguier et al. 2014; 10.1038/nature13685).

      I understand that the genome of related sexual species are not available, which precludes direct comparisons with the asexual species. However, I think that the results could be strengthened if the authors provided for each genomic feature that they tested some estimates from related sexual species. Actually, they partially do so along the Result & Discussion section for the palindromes, transposable elements and horizontal gene transfers. I think that these expectations for sexual species (and others) could be added to Table 1 to facilitate the comparisons.

      6) Regarding statistics, I acknowledge that the number of species analysed is relatively low (n=26), which may preclude getting any significant results if the effects are weak. However, the authors should then clearly state in the text (and not only in the reporting form) that their analyses are descriptive. Also, their position regarding this issue is not entirely clear as they still performed a statistical test for the effect of asexuality mode / origin on TE load (Figure 2 - supplement 1). Therefore, I would like to see the same statistical test performed on heterozygosity (Figure 2).

      7) As you used 31 individuals from 26 asexual species, I was wondering whether you make profit of the multi-sample species. For example, were the kmer-based analyses congruent between individuals of the same species?

    2. Reviewer #2

      This paper is interesting because it is studying, through a comparative genomic approach, how asexuality affects genome evolution in animal lineages while focusing on the same features. Such an extensive comparison can, in principle, distinguish the common consequences of asexuality, in contrast to previous studies that focused on few asexual species (or only one). It is interesting that the authors did not find a universal genomic feature of "asexual" species. This is a potentially important contribution to the field of the evolution of reproductive systems.

      However, I am concerned about limitations and potential biases in many of the specific genomic features analysed, and resultant difficulties in drawing any general conclusions from these analyses. For example, the heterozygosity analyses need to be more clearly explained and the potential limits of the methods used discussed further. The use of kmer spectra analyses as opposed to genome assemblies is understandable, but these are biases here that were not discussed. I am also concerned about the impact of low read quality and low coverage genomic data, and whether issues with genome assembly affect the conclusions. There are also issues about conclusions related to species of hybrid origin as there are numerous "unknown" cases and cytological data is lacking for many of the studied animal groups (therefore the authors should be cautious on the evidence of reproduction mode).

      Ideally, all the genomes of the asexual animal clades studied should have been sequenced and assembled using the same method which would make this comparative study much stronger. We realize this may not yet be practical, but the absence of such data must temper the conclusions. It is nevertheless the first article including and comparing many distinct parthenogenetic animal clades and the main result that no common universal genomic feature of parthenogenesis is, with caveats, interesting.

      Major Issues and Questions:

      1) The authors choose to refer to asexuality when describing thelytokous parthenogenesis. Asexuality is a very general term that can be confusing: fission, vegetative reproduction could also be considered asexuality. I suggest using parthenogenesis throughout the manuscript for the different animal clades studied here. Moreover, in thelytokous parthenogenesis meiosis can still occur to form the gametes, it is therefore not correct to write that "gamete production via meiosis... no longer take place" (lines 57-58). Fertilization by sperm indeed does not seem to take place (except during hybridogenesis, a special form of parthenogenesis).

      2) The cellular mechanisms of asexuality in many asexual lineages are known through only a few, old cytological studies and could be inaccurate or incomplete (for example Triantaphyllou paper of 1981 of Meloidogyne nematodes or Hsu, 1956 for bdelloid rotifers). The authors should therefore mention in the introduction the lack of detailed and accurate cellular and genetic studies to describe the mode of reproduction because it may change the final conclusion.

      For example, for bdelloid rotifers the literature is scarce. However the authors refer in Supp Table 1 to two articles that did not contain any cytological data on oogenesis in bdelloid rotifers to indicate that A. vaga and A. ricciae use apomixis as reproductive mode. Welch and Meselson studied the karyotypes of bdelloid rotifers, including A. vaga, and did not conclude anything about absence or presence of chromosome homology and therefore nothing can be said about their reproduction mode. In the article of Welch and Meselson the nuclear DNA content of bdelloid species is measured but without any link with the reproduction mode. The only paper referring to apomixis in bdelloids is from Hsu (1956) but it is old and new cytological data with modern technology should be obtained.

      3) In the section on Heterozygosity, the authors compute heterozygosity from kmer spectra analysis from reads to "avoid biases from variable genome assembly qualities" (page 16). But such kmer analysis can be biased by the quality and coverage of sequencing reads. While such analyses are a legitimate tool for heterozygosity measurements, this argument (the bias of genome quality) is not convincing and the authors should describe the potential limits of using kmer spectra analyses.

      4) The authors state that heterozygosity levels “should decay over time for most forms of meiotic asexuality". This is incorrect, as this is not expected with "central fusion" or with "central fusion automixis equivalent" where there is no cytokinesis at meiosis I.

      5) I do not fully agree with the authors’ statement that: "In spite of the prediction that the cellular mechanism of asexuality should affect heterozygosity, it appears to have no detectable effect on heterozygosity levels once we control for the effect of hybrid origins (Figure 2)." (page 17)

      The scaling on Figure 2 is emphasizing high values, while low values are not clearly separated. By zooming in on the smaller heterozygosity % values we may observe a bigger difference between the "asexuality mechanisms". I do not see how asexuality mechanism was controlled for, and if you look closely at intra group heterozygosity, variability is sometimes high.

      It is expected that hybrid origin leads to higher heterozygosity levels but saying that asexuality mechanism is not important is surprising: on Figure 2 the orange (central fusion) is always higher than yellow (gamete duplication). Also, the variability found within rotifers could be an argument against a strong importance of asexuality origin on heterozygosity levels: the four bdelloid species likely share the same origin but their allelic heterozygosity levels appears to range from almost 0 to almost 6% (Fig 2 and 3, however the heterozygosity data on Rotaria should be confirmed, see below).

      The authors’ main idea (i.e. asexuality origin is key) seems mostly true when using homoeolog heterozygosity and/or composite heterozygosity which is not what most readers will usually think as "heterozygosity". This should be made clear by the authors mostly because this kind of heterozygosity does not necessarily undergo the same mechanism as the one described in Box 2 for allelic heterozygosity. If homoeolog heterozygosity is sometimes not distinguishable from allelic heterozygosity, then it would be nice to have another box showing the mechanisms and evolution pattern for such cases (like a true tetraploid, in which all copies exist).

      The heterozygosity between homoeologs is always high in this study while it appears low between alleles, but since the heterozygosity between homeologs can only be measured when there is a hybrid origin, the only heterozygosity that can be compared between ALL the asexual groups is the one between alleles.

      Both in the results and the conclusion the authors should not over interpret the results on heterozygosity. The variation in allelic heterozygosity could be small (although not in all asexuals studied) also due to the age of the asexual lineages. This is not mentioned here in the result/discussion section.

      6) Regarding the section on Heterozygosity structure in polyploids.

      There is inconsistency in many of the numbers. For example, A. vaga heterozygosity is estimated at 1.42% in Figure 1, but then appears to show up around 2% in Figure 2, and then becomes 2.4% on page 20. It is unclear is this is an error or the result of different methods.

      It is also unclear how homologs were distinguished from homeologs. How are 21 bp k-mers considered homologous? In the method section. the authors describe extracting unique k-mer pairs differing by one SNP, so does this mean that no more than one SNP was allowed to define heterozygous homologous regions? Does this mean that homologues (and certainly homoeologs) differing by more than 5% would not be retrieved by this method. If so, then It is not surprising that for A. vaga is classified as a diploid.

      The result for A. ricciae is surprising and I am still not convinced by the octoploid hypothesis. In Fig S2. there is a first peak at 71x coverage that still could be mostly contaminants. It would be helpful to check the GC distribution of k-mers in the first haploid peak of A. ricciae to check whether there are contaminants. The karyotypes of 12 chromosomes indeed do not fit the octoploid hypothesis. I am also surprised by the 5.5% divergence calculated for A. ricciae, this value should be checked when eliminating potential contaminants (if any). In general, these kind of ambiguities will not be resolved without long-read sequencing technology to improve the genome assemblies of asexual lineages.

      7) Regarding the section on palindromes and gene conversion.

      The authors screened all the published genomes for palindromes, including small blocks, to provide a more robust unbiased view. However, the result will be unbiased and robust if all the genomes compared were assembled using the same sequencing data (quality, coverage) and assembly program. While palindromes appear not to play a major role in the genome evolution of parthenogenetic animals since only few palindromes were detected among all lineages, mitotic (and meiotic) gene conversion is likely to take place in parthenogens and should indeed be studied among all the clades.

      8) Regarding the section on transposable elements.

      The authors are aware that the approach used may underestimate the TEs present in low copy numbers, therefore the comparison might underestimate the TE numbers in certain asexual groups.

      9) Regarding the section on horizontal gene transfer.

      For the HGTc analysis, annotated genes were compared to the UniRef90 database to identify non-metazoan genes and HGT candidates were confirmed if they were on a scaffold containing at least one gene of metazoan origin. While this method is indeed interesting, it is also biased by the annotation quality and the length of the scaffolds which vary strongly between studies.

      10) Regarding the use of GenomeScope2.0.

      When homologues are very divergent (as observed in bdelloid rotifers) GenomeScope probably considers these distinct haplotypes as errors, making it difficult to model the haploid genome size and giving a high peak of errors in the GenomeScope profile. Moreover, due to the very divergent copies in A. vaga, GenomeScope indeed provides a diploid genome (instead of tetraploid).

      For A. vaga, the heterozygosity estimated par GenomeScope2.0. on our new sequencing dataset is 2% (as shown in this paper). This % corresponds to the heterozygosity between k-mers but does not provide any information on the heterogeneity in heterozygosity measurements along the genome. A limitation of GenomeScope2.0. (which the authors should mention here) is that it is assuming that the entire genome is following the same theoretical k-mer distribution.

    3. Reviewer #1

      This paper addresses the very interesting topic of genome evolution in asexual animals. While the topic and questions are of interest, and I applaud the general goal of a large-scale comparative approach to the questions, there are limitations in the data analyzed. Most importantly, as the authors raise numerous times in the paper, questions about genome evolution following transitions to asexuality inherently require lineage-specific controls, i.e. paired sexual species to compare with the asexual lineages. Yet such data are currently lacking for most of the taxa examined, leaving a major gap in the ability to draw important conclusions here. I also do not think the main positive results, such as the role of hybridization and ploidy on the retention and amount of heterozygosity, are novel or surprising.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to Version 2 of the preprint: https://www.biorxiv.org/content/10.1101/497495v2

      Summary

      This paper addresses the question of whether there are distinct genomic features in animals that reproduce asexually. The authors examine a range of features in the genomes of 26 species representing 18 independent evolutionary origins of asexuality. The reviewers were unanimous that this is an interesting question, and find that exploring it in a broad evolutionary context is the right approach. However, they raised questions about biases in specific analyses that complicated their interpretation, and the extent to which the central claims can be supported without comparison to closely related sexual species.

    1. Reviewer #2:

      General assessment:

      The paper studies how facial expressions of proposers in a repeated ultimatum game affect decisions by responders. The paper makes three main contributions. First, responder's decisions are affected by the facial expressions of proposers. Second, the paper statistically compares the fit of several decision functions (utility functions). In the preferred model, the degree of inequity aversion of the responder depends on the facial expression of the proposer. Third, facial expressions of proposers correlate with pupil dilation of responders. The second contribution is the main contribution of the paper, as the first point has been shown before in many different economic games. I think that the second point - the modeling exercise - is interesting, but should be improved. Moreover, I think the experimental design has some important issues, which seem hard to address without collecting new data.

      Substantive concerns:

      1) One of the main selling points of the paper is that it studies iterative/repeated games instead of one-shot interactions. The authors seem to ignore (rule out) repeated game strategies however. This is understandable, given that analyzing the repeated game (with signaling) is complex, and beyond the point of the paper. More importantly, the statistical analysis ignores the dynamic nature of the game. From what I understand, in the analysis all data are pooled, both across participants and trials. Given this, I think the authors overinterpret the model, as the interpretation in the text is often dynamic (for example, on page 10, lines 254-255, but also in several other instances), whereas the statistical analysis is not.

      2) Given that facial expressions affect decision-making, it is no surprise that including facial expressions in the decision values improves the fit. The most interesting part (to me) of the modeling exercise is to determine how facial expressions are best incorporated in the model. The authors organized a kind of 'horse race' between several models to address this. But why select these models? The choice seems ad-hoc and could be better motivated. For example, the best performing model treats positive and negative deviations from neutral faces in the same way, whereas the emotion recognition task and the pupil dilation analysis suggest that participants treat positive and negative emotions differently. An arguably simpler model would be one where more positive emotions lead to a higher weight on the other's payoffs. In sum, it would be good to better motivate which models are included (or not), and perhaps include several other competing models.

      3) Another interesting feature of the modeling exercise is that it can help to quantify the relative importance of facial expressions. The best performing model predicts 86% of the decisions correctly. To judge whether this is a lot or a little, it would be good to report the accuracy of competing models (e.g. self-interest or 'standard' inequity aversion without facial expressions). It would also be helpful to report the log-likelihood and BIC for each model. Reporting all this (for all models) would help to understand the added value of facial expressions.

      4) In the experiment, participants are given explicit instructions on how to make decisions (page 23, lines 644-654). I think this is a poor design choice if you study how people make decisions.

      5) The sample size is rather small (n=44). Moreover, almost half (21 out of 44) of the participants are told to be playing against a computerized strategy, although the authors note that this did not affect decisions. I do not understand the reasons why it was not possible to match people with a confederate (page 22). Given that the study uses deception, it seems easy enough to always tell people that they are playing with a real person, but perhaps I miss something. Additionally, it is unclear what 'playing against a computerized strategy' means here. Are participants told that their decisions affect someone else's earnings? This seems crucial for social preferences to have a bite.

      6) In the experiment, the proposers' expressions and offers are a function of the history of the game (responders do not know this). This makes it hard to identify if responders really respond to the expressions on the pictures, or if they respond to other factors in the history of the game, such as previous earnings or previous offers. For example, Figure 4 shows that responders' decisions are affected by the offer in the preceding trial (n-1). However, as the offer in trial (n) is a function of the offer in trial (n-1), this could simply pick up the effect of the current offer (n).

    2. Reviewer #1:

      The authors use an iterative ultimatum game to show that the proposer's facial expression, as well as the offer amount, influence human choice behavior. In particular, it is suggested that a proposer's facial responses to a participant's decisions specifically modulate the negative influence of perceived inequality on decision values. The combination of a game theoretic behavioral choice paradigm with computational cognitive modeling and a physiological arousal measure is appealing. I do, however, have some major concerns with novelty and interpretability, listed below in order of importance.

      1) It is not particularly surprising that participants are more willing to accept an advantageous inequality if the proposer signals, with a smile, that it pleases them (or, conversely, less willing to accept if the proposer signals discontent), particularly in light of previous work having already shown that both advantageous and disadvantageous inequalities are more frequently accepted if the proposer is smiling than if the proposer looks angry (e.g., Mussel et al., 2013). The addition of pupillary data could have added a fundamentally different dimension to such findings; however, since pupil size could not be significantly related directly to model-based decision values (please make this null effect more salient to the reader, unless I have misunderstood it), the choice data and physiological measure seem disconnected, which weakens the impact of each.

      2) The authors argue that the ecological validity of previous work assessing the influence of facial expressions on UG decisions (e.g., Mussel et al., 2013) was limited by the use of non-contingent affective stimuli in independent, one-shot, games. It could be argued, however, that the response-contingent affective and monetary feedback used in the current study threatens construct validity, by conflating game theoretic strategizing with basic reward learning. This is particularly problematic since the computational models lack a representation of learning, or any incorporation of feedback over trials, in spite of such information being shown to profoundly influence acceptance decisions in model-free analyses. Given the overall emphasis on changes in participants' behavior across trials, it is important to formally characterize those learning curves, using reinforcement learning or some other relevant computational framework.

      3) It appears that a parabolic modulation was considered for the inequality term, but not for the self-reward term. Given the dramatic improvement in model-fits across exponential and parabolic modulations of the inequality term, it would be interesting to see the performance of a model that includes parabolic modulation of both self-reward and inequality.

      4) Given the apparent difference in affective modulation of advantageous vs. disadvantageous inequality, the exclusive focus on advantageous inequality in the discussion of model-based analyses makes it difficult to map modeling results to potential underlying psychological constructs (also, it is unclear how results from separately modeled advantageous and disadvantageous inequalities were integrated during model selection).

      5) Another difficulty with data interpretation is the absence of a comparison across different total amounts (e.g., 200 out of 1000 vs. 200 out of 300). It seems to me that the constant total (of 1000) may have unduly focused participants on the inequality, over self reward.

      6) "This indicated that participants' affective biases were more prominent for negative emotions, causing them to under-estimate the severity of negative affective displays". It is unclear from the methods whether asymmetries in the rated valence of facial expressions reflect a bias on the part of participants, or a limit on the confederates' abilities to simulate a range of negative expressions.

      7) "After excluding six extreme outliers [...]" Please account for the methods and effects of outlier exclusions.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      There was consensus among the reviewers that this paper addresses an interesting and important question of how social, affective and economic variables are formally integrated in strategic decision-making. However, the absence of a model-based account of how repeated game strategies and learning processes were shaped by the transition probabilities was a major concern, as was the lack of coherence between decision-making and pupillary effects.

    1. Reviewer #3:

      The manuscript reexamines AMPK-deficiency in the T cell compartment using mixed bone marrow chimeras, to show that T cell cell expansion (and effector functioning) both in vitro and in vivo is compromised by AMPK deficiency, that this is despite any effect of this deficiency on early events during TCR signalling, and that ROS scavenging ameliorates these defects to some extent. While the data are interesting, they remain incremental at this point, since a role for AMPK in the functioning of the T cell lineage has been shown previously (including by the authors), as the authors cite. The potentially novel nuanced observations the authors report in the present manuscript are not accompanied by novel mechanistic insights as yet.

      The competitive bone marrow chimeras show the relative reduction of the AMPK-deficient genotype in the effector-memory T cell compartment, as would be predicted by previous literature. The more robust lack of AMPK-deficient T cells in the CXCR3-expressing subset and in gut lymphoid tissues is interesting, but no further mechanistic insights are offered into how AMPK specifically affects commitment to and or/survival in this compartment.

      Similarly, the authors show that, interestingly, AMPK-deficient T cells show much poorer homeostatic proliferation, in a number of models of such proliferation. The authors connect this deficit to increased mitochondrial turnover and to the generation of ROS in the absence of AMPK. Once again, these are potentially interesting data. However, the causal connectivity claimed between the mitochondrial phenotype and the homeostatic proliferation defect is not well supported by the data, which consists only of a partial pharmacological rescue by a ROS scavenger in vitro. Further, there are no data indicating any explanation for this apparent distinction between initial cognate activation-induced proliferation and homeostatic proliferation.

      Therefore, while this is a sound incremental manuscript of utility to the field, it does not as yet provide sufficient breadth of interest for a cross-disciplinary readership.

    2. Reviewer #2:

      The manuscript by Anouk Lepez and colleagues examines the importance of AMPK in long term T cell fitness and proliferation and concludes that although AMPK is dispensable for early TCR signaling and short term proliferation it is required for sustained long-term T cell proliferation and effector/memory T cell survival. The authors demonstrate that AMPK aggravated the severity of graft vs host disease and mechanistically proposed that AMPK enhanced the mitochondrial membrane potential of T cells to limit ROS production and associated toxicity. As the authors acknowledge, previous work on AMPK has shown that its absence does not affect T cell proliferation, however earlier work has also established that absence of AMPK affects GVHD (Beezhold, K et al, Blood (2016) 128 (22): 806) and that AMPK maintains homeostasis through regulation of Mitochondrial ROS (Rabinovitch et al, Cell Rep 2017 Oct 3;21(1):1-9. Current work does not add any additional mechanistic insights to the already known functions of AMPK. Authors, however, have an interesting finding in the reduced population of gut lamina propria and intra-epithelial compartment but did not examine the outcomes of such defects.

      Major Concerns:

      1) AMPK was previously found to be dispensable for the generation of effector T cells (cited papers 15,16). Please expand on the reasons for differing results of this paper. Similarly, in vivo experiments have found AMPK-/- T cells to be largely immunocompetent (cited paper 17). The authors' focus seems to be on homeostatic expansion but it is not clear what the importance of the requirement of AMPK for homeostatic proliferation is. Additionally, if Lamina Propria and IEL compartments are most affected when AMPK is absent in T cells, what is its outcome on gut immunity? Authors fail to examine this.

      2) Much of the data presented in many of the figures is derived data presented as proportions or ratios of AMPK-KO to WT T cells.

      3) The GVHD data presented in figure 3 makes the point that absence of AMPK reduces the severity of GVHD. Is this due to defective cytokine production/defective division/defective survival of transferred cells? Moreover these findings were already published in Blood in 2016.

      4) The in vitro data do not substantially add to the author's point that homeostatic proliferation is defective in the absence of AMPK.

      5) With regards to mitochondrial fitness, this was demonstrated in fibroblasts in the paper published in Cell Reports in 2017. Although it is interesting that AMPK has conserved properties in fibroblasts and T cells, this is not a conceptual leap.

      6) The final figure in the paper has major caveats.(Figure 7H,I) Rescue of T cell proliferation in the presence of ROS scavenger. This experiment should be extended to show if the ROS scavenger rescues other defects like priming in the IL7+DC condition, IFNg production, Cxcr3 expression, GVHD pathogenicity.

    3. Reviewer #1:

      The study by Lepez et al, investigates the requirement for the metabolic sensor AMPK in the T-cell lineage. The analysis builds on genetic ablation that results in functional deficit of AMPK in the lineage to assess cellular response in homeostatic conditions, in response to antigen and in an in vitro cell culture system. The experiments are well executed and generally carefully controlled. The cell culture system allows the interrogation of mechanistic underpinnings at the cellular level in vitro and can be coupled with the validation of predictions in vivo.

      AMPK regulation of cellular ROS homeostasis is one of the main outcomes reported in this work. However, the data supporting the latter are somewhat preliminary. Overall, in my view this work offers some advance on current knowledge but sufficient mechanistic insight is lacking at this juncture.

      Concerns:

      The experiments connecting AMPK signaling and ROS homeostasis are interesting but the evidence that ROS toxicity is inhibited by AMPK is largely correlative.

      Nutrient sensing modalities are undoubtedly affected in AMPK deficient cells and the implications of these for ROS homeostasis are not evident in the analysis or discussion. For instance, AMPK control of redox regulation by the maintenance of cellular NADPH (Chandel's group) has been described and is a potential target that could be assessed in T-cells.

      In Figure 7D, the WT cells show 70% mortality and the KO ~90% with differences maintained in the dose response analysis (S11). An important control would be the demonstration that (WT) cells are protected following treatment with an anti-oxidant/ scavenger. Further, does modulation of AMPK in WT cells - activation or inhibition - replicate the results seen with WT and KO cells?

      The inclusion of another ROS perturbation such as mitochondria-targeted MitoParaquat will strengthen the assessment of differential susceptibilities in the survival/ ROS toxicity assays.

      Given the rich literature on ROS regulation of T-cell function, the identity and characterisation of the ROS component[s] regulated by AMPK is necessary. This is relevant, as not only are there several sources of cellular ROS, their requirements are thought to be distinct in T-cell subsets.

      Finally, the data presented do not account for the differential requirement of AMPK in T-cell subsets, which appears to be a major objective of the study. The conclusions of the study would be strengthened with an effort that establishes the identity of the ROS component and its interaction or regulation by AMPK.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The manuscript examines the importance of AMPK in long term T cell fitness and proliferation and concludes that although AMPK is dispensable for early TCR signaling and short term proliferation, it is required for sustained long-term T cell proliferation and effector/memory T cell survival. The authors demonstrate that AMPK aggravates the severity of graft vs host disease and propose that AMPK enhances mitochondrial membrane potential in T cells to limit ROS production and associated toxicity and that ROS scavenging ameliorates these defects to an extent. However, causal connectivity claimed between the mitochondrial phenotype and the homeostatic proliferation defect is not established. The competitive bone marrow chimeras show the relative reduction of the AMPK-deficient genotype in the effector-memory T cell compartment, as predicted by previous literature. The more robust lack of AMPK-deficient T cells in the CXCR3-expressing subset and in gut lymphoid tissues is interesting, but no further mechanistic insights are offered into how AMPK specifically affects commitment to and or/survival in this compartment. Previous work on AMPK has shown that its absence does not affect T cell proliferation and also established that absence of AMPK affects GVHD (Beezhold, K et al, Blood (2016) 128 (22): 806) and that AMPK maintains homeostasis through regulation of Mitochondrial ROS (Rabinovitch et al, Cell Rep 2017 Oct 3;21(1):1-9. AMPK regulation of mitochondrial fitness, is previously demonstrated in fibroblasts (Cell Reports in 2017), and sufficient insight in constraining T-cell function is not provided. While the experiments are well executed and carefully controlled with several potentially interesting new observations, the study does not provide a sufficient advance to current knowledge or offer novel mechanistic insights into AMPK signalling in the mature T-cell compartment.

    1. Reviewer #3:

      General Assessment:

      The paper investigates the recovery of neurocognitive function after general anaesthesia, a topic of clinical and scientific interest, and not well investigated to date. It's concisely written and its conceptual structure easy to follow.

      The study is well controlled, and uses a wide range of neurocognitive tests to assess different aspects of cognition.

      The main findings, that executive function recovers before other potentially more basic aspects of cognition, supported by a similarly early return of frontal cortical dynamics, and essentially unperturbed sleep-wake cycles, suggest neurocognitive resilience to general anaesthesia with isoflurane in healthy individuals.

      These findings are novel and, although cannot be generalised beyond anaesthetic agent isoflurane, will be of interest to clinical anaesthesiologists, healthy individuals undergoing isoflurane-based general anaesthesia, and researchers investigating the relationship between consciousness and cognition.

      Major Comments:

      More in-depth and critical description of cognitive functions investigated, and of motivation for hypothesis is needed in the introduction.

      -First, the reason for hypothesised recovery sequence of cognitive functions is unclear. E.g. it's unclear that attention and executive functions are at opposite ends of this proposed hierarchy, or if so what type/aspect of 'attention' is investigated. Similarly scanning and tracking does not refer to a cognitively- or psychologically-motivated distinct function (top of page 5).

      -The link between cognitive functions and tests used to assess these is unclear in this section. E.g., 5 functions are linked to 6 behavioural tests.

      -The descriptions in the methods section do not help to clarify the relationships; e.g., the Motor Praxis Task (MP) task linked to complex scanning and visual tracking in the introduction, is described to measure sensorimotor speed. Similar concerns apply to the others.

      Details of analyses and results are hard to follow and need to be made more transparent, and comprehensive.

      -Results described in the two paragraphs of page 9 do not match those summarised on Table 1, as suggested. Is this a case of mistaken table, or is this table capturing other results? If so, results in page 9 need to be summarised in table form.

      -Results in 2nd paragraph of page 7 are very scantily described, and a summary table with full disclosure of test statistic values is needed.

      -Figures 2 and 3 lack signposting of statistical significance, a missed opportunity given the rich information provided. E.g., it's impossible to visualise when performance in each task reconstitutes, or matches control level.

      -While AM is showcased, it would be useful to learn about the relative timing of baseline recovery of the other tasks (& related cognitive functions) to one another, to fully evaluate the reconstitution of the proposed cognitive hierarchy.

      -Similarly, more transparent Bayesian analyses results would be helpful. As it stands, the figures do not convey well the type of analyses performed, nor do they give sufficient statistical details.

      -The lack of these details make it hard for another team to attempt to replicate these tests and results, as depicted in the paper itself.

      -Additional info can be placed in SI.

      More context and critical analyses is needed on the interpretation of the main finding, of executive function (based on performance of abstract matching (AM) task) reconstitution after loss of consciousness.

      -In page 5, the authors state the isoflurane is used because of slower offset relative to other anaesthetics, that would allow observation of differential recovery of function. This suggests slower recovery, than with other, more commonly used agents for anaesthesia studies, e.g. propofol. However, in page 13 the authors suggest that residual isoflurane levels are predicted to be 1-4 times lower than hypnotic agents, e.g., propofol, used in other studies where early recovery of executive function was not observed, therefore accounting for robust return of cortical dynamics in the current study. These statements appear to be in contradiction.

      -It's worth considering whether task differences serve as confounds that drive the early recovery of performance in the AM task, e.g., stronger salience, more engaging etc.

    2. Reviewer #2:

      In this work cognitive assessment after isoflurane anaesthesia shows that several cognitive domains are impaired in speed of response and accuracy but dynamics of recovery are not the same for all domains. Specifically, tests related to executive functions recovered faster than others, against the authors' expectations.

      These results are important as they help to understand the dynamics of recovery of the cognitive systems after being challenged pharmacologically. The dynamics of a complex system (the brain) coming back to functioning in full is assessed both cognitively and neurally.

      I think this paper requires some clarifications, some more analyses and further discussion. One important result is the assessment of the dynamics of cognitive recovery after unconsciousness and its parallels with local and global complexity measures. As I was reading the paper I thought there would be a combined analyses to address the dependencies between complexity measured before, in unconsciousness and ROC to the behavioural outcomes. How does the level of complexity before even getting sedated or the complexity reached during unconsciousness influences the degree or speed of recovery? Please let me know if this sounds too post-hoc for you since it feels like an important and meaningful question to pose to the data for me.

      Am I correct in interpreting that you have calculated the LZC over the global topography? It would be important to clarify this point, differentiate from the other variant, and reflect that in the theoretical interpretation to avoid misunderstanding and subsequent unnecessary criticism. Two different variants of LZ complexity have been described: one that quantifies local, channel-wise complexity (LZS/LZSUM) and one that quantifies the complexity of the global topography of the scalp over time (LZC). These two variants appear to occasionally track different aspects of consciousness (Comsa 2018, thesis and Schartner et al., 2017). Specifically from Comsa's thesis "To compute the Lempel-Ziv complexity of EEG data, the concatenation of a signal consisting of channel values over time can be performed either channel-by-channel or observation-by-observation, where an observation consists of the values of all channels at a single point in time. The interpretation of the two complexity flavours is slightly different: the former case reflects the local, temporal signal diversity in individual channel values over time, whereas the latter captures the spatial diversity of the global landscape of neural activity. In some of the above studies, a different flavour appears to have worked best in different contexts: for example, the spatial variant in anaesthesia (Schartner et al., 2015), and the temporal variant in psychedelic states (Schartner et al., 2017). These different interpretations have not been thoroughly explored so far and it is not clear which variant best fits with the original theoretical framework that indicates neural information diversity as a key element for the emergence of consciousness".

      It would be a good idea to ask the question of no differences between cognitive scores before isoflurane and after several hours (three hours?), and compare to the control group in a statistically robust manner. If the aim is to claim full return-to-normal then a test to trust the no-difference would offer the answer. Please consider a statistical model that allows you to test the "return to normal" of cognitive capacities appropriately, maybe a Bayesian framework like the NLMM used but including some measure of the trust in the no-differences. It may be that the authors consider the CI values enough, in that case please express the results in terms of strength of these?

      I think a rerun of the stats asking for the effect size or bayes factor or any other parameter that would allow for an impression of the strength of the effect would go a long way in interpreting the results. Currently there seems to be a reliance on the p value (in the text), that does not reflect the strength of an effect.

      Further to this, supplementary material with the single subject dynamics of recovery would paint a true picture of the variance and variability of the results. We have gained great insight about the differential impact of sedatives in the last few years in the transition of consciousness. Here a couple of examples:

      https://www.pnas.org/content/110/12/E1142 https://www.sciencedirect.com/science/article/pii/S1053811920301142#bib68 and even one of our own https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004669

      In particular you might want to take a look at a recent reanalysis of our data of mild sedation by Bola and collaborators (https://www.biorxiv.org/content/10.1101/444281v2.full ) where they analyse the eeg using measures of diversity and complexity that are particularly relevant for the interpretation of your results.

      In the discussion there is the need for a section where the theoretical justification for the use of PE and LCZ. How is this better or complementary to power, connectivity and other measures used in EEG to discuss consciousness and sedation needs to be addressed so the readers get a more contextualised picture of why using these measures may shield better results, why they may be better for interpretation of the loss, maintenance and recovery of consciousness.

    3. Reviewer #1:

      This study examines the impact of general anaesthesia on cognitive function and, in parallel on a set of EEG indices. In particular, the authors seek to establish the order in which different cognitive abilities are recovered as consciousness is restored. One group of volunteers were placed under general anaesthetic (isoflurane) for 3 hours while a comparison control group participated in active walking during the corresponding period. Both groups then undertook cognitive testing at 30 minute intervals for 3 hours. The results suggest that, contrary to the authors hypotheses, executive functions were the first to recover and this was accompanied by a restoration of frontal EEG dynamics.

      Overall I think this is a potentially valuable study of interest to the field however my current enthusiasm is dampened by a number of apparently major issues.

      First and foremost is that the author's do not clearly define or operationalise the term 'recovery'. A Bayesian regression approach is described in the Methods section but the information provided does not explain to me how recovery is defined or established. As the authors themselves note, the potential for practice effects to confound any recovery estimates is a critical concern here and I remain to be convinced that it has been addressed.

      Relatedly, there is the concern that these cognitive tests may differ quite markedly in their difficulty for potentially trivial reasons. I do not see any analyses that would address the possibility that some tasks may simply be more sensitive to cognitive perturbations than others e.g. if performance is close to or at ceiling in the control group.

      The EEG analyses are potentially interesting too but the authors do not provide any rationale for focussing in on these particular metrics. In addition, the fact that the EEG trends are never linked to the cognitive ones limits the conclusions that can be drawn here.

      On the more minor side, the authors do not provide any rationale for their starting hypotheses. Their prediction that vigilance would be the first function to recover is not at all intuitive for me. Can the authors cite previous literature to back up this prediction?

      In addition, if the authors prefer to position the Results before the Methods then they should ensure that there is sufficient detail in the Results to allow the reader to understand the experiment. For example, they should not have to read the Methods to be told that there were two separate groups and that the control group exercised prior to cognitive testing.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Redmond G O'Connell (Trinity College Dublin) served as the Reviewing Editor.

      Summary:

      The three reviewers agreed that the paper reports results that are important as they appear to offer novel insights into the dynamics of cognitive recovery following loss of consciousness; an area that has been relatively under-investigated to date. However all three reviewers also highlighted some significant concerns regarding aspects of the study rationale and methodology.

    1. Reviewer #3:

      This is a potentially interesting work that addresses a key question in the temporal cognition field: how perceived duration is represented in the human brain. I found the manuscript well written, the methodology used sound. Analysis-wise, the authors make a big effort to model the fMRI data in several ways. They even use an artificial network model to show that via accumulation of salient events it is possible to mimic human duration perception.

      Despite this big effort though I found the results and a few aspects of the analysis not entirely convincing.

      Below I list my comments:

      1) The authors talk about salient events and accumulation of them. But what are these events? Are they moving objects, changes of edges or luminance? I feel that a better characterization of the visual properties of the stimuli is missing here. This information is important also to better understand the events underlying the BOLD change. According to the authors, perceived time is a function of the BOLD changes associated with these events. It is therefore crucial to tell what these events actually are. Can we consider eye movements salient events?

      2) The authors record eye movements but as far as I read in the manuscript they do not incorporate this information in any of the analyses. Do eye movements correlate with the predicted bias and/or with the human bias?

      I think the results would greatly benefit from a better specification of the type of events leading to brain changes and consequently to duration perception.

      3) I found it puzzling that BOLD changes in auditory and somatosensory cortices predict physical time. How is this possible? Is there a brain area where physical duration cannot be predicted?

      4) A bit disappointing is the lack of differences in predicting perceived time of the different visual layers. The result suggests that any accumulated change in visual cortex activity leads to perceptual bias. I think it is very unlikely that different parts of the visual stream contribute in the same way to duration perception.

      5) The model prediction works for the two algorithms used to quantify BOLD changes. If I understand correctly, we cannot tell whether it is a difference in change or it is the change itself that leads to duration bias. I found this aspect of the results also not very informative.

      6) In how many subjects was it possible to actually predict perceived duration from BOLD activity? A clearer picture on how the model works in individual subjects would be more convincing.

    2. Reviewer #2:

      Sherman et al seek to understand the basis of human time perception using a combination of psychophysics, computational modeling, and fMRI. This work builds on previously published work by the same group (Roseboom, Nature Communications 2019) showing that integrated changes in the state of (a) deep image classification network(s) during the presentation of movies predicted aspects of human timing reports. In that study, similar to what is shown in the current manuscript, timing biases were found in human behavior for different movie scene types, for example, city, natural scenes, or offices. Interestingly, similar biases were found in the timing estimates produced by their integrated deep network state change procedure. They interpret these findings as evidence that estimates of duration are derived from changes in the state of perceptual networks, in this case presumably those involved in visual perception. I find this previous work to be an important contribution toward understanding how the brain constructs information about a fundamental dimension of the environment for which there are no obvious sensors.

      In the current study, the authors repeat many of the steps contained in the previous publication, but in the context of humans estimating the duration of silent movies while positioned in an MRI scanner. They compute BOLD signals during movie viewing using a set of techniques I am not intimately familiar with because I do not use MR to assess brain activity in my own research, but which seem standard from what I can tell. They then treat the voxel by voxel BOLD measures similarly to the manner they did nodes in the deep network, and show that estimates derived from visual cortices may correlate with human biases and effects of scene type, but not those estimates derived from voxels in auditory or somatosensory cortices. While I have some technical questions, I find the work to be overall well reasoned and clearly presented. My major issue with the paper has to do with the fact that given their previous publication already showed that human behavior exhibits timing biases that correlate with the rate of change in visual scenes, and what we know about the localization of modality specific sensory function in cortex, it would be worrying if they could not derive time estimates from a measure of neural activity in visual cortex. It seems that the core hypothesis they are testing has to do with whether one can extract a measure of change in visual scenes from BOLD signals recorded in the visual cortex. Finding that one can indeed do so doesn't seem particularly surprising and thus represents a relatively incremental advance relative to what was known before. In terms of novelty, what we are left with then is the observation that the use of different metrics on BOLD changes per voxel to estimate elapsed time differ with respect to their ability to reproduce timing biases by scene type. However, clarification is needed regarding how they compute these metrics to fully assess the importance of these differences.

      The authors state that they compute Euclidian distance between voxel activations from TR to TR. However, it looks like they are computing the L1 norm of the differences, or the manhattan/city block distances. Which is it?

      Why should the sum of signed differences provide a different result? Is it that in the distance measurement, noise is accumulated in the measure over voxels whereas in the signed difference this noise is canceled out by averaging? Some amount of intuition would be helpful.

      Writing level comments:

      4) Regarding the framing and discussion of the experiments, I am not sure why the authors see their results as incompatible with and not complementary to some of the existing proposals for time encoding in the brain. For example, the impact of sensory change on responses in perceptual networks might very well have an influence on dynamics of downstream neural populations, potentially through neuromodulators, so I don't see the obvious incompatibility. This is not to say that the authors are not addressing an important problem, namely why does sensory change bias timing reports.

      For example, I think this statement is a bit inaccurate and unnecessary:

      "...This end-to-end account of time perception represents a significant advance over homuncular accounts that depend on "clocks" in the brain. "

      5) I wouldn't say their work represents an "end to end" account of time perception, and certainly not an end to end account of the behavior they are studying. What happens in more naturalistic situations where people are moving, and taking in other sensory modalities? How does this time perception information get transformed into the behavioral report of individuals, for example? The authors don't need to over-reach for the work to be interesting. The authors would also seem to be implying that the previously cited studies assume a specialized clock somewhere, where in fact Tsao et al and Soares et al at least are explicitly saying the opposite, and from my perspective the field views the idea of explicit "clocks" as a bit antiquated, and rather that timing is an emergent property of the functions that neural circuits are optimized to perform... an idea that seems compatible with the authors' work.

    3. Reviewer #1:

      In this manuscript, Sherman and colleagues present videos of natural scenes and measure the fMRI responses of visual cortex. The addition of fMRI data aims to link both perceived duration and neural network activity differences to a common neural substrate, the sensory cortex. The authors propose that this therefore shows "the processes underlying subjective time have their neural substrates in perceptual and memory systems, not systems specialized for time itself". I generally appreciate the aim of providing an integrated account linking duration perception to specific neural substrates, and moving away from non-specific clock models. I also appreciate the pre-registration and open science principles throughout the manuscript. However, the fMRI results described here are unsurprising and can be seen as replicating other recent findings (outside the field of timing).

      Furthermore, the links between (previously described) deep network results and the fMRI results are unconvincing. Finally, a lot is made of the role of predictive coding, but no role is convincingly demonstrated as there is no attempt to distinguish this from differences in low-level features between stimuli.

      1) The hypothesis that office and city videos produce different response amplitudes in early visual cortex is consistent with the difference in their perceived duration, but these videos seem likely to differ in many low-level properties. Most obviously, they are likely to differ in temporal frequency and the duration of events they contain. The manuscript proposes the difference in their response reflects surprise or prediction error. But this proposal is not tested. Recent studies using entirely predictable stimuli that differ in event frequency and duration (Stigliani, Jeska, & Grill-Spector, 2017, PNAS) show that these low-level features strongly affect the response of early visual areas.

      2) Similarly, a difference between network states on consecutive frames also seems likely to reflect the frequency of changes, regardless of whether these are regular and predictable or irregular and unpredictable. Again, no effort is made to distinguish between event frequency and predictability.

      3) In the conclusion, the main conceptual contribution of the manuscript is described as follows: "we have taken a model-based approach to describe how sensory information arriving in primary sensory areas is transformed into subjective time." The abstract contains a similar statement: "providing a computational basis for an end-to-end account of time perception". I appreciate the attempt to introduce a quantitative model-based approach, but the network model proposed doesn't even attempt to be biologically plausible. As such, it cannot "describe how sensory information arriving in primary sensory areas is transformed into subjective time". Specifically, the measure of Euclidian distance between network states in a feedforward network that analyses each frame independently is clearly not biologically plausible. Neural systems don't make such calculations. Instead, this represents a mathematical abstraction of more complex recurrent processes that are not included in the model. As a result, this conclusion (and similar statements elsewhere) seems to overstate the conceptual advance. To me, the results instead confirm that subjective time, sensory cortex activity and deep network activity are affected by sensory stimulus content.

      4) The framework linking the fMRI response of early visual cortex to neural network simulations is primarily a larger response of both to busy city scenes than office scenes. In both data sets, this difference is unsurprising and has been shown in previous comparisons of various quickly and slowly changing stimuli (for fMRI) and these exact scene types (for neural networks). But as the fMRI response amplitude difference is based on a binary comparison, any number of explanations could be given for why the two responses change in the same direction. An unexpected and quantitative shared effect would convincingly link the two effects seen, but an expected and qualitative change in the same direction does not.

      5) The analysis that looks for correlated differences in fMRI responses and subjective duration perception within a scene type (from line 300) is more convincing that sensory cortex responses are linked to subjective duration. However, this analysis does not link fMRI responses and deep network responses, and again changes in both fMRI responses and subjective duration are already known to reflect low-level features like visual motion and event frequency. So it's unclear whether differences in video properties (within the same class) underlie the correlated differences between fMRI responses and subjective duration, and whether the deep network models predict such effects.

      6) The word 'time' is used throughout the manuscript in a very general way. Time is a broad concept, with many different aspects and scales, from sub-second to circadian to seasonal. This study's scope does not include most of these aspects and scales, so the use of this general term 'time' overstates the broadness of the findings. Here it is used to mean 'duration in the tens of seconds'. Please specify more precisely what you mean.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      Summary:

      The reviewers appreciated the approach of your study, both in terms of the theoretical framework and in terms of the methodology. However, the reviewers were not convinced that the presented results reveal convincing evidence for neural substrates of perceived event duration. They noted that there are several alternative explanations for the effects observed, reflecting uncontrolled differences between events that are known to drive visual cortex activity (e.g., in low-level features, rate of change, or eye movements).

    1. Reviewer #3:

      This is the largest study of DNA methylation differences in the blood of controls and patients with psychosis, performed in a sample of 4,483 participants. As is predictable, the authors found significant differences in measures of blood cell proportions and smoking exposure in patients with psychosis compared with controls, and in patients with schizophrenia with clozapine treatment compared with other patients. They also detected differentially methylated positions in such comparisons. The authors have employed an appropriate methodology to search for schizophrenia- and psychosis- associated methylation changes, and the manuscript is interesting and well-written. However, I think a more extensive analysis may increase our insight about DNA methylation differences in schizophrenia, and is therefore necessary.

      1) An important question is whether the methylation differences are pre-existing the disorder or a consequence, an epiphenomenon of the disorder. The fact that the authors detect a higher number of DMPs when they exclude individuals with first episode psychosis from their analysis could suggest that the methylation differences are not present before the onset of the disorder. However, the authors have the resources and the ability to better answer this question. For example:

      1a) I think they should report in a separate section the results in the two samples of FEP individuals compared with age-matched controls. Can they identify any FEP-specific DMP?

      1b) Also, I think they could try to integrate their data with other blood methylation datasets, to see whether the DMPs associated with psychosis/schizophrenia have been associated with environmental risk factors associated with schizophrenia. For example, the authors could check the overlap of the DMPs with blood methylation changes associated with gestational age (PMID: 32114984; this work contains references to other studies that may be useful too). Data on methylation and cannabis or other environmental factors, if available, may be useful too.

      1c) The authors could also explore, in patients and controls, the relationship between age and methylation of the DMPs. An increase of the differences between patients and controls in older ages would suggest that the methylation differences are related to factors that are secondary to the disorders, while the presence of methylation differences at younger ages could suggest the opposite. Analyzing the interaction between methylation and age on case-control status could be an alternative way to answer this question.

      2) Sex is an important biological variable that the authors could analyze more extensively, considering that being male is a risk factor for schizophrenia, and is associated with a different epigenetic regulation. The authors have already the statistics to analyze whether the psychosis/schizophrenia-associated DMPs are also associated with sex. Moreover, they could analyze the interaction between methylation and sex on case-control status and/or perform analyses stratified by sex.

      3) The authors did not find association of schizophrenia with age acceleration. However, a recent study has performed a comprehensive analysis of 14 epigenetic clocks categorized according to what they were trained to predict: chronological age, mortality, mitotic divisions, or telomere length. I think it is relevant that the authors try to validate and perhaps extend the findings of Higgis-Chen and coll. ("Schizophrenia and Epigenetic Aging Biomarkers: Increased Mortality, Reduced Cancer Risk, and Unique Clozapine Effects", PMID: 32199607).

      4) Adjustment: I have not found any clear information about ethnicity/race. I assume the samples were mainly composed by white Caucasians. Did the authors perform any adjustment for ethnicity/race or population stratification? Also, were principal components of negative control probes included as covariates?

      5) Replication: was there any replication at the level of DMP in the data from Montano et al.? Also, if many DMPs are under genetic control, we should expect an overlap between DMPs in blood and brain of patients with schizophrenia. Have the authors analyzed such overlap?

      6) I think the authors should be more cautious in interpreting the clozapine data. They write: "Studies have also shown that higher neutrophil counts in schizophrenia patients correlate with a greater burden of positive symptoms (Núñez et al., 2019) suggesting that variations in the number of neutrophils is a potential marker of disease severity(Steiner et al., 2019). Our sub-analysis of treatment-resistant schizophrenia, which is associated with a higher number of positive symptoms (Bachmann et al., 2017), found that the increase in granulocytes was primary driven by those with the more severe phenotype, supporting this hypothesis." Actually, the fact that TRS cases are characterized by a significantly higher proportion of granulocytes could be related a "recruitment bias": because clozapine administration is associated with a risk of agranulocytosis, clozapine is usually not prescribed to patients with low number of granulocytes. I think this possibility needs to be mentioned, unless the authors can exclude it.

    2. Reviewer #2:

      This is an important piece of work conducted to the highest standards of methodological rigour. By drawing together most case-control DNAm studies of schizophrenia in a single meta-analysis, this work will provide the most up-to-date information for some time, and is likely to generate a lot of interest.

      I think there are no critical methodological problems with the manuscript. Points for consideration include:

      1) The abstract details the (unsurprising) smoking results but lacks other findings, such as the GO analysis and the localisation of findings to previously associated GWAS loci.

      2) The authors could consider providing a DNAm-based predictor of SCZ/SCZ-resistance based on their dataset - to be tested in a series of leave-one-out analyses. In my opinion, this would provide further interest in the results, provide evidence of replication somewhat lacking from the current version, and could be used by others to test for SCZ/TRS prediction in their cohorts or for the purpose of PheWAS.

      3) There are a large number of findings reported with only a p-value given, and no effect size. In many cases, I think there's no reason that additional info couldn't be added.

      4) It's not sufficiently clear in the text how the effects of SCZ were disambiguated from TRS - when the latter group is nested within the first.

      5) Whether DNAm is a cause or consequence of liability to SCZ could be further examined in the paper - and I'm not sure why the authors have stopped short of further MR-based tests of this question.

      6) The correction for smoking is somewhat heterogeneous across studies ('smoking status'). If they were current non-smokers, was this recent? Further examination of whether reporting findings attenuate after inclusion of AHRR CpGs would provide greater confidence that some are not due to residual confounding. Alcohol and BMI are also likely to give rise to similar issues.

    3. Reviewer #1:

      This is a large study of multiple cohorts of individuals with schizophrenia and controls and comparing DNA methylation in blood samples. The main findings are replications of smaller studies. The purported goal is identification of a biomarker but the impact of medication effects on blood cell composition cannot be ruled out and therefore confounds any conclusions about future utility. The confirmation of heavier smoking in individuals with schizophrenia also seems of limited use.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This is the largest study of DNA methylation differences in the blood of controls and patients with psychosis, performed in a sample of 4,483 participants. This is an important piece of work conducted to the highest standards of methodological rigour. By drawing together most case-control DNAm studies of schizophrenia in a single meta-analysis, this work will provide the most up-to-date information for some time, and is likely to generate a lot of interest.

      As predictable, the authors found significant differences in measures of blood cell proportions and smoking exposure in patients with psychosis compared with controls, and in patients with schizophrenia with clozapine treatment compared with other patients. They also detected differentially methylated positions in such comparisons. The authors have employed an appropriate methodology to search for schizophrenia- and psychosis- associated methylation changes, and the manuscript is interesting and well-written.

    1. Reviewer #3:

      The author implemented a recurrent network with excitatory plasticity (from Clopath10) and inhibitory plasticity (from Vogels11) at all connections - both feedforward and recurrent. They showed that a model with inhibitory plasticity exhibits more diverse receptive fields (covering the different orientation preferences more uniformly) compared to a model without any inhibition but with plastic excitatory synapses. They showed that synaptic connectivity reflects tuning similarity. They then showed that inhibition helps decorrelation. In their model, inhibition sharpens tuning curves and helps to exhibit contrast invariance as well as promotes sparseness. Finally, they showed that their plastic model has a lower reconstruction error compared to a model without inhibition at all but similar to a model where inhibition is blocked after learning.

      Below is a list of questions/comments:

      1) The finding regarding receptive field diversity is probably the most novel part of the paper. It would be nice to dig into it a bit more. Does inhibitory plasticity or inhibition promote receptive field diversity? And what is the intuition behind it? Why?

      2) It would be good to discuss the various histograms of orientation preference reported in different experimental data and compare that to the model.

      3) The introductory paragraph of the results section does not contain enough information to understand the results. Without reading the Methods first, it is very confusing. In particular:

      -The 2:1 and 3:1 model variants are poorly explained. This comes from the different levels of \rho but how it is written, it seems to come from a difference in connectivity or the ratio between the numbers of E and I cells.

      -Noihn model: it should be noted that excitation is plastic.

      4) The authors report the correlation drop with and without inhibition (l120-130). Would it be possible to compare quantitatively to some experimental data where inhibition is blocked (e.g. optogenetically). And so, how much does this drop depend on the model parameters?

      5) Plasticity inhibition helps reconstruction error. It would be nice to elaborate further. In Fig 9a, surprisingly blockInh is doing very well. Why? I am not sure the statements in the text (regarding the role of inhibitory plasticity on the reconstruction error and encoding quality) are supported by the simulation results.

      6) I encourage the author to be more precise in the text: what comes from inhibition, which effect can you get with fixed inhibition (tuned or broad), what comes from plasticity inhibition, what has been shown before etc. For example, I compile a little list below that helps me putting things together:

      -Fig 3. synaptic connectivity reflects tuning similarity - Shown in Clopath10

      -Fig 4: Inhibitory strength influence the response decorrelation- Shown in Vogels11

      -Fig 5: Inhibition sharpens tuning curves - that's the classical iceberg effect. It works with fixed blanket of inhibition - e.g. Ben-Yishai 95.

      -Fig 6-7. Inhibition leads to contrast invariance. Same here, inhibition does not need to be plasticity, it works with blanket inhibition - e.g. Ben-Yishai 95.

      -Fig 8. Inhibition increases sparseness - Vogels11 inhibition plasticity leads to E/I balance with increased sparseness.

      7) The code should be made public.

    2. Reviewer #2:

      The authors introduce a computational model of the interplay between excitatory and inhibitory plasticity during development in V1. The analysis of the work is interesting; however, several assumptions have to be checked and a multitude of additional analyses is required to validate the conclusions.

      Major Comments:

      1) The model describes the dynamics during the development of V1. However, during development there are several phases, each having its specific properties and dynamics. For instance, van Versendaal and Levelt 2016 discuss that especially inhibition could have a critical and phase-specific role. Please discuss in more detail the relation of the model to the developmental periods or rather which period you model.

      2) In the model, the LGN has about twice the number of neurons compared to V1. However, experiments estimate that V1 has 40 times more neurons than LGN yielding a different type of projection. Please test the dynamics for a significantly larger V1. Furthermore, please test the dynamics resulting from a sparse connectivity between areas, as all-to-all connectivity is a very strong assumption.

      3) The authors neglect recurrent excitatory-excitatory connections. Please show at least the influence of non-adaptive recurrent excitatory connections on the results.

      4) In the model, the role of inhibition is mainly to constrain the neuronal activities, which can also be done by other homeostatic plasticity mechanisms. Would intrinsic plasticity also be sufficient? Also the role of homeostatic synaptic plasticity for V1 development has already been shown in other computational studies (e.g., Stevens et al., 2013; J. Neurosci.). Please discuss.

      5) In general, EI2/1 seems to be more efficient than EI3/1. What is the lower limit? Is an EI1/1 system even better? In addition, the reduction of redundancy could imply that the system becomes less robust against noise. Please test for different noise levels/sources and whether noise implies a lower bound.

      6) The authors discuss on Page 18 that the learning rates of the involved plasticity processes are important. However, they do not show any data. Overall, the parameter-dependency of the model remains unclear. Especially given that the parameters of inhibitory plasticity are not based on experimental data, these have to be investigated in more detail.

      7) The authors say that the receptive fields in the model are stable. Please show any data supporting this claim. Under which condition are the receptive fields stable?

      8) Is the model leading to any experimentally verifiable predictions?

    3. Reviewer #1:

      This manuscript details a modeling study used to understand how inhibitory plasticity shapes the emergence and structure of receptive fields in visual cortical networks. The work seems well-carried-out and the writing is clear.

      Major concerns:

      1) It needs to be made more clear in the manuscript how these results extend on what has been shown previously on the emergence of V1-like RF's in cortical networks. The new insight here is not apparent in the framing of the introduction. A somewhat more detailed answer to the question "How surprised should one be by these results?" particularly about the emergent gain adaptation, would be useful.

      2) It would be very good to see more comparisons between fixed inhibition and inhibitory plasticity in this work, especially since this is advertised in the title and abstract as the main thrust of the work. In the current draft, this is addressed only in Figure 9 but should play a more major role throughout the draft, to strengthen and emphasize the novelty of the work.

      3) Some amount of theoretical work to complement the simulations would strengthen the paper greatly.

      4) Comparisons to other plasticity models, to show what exactly is necessary for replicating the effects here seems very important, but under-explored.

      5) When speaking about metabolic costs of computation, it seems important to also discuss the size of the network and the maintenance of synapses, not just the average firing rate per cell. Some discussion of this should be included, or some of the claims in the intro/abstract should be softened.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This manuscript details a modeling study used to understand how inhibitory plasticity shapes the emergence and structure of receptive fields in visual cortical networks. The work seems well-carried-out and the writing is clear. The authors implemented a recurrent network with excitatory plasticity and inhibitory plasticity at all connections - both feedforward and recurrent. The results reveal that a model with inhibitory plasticity exhibits more diverse receptive fields (covering the different orientation preferences more uniformly) compared to a model without any inhibition but with plastic excitatory synapses. Synaptic connectivity reflects tuning similarity, and inhibition aids in decorrelation. In this model, inhibition sharpens tuning curves, helps to develop contrast invariance, and promotes sparseness. Finally, the manuscript shows that the plastic model has a lower reconstruction error compared to a model without inhibition at all.

      The reviewers found the results presented to be clear. The reviewers also thought that some new analyses should be done to shore up the results, and that writing revisions could be implemented to improve the flow of ideas for the reader.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      __*Review 1 Summary:

      __In this manuscript, Borah et al showed that Heh2, a component of INM, can be co-purified with a specific subset of nucleoporins. They also found that disrupting interactions between Heh2 and NPC causes NPC clustering. Lastly, they showed that the knockout of Nup133, which does not physically interact with Heh2, causes the dissociation of Heh2 from NPCs. These findings led the authors to propose that Heh2 acts as a sensor of NPC assembly state. *

      __Reviewer 1 major comment 1:__ The authors claimed that Heh2 acts as a sensor of NPC assembly state, as evidenced by their finding that Heh2 fails to bind with NPCs in nup133 Δ cells (Fig2, Fig 5). However, there is a possibility that the association between Heh2 and NPCs is merely affected by the clustering of the NPCs (as the authors discussed) but not related to the structural integrity of NPC.

      • *

      Our Response: We agree that this is a possibility, however, we ask the reviewer to also consider that we artificially cluster NPCs using the anchor away system (Figure 3C) and this does not affect Heh2’s association with NPCs. Thus, clustering per se is insufficient to disrupt Heh2 binding to NPCs. We will also make changes in the text to make this point.

      • *

      Reviewer 1 major comment 2: In addition, their data showing that the Heh2-NPCs association is not easily disrupted by knocking out the individual components of the IRC (Fig. 5A and 5D), also disfavor the idea that Heh2 could sense NPC assembly state.

      Our Response: There are three considerations here. The first is that as this is the first evidence of any kind of “NPC assembly state” sensor, it is difficult to make any assumptions as to what specifically such a sensor would be monitoring. i.e. perhaps sensing only the ORC is what is functionally important. Second, for obvious reasons, we only tested non-essential IRC nups so by definition there is inherent functional redundancy that maintains NPC function and thus there may be no need to “sense” anything in the absence of these IRC nups. Further (and last), the IRC is essential for NPC assembly. Thus, without an IRC there is no NPC assembly state to sense.

      Reviewer 1 major comment 3: Since some nup knockout strains, other than nup133 Δ, are also known to show the NPC clustering (ex. nup159 (Gorsch JCB 1995) and nup120 (Aitchison JCB 1995; Heath JCB 1995)), it will be worth trying to monitor the localization of Heh2 and its interaction with nucleoporins (by Heh2-TAP) using these strains. While Nup159 is a member of the cytoplasmic complex, Nup120 is an ORC nucleoporin. Thus, biochemical and phenotypical analysis using these mutant cells will be useful to clarify if the striking phenotypes the authors found are specific to nup133 knockout strain (or ORC Nup knockouts) or could be commonly observed in the strains that show NPC clustering. Another interesting point is that Nup159 shows strong interaction with Heh2, even in nup133Δ cells. As the authors mentioned, Nup159-Heh2 interaction may not be sufficient for Heh2-NPC association, but it could be important for NPC clustering.

      Our Response: These are excellent points and we agree that there is a need to more thoroughly explore how NPC clustering driven by abrogating the function of other nups impacts Heh2’s association with NPCs. Thus, in a revised manuscript, we would examine Heh2’s association with NPCs in several additional genetic backgrounds where NPCs cluster.

      Reviewer 1 major comment 4: Figure 4C: Is it known that rapamycin treatment in this strain did not affect the protein levels of nucleoporins? Otherwise, the authors should confirm this by western blotting (at least some of them).

      Our Response: This is a good point and we will directly address this with Western blotting of some nups.

      Reviewer 1 major comment 5: Figure 5: The authors mentioned (line 256-257) that "in all cases the punctate, NPC-like distribution of Heh2-GFP was retained (Fig 5D)". However, nup107 KO strain seems to show more diminished punctate staining as compared with other strains. To clarify this, the authors should express mCherry tagged Nup as in Fig. 2 or Fig. 3.

      Our Response: Yes, we agree and in fact this observation is consistent with the fact that there is an ER-pool of Heh2 observed in this strain and we observe loss of nup interactions in the affinity purification. We will include a more thorough quantification of this in a revised manuscript and more directly address this in the text.

      **Minor comments:**

      Reviewer 1 minor comment 1: Figure 4A and 4B: The authors should show Scatter plot as in Fig. 2 and Fig. 3.

      • *

      We will include this in a revised manuscript.

      Reviewer 1 minor comment 2: Figure 5C: Explanations of the arrowheads is missing in the figure legend.

      Thank you for pointing this out, it will be fixed in a revised manuscript.

      Reviewer 1 minor comment 3: Figure 6: Is there any information as to where Heh2 (316-663) is localized in the cell?

      As this truncation lacks INM targeting sequences, it is found throughout the cortical ER. The determinants of Heh2 targeting (including truncations) has been extensively evaluated in King et al. 2006, Meinema et al., 2011 and Rempel et al. 2020. We will make this clearer in the revised manuscript.

      Reviewer 1 minor comment 4: Figure 6B: Nucleoporins should be marked with color circles as in Fig. 1 and Fig. 5.

      This will be done.

      Reviewer 2

      Borah et al. present a biochemical and cell biological examination of the inner nuclear membrane (INM) protein Heh2 and its putative interactions with the nuclear pore complex (NPC). The potential conceptual advance of this study is that Heh2 interacts with the NPC, while mutations believed to trigger NPC mis-assembly are shown to abolish interaction with Heh2, leading to the hypothesis that Heh2 is a sensor for NPC assembly states within the (INM). The conclusions would undoubtably be of broad interest to the nucleocytoplasmic transport field, but the evidence provided thus far is insufficient to build confidence and consequently this manuscript is premature for publication.

      Our Response: We thank the reviewer for recognizing the potential for a significant conceptual advance for the field but object to the notion that the work is “premature for publication”. This is a highly subjective statement that does not seem to meet the mission or purpose of the Review Commons platform. While it is possible that some of the conclusions drawn in our manuscript might not be fully supported by the data in its current form, there is a substantial body of work here that is certainly publishable.

      Reviewer 2 major comment 1: The TAP-tag Heh1/Heh2 pulldowns are the most significant experiment presented, and on face value provide compelling evidence that Heh2 interacts with the NPC. It is stated that mass spectroscopy (MS) was used to confirm the identities of the labeled bands yet there is no methods section, nor any MS data reported in the manuscript. Given the large number of unspecified proteins observed in these gels, and the single-step pulldown methodology used, knowledge of the contaminants present may aid in elucidating how Heh2 pulls down NPC components. Consequently, within the supplementary materials, the authors must indicate which regions of the gel were excised for MS analysis and provide a table listing all of the proteins that were detected for each sample, including the number of unique/expected peptides observed. Our Response: This was a major oversight on our part and a revised manuscript will contain all relevant details with regards to the MS analysis including a more detailed description of the excised bands and the quantification of spectra derived from these bands.

      Reviewer 2 major comment 2a: The representative micrographs provided across Figures 2, 3, 4, 5 and 6 are very noisy. Particularly in the case of the mCherry labeled nucleoporins, this is both unusual and unfortunate given this is used to infer colocalization of Heh2 with the NPC.

      Our Response: These micrographs are not unusual and are in fact of respectable quality. We agree that the apparent “noise” is unfortunate, but this is simply a reality of the yeast system. We remind the reviewer that there are only ~100 to ~200 NPCs per budding yeast nucleus, which is an order of magnitude smaller than a typical mammalian cell nucleus. Further, the copy number of yeast nups per NPC is half of the mammalian cell NPC. Further, budding yeast are spherical with a cell wall that is extremely effective at scattering light; they are also highly autofluorescent (particularly in the red channel). Lastly, unlike in mammalian cells, budding yeast NPCs are mobile on the nuclear envelope. Thus, co-localization is challenging (particularly with the long exposures required to obtain good images). This is why clustering of NPCs driven by nup133**∆ cells has provided one of the key assays in the field to assess whether a given protein associates with NPCs at the level of light microscopy.

      Reviewer 2 major comment 2b: As a result it is unclear whether this experiment can be used to differentiate between NPC colocalization vs. nuclear envelope colocalization.

      Our Response: The reviewer is correct. Co-localization between Heh2-GFP and any Nup-mCherry is insufficient to assess NPC association in WT cells. In fact, as we point out in Figure 3B, at best one can expect a correlation of r = 0.48 for two well established nups. Thus, to further support the conclusion that Heh2 associates with NPCs, we established the Nsp1-FRB NPC clustering assay (Figure 3).

      Reviewer 2 major comment 2c: The authors should include negative controls for an alternative NE membrane protein that doesn't bind the NPC, which would be expected to exhibit a reduced level of colocalization with NPC proteins when compared to Heh2. For example, Heh1 would be a suitable, given the clear-cut negative pulldown data and its prior usage as a negative control in Figure 4.

      • *

      Our Response: This is included in Figure 3D.

      Reviewer 2 major comment 3a. Figure 2. The rim staining for the Nup82-mCherry in the WT background is unusually punctate, bringing into question the viability of the cells imaged.

      Our Response: As the middle cell in the panel is undergoing cell division, these cells are clearly viable. All our imaging is performed on mid-log phase cultures.

      • *

      Reviewer 2 major comment 3b. Why has ScNup82, a cytoplasmic filament component, been selected for colocalization experiments when Heh2 is proposed to interact with the inner ring complex?

      Our Response: The resolution of a conventional light microscope is, at best, 200 nm in x, y. As NPCs are 100 nm in diameter, even two NPCs side-by-side cannot be resolved. The IRC is tens of nm away from the cytoplasmic filaments thus any nup is relevant for a co-localization analysis with a light microscope.

      Reviewer 2 major comment 3c: Additionally, the experiments shown in panels A and C are not directly comparable, ScNup82 is an asymmetric cytoplasmic nucleoporin, while SpNup107 is located in the Y-shaped Nup84 nucleoporin complex and present on both faces of the NPC. This experiment should be repeated with scNup84 to match panel C, additionally a viability dot spot assay and western blot analysis of the labeled proteins should be conducted.

      Our response: These are in fact directly comparable within the limits of resolution of light microscopy as described above. Viability assays are not required here as both nups are essential and perturbation to their function would lead to inviability.

      Reviewer 2 major comment 4: Figure 3, the authors use yeast strains where proteins are tagged with FRB and FKBP12 domains, which dimerize upon the addition of rapamycin inducing NPC clusters. The authors then observe the effect this has on Heh2 NPC colocalization. However, Rapamycin may also have an effect independent from the induced dimerization event. Negative controls should be performed in strains lacking the FRB and FKBP12 tagged proteins to demonstrate that Rapamycin doesn't modify Heh2 localization independently of NPC clustering.

      Our response: This is a good point and important control that we performed in prior studies, see Colombi et al., JCB, 2013. We will be more explicit in describing that this control has been done.

      Reviewer 2 major comment 5: Figure 4. The authors provide a qualitative description of the colocalization presented, while in all other instances they calculate a Pearson correlation coefficient. This is significant because Heh2 appears to be evenly distributed within the NE of the DMSO control (panel B). Given the presented hypothesis isn't colocalization expected with Nup192? As a minimum, a Pearson correlation coefficient analysis should be conducted and added to Figure 4.

      Our response: This will be included in a revised manuscript.

      Reviewer 2 major comment 6: Figure 4. Pom152-mCherry localizes at both the NE and strongly within the cytoplasm, which is unexpected given typical rim staining phenotypes observed previously for both Pom152-YFP and Pom152-GFP strains (Katta, ..., Jaspersen et al., Genetics (2015) & Upla, ..., Fernandez-Martinez et al., Structure (2017), respectively). Given the unusually weak rim staining observed throughout, viability assays of the strains listed in Table S1 and protein expression analysis of the tagged nucleoporins via western blot is necessary.

      Our response: This is not localization in the cytoplasm but is in fact autofluorescence from the yeast vacuole. We regret we were not more explicit in describing this and we will make the manuscript more accessible for the non yeast expert. In order to perform the Western blot analysis for all strains requested by the reviewer would require a battery of antibodies to the endogenous proteins to directly assess how tagging influences nup levels, which we do not have (nor does anyone else that we are aware of). This is also not standard practice in the field as it is an onerous and unnecessary burden.

      Reviewer 2 major comment 7:* Figure 5A. The TAP-tagged pulldowns from ∆Pom152 and ∆Nup133 strains appear to be from a different round of experiments than the previous deletion strains presented. Interestingly, there appears to be an additional band at approximately 250 kDa in both cases that is not present in any other experiments. This band could be a contaminant observed due to different experimental conditions, or a protein that exclusively binds to Heh2 in the ∆Pom152 and ∆Nup133 background. Either way the authors should identify this protein with MS to address this ambiguity.

      *

      Our response: We will include negative controls for these specific experiments to show that this is a non specific band.

      Reviewer 2 major comment 8: Figure 6B. Please label the nucleoporin bands in the TAP-tagged pulldowns.

      Our response: This will be done.

      Reviewer 2 major comment 9: Figure 6D. Please specify Heh2-GFP clustering in the y-axis.

      Our response: As this represents both Heh2-GFP and heh2-1-570-GFP, we will keep it as is to avoid confusion.

      Reviewer 2 major comment 10: *Under the results section titled 'Heh2 binds to specific nups in evolutionarily distant yeasts', the authors state that spHeh2 co-purifies with "several specific species". The meaning is unclear, this sentence should be rephrased and the specific species clearly described. **

      *

      Our response: Ok.

      Reviewer 2 major comment 11: Under the results section titled 'Heh2 fails to interact with NPCs lacking Nup133', the authors refer to a Pearson correlation coefficient of -0.03 as a clear anticorrelation. Instead state there was no correlation.

      Our response: Ok.

      Reviewer 2 major comment 12: In the discussion, the authors state that "clustering itself may sterically preclude an interaction with Heh2". The text should be expanded to explain this in more detail, it is not clear from the presented data why this would occur.

      Our response: Ok.

      Reviewer 2 comment on significance: the manuscript is premature for publication.

      Our Response: Such a statement has no relevance to this form of review as a decision as to whether a study is premature for publication should be made by journal editors, not reviewers. We would argue quite strongly that we have definitively shown that Heh2 binds to NPCs, that it does so in multiple evolutionarily distant yeasts and that this binding is functionally relevant. For example, we can specifically disrupt the association of Heh2 with NPCs with a specific domain deletion and observe a loss of function phenotype (e.g. NPC clustering). What all three reviewers agree on is that the concept of a “NPC assembly state sensor” needs additional data to be fully supported, although we note that this reviewer did not provide any suggestions for how we might achieve this goal. We further note that we added the qualifier “may” into the title of the work. Thus, we will therefore perform additional experiments as outlined in comments to Reviewer 1 to support this conclusion in order to introduce this as a new concept in the field.

      Reviewer Comment from Cross Commenting: It seems to me that all reviewers agree that the manuscript is premature for publication. The data thus far do not support the conclusion that Heh2 may be an NPC assembly sensor nor does it provide any mechanistic insight. Reading the comments of the other two reviewers makes me more negative, as it is care that the paper also lacks scientific rigor. The manuscript is a great starting point for a rigorous dissection but I do not see this paper to be a candidate for a broad impact journal.

      Our Response: The statement that this manuscript is premature for publication is an opinion and does not seem to reflect the sentiment of the other reviewers. It is also confounding that this reviewer suggests that this work lacks rigor. With the exception of the omission of the MS analysis (our fault), the data are of high quality and rigorously quantified. Our assertion of rigor and data quality is based on our collective team’s many decades-long history of publishing and reviewing papers at the highest levels in this field. Questions as to the quality of the data as stated by this reviewer (and only this reviewer) in fact address limitations of light microscopy and the yeast system more generally in this one respect.


      Reviewer 3

      Reviewer 3 Summary part a*: This is quite an interesting manuscript that explores the relationship between an INM protein, Heh2, and NPCs. It represents an extension of earlier work performed by this group in which it was shown that the HEH2 gene shares genetic interactions with the genes encoding various nucleoporins. Heh2 belongs to an intriguing family of conserved proteins that includes its orthologue, Heh1, as well as human MAN1 (LEMD3) and LEMD2, among others. Each of these proteins contains two transmembrane domains with the N- and C-terminal regions extending in to the nucleoplasm. The two TM domains are separated by a short lumenal loop.

      In this study, the authors show that a population of Heh2 is associated with Nups of the NPC inner ring complex. This was demonstrated initially in pulldown experiments. The authors go on to show that when NPCs are caused to aggregate, by physical tethering employing an FKBP/FRP system in combination with Rapamycin, Heh2, but not Heh1, colocalizes with the NPC clusters. *

      • *

      Our Response: Thank you to the reviewer for recognizing the value of this work.

      • *

      Reviewer 3 Summary_b. Although not stated explicitly in the manuscript, this would imply that there is a population of Heh2 that resides in the NPC membrane domain, with the remainder in the INM. As an idle question, is there any evidence for a similar localization of MAN1 or LEMD2 in mammals? I am guessing probably not.

      Our Response: We regret this was not made more clear but the idea that there is a pool of Heh2 at the POM and a pool at the INM is an important conclusion of the work and was stated in the results - we’ll re-emphasize in the revised discussion. As to whether MAN1 or LEMD2 has a similar NPC association, we hypothesize that MAN1 but not LEMD2 will indeed interact with NPCs in mammalian cells. This is based on considering that we show that both the budding and fission yeast orthologues of MAN1 share this association so unless it was lost in evolution, this is a likely outcome of future studies.

      Reviewer 3 Significance statement a: The complications arise when the authors show that an alternative method of NPC aggregation (although they did this first), involving Nup133 deletion, results in failure of Heh2 to co-aggregate. In other words, Nup133 is required for the association of Heh2 with NPCs. The issue here is that there is no evidence for an interaction between Heh2 and Nup133, and furthermore that loss of Nup133 (a Y complex component of the outer ring complex) leaves the inner ring complex intact.

      • *

      Our Response: We tested the nup133Δ background first as this is the standard approach for assessing NPC-association of a given protein so we felt this would be logical for a reader in the field. Further, while the disruption of Heh2’s binding by loss of Nup133 may be a complication, we prefer to see it as an opportunity for discovery. As described in our manuscript, we have chosen to interpret this result in the context of a new biological function/concept with Heh2 being a novel “NPC assembly state” sensor. While one could argue that we have not fully met this bar yet, we will perform additional experiments as outlined in our response to reviewer 1 to help support this compelling conclusion.

      • *

      Reviewer 3 Signfiicance statement b: What is clear, however, is that Heh2 seems to be required to inhibit NPC aggregation since Heh2 deficient cells exhibit NPC clusters. The association between Heh2 and IRC Nups resides in the C-terminal nucleoplasmic winged helix domain. The N-terminal domain, in contrast confers INM localization.

      • *

      Our Response: We agree.__*


      Reviewer 3 Signfiicance statement c I must admit, I am in two minds about this manuscript. The data clearly show that Heh2 is associated with IRC components and I agree with the authors that this protein may well have a role in NPC assembly quality control perhaps in the guise of a chaperone. However, I find it hard to come up with a convincing model for the effects of Nup133. On the one hand, one could make an argument that the data presented here is too preliminary and fails to provide a complete story. On the other hand, it does provide an intriguing foundation for future studies and I do feel positively disposed towards it. In short, I have no fundamental complaints about the science, I am just uncertain as to whether the study is ready for publication.

      Our Response: This statement nicely articulates the challenge with this manuscript as there are some solid findings (that Heh2 binds specifically to NPCs etc.) but also a provocative finding (that loss of Nup133 breaks Heh2’s interaction with NPCs despite not physically interacting). Thus, there is a decision to be made about whether there is value in introducing a novel concept to the field once additional data is provided in a revised manuscript.

      Reviewer 3 Cross commenting: I have no fundamental disagreements with either of the other two reviewers. The comment from Reviewer#2 summarises this quite neatly. While I have fewer concerns about the quality of the data as presented, I think we all agree that at best the study is preliminary. What the authors need to do is to construct a coherent model that will account for the observations described here and then to design experiments that will test this model. I'm not suggesting that they must have a complete story, but they do need to go beyond what is in the current manuscript.

      • *

      Our Response: We appreciate that the reviewer does not have any questions about the quality of our data, but we argue that we have in fact presented the most coherent interpretation of the data as it currently stands. As described above, we intend to attempt to solidify this model by performing experiments suggested by reviewer 1.



      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting. Reply to the Reviewers I thank the Referees for their...Referee #1__

      1. The authors should provide more information when... Responses__

      The typical domed appearance of a hydrocephalus-harboring skull is apparent as early as P4, as shown in a new side-by-side comparison of pups at that age (Fig. 1A). Though this is not stated in the MS

      1. Figure 6: Why has only... Response: We expanded the comparisonMinor comments:__

      2. The text contains several... Response: We added... Referee #2__

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This is quite an interesting manuscript that explores the relationship between an INM protein, Heh2, and NPCs. It represents an extension of earlier work performed by this group in which it was shown that the HEH2 gene shares genetic interactions with the genes encoding various nucleoporins. Heh2 belongs to an intriguing family of conserved proteins that includes its orthologue, Heh1, as well as human MAN1 (LEMD3) and LEMD2, among others. Each of these proteins contains two transmembrane domains with the N- and C-terminal regions extending in to the nucleoplasm. The two TM domains are separated by a short lumenal loop.

      In this study, the authors show that a population of Heh2 is associated with Nups of the NPC inner ring complex. This was demonstrated initially in pulldown experiments. The authors go on to show that when NPCs are caused to aggregate, by physical tethering employing an FKBP/FRP system in combination with Rapamycin, Heh2, but not Heh1, colocalizes with the NPC clusters. Although not stated explicitly in the manuscript, this would imply that there is a population of Heh2 that resides in the NPC membrane domain, with the remainder in the INM. As an idle question, is there any evidence for a similar localization of MAN1 or LEMD2 in mammals? I am guessing probably not.

      Significance

      The complications arise when the authors show that an alternative method of NPC aggregation (although they did this first), involving Nup133 deletion, results in failure of Heh2 to co-aggregate. In other words, Nup133 is required for the association of Heh2 with NPCs. The issue here is that there is no evidence for an interaction between Heh2 and Nup133, and furthermore that loss of Nup133 (a Y complex component of the outer ring complex) leaves the inner ring complex intact. What is clear, however, is that Heh2 seems to be required to inhibit NPC aggregation since Heh2 deficient cells exhibit NPC clusters. The association between Heh2 and IRC Nups resides in the C-terminal nucleoplasmic winged helix domain. The N-terminal domain, in contrast confers INM localization.

      I must admit, I am in two minds about this manuscript. The data clearly show that Heh2 is associated with IRC components and I agree with the authors that this protein may well have a role in NPC assembly quality control perhaps in the guise of a chaperone. However, I find it hard to come up with a convincing model for the effects of Nup133. On the one hand, one could make an argument that the data presented here is too preliminary and fails to provide a complete story. On the other hand, it does provide an intriguing foundation for future studies and I do feel positively disposed towards it. In short, I have no fundamental complaints about the science, I am just uncertain as to whether the study is ready for publication.

      REFEREES CROSS COMMENTING

      I have no fundamental disagreements with either of the other two reviewers. The comment from Reviewer#2 summarises this quite neatly. While I have fewer concerns about the quality of the data as presented, I think we all agree that at best the study is preliminary. What the authors need to do is to construct a coherent model that will account for the observations described here and then to design experiments that will test this model. I'm not suggesting that they must have a complete story, but they do need to go beyond what is in the current manuscript.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Borah et al. present a biochemical and cell biological examination of the inner nuclear membrane (INM) protein Heh2 and its putative interactions with the nuclear pore complex (NPC). The potential conceptual advance of this study is that Heh2 interacts with the NPC, while mutations believed to trigger NPC mis-assembly are shown to abolish interaction with Heh2, leading to the hypothesis that Heh2 is a sensor for NPC assembly states within the (INM). The conclusions would undoubtably be of broad interest to the nucleocytoplasmic transport field, but the evidence provided thus far is insufficient to build confidence and consequently this manuscript is premature for publication.

      Specific comments:

      (1)The TAP-tag Heh1/Heh2 pulldowns are the most significant experiment presented, and on face value provide compelling evidence that Heh2 interacts with the NPC. It is stated that mass spectroscopy (MS) was used to confirm the identities of the labeled bands yet there is no methods section, nor any MS data reported in the manuscript. Given the large number of unspecified proteins observed in these gels, and the single-step pulldown methodology used, knowledge of the contaminants present may aid in elucidating how Heh2 pulls down NPC components. Consequently, within the supplementary materials, the authors must indicate which regions of the gel were excised for MS analysis and provide a table listing all of the proteins that were detected for each sample, including the number of unique/expected peptides observed.

      (2)The representative micrographs provided across Figures 2, 3, 4, 5 and 6 are very noisy. Particularly in the case of the mCherry labeled nucleoporins, this is both unusual and unfortunate given this is used to infer colocalization of Heh2 with the NPC. As a result it is unclear whether this experiment can be used to differentiate between NPC colocalization vs. nuclear envelope colocalization. The authors should include negative controls for an alternative NE membrane protein that doesn't bind the NPC, which would be expected to exhibit a reduced level of colocalization with NPC proteins when compared to Heh2. For example, Heh1 would be a suitable, given the clear-cut negative pulldown data and its prior usage as a negative control in Figure 4.

      (3)Figure 2. The rim staining for the Nup82-mCherry in the WT background is unusually punctate, bringing into question the viability of the cells imaged. Why has ScNup82, a cytoplasmic filament component, been selected for colocalization experiments when Heh2 is proposed to interact with the inner ring complex? Additionally, the experiments shown in panels A and C are not directly comparable, ScNup82 is an asymmetric cytoplasmic nucleoporin, while SpNup107 is located in the Y-shaped Nup84 nucleoporin complex and present on both faces of the NPC. This experiment should be repeated with scNup84 to match panel C, additionally a viability dot spot assay and western blot analysis of the labeled proteins should be conducted.

      (4)Figure 3, the authors use yeast strains where proteins are tagged with FRB and FKBP12 domains, which dimerize upon the addition of rapamycin inducing NPC clusters. The authors then observe the effect this has on Heh2 NPC colocalization. However, Rapamycin may also have an effect independent from the induced dimerization event. Negative controls should be performed in strains lacking the FRB and FKBP12 tagged proteins to demonstrate that Rapamycin doesn't modify Heh2 localization independently of NPC clustering.

      (5)Figure 4. The authors provide a qualitative description of the colocalization presented, while in all other instances they calculate a Pearson correlation coefficient. This is significant because Heh2 appears to be evenly distributed within the NE of the DMSO control (panel B). Given the presented hypothesis isn't colocalization expected with Nup192? As a minimum, a Pearson correlation coefficient analysis should be conducted and added to Figure 4.

      (6)Figure 4. Pom152-mCherry localizes at both the NE and strongly within the cytoplasm, which is unexpected given typical rim staining phenotypes observed previously for both Pom152-YFP and Pom152-GFP strains (Katta, ..., Jaspersen et al., Genetics (2015) & Upla, ..., Fernandez-Martinez et al., Structure (2017), respectively). Given the unusually weak rim staining observed throughout, viability assays of the strains listed in Table S1 and protein expression analysis of the tagged nucleoporins via western blot is necessary.

      (7)Figure 5A. The TAP-tagged pulldowns from ∆Pom152 and ∆Nup133 strains appear to be from a different round of experiments than the previous deletion strains presented. Interestingly, there appears to be an additional band at approximately 250 kDa in both cases that is not present in any other experiments. This band could be a contaminant observed due to different experimental conditions, or a protein that exclusively binds to Heh2 in the ∆Pom152 and ∆Nup133 background. Either way the authors should identify this protein with MS to address this ambiguity.

      (8)Figure 6B. Please label the nucleoporin bands in the TAP-tagged pulldowns.

      (9)Figure 6D. Please specify Heh2-GFP clustering in the y-axis.

      (10)Under the results section titled 'Heh2 binds to specific nups in evolutionarily distant yeasts', the authors state that spHeh2 co-purifies with "several specific species". The meaning is unclear, this sentence should be rephrased and the specific species clearly described.

      (11)Under the results section titled 'Heh2 fails to interact with NPCs lacking Nup133', the authors refer to a Pearson correlation coefficient of -0.03 as a clear anticorrelation. Instead state there was no correlation.

      (12)In the discussion, the authors state that "clustering itself may sterically preclude an interaction with Heh2". The text should be expanded to explain this in more detail, it is not clear from the presented data why this would occur.

      Significance

      the manuscript is premature for publication.

      REFEREES CROSS COMMENTING

      It seems to me that all reviewers agree that the manuscript is premature for publication. The data thus far do not support the conclusion that Heh2 may be an NPC assembly sensor nor does it provide any mechanistic insight. Reading the comments of the other two reviewers makes me more negative, as it is care that the paper also lacks scientific rigor. The manuscript is a great starting point for a rigorous dissection but I do not see this paper to be a candidate for a broad impact journal.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Borah et al showed that Heh2, a component of INM, can be co-purified with a specific subset of nucleoporins. They also found that disrupting interactions between Heh2 and NPC causes NPC clustering. Lastly, they showed that the knockout of Nup133, which does not physically interact with Heh2, causes the dissociation of Heh2 from NPCs. These findings led the authors to propose that Heh2 acts as a sensor of NPC assembly state.

      Major comments:

      The authors claimed that Heh2 acts as a sensor of NPC assembly state, as evidenced by their finding that Heh2 fails to bind with NPCs in nup133 Δ cells (Fig2, Fig 5). However, there is a possibility that the association between Heh2 and NPCs is merely affected by the clustering of the NPCs (as the authors discussed) but not related to the structural integrity of NPC. In addition, their data showing that the Heh2-NPCs association is not easily disrupted by knocking out the individual components of the IRC (Fig. 5A and 5D), also disfavor the idea that Heh2 could sense NPC assembly state. Since some nup knockout strains, other than nup133 Δ, are also known to show the NPC clustering (ex. nup159 (Gorsch JCB 1995) and nup120 (Aitchison JCB 1995; Heath JCB 1995)), it will be worth trying to monitor the localization of Heh2 and its interaction with nucleoporins (by Heh2-TAP) using these strains. While Nup159 is a member of the cytoplasmic complex, Nup120 is an ORC nucleoporin. Thus, biochemical and phenotypical analysis using these mutant cells will be useful to clarify if the striking phenotypes the authors found are specific to nup133 knockout strain (or ORC Nup knockouts) or could be commonly observed in the strains that show NPC clustering. Another interesting point is that Nup159 shows strong interaction with Heh2, even in nup133Δ cells. As the authors mentioned, Nup159-Heh2 interaction may not be sufficient for Heh2-NPC association, but it could be important for NPC clustering.

      Figure 4C: Is it known that rapamycin treatment in this strain did not affect the protein levels of nucleoporins? Otherwise, the authors should confirm this by western blotting (at least some of them).

      Figure 5: The authors mentioned (line 256-257) that "in all cases the punctate, NPC-like distribution of Heh2-GFP was retained (Fig 5D)". However, nup107 KO strain seems to show more diminished punctate staining as compared with other strains. To clarify this, the authors should express mCherry tagged Nup as in Fig. 2 or Fig. 3.

      Minor comments:

      Figure 4A and 4B: The authors should show Scatter plot as in Fig. 2 and Fig. 3.

      Figure 5C: Explanations of the arrowheads is missing in the figure legend.

      Figure 6: Is there any information as to where Heh2 (316-663) is localized in the cell?

      Figure 6B: Nucleoporins should be marked with color circles as in Fig. 1 and Fig. 5.

      Significance

      Heh2 has been implicated in the quality control of NPC assembly, however, the molecular mechanism of how Huh2 interacts and affects NPC assembly/function remained largely unknown. The relationship between Heh2 and specific nucleoporins shown in this study is novel and interesting. While the data are overall good quality and convincing, the current manuscript still lacks the molecular mechanistic insights. In particular, it is not clear if the observed phenotypes are due to structural defects of NPC or NPC clustering.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): The manuscript by Huh et al. reports that oxidative stress causes fragmentation of a specific tyrosine pre-tRNA, leading to two parallel outcomes. First, the fragmentation depletes the mature tRNA, causing translational repression of genes that are disproportionally rich in tyrosine codon. These genes are enriched for those involved in electron transport chain, cell cycle and growth. Second, the fragmentation generates tRNA fragments (tRFs) that bind to two known RNA binding proteins. Finally, the authors identify a nuclease that is needed for efficient formation of tyrosine tRFs. Comment 1: Th­­­­e authors should include a short diagram indicating the various known steps of pre-tRNA fragmentation (perhaps as a supplement) for general readers.

      Response: We thank the reviewer for their suggestion. Pre-tRNA fragmentation is still an unknown field but an initial introduction is best seen from pre-tRNA processing where there is a cleavage event for pre-tRNAs with an intron. This is a complex subject but a recent review from Hopper and Nostramo has done an excellent job in in describing the current field in yeast and vertebrate species (Hopper and Nostramo, Front. Genet., 2019). We have added this citation and new text in the manuscript about pre-tRNA processing for general readers to follow up on. We feel that a supplementary figure might be a bit too brief in describing the knowns and unknowns of pre-tRNA processing and fragmentation.

      Comment 2: I find the enrichment for mitochondrial electron transport chain (ETC) curious. The ETC includes several oxidoreductases, which may be rich in tyrosine as it is a common amino acid used in electron transfer. The depletion of the tyrosine tRNA from among many tRNAs under oxidative stress may not be incidental but related to an attempt by the cell to decrease oxygen consumption to avoid further oxidative damage. The authors could further mine their data to corroborate this hypothesis. For example, are the ETC genes among the targets of the RNA binding proteins targeted by tyrosine tRFs? This could potentially connect the effects of mature tRNA depletion and tRFs.

      Response: We thank the reviewer for this very interesting comment and insight, which had not occurred to us. The relationship between this response and oxidoreductase regulation could be a factor in both the tRNA and tRF modulations seen in our cells. Interestingly, we find that many oxidoreductases genes (such as the NDUF family) are bound by hnRNPA1 by CLIP. In new data, we have done stability experiments with the tRF (new Fig 7E-F) to show the regulon of hnRNPA1 is modulated with overexpression and LNA against the tRF, revealing that this tRNA fragmentation response modulates expression of certain oxidoreductase genes. However, we do not see clear and significant differences for ETC genes in particular. As hnRNPA1 is known to act as both a promoter and destabilizer of genes depending on context, it is likely that further and more detailed work will be needed to parse this hypothesis out in future studies.

      Comment 3: In figure 4A, the authors should provide the tyrosine codon content of the overlap genes and show how much it differs from a randomly selected sample.

      Response: We have identified an error in our manuscript where the overlap actually identifies 109 proteins rather than the 102 reported in the original manuscript. We apologize for this oversight. As for the overlap proteins, we plotted the downstream proteins detected in the proteome by mass spectrometry based off on Tyr-codon content. As explained in the text, the targets we tested were chosen for having higher than median levels of Tyr-codon, as seen in the histogram, and for showing some of the greatest reduction after Tyr tRNA-GUA depletion (Fig S4A). The other proteins found in the overlap will fall in a similar pattern along the histogram.

      Comment 4: Fig.6F, lower panel: the model should show pre-tRNA, as opposed to mature tRNA, because it is the former that is fragmented.

      Response: We apologize for the confusion. The model in Fig 7F was supposed to denote the pre-tRNA with the trailer and leader sequences intact initially, then lost with processing to mature tRNA. To make it clearer, we have now labeled the first species as “Pre-tRNA.”

      Reviewer #1 (Significance (Required)): This study is comprehensive and novel, and includes several orthogonal and complementary approaches to provide convincing evidence for the conclusions. The main discovery is significant because it presents an important advance in post-transcriptional control of gene expression. The process of tRF formation was previously thought not to affect the levels of mature tRNA. This study changes that understanding by describing for the first time the depletion of a specific mature tRNA as its precursor form is fragmented to generate tRFs. Finally, the authors identify DIS3L2 as a nuclease involved in fragmentation. This is also an important finding as the only other suspected nuclease, albeit with contradictory evidence, is angiogenin. Collectively, the findings of this study would be of interest to a broad group of scientists. I only have a few minor comments and suggestions (see above).

      Response: We thank the reviewer for their very positive and insightful comments and feedback.

      REFEREES CROSS-COMMENTING I have the following comments on other reviewers' critiques. Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly? Reviewer 3 raises the issue of cross hybridization in Northern blots. The authors indicate that they "could not detect the other tyrosyl tRNA (tRNA Tyr AUA) in MCF10A cells by northern blot..." (page 6). Also, they gel extracted tRFs and sequenced them (figure S6B), directly identifying the fragments. I think these findings mitigate the concern of cross hybridization and clearly identify the nature of tRFs. Finally, I think that the codon-dependent reporter experiment (figure 5D) addresses many issues surrounding codon dependent vs indirect effects. In that experiment, the authors mutate 5 tyrosine codons of a reporter gene and demonstrate that the encoded protein is less susceptible to repression in response to oxidative stress.

      Response: We thank the reviewer for their tremendous insights. We are in agreement regarding the three points in the cross-comments.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): This very interesting study from Sohail Tavazoie's lab describes the consequences of oxidative stress on the tRNA pool in human epithelial cell lines. As previously described, the authors observed that tRNA fragments were generated upon exposure of cells to ROS. In addition, the authors made the novel observation that specific mature tRNAs were also depleted under these conditions. In particular, the authors focused on tyrosyl tRNA-GUA, which was decreased ~50% after 24 hours of ROS exposure, an effect attributable to a decrease in the pre-tRNA pool. Depletion of tyrosyl tRNA resulted in reduced translation of specific mRNAs that are enriched in tyr codons and likely contributed to the anti-proliferative effects of ROS exposure. In addition, the authors demonstrated that the tRFs produced from tyr tRNA-GUA can interact with specific RNA binding proteins (SSB and hnRNPA1). The major contribution of this paper is the novel finding that stress-induced tRNA fragmentation can result in a measurable reduction of specific mature tRNAs, leading to a selective reduction in translation of mRNAs that are enriched for the corresponding codons. Previously, studies of tRNA fragmentation largely focused on the functions of the tRFs themselves and it was generally believed that the mature tRNA pool was not impacted sufficiently to reduce translation. The findings reported here therefore add a new dimension to our understanding of the cellular consequences of stress-induced tRNA cleavage. Overall, the data are of high quality, the experiments are convincing, and the conclusions are well supported. I have the following suggestions that would further strengthen the study and bolster the conclusions. Comment 1: The authors have not formally demonstrated that the reduction in pre-tRNA in H2O2-treated cells is a consequence of pre-tRNA cleavage. It is possible that reduced transcription contributes to this effect. Pulse-chase experiments with nucleotides such as EU would provide a tractable approach to demonstrate that a labelled pool of pre-tRNA is rapidly depleted upon H2O2 treatment, which would further support their model. Since the response occurs rapidly (within 1 hour), it would be feasible to monitor the rate of pre-tRNA depletion during this time period in control vs. H2O2-treated cells.

      Response: We thank the reviewer for their suggestion and agree that testing for a transcriptional effect using a pulse-chase experiment would further support these findings. We are grateful to both reviewer 1 and reviewer 2 in the cross-comments for recognizing that the tRNA repression response we see is too rapid to be a transcriptional response and that the fact that this tRNA depletion response occurs concomitantly with the tRF generation supports our model that this is a pre-tRNA fragmentation response. It would be of interest for future studies to also examine the impact of cellular stress on tRNA transcription.

      Comment 2: To what extent is the growth arrest that results from H2O2 treatment attributable to tyr tRNA-GUA depletion (Fig. 3A)? Since the reduction in tRNA levels is only partial (~50%), it should be feasible to restore tRNA levels by overexpression (strategy used in Fig. 3E, S3B) and determine whether this measurably rescues growth in H2O2-treated cells.

      Response: We thank the reviewer for their suggestion. Originally, we had also thought of this experiment and attempted to test this hypothesis. Upon experimentation, we ran into technical challenges that prevented us from drawing any conclusions. The problems were that we were unable to develop a cell line that stably overexpressed the Tyr tRNA-GUA and had to settle for a transient overexpression that only lasted for a couple of days (Fig S3B). For transient transfection, we used Lipofectamine 3000 (Invitrogen) that has associated cell toxicities and requires a control RNA transfection in lipofectamine. In addition, H2O2 in itself is a stress. The simultaneous occurrence of these two stresses led to a combination of cell death and cell growth for the control and experimental group. Given the high variability, we were unable to draw any conclusions on cell growth with this combination. We hope to identify a way to stably overexpress Tyr tRNA-GUA in the future to address this hypothesis.

      Comment 3: Knockdown of YARS/tyr tRNA-GUA resulted in reduced expression of EPCAM, SCD, and USP3 at both the protein and mRNA levels (Fig. 4C-D, S4C). In contrast, H2O2-exposure reduced the abundance of these proteins without affecting mRNA levels (Fig. 5A-B, S5A). The authors should comment on this apparent discrepancy. Perhaps translational stalling induces No-Go decay, but it is unclear why this response would not also be triggered by ROS.

      Response: We would like to clarify that out of the three genes in Fig. S5A, only EPCAM mRNA levels were significantly reduced with H2O2-exposure while no changes were observed in the mRNA levels of USP3 or SCD. It is difficult to ascertain the reason for EPCAM mRNA reduction but one hypothesis is due to timing and steady state levels. Levels of mRNAs seen with knockdown of YARS or tRNA represent steady state levels where mRNA decay and transcriptional changes can be easily seen. Following H2O2, the data is collected at 24 hours, which may be before mRNA effects can be fully appreciated. We have edited the text to clarify the uncertainty involved. We agree with the reviewer’s insightful comment and find these differences to be interesting and will consider them in future studies to better understand the interplay between translation and mRNA levels in the context of tRNA depletion.

      Comment 4: In addition to the analyses of ribosome profiling in Fig. 5E-F, it might also be helpful to show a metagene analysis of ribosome occupancy centered upon UAC/UAU codons (for an example, see Figure 2 of Schuller et al., Mol Cell, 2017). This has previously been used as an effective way to visualize ribosome stalling at specific codons. Additionally, do the authors see a global correlation between tyrosine codon density and reduced translational efficiency in tRNA knockdown cells?

      Response: We thank the reviewer for their important suggestion. We have expanded the analysis to look at codon usage scatterplots across all codons for shTyr and shControl replicates (Fig S5D). The 5 most changed codons are labeled with UAC, a codon for the tyrosine amino acid, being the most affected (red arrow). Consistent with our model, a tyrosine codon, when at the ribosome A-site, is most affected with depletion of the corresponding tRNA. The text has also been edited to reflect our new analysis providing further evidence that ribosomal stalling could occur upon depletion of this tRNA. The gray outline around the regression line represents the 95% confidence interval.

      Fig S5D

      As seen in Fig 5F, a significant overlap was noted for genes with the lowest translational efficiency and tyrosine enrichment. We did further analysis to test if a direct and linear relationship exists between tyrosine codon density and reduced translational efficiency on the global scale (i.e. does more stalling occur with more tyrosine codons on a global scale). We again see that a reduced translational efficiency is significantly correlated with tyrosine codon enrichment (above median parameters) in the tRNA knockdown ribosome profiling data. However, our analysis on a direct relationship between codon density and translational efficiency is inconclusive. This analysis is limited given the sequencing depth and number of experimental replicates available and we lack the statistical power to draw strong conclusions. To prevent overstating our claims, we have omitted any conclusions regarding this second analysis.

      Comment 5: MINOR: On pg. 4, the authors state that tRF-tyrGUA is the most highly induced tRF, but Fig. S1B appears to show stronger induction of tRF-LeuTAA.

      Response: The reviewer is correct in that the data from Fig S1B shows Leu-tRFs with higher induction. Our text was meant to suggest we focused on tRF-TyrGUA due to higher band intensity seen on northern blot validation. We have edited the text in the manuscript to clarify this.

      Reviewer #2 (Significance (Required)): The major advance provided by this work is the demonstration that stress-induced tRNA cleavage can reduce the abundance of the mature tRNA pool sufficiently to impact translation. Moreover, the effect on mature tRNAs is selective, resulting in the reduced translation of a specific set of mRNAs under these conditions. These findings reveal previously unknown consequences of oxidative stress on gene expression and will be of interest to scientists working on cellular stress responses and post-transcriptional regulation.

      Response: We thank the reviewer for the kind comments and feedback.

      REFEREES CROSS-COMMENTING Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly? Here is what I was thinking: The generation of tRFs does not generally result in reduction in levels of the mature tRNAs. So you can imagine a scenario where oxidative stress causes tRF generation from the mature tyr tRNA (which does not impact its steady-state levels), as is the case for other tRNAs. At the same time, decreased transcription would reduce the pre-tRNA pool, leading to a delayed reduction in mature tRNA, as observed. However, looking back at the data, I see that after only 5 min of H2O2 treatment, the authors observed reduced pre-tRNA and increased tRFs (Fig. 2A). This seems very fast for a transcriptional response, which would presumably require some kind of signal transduction. In addition, when you consider the amount of tRFs produced in Fig. S2C, it is hard to imagine that this would not impact the mature tRNA pool if they were derived from there. So I agree that the transcriptional scenario seems unlikely. Nevertheless, I think that looking at pre-tRNA degradation directly with the pulse-chase strategy would strengthen their story, so I would like to give the authors this suggestion. However, I am fine with listing this as an optional experiment which would enhance the paper but should not be essential for publication.

      Response: We thank the reviewer for these insightful comments. As mentioned above, five minutes is likely too rapid for a transcriptional response to be the main effect of H2O2 on Tyr-tRNA GUA. Moreover, the concomitant appearance of the tRF at this time-point makes tRNA fragmentation the most parsimonious and likely explanation rather than transcriptional repression, which would not cause a tRNA fragment to occur concurrently. Moreover, extraction and sequencing of the tRF shows it likely derives from the pre-tRNA as a 5’ leader sequence is present. We appreciate the reviewer’s suggestion and scholarly willingness to reassess their own hypothesis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): The major findings in this manuscript are: 1.) Oxidative stress in human cells causes a decrease in tyrosine tRNA levels and accumulation of tyrosine tRNA fragments; 2.) The depletion of tyrosyl-tRNA synthetase or tyrosine tRNAs in human cells results in altered translation of certain genes and reduced cell growth and 3.) hnRNPA1 and SSB/La can bind tyrosine tRNA fragments. There is also preliminary evidence that the DIS3L2 endonuclease contributes to the appearance of tyrosine tRNA fragments upon oxidative stress. Based upon these results, the Authors conclude that tyrosine tRNA depletion is part of a conserved stress-response pathway to regulate translation in a codon-based manner. **Major comments:** Comment 1: There is a considerable amount of data in this paper and the experiments are performed in a generally rigorous manner. Sufficient details are provided for reproducing the findings and all results have been provided to appropriate databases (RNA-Seq and ribosome profiling).

      Response: We thank the reviewer for the positive comments and feedback.

      Comment 2: The manuscript uses a probe against the 5' half of Tyrosine tRNA for Northern blotting. However, tRNA probes can be prone to cross-hybridization, especially with some tRNA isoacceptors being similar in sequence. Thus, the blots in Figure 2 and Supplemental Figures should be probed with an oligonucleotide against the 3' half of tRNA-Tyr. This will confirm the pre- and mature tRNA-Tyr bands detected with the 5' probe. Moreover, this will determine whether 3' tRNA-Tyr fragments accumulate.

      Response: We agree that the reviewer is correct in suggesting that the 3’ tRNA-Tyr might also accumulate. However, we disagree that any accumulation of the 3’ tRF might be relevant in our particular model for multiple reasons. As supported by reviewer 1’s cross-comments, cross-hybridization between isoacceptors (GUA vs AUA) would be unlikely as Tyr-AUA could not even be detected by the initial 5’ tRF probe. Additionally, the sequences for Tyr-GUA are different with no nucleotide alignment from Tyr-AUA. Furthermore, the extraction and sequencing of the 5’ tRF (Fig S6B) confirms the 5’ leader sequence unique to the pre-tRNA (also noted by reviewer 1). While the 3’ half of many Tyr-GUA are similar, we find selective binding of our RNA binding proteins only to the 5’ tRF. The 3’ tRF may play some role in binding to other proteins in cell regulatory pathways but such experiments would be outside the scope of this study.

      Comment 3: The analysis of the proteomic and ribosome profiling experiments seem rather limited, or based upon what was presented in this manuscript. If additional analyses were performed, then they should be included as well, even if they yielded negative results. For example, the manuscript identifies 102 proteins that decrease after tRNA-Tyr depletion and YARS-depletion with a certain threshold of Tyr codon content. We realize the Authors were trying to find potential genes that are modulated under all three conditions. However, this does not provide information whether there is a relationship between a certain codon such as Tyr and protein abundance if only binning into two categories representing below and above a certain codon content. The Authors should plot the abundance change of each detected protein versus each codon and determine the correlation coefficient. This analysis is important for substantiating the conclusion of a codon-based system of specifically modulating transcripts enriched for certain codons. Otherwise, how could changes in tRNA-Tyr levels modulate codon-dependent gene expression if two different transcripts with the same Tyr codon content exhibit differences in translation? Moreover, this analysis should be performed with all the other codons as well.

      Response: We have identified an error in our manuscript where the overlap identified 109 proteins and not 102 as reported previously. We apologize for this oversight. While the reviewer is correct in that identifying codon dependent changes for all 3500+ proteins detected would offer greater insight, our study was specifically focused on tyrosine as we observed this tRNA to become depleted and our experimental system modulated this specific tRNA. As for the second point on Tyr tRNA level effects on translation, we felt that the most rigorous course would be to assess causality rather than an association for this tRNA and its codon in regulating a target gene. The only way to do this is to perform mutagenesis and reporter studies. Our codon dependent reporter clearly shows a direct effect on translation in a tyrosine-codon dependent manner. As for translational regulation for two different transcripts with the same Tyr codon content, it is unclear the molecular mechanisms that could dictate these differences. The reviewer has already brought up possibilities in the next comment regarding Tyr codons in 5’ or 3’ ends or consecutive Tyr codons. These are all interesting hypotheses that others in the field have devoted entire publications to try and understand how and why codon interactions and localizations impact translation (see Gamble et al., Cell 2016, Kunec and Osterreider, Cell Reports 2016, Gobet et al., PNAS 2020). While these further analyses would be interesting, our current experimental data would be insufficient to properly address these questions. We have focused on a specific tRNA, its fragment, and demonstrated direct effects of the tRNA on the codon-dependent translation of a specific growth-regulating target gene and the tRNA fragment on the modulation of the activity of the RNA binding protein it binds to with respect to its regulon. We believe that these findings individually reveal causal roles for this tRNA and tRF in downstream gene regulation and collectively reveal a previously unappreciated post-transcriptional response. We hope the reviewer agrees with us regarding the already deep extent of the studies and that further such analyses beyond this tRNA are outside the scope and focus of this current study.

      Comment 4: The Authors should provide the specific parameters used to calculate the median abundance of Tyr codons in a protein and the list of proteins containing higher than median abundance of Tyr codon content. Moreover, the complete list of 102 candidate genes should also be provided. This will allow one to determine what percentage of these Tyr-enriched proteins exhibited a decrease in levels. Moreover, is there anything special about these Tyr codon-enriched transcripts where they are affected at the level of translation but not the other Tyr-codon enriched transcripts? For example, are these transcripts enriched at the 5' or 3' ends for Tyr codons? Do these transcripts exhibit multiple consecutive Tyr codons? This deeper analysis would enrich the findings in this manuscript.

      Response: For the proteins identified in the mass spectrometry and overlap listed in Fig 4A, Tyr codon abundance was calculated by dividing the number of Tyr amino acids present by the total number of amino acids for each protein. For genes with different isoforms possible, the principal isoform, using ENSEMBL, was used for calculations. We are also happy to provide the entire list of proteins. Additionally, please see above response to comment 3. We wish to emphasize that the goal of identification of these proteins was to identify downstream targets of this response for functional studies, which we have done. We have identified downstream genes that become modulated by this response and that regulate cell growth, consistent with the phenotype of the tRNA. We then demonstrated a direct causal tRNA-dependent codon-based response with a specific target gene using mutagenesis.

      While we agree that the additional analysis the reviewer is requesting to determine what constitutes heightened translational sensitivity to this response is interesting, we believe this is a challenging question for future studies. It is possible that enrichment at 5’ or 3’ or concentration of tyrosine codons could cause increased sensitivity. Ideally, one would have information on a larger set of proteins so that such challenging questions could be better statistically bolstered. Ultimately, the requested experiments that go beyond our current work would require further analyses and experiments to allow firm conclusions to be drawn. As the other reviewers state and this reviewer agrees, we have uncovered the initial discovery regarding this tRNA fragmentation response and provided mechanistic characterization. Future studies, which are beyond the scope of the current work will undoubtedly further characterize features of this response.

      Comment 5: The ribosome profiling results are condensed into two panels of Figure 5E and 5F. We recommend the ribosome profiling experiment be expanded into its own figure with more extensive analysis and comparison beyond just looking at tRNA-Tyr. This could reveal insight into other codons that are impacted coordinately with Tyr codons and perhaps strengthen their conclusion. As an example of a more thorough analysis of ribosome profiling and proteomics, we point the Authors to this recent paper: Lyu et al. 2020 PLoS Genetics, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008836

      Response: We thank the reviewer for their suggestion. We have expanded the analysis to look at codon usage scatterplots across all codons for shTyr and shControl replicates (Fig S5D). The 5 most changed codons are labeled with UAC, a codon for the tyrosine amino acid, being the most affected (red arrow). Consistent with our model, a tyrosine codon, when at the ribosome A-site, is most affected with depletion of the corresponding tRNA. The text has also been edited to reflect our new analysis providing further evidence that ribosomal stalling might occur with depletion of a given tRNA. The gray outline around the regression line represents the 95% confidence interval.

      Fig S5D

      Comment 6: Moreover, one would expect that the mRNAs encoding USP3, EPCAM and SCD would exhibit increased ribosome occupancy. Thus, the authors should at least provide relative ribosome occupancy information on these transcripts to provide evidence that the decrease in protein levels is indeed linked to ribosome pausing or stalling.

      Response: We would like to emphasize that resolution of ribosomal profiling data at the codon level for specific genes requires a high number of reads and replicates to draw accurate conclusions. There is an inherent level of stochasticity when mapping RPFs to specific genes and as a result, our analysis revolved around Tyr-enriched vs Tyr-low populations as this analysis was appropriate for our sequencing depth and number of replicates. To be able to conclusively make claims regarding ribosome pausing or stalling for specific genes, we would likely need further experimentation than can be currently done. However, we are currently conducting the requested bioinformatic analysis and have promising preliminary transcript-level data supporting our model.

      Comment 7: The results with hnRNPA1 and SSB/La are extremely preliminary and simply show binding of tRNA fragments but no biological relevance. We realize that the Authors attempted to see if Tyr-tRNA fragments impacted RNA Pol III RNA but found no effect. A potential experiment would be to perform HITS-CLIP on H2O2-treated cells to see if stress-induced tRNA fragments bind to SSB/La or hnRNPA1. In this case, at least the Authors would link the oxidative stress results found in Figure 1 and 2 with La/SSB and hnRNPA1.

      Response: We agree with the reviewer that a tRF function was not established in the manuscript. As a result, we have recently completed experiments looking at mRNA stability of the hnRNPA1 regulon in the context of overexpressing the tRF as well as using LNA to inhibit this Tyr-tRF (Fig 7E-F). Our data shows, in an hnRNPA1-dependent manner, that its regulon can be functionally regulated by Tyr-tRF. With tRF overexpression and RNAi-mediated depletion of hnRNPA1, a right shift in transcript stability is seen. Importantly, when we do the converse experiment with tRF inhibition in the same RNAi-mediated reduction of hnRNPA1, we see a left shift. These complementary experiments provide data that the Tyr-tRF has a functional role when bound to hnRNPA1 by modulating the regulon of hnRNPA1 and expand the scope of this manuscript and extend the pathway defined downstream of this tRNA fragmentation event.

      Fig 7E-F

      Comment 8: The manuscript concludes that "Tyrosyl tRNA-GUA fragments are generated in a DIS3L2-dependent manner" based upon data in Supplemental Figure S7. However, there is still a substantial amount of tyrosine tRNA fragments in both worms and human cells depleted of DIS3L2. Thus, DIS3L could play a role in the formation of Tyrosine tRNA fragments but it is too strong a claim to say that tRNA fragments are "dependent" upon DIS3L2. We suggest that the Authors soften their conclusions.

      Response: While there are certainly tRFs still apparent with DIS3L2 depletion (Fig S7F-I), we note significant impairment of tRF induction with DIS3L2 knockdown/knockout with multiple different methods in C. elegans and human cells. This data supports our conclusion that tRF generation is dependent on DIS3L2 as this ribonuclease is necessary to elicit the full Tyr-tRF response. We do not make claims that Tyr-tRFs are solely or completely dependent on DIS3L2. There must be other RNases involved given the data highlighted by the reviewer. To this point, we have added clarifying text that DIS3L2 depletion does not completely eliminate the tRF induction.

      Comment 9: Moreover, what is the level of DIS3L2 depletion in the worm and human cell lines? The Authors should provide the immunoblot of DIS3L2 that was described in the Materials and Methods.

      Response: An immunoblot of DIS3L2 depletion in human cells has now been added as a supplementary figure (Fig S7I). Depletion in C. elegans was confirmed through sequencing of a mutation, as is standard in the field. The wild-type PCR product is 1nt longer (859 bp) than the mutant product (858 bp) with CTC to TAG nonsynonymous mutation preceding a single nucleotide deletion.

      Wild-type disl-2: GTTGAAGCCGCAGGGC[CTC]ACTCAGACAGCTACAGG

      disl-2 (syb1033): GTTGAAGCCGCAGGGC[TAG]-CTCAGACAGCTACAGG

      Fig S7I

      Comment 10: The key conclusions of "a tRNA-regulated growth suppressive oxidative stress response pathway" and an "underlying adaptive codon-based gene regulatory logic inherent to the genetic code" are overstated. This is because of the major caveat that knockdown of tyrosine-tRNA or tyrosyl-tRNA synthetase are likely to trigger numerous indirect effects. While the authors validate that three proteins are expressed at lower levels under all three conditions (H2O2, tRNA-Tyr and YARS), they might overlap in some manner but not necessarily define a coordinated response. Thus, a glaring gap in this paper is a clear, mechanistic link between H2O2-induced changes in translation versus the changes in expression when either tRNA-Tyr or YARS is depleted. Thus, it is too preliminary to conclude that tRNA depletion is part of a "pathway" and "regulatory logic" when it could all be pleiotropic effects. At the very least, the authors should discuss the possibility of indirect effects to provide a more nuanced discussion of the results obtained using two different cell systems and oxidative stress.

      Response: We thank the reviewer for the feedback. While we agree that indirect effects may exist, we do not make any claims that our pathway is the only one required to have translation effects. The text for Fig 4A already acknowledges the pleiotropic effects of tRNA depletion. Our data shows that H2O2 stress leads to a depletion of Tyr tRNA-GUA and that depletion of this tRNA through multiple complementary methods has a codon-dependent effect on protein expression. We hope the reviewer agrees that the reduction of a specific target gene in a tyrosine codon-dependent manner (demonstrated by mutagenesis) and the binding of the tRF directly to an RBP and the modulation of the regulon of this RBP by this tRF (demonstrated by gain- and loss-of-function studies) demonstrates a direct role of this response on specific downstream target genes rather than pleiotropy. This is in keeping with the cross-comments of reviewer 1, where Fig 5D shows a direct Tyr codon link between H2O2 and downstream effects. As a result, we feel that our conclusions of a pathway (not the only pathway) are valid. However, the conclusion of a “regulatory logic” might not be interpreted in the same way by all readers and we have thus changed the text to reflect a more nuanced position.

      **Minor comments:** Comment 11: Tyrosyl-tRNAs refers to the aminoacylated form of tRNA. We recommend that all instances of tyrosyl-tRNA be changed to tyrosine tRNA or tRNA-Tyr which is more generic and provides no indication as to the aminoacylation status of a tRNA.

      Response: We thank the reviewer for their correction. We have changed all instances of “tyrosyl” to “tyrosine” in the text.

      Comment 12: In Figure 5C, the promoter is drawn as T7, which is a bacteriophage promoter. While the plasmid used in this manuscript (psiCHECK2) does contain a T7 promoter, mammalian gene expression is driven from the SV40 promoter. Thus, the relevant label in Figure 5C should be "SV40 promoter". Moreover, additional details should be provided on how the construct was made (such as sequence information etc.).

      Response: We thank the reviewer for their correction. We have changed the promoter text in the figure. In the methods for the construct, we have included which USP3 was used and would be happy to include further information if requested.

      Comment 13: Please provide original blots for each of the replicates in: Figure 4C, n=4 Figure 4A, n=9 Figure 4D, n=3 Figure 5D, n=3

      Response: There appears to be an unintentional mislabeling of the requested blots by the reviewer. The original blots for Fig 4C, Fig 5A, Fig 5D, and Fig 6D have been made available in a separate file for reviewers.

      Reviewer #3 (Significance (Required)): This manuscript provides evidence that specific tRNAs are depleted upon oxidative stress as part a conserved stress-response pathway in humans (and worms) to regulate translation in a codon-based manner. Unfortunately, the manuscript attempts to tie together results from different conditions and systems without providing any definitive links that suggest a "pathway" involved in the oxidative stress response. The findings in this paper provide a useful starting point but fall short of being a major advance due to the lack of a clear mechanism. However, there are intriguing results in this manuscript based upon the cell lines depleted of tRNA-Tyr or tyrosine synthetase that could interest researchers in the field of tRNA biology.

      Response: We thank the reviewer for the positive comments regarding our demonstration of a conserved stress response, acknowledging the intriguing nature of our findings that will be a starting point for future studies and that our work will be of interest to researchers in the field of tRNA biology. We hope that the very positive comments of reviewer 1 and 2, the cross-comments of reviewer 1 in response to reviewer 3’s comments regarding the specificity of this response, and our inclusion for reviewer 3 of additional data on the function of the tRF in regulating the activity of the hnRNPA1 RNA binding protein defining a post-transcriptional pathway and additional corroborating requested codon-level computational analyses provide compelling support that that our findings indeed represent a major advance for the field.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The major findings in this manuscript are: 1.) Oxidative stress in human cells causes a decrease in tyrosine tRNA levels and accumulation of tyrosine tRNA fragments; 2.) The depletion of tyrosyl-tRNA synthetase or tyrosine tRNAs in human cells results in altered translation of certain genes and reduced cell growth and 3.) hnRNPA1 and SSB/La can bind tyrosine tRNA fragments. There is also preliminary evidence that the DIS3L2 endonuclease contributes to the appearance of tyrosine tRNA fragments upon oxidative stress. Based upon these results, the Authors conclude that tyrosine tRNA depletion is part of a conserved stress-response pathway to regulate translation in a codon-based manner.

      Major comments:

      •There is a considerable amount of data in this paper and the experiments are performed in a generally rigorous manner. Sufficient details are provided for reproducing the findings and all results have been provided to appropriate databases (RNA-Seq and ribosome profiling).

      •The manuscript uses a probe against the 5' half of Tyrosine tRNA for Northern blotting. However, tRNA probes can be prone to cross-hybridization, especially with some tRNA isoacceptors being similar in sequence. Thus, the blots in Figure 2 and Supplemental Figures should be probed with an oligonucleotide against the 3' half of tRNA-Tyr. This will confirm the pre- and mature tRNA-Tyr bands detected with the 5' probe. Moreover, this will determine whether 3' tRNA-Tyr fragments accumulate.

      •The analysis of the proteomic and ribosome profiling experiments seem rather limited, or based upon what was presented in this manuscript. If additional analyses were performed, then they should be included as well, even if they yielded negative results. For example, the manuscript identifies 102 proteins that decrease after tRNA-Tyr depletion and YARS-depletion with a certain threshold of Tyr codon content. We realize the Authors were trying to find potential genes that are modulated under all three conditions. However, this does not provide information whether there is a relationship between a certain codon such as Tyr and protein abundance if only binning into two categories representing below and above a certain codon content. The Authors should plot the abundance change of each detected protein versus each codon and determine the correlation coefficient. This analysis is important for substantiating the conclusion of a codon-based system of specifically modulating transcripts enriched for certain codons. Otherwise, how could changes in tRNA-Tyr levels modulate codon-dependent gene expression if two different transcripts with the same Tyr codon content exhibit differences in translation? Moreover, this analysis should be performed with all the other codons as well.

      •The Authors should provide the specific parameters used to calculate the median abundance of Tyr codons in a protein and the list of proteins containing higher than median abundance of Tyr codon content. Moreover, the complete list of 102 candidate genes should also be provided. This will allow one to determine what percentage of these Tyr-enriched proteins exhibited a decrease in levels. Moreover, is there anything special about these Tyr codon-enriched transcripts where they are affected at the level of translation but not the other Tyr-codon enriched transcripts? For example, are these transcripts enriched at the 5' or 3' ends for Tyr codons? Do these transcripts exhibit multiple consecutive Tyr codons? This deeper analysis would enrich the findings in this manuscript.

      •The ribosome profiling results are condensed into two panels of Figure 5E and 5F. We recommend the ribosome profiling experiment be expanded into its own figure with more extensive analysis and comparison beyond just looking at tRNA-Tyr. This could reveal insight into other codons that are impacted coordinately with Tyr codons and perhaps strengthen their conclusion. As an example of a more thorough analysis of ribosome profiling and proteomics, we point the Authors to this recent paper: Lyu et al. 2020 PLoS Genetics, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008836

      •Moreover, one would expect that the mRNAs encoding USP3, EPCAM and SCD would exhibit increased ribosome occupancy. Thus, the authors should at least provide relative ribosome occupancy information on these transcripts to provide evidence that the decrease in protein levels is indeed linked to ribosome pausing or stalling.

      •The results with hnRNPA1 and SSB/La are extremely preliminary and simply show binding of tRNA fragments but no biological relevance. We realize that the Authors attempted to see if Tyr-tRNA fragments impacted RNA Pol III RNA but found no effect. A potential experiment would be to perform HITS-CLIP on H2O2-treated cells to see if stress-induced tRNA fragments bind to SSB/La or hnRNPA1. In this case, at least the Authors would link the oxidative stress results found in Figure 1 and 2 with La/SSB and hnRNPA1.

      •The manuscript concludes that "Tyrosyl tRNA-GUA fragments are generated in a DIS3L2-dependent manner" based upon data in Supplemental Figure S7. However, there is still a substantial amount of tyrosine tRNA fragments in both worms and human cells depleted of DIS3L2. Thus, DIS3L could play a role in the formation of Tyrosine tRNA fragments but it is too strong a claim to say that tRNA fragments are "dependent" upon DIS3L2. We suggest that the Authors soften their conclusions.

      •Moreover, what is the level of DIS3L2 depletion in the worm and human cell lines? The Authors should provide the immunoblot of DIS3L2 that was described in the Materials and Methods.

      •The key conclusions of "a tRNA-regulated growth suppressive oxidative stress response pathway" and an "underlying adaptive codon-based gene regulatory logic inherent to the genetic code" are overstated. This is because of the major caveat that knockdown of tyrosine-tRNA or tyrosyl-tRNA synthetase are likely to trigger numerous indirect effects. While the authors validate that three proteins are expressed at lower levels under all three conditions (H2O2, tRNA-Tyr and YARS), they might overlap in some manner but not necessarily define a coordinated response. Thus, a glaring gap in this paper is a clear, mechanistic link between H2O2-induced changes in translation versus the changes in expression when either tRNA-Tyr or YARS is depleted. Thus, it is too preliminary to conclude that tRNA depletion is part of a "pathway" and "regulatory logic" when it could all be pleiotropic effects. At the very least, the authors should discuss the possibility of indirect effects to provide a more nuanced discussion of the results obtained using two different cell systems and oxidative stress.

      Minor comments:

      •Tyrosyl-tRNAs refers to the aminoacylated form of tRNA. We recommend that all instances of tyrosyl-tRNA be changed to tyrosine tRNA or tRNA-Tyr which is more generic and provides no indication as to the aminoacylation status of a tRNA.

      •In Figure 5C, the promoter is drawn as T7, which is a bacteriophage promoter. While the plasmid used in this manuscript (psiCHECK2) does contain a T7 promoter, mammalian gene expression is driven from the SV40 promoter. Thus, the relevant label in Figure 5C should be "SV40 promoter". Moreover, additional details should be provided on how the construct was made (such as sequence information etc.).

      •Please provide original blots for each of the replicates in:

      Figure 4C, n=4

      Figure 4A, n=9

      Figure 4D, n=3

      Figure 5D, n=3

      Significance

      This manuscript provides evidence that specific tRNAs are depleted upon oxidative stress as part a conserved stress-response pathway in humans (and worms) to regulate translation in a codon-based manner. Unfortunately, the manuscript attempts to tie together results from different conditions and systems without providing any definitive links that suggest a "pathway" involved in the oxidative stress response. The findings in this paper provide a useful starting point but fall short of being a major advance due to the lack of a clear mechanism. However, there are intriguing results in this manuscript based upon the cell lines depleted of tRNA-Tyr or tyrosine synthetase that could interest researchers in the field of tRNA biology.

      This review is written from the perspective of a researcher with expertise in RNA processing, RNA biology and translation regulation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This very interesting study from Sohail Tavazoie's lab describes the consequences of oxidative stress on the tRNA pool in human epithelial cell lines. As previously described, the authors observed that tRNA fragments were generated upon exposure of cells to ROS. In addition, the authors made the novel observation that specific mature tRNAs were also depleted under these conditions. In particular, the authors focused on tyrosyl tRNA-GUA, which was decreased ~50% after 24 hours of ROS exposure, an effect attributable to a decrease in the pre-tRNA pool. Depletion of tyrosyl tRNA resulted in reduced translation of specific mRNAs that are enriched in tyr codons and likely contributed to the anti-proliferative effects of ROS exposure. In addition, the authors demonstrated that the tRFs produced from tyr tRNA-GUA can interact with specific RNA binding proteins (SSB and hnRNPA1).

      The major contribution of this paper is the novel finding that stress-induced tRNA fragmentation can result in a measurable reduction of specific mature tRNAs, leading to a selective reduction in translation of mRNAs that are enriched for the corresponding codons. Previously, studies of tRNA fragmentation largely focused on the functions of the tRFs themselves and it was generally believed that the mature tRNA pool was not impacted sufficiently to reduce translation. The findings reported here therefore add a new dimension to our understanding of the cellular consequences of stress-induced tRNA cleavage.

      Overall, the data are of high quality, the experiments are convincing, and the conclusions are well supported. I have the following suggestions that would further strengthen the study and bolster the conclusions.

      1.The authors have not formally demonstrated that the reduction in pre-tRNA in H2O2-treated cells is a consequence of pre-tRNA cleavage. It is possible that reduced transcription contributes to this effect. Pulse-chase experiments with nucleotides such as EU would provide a tractable approach to demonstrate that a labelled pool of pre-tRNA is rapidly depleted upon H2O2 treatment, which would further support their model. Since the response occurs rapidly (within 1 hour), it would be feasible to monitor the rate of pre-tRNA depletion during this time period in control vs. H2O2-treated cells.

      2.To what extent is the growth arrest that results from H2O2 treatment attributable to tyr tRNA-GUA depletion (Fig. 3A)? Since the reduction in tRNA levels is only partial (~50%), it should be feasible to restore tRNA levels by overexpression (strategy used in Fig. 3E, S3B) and determine whether this measurably rescues growth in H2O2-treated cells.

      3.Knockdown of YARS/tyr tRNA-GUA resulted in reduced expression of EPCAM, SCD, and USP3 at both the protein and mRNA levels (Fig. 4C-D, S4C). In contrast, H2O2-exposure reduced the abundance of these proteins without affecting mRNA levels (Fig. 5A-B, S5A). The authors should comment on this apparent discrepancy. Perhaps translational stalling induces No-Go decay, but it is unclear why this response would not also be triggered by ROS.

      4.In addition to the analyses of ribosome profiling in Fig. 5E-F, it might also be helpful to show a metagene analysis of ribosome occupancy centered upon UAC/UAU codons (for an example, see Figure 2 of Schuller et al., Mol Cell, 2017). This has previously been used as an effective way to visualize ribosome stalling at specific codons. Additionally, do the authors see a global correlation between tyrosine codon density and reduced translational efficiency in tRNA knockdown cells?

      5.MINOR: On pg. 4, the authors state that tRF-tyrGUA is the most highly induced tRF, but Fig. S1B appears to show stronger induction of tRF-LeuTAA.

      Significance

      The major advance provided by this work is the demonstration that stress-induced tRNA cleavage can reduce the abundance of the mature tRNA pool sufficiently to impact translation. Moreover, the effect on mature tRNAs is selective, resulting in the reduced translation of a specific set of mRNAs under these conditions. These findings reveal previously unknown consequences of oxidative stress on gene expression and will be of interest to scientists working on cellular stress responses and post-transcriptional regulation.

      REFEREES CROSS-COMMENTING

      Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly?

      Here is what I was thinking: The generation of tRFs does not generally result in reduction in levels of the mature tRNAs. So you can imagine a scenario where oxidative stress causes tRF generation from the mature tyr tRNA (which does not impact its steady-state levels), as is the case for other tRNAs. At the same time, decreased transcription would reduce the pre-tRNA pool, leading to a delayed reduction in mature tRNA, as observed.

      However, looking back at the data, I see that after only 5 min of H2O2 treatment, the authors observed reduced pre-tRNA and increased tRFs (Fig. 2A). This seems very fast for a transcriptional response, which would presumably require some kind of signal transduction. In addition, when you consider the amount of tRFs produced in Fig. S2C, it is hard to imagine that this would not impact the mature tRNA pool if they were derived from there. So I agree that the transcriptional scenario seems unlikely.

      Nevertheless, I think that looking at pre-tRNA degradation directly with the pulse-chase strategy would strengthen their story, so I would like to give the authors this suggestion. However, I am fine with listing this as an optional experiment which would enhance the paper but should not be essential for publication.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Huh et al. reports that oxidative stress causes fragmentation of a specific tyrosine pre-tRNA, leading to two parallel outcomes. First, the fragmentation depletes the mature tRNA, causing translational repression of genes that are disproportionally rich in tyrosine codon. These genes are enriched for those involved in electron transport chain, cell cycle and growth. Second, the fragmentation generates tRNA fragments (tRFs) that bind to two known RNA binding proteins. Finally, the authors identify a nuclease that is needed for efficient formation of tyrosine tRFs.

      The authors should include a short diagram indicating the various known steps of pre-tRNA fragmentation (perhaps as a supplement) for general readers.

      I find the enrichment for mitochondrial electron transport chain (ETC) curious. The ETC includes several oxidoreductases, which may be rich in tyrosine as it is a common amino acid used in electron transfer. The depletion of the tyrosine tRNA from among many tRNAs under oxidative stress may not be incidental but related to an attempt by the cell to decrease oxygen consumption to avoid further oxidative damage. The authors could further mine their data to corroborate this hypothesis. For example, are the ETC genes among the targets of the RNA binding proteins targeted by tyrosine tRFs? This could potentially connect the effects of mature tRNA depletion and tRFs.

      In figure 4A, the authors should provide the tyrosine codon content of the overlap genes and show how much it differs from a randomly selected sample.

      Fig.6F, lower panel: the model should show pre-tRNA, as opposed to mature tRNA, because it is the former that is fragmented.

      Significance

      This study is comprehensive and novel, and includes several orthogonal and complementary approaches to provide convincing evidence for the conclusions. The main discovery is significant because it presents an important advance in post-transcriptional control of gene expression. The process of tRF formation was previously thought not to affect the levels of mature tRNA. This study changes that understanding by describing for the first time the depletion of a specific mature tRNA as its precursor form is fragmented to generate tRFs. Finally, the authors identify DIS3L2 as a nuclease involved in fragmentation. This is also an important finding as the only other suspected nuclease, albeit with contradictory evidence, is angiogenin. Collectively, the findings of this study would be of interest to a broad group of scientists. I only have a few minor comments and suggestions (see above).

      REFEREES CROSS-COMMENTING

      I have the following comments on other reviewers' critiques.

      Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly?

      Reviewer 3 raises the issue of cross hybridization in Northern blots. The authors indicate that they "could not detect the other tyrosyl tRNA (tRNA Tyr AUA) in MCF10A cells by northern blot..." (page 6). Also, they gel extracted tRFs and sequenced them (figure S6B), directly identifying the fragments. I think these findings mitigate the concern of cross hybridization and clearly identify the nature of tRFs.

      Finally, I think that the codon-dependent reporter experiment (figure 5D) addresses many issues surrounding codon dependent vs indirect effects. In that experiment, the authors mutate 5 tyrosine codons of a reporter gene and demonstrate that the encoded protein is less susceptible to repression in response to oxidative stress.

    1. Reviewer #3:

      General comment: Marotel et al present a detailed characterization of the peripheral NK cells phenotype and function in patients with chronic hepatitis B. The cohorts are well designed and used in an appropriate way that makes the conclusions interesting. The manuscript is well written and the figures easy to navigate. Supplementary information is relevant. Interesting parallels with T cell exhaustion mechanisms are made. Weakness might relate to relative lack of selective/precise analysis of subsets (bright vs dim, and maturation stratification) for example in RNAseq, calcium experiments, phosflow and mitochondria analysis.

      Major comment 1: Figure 2 - As it seems, results display total NK cells which makes sometimes differences difficult to interpret, if possible, please provide in supplement at least phenotype of Bright vs DIM NKG2A+ vs DIM NKG2A-

      Major comment 2: Figure 3 - Phosflow as well as mitochondrial analysis are always difficult to perform due to technical specificities, efficient detection of epitopes, atypical fluorescence leakages or analysis of small shift differences. For both techniques, in order to highlight the quality of the datasets, please provide representative histograms as well as positive and negative controls, and gating strategy to further convince the readers.

      Major comment 3: Figure 6 - Regarding calcium related mecanisms - Mechanistic investigations might be completed to support the current statements such as highlighted in the abstract "when stimulating Ca2+-dependent pathway in isolation, we recapitulated the dysfunctional phenotype" (based on n=3, total NK cells from Healthy individuals). Cells from patients might be investigated. Also, beside the ionomycine treatment performed, calcium flux experiment in sorted cells based on the phenotypes described would have been elegant.

      Major comment 4: A large part of the manuscript relates to TOX and its involvement in exhaustion. However, a recent article (Sekine et al, Science immunology 2020) demonstrated that TOX is expressed by most circulating effector memory CD8+ T cell subsets and not exclusively linked to exhaustion.

      This is an important piece of work where such data might be integrated and invite reinterpretation of results and conclusions.

    2. Reviewer #2:

      In this manuscript, Marcais laboratory defines the molecular basis of NK cell dysfunction in patients with Hepatitis B. They use NK cells derived from the peripheral blood of Hep-B patients and healthy cohorts. The key finding is that the NK cells derived from the Hep-B patients were able to mediate cytotoxicity while they were significantly impaired to producing inflammatory cytokines, including IFN-g. Employing phenotypic, functional, and transcriptomic analyses, authors conclude that NFAT-mediated Ca2+-dependent cellular exhaustion as the potential mechanism results in dysfunctional peripheral NK cells. This study provides newer insights into the molecular mechanisms associated with NK cell dysfunction. However, addressing the following concerns can vastly improve the contribution of this work.

      1) Given significant differences between the published characteristics of T cell exhaustion and authors' findings in this current work, it is not fair to call them similar. This applies to both phenotypic and functional changes. For example, in multiple viral infection models, the decrease in IFN-g production occurs in a step-wise manner during the progress of T cell exhaustion. In the current work, the authors show a significant and complete reduction of IFN-g production in all the patients analyzed. Importantly, the number of T cells that produce multiple cytokines such as IFN-g and TNF-a are reduced. However, it does not appear that these two cytokines are concurrently reduced in Hep-B patients. Another difference is that the NK cells from Hep-B patients are able to mediate normal cytotoxicity against K562 cells while the exhausted T cells are impaired in mediating this effector function. While it may be true that the NK cells in the Hep-B patients undergoing exhaustion, it may not be fair to call this phenomenon as that of T cells.

      2) The link that authors are providing between mTOR-S6-NK cell exhaustion is not clear. The reduction in the phosphorylation of AKT is significant; but, moderate. Is this physiologically relevant? Does the alternate pathway mediated by PIM kinases is the one primarily affected in the NK cells from the Heo-B patients?

      3) Apart from NFAT, T-bet, BATF, EOMES, FOXO1, BLIMP1, and IRF4 have been implicated in playing a significant role in causing T cell exhaustion. What are the reasons that the gene signatures representing these transcription factors did not come through from the RNA sequencing analyses?

      4) It is not clear how treating with a higher concentration of ionomycin can mimic NK cell exhaustion that occurs over a period of months or years. Theoretically, it cannot be a transient over-flux of calcium that initiates the expression of TOX and leading to NK cell exhaustion. NFAT/Calcineurin could play a role in the formation of NK cell exhaustion. However, the over-activation of NK cells from healthy control does not prove that this mechanism is the cause of the pathological outcome.

    3. Reviewer #1:

      Marotel et al. study the mechanisms of NK cell exhaustion in patients with chronic hepatitis B infection (CHB). They first confirm several previous findings, such as reduction of IFNg production by NK cells accompanied by a change in phenotype in CHB patients. Furthermore, they show that mTOR activation is impaired in CD56bright NK cells upon IL-15 stimulation, and at the same time total NK cells do not show differences in selected metabolic parameters. They also performed RNAseq analysis which indicated transcriptional similarities of CHB NK cells and exhausted CD8+ T cells. In line with RNAseq, CHB NK cells showed increased expression of TOX transcription factor and inhibitory receptor LAG3 in CHB NK cells. The authors suggest that this is due to NFAT signaling, and show that NK cells have reduced ability to produce IFNg following incubation with target cells if they were previously stimulated with ionomycin overnight to support their hypothesis of NFAT involvement.

      In conclusion, while presented observations are interesting and relevant, they are still preliminary and largely descriptive. In addition, conclusions are not fully supported by the data.

      1) Figure 3. The authors focus on CD56bright NK cells when measuring mTOR activation, as CD56bright NK cells are more responsive to IL-15. They show that in HBV patients CD56bright NK cells have impaired response to mTOR activation. They correlate this finding with several metabolic parameters in total NK cells. Since CD56bright NK cells represent only a small fraction of NK cells it is not clear why the metabolic parameters were not analyzed only on CD56bright population as well, or vice versa, why the total NK cells were not compared in both cases (mTOR activation and metabolic characteristics). At the current state, no conclusion can be reached by comparing these two sets of data. Also, it is not clear if cells that have reduced ability to activate mTOR upon IL-15 stimulation contribute to other observations presented, e.g. if this finding would explain reduced NK cell ability to produce IFNg, changes in NK phenotype or transcriptome.

      2) Several metabolic parameters are studied, however, it is not clear how they were selected as there are many other metabolic processes involved in NK cell response which could be important and deregulated in CHB. In addition, only basal metabolic state was analyzed, but it remains unclear if CHB NK cells show the same metabolic characteristics upon activation.

      3) Figure 5 - isotype controls are missing in all histograms. The authors state in the text 'Increased TOX expression was seen mainly in the CD56dim subset in CHB patients.', however, they do not provide data for this statement. As mentioned previously, the effects of CHB on NK mTOR signaling are the highest in the CD56bright population, so it is not clear how these data do relate one to each other.

      4) The authors provide evidence that expression of transcription factor TOX is increased and T-bet expression is reduced to support the transcriptome data on the similarity of CHB NK cells and exhausted CD8+ T cells. However, they do not provide the evidence on the co-expression of these transcription factors, and if their changed expression directly correlates with reduced functional properties of NK cells, e.g. if NK cells having high TOX and low T-bet will produce less IFNg.

      5) To address their hypothesis on NFAT involvement in NK cell exhaustion and TOX expression the authors stimulate NK cells in vitro with ionomycin and show that pretreatment with ionomycin renders NK cells hyporesponsive. They titrate the effect of ionomycin and find an ionomycin concentration which is inducing a reduction of IFNg response without affecting degranulation. While the reduction of IFNg response in this experiment is observed as in chronic HBV infection, this model should be validated before making any claims. For example, the phenotype and transcription profile of the ionomycin treated cells should be analyzed, as well as the expression of transcription factors. A similar experiment has been published previously, so the novelty is minor without additional experiments addressing above mentioned issues.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      This is a case report analysing TCR repertoire on two individuals with suspected COVID-19 infection. The report shows that a set of TCR sequences expands between days 15 and day 30/37 and another set contract. The amount of expansion/contraction is not clearly shown. Most of these sequences are found in the memory phenotype. A few (especially CD4) are found before immunisation. As the authors point out, the evidence that the TCRs recognise COVID-19 is purely circumstantial. Even if they do, I do not see that this study contributes significantly to understanding either the protective or the pathological immune response to COVID-19.

      Substantive concerns:

      1) The abstract includes unsubstantiated claims. For example "T cell response is a critical part of both individual and herd immunity to SARS-CoV-2 and the efficacy of developed vaccines. " Or "In both donors we identified SARS-CoV-2-responding CD4+ and CD8+ T cell clones. We describe characteristic motifs in TCR sequences of COVID- 19-reactive clones, suggesting the existence of immunodominant epitopes." The authors do not identify COVID-19 responding clones; nor do they show any evidence that there are immunodominant epitopes.

      2) Fig 1 What does "normalized trajectory of TCR clones in each cluster" mean? It would be interesting to see the magnitude of the responses. Similarly, I don't really understand the y axis in panels d and e.

      3) Fig 3. I don't understand panels a and b. Is this the proportion of contracting TCR sequences which are memory phenotype? If so, what are the rest? Or are they simply not captured. The figure legend is obscure.

    2. Reviewer #2:

      This manuscript describes a longitudinal study of TCR repertoires in two individuals with mild COVID-19. TCRalpha and beta repertoires at 4 time points post-infection are used to identify T cell clonotypes likely responding to COVID-19. These responding clones fall into two groups, a set of monotonically contracting clones and a set of clones whose frequencies peak (at day ~37) and then contract. Sequencing of memory populations at two time points and availability of TCR repertoire data from both individuals prior to infection allow the authors to map clonotypes to memory phenotypes and to identify a handful of responding clones that existed in the memory compartment prior to infection. Clusters of sequence-similar clonotypes are identified that suggest focused responses to immunodominant epitopes. This is a succinct and timely study and I have no major concerns, just a few minor questions/suggestions/typos detailed below.

      How unexpected is the TCR clustering evident in Fig 2d-g? For example if the same number of equally high Pgen sequences were selected at random? I wonder whether the authors could run ALICE on just the responding clones (not the full dataset) to assess which neighborhoods are very unlikely to occur by chance.

      Could the "computational chain pairing" method of Minervina et al be applied to this data? If only to try to connect some of the sequence motifs between the alpha and beta chains?

    3. Reviewer #1:

      General assessment: This work investigates the T cell receptor (TCR) repertoires of 2 individuals diagnosed with mild COVID-19 infection. The authors use high-throughput sequencing of 2 biological replicate samples obtained at each of multiple pre-infection and post-infection timepoints to identify TCRalpha and TCRbeta clonotypes that contract or expand post-infection and to investigate potential reactivation of pre-existing memory cells. This is a potentially interesting work that may provide novel insights into T cell responses to SARS-CoV-2. However, some of the specific details of the various analyses reported are not clear and I have several major concerns about the reported work.

      Substantive concerns:

      1) The primary concern is the TCR specificity of the clonotypes that were determined to be contracting or expanding post-SARS-CoV-2-infection and therefore identified as responding to or reactive to SARS-CoV-2. There is no verification that these expanding or contracting clonotypes have TCR specificity for SARS-CoV-2. One alternative possibility is that some, maybe even many, of these expanding or contracting clonotypes are bystander-activated T cells with TCRs that are not specific for SARS-CoV-2. Similarly, the clonotypes that were identified as contracting or expanding post-SARS-CoV-2 infection and also detected in the memory pool prior to SARS-CoV-2 infection may not be crossreactive (i.e. specificity for another infection + SARS-CoV-2), as suggested by the authors, but rather non-SARS-CoV-2-specific bystander-activated memory T cells.

      While the dynamics of the T cell populations following SARS-CoV-2 infection may be informative regardless of the mode of activation of the T cells (i.e. TCR-mediated vs. bystander activated), the reported TCR clonotype motifs are only informative if these TCRs have SARS-CoV-2 specificity.

      2) Another concern is the substantial variation between the various approaches used to identify the contracting and expanding clonotypes post-infection that are associated with COVID-19 infection. The manuscript text states that the EdgeR and NoiseET approaches for identifying expanding and contracting clonotypes yielded similar results. Fig. S4a, d suggest that the two approaches yield similar trajectories for the identified expanding and contracting clonotype subsets (i.e. fraction of reactive clonotypes). However, the Venn diagrams in Fig. S4b, c, e, f show that the two approaches are, in some cases, identifying substantially different subsets of expanding or contracting clonotypes. For example, for Donor M in Fig S4f, of the 1044 expanded clonotypes identified by NoiseET, only 478 were also identified by EdgeR.

      The text also states that the contracting and expanding clonotypes identified using EdgeR largely overlap/correspond to the clusters 2 and 3 of clonal trajectories yielded using PCA (Fig. 1b-e) but no quantitative evidence is provided to support this. Venn diagrams, similar to those in Fig. S4, could be provided that compare the expanding and contracting clonotypes identified using the three different approaches (i.e. EdgeR, NoiseET, and PCA) as applied to TCRa as well as TCRb clonotypes.

      While these differences between methods may not have significant consequences for some of the reported results (eg. temporal clonal trajectories), these differences raise concerns about the results that depend on specific clonotype sequences (eg. Fig 2d-g, Fig S8 and Fig S5 d-g that report amino acid motifs for contracting and expanding clonotypes).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Author Response

      Reviewer #1:

      This study was designed to determine whether there is a relationship among cranial suture closure patterns, the molecular causes for suture patency/closure, and phylogeny. The authors use correlative data to test causal hypotheses related to brain size, suture closure patterns, and diet and search for the genetic underpinnings of the relationships they identify using reference genomes. There are many ideas put forward and methods used that are not clearly explained in the body of the work or in the supplementary material. This made it difficult to provide a clear evaluation of the work. Even checking original sources on which they base their approach, I found some disconnect between original sources and ideas laid out here. I see some interesting ideas in the study but a lack of solid reasoning behind the hypotheses proposed, confusion about the data and/or ideas summarized from the literature (the confusion could be on my part, but it rests with the authors to explain this more fully), and lack of detail regarding methods used to support their conclusions.

      We take good note of this confusion and we will explain everything in more detail in a revised version of the manuscript.

      1) The entire study rests on the authors scoring of sutures as patent or closed but no information is given other than a suture was considered closed if it was not visible ( 'obliterated"), and a suture was considered open if visible. These are problematic definitions for distinguishing patent from closed sutures if we accept the authors' definition of sutures as growth and stress diffusion sites. A suture can be visible but still be "closed" as evidenced by bony connections or bridges linking the bones that border the suture. In the case of bridging, the suture would be visible, so would be scored as "open" according to the authors' criterion, but functionally, the suture is closed.

      Visual examination of sutures (e.g., from photos or in situ) is a common procedure in macroevolutionary studies of suture patency, where raw data is not always available for histological inspection (e.g., invasive procedures or CT are not permitted). In this regard, we follow previous literature. We would like to note that only photographic materials were available for most specimens during this project, because of the current exceptional circumstances (museums lockdown).

      Also, in some mammals (e.g., the laboratory mouse) most cranial sutures do not close in typically developing individuals.

      In this study we used specimens hosted in museum collections, which come from the wild or zoos. We did not use data from laboratory animals grown in controlled environments, which may indeed affect their suture patency (e.g., by feeding on pellets).

      2) Age estimates are not provided for the specimens used in analysis. In many mammalian species, suture closure occurs in a somewhat predictable fashion - this, coupled with tooth formation/eruption patterns is one of the ways that forensic scientists aged skeletal remains prior to the advent of modern technologies. The order of suture closure is not necessarily similar across vertebrates, or even across mammals. This means that, without known or estimated ages for each skull included in analysis, age becomes an unrecognized source of variation that will affect analytical outcome.

      Unfortunately, the exact age for museum specimens is often not available. For this reason, we focused on adult specimens, where suture patency tends to remain constant. We also excluded individuals with signs of senescence. To accommodate age and other source of intraspecific variation in adults, we collected information for as many individuals as possible, often more than 10 and sometimes up to 100. Thus, we coded suture patency as a frequency rather than as open/closed for each species.

      We only dichotomized suture patency as open/closed for the second part of the study. Here we used a sensible threshold to avoid ambiguity and be conservative. As a result, species with frequency of suture patency between 75% and 25% were excluded. This also means that if only 4 individuals were examined (small sample size was unavoidable for some rare species) and at least one showed a discrepancy, that species was excluded from the analysis. However, because suture patency is a very conserved trait, only a few taxa had to be excluded at the end.

      In any case, we will emphasize more this fact in the revised version.

      3) The authors' impact statement: "brain growth and skull ossification sequence cause suture closure in mammals evolution without common genetic factors causing premature suture closure diseases in humans" is hard to digest as brain growth is not considered by the authors but instead brain size. From a developmental perspective, brain size or even some form of the encephalization quotient (EQ) is not what is commonly proposed to drive suture closure/patency (or degree of patency). Instead it is the dynamics of brain growth that is proposed as a stimulus for the initiation of mineralization of cranial bones. As bones increase in size, new bone is added at the leading edge of opposing bones that line the suture, while the stem cells in the center of the suture remain to add to the mesenchymal cell population of the suture, keeping the suture patent. In short, the dynamics of brain growth (including any signaling emanating from the brain, dura, bones, or even the suture itself) contributes to suture patency. Because sutures tend to close later in life (after childhood in humans), normal suture closure appears to be associated with the termination of brain growth. Making the jump in their study from estimates of EQ (in some way estimated here) to dynamics of brain growth as a cause requires several steps and knowledge on timing and rate of growth that is not considered by the authors.

      We agree with the reviewer. A developmentally focused study on suture formation and closure dynamics must consider brain growth. However, this information is not available for most species selected for this study. Note that species selection depended on the availability of referenced genomes and multiple sequence alignments (some of which are rare, endangered species). Because we were comparing macroevolutionary dynamics in adults we decided to use brain size as a feasible proxy for brain influence (either due to growth or signalling). We aim to fill this gap in future research projects. In the meantime, we will revise the wording of the article to make sure that there are no misleading statements about brain growth influence.

      4) The authors assume a suture closure pattern across the skull that starts at the anterior (rostrally) and move posteriorly (caudally) and builds this into their model. This seems to be based on a work by Koyabu et al. (2014), but that study is about the appearance of ossification centers for bones (not suture formation or closure) and the study actually clumps the frontal and parietal into the same group in their final analysis so why this supports and anterior to posterior direction of suture closure is not clear.

      Note that we did not “assume” any closure pattern; we interpreted the published evidence on how the skull ossifies in mammals to make a plausible hypothesis. We also tested other 11 plausible hypotheses. It could have happened that such hypothesis was worse than the others, but we found that the best supported hypothesis includes an anterior-posterior relation of suture closure. We will try to explain the construction of our model and hypothesis testing better in the revised version.

      5) The authors conclusion: (Lines 289-292 does not follow from their analyses.) Brain growth was not analyzed. I am uncertain what they mean by suture self-regulation as I don't think their detection of genetic variants in common across a diverse set of species means that those are controlling suture patency/closure.

      The proposed idea of suture self-regulation refers to the fact that one suture closure may affect another suture closure (as theoretical models previously suggested), and it is not necessarily related to the genetic variants identified here. As explained before, we will revise any reference to brain growth.

      Reviewer #2:

      -Authors tested 4 hypotheses (page 5, lines 78-84), but rejected or questioned them later on (which is a fair approach to be realistic and point out possible weaknesses or methodological limitations, nevertheless, I find there are more questions or suggestions rather than actual answers).

      We have tried to offer an open and clear set of hypotheses, tested them with the available data, and discussed the results fairly. As it is often the case in science, research may bring more questions than answers; we do not see this as a weakness. Our answers are also contextualized within the limitations that we described in the methods. We believe this is the correct way of doing science: even if this forces us to reject all our hypotheses, negative results are also results. Since our object of study is not very well known, we hope this study can fuel more research.

      -Lots of repeating text

      -Frequent missing references for major statements, unclear formulations

      We will double-check our manuscript. However, the reviewer offers no details about what is repeated or missing.

      -Few contradicting or unclear information, for instance, "high conservation..enabled us to categorize phenotype as either open or closed" / "suture patency ranging from 0-1, only above 75% and below 25% was counted as open or closed" / authors involved species were >2 samples were available but excluded any ambiguous case (small number of samples per species?)

      As explained before, thresholding at 25/75 % was used to binarize species as having a suture open or closed. This binarization is only used for the convergent amino acid substation analysis. We excluded ambiguous cases (i.e., a suture half closed) prior to data collection. We will explain it better in the revised version to avoid confusion.

      -"Phylogenetic path analysis showed almost no effect of diet on the brain size; low to medium (what does that mean then?) effect of brain on suture closure and medium to high effect of 1 suture affecting the other sutures in AP direction" (in many species this is described-the timeline of suture closure)

      Not sure about what the reviewer means; we will revise these sentences to make them clearer to readers.

      -I am not able to evaluate if the assessment of diet hardness as an equivalent to mechanical forces in the skull is correct and hope other reviewers will be able to do that-in fact, also to evaluate the phylogenetic path analysis performed in this manuscript. Authors took information on % of nektar/soft-plants and invertebrates/hard food (seeds etc) that given species consumes and multiplied by an index but not an actual modeling or assessment of the forces... To a laymen it looks like, for instance, cow chewing all day long relatively soft grass, building very strong muscles will at the end develop much more force/tension within the skull than an animal cracking one nut.

      As the reviewer correctly points out, chewing grass all day long is harder than cracking one nut (cracking nuts “all day long” would be another issue). In any case, we have weighed each food item compared to others (e.g., grass is weighed as twice as hard as meat) and there is consensus that feeding on seeds and scavenging is one of the most biomechanically demanding feeding strategies. In addition, we would like to note that we critically discussed the caveats of diet hardness as a proxy for the effect of feeding biomechanics on sutures, and we did not blindly assume this as a hard truth.

      -Lots of attention is given to the three identified genes with convergent amino acid substitution despite the fact that none of these genes have ever been related to any aspect of craniofacial biology, nor to the suture pathological conditions.

      We discussed the three genes that our analysis revealed. We cannot discuss genes for which we found no support. For these three genes, we offered plausible scenarios for how they could be associated to craniosynostosis; it is for future studies to explore these scenarios and validate experimentally or clinically these genes. The fact that they are not currently known as part of pathological conditions does not preclude that we need to discuss them in the manuscript. Every year, new genetic variants are discovered to be associated with craniosynostosis. The lack of correspondence between these genes and pathology is in fact one of the findings of this study: the few genes that show convergent mutations are not associated to pathology. We agree that absence of evidence is not evidence of absence. However, we also think that this is a result to be discussed in this manuscript and for the readers to ponder.

    2. Reviewer #2:

      Authors' goal was to reveal phenotypic and genetic causes of suture closure in evolution. Authors formulated and tested several hypotheses to find out whether brain size, diet hardness, etc is a causal link to the presence of typically patent (open) or closed sutures in 48 mammalian species. Next, authors attempted to identify genes (And convergent AC substitutions) associated with these species-specific suture status, and relate them to the biological functions commonly associated with suture formation and/or mutation in pathological conditions such as craniosynostosis.

      While I think it is an interesting question or hypothesis to test (seems to be inspired by Abelson 2016 and similar studies) during the reading, several concerns arose (and even authors themselves pointed out several of them a few times). Overall, I do not find convincing evidence for the authors' statements. Very briefly, just few of my comments:

      -Authors tested 4 hypotheses (page 5, lines 78-84), but rejected or questioned them later on (which is a fair approach to be realistic and point out possible weaknesses or methodological limitations, nevertheless, I find there are more questions or suggestions rather than actual answers).

      -Lots of repeating text

      -Frequent missing references for major statements, unclear formulations

      -Few contradicting or unclear information, for instance, "high conservation..enabled us to categorize phenotype as either open or closed" / "suture patency ranging from 0-1, only above 75% and below 25% was counted as open or closed" / authors involved species were >2 samples were available but excluded any ambiguous case (small number of samples per species?)

      -"Phylogenetic path analysis showed almost no effect of diet on the brain size; low to medium (what does that mean then?) effect of brain on suture closure and medium to high effect of 1 suture affecting the other sutures in AP direction" (in many species this is described-the timeline of suture closure)

      -I am not able to evaluate if the assessment of diet hardness as an equivalent to mechanical forces in the skull is correct and hope other reviewers will be able to do that-in fact, also to evaluate the phylogenetic path analysis performed in this manuscript. Authors took information on % of nektar/soft-plants and invertebrates/hard food (seeds etc) that given species consumes and multiplied by an index but not an actual modeling or assessment of the forces... To a laymen it looks like, for instance, cow chewing all day long relatively soft grass, building very strong muscles will at the end develop much more force/tension within the skull than an animal cracking one nut.

      -Lots of attention is given to the three identified genes with convergent amino acid substitution despite the fact that none of these genes have ever been related to any aspect of craniofacial biology, nor to the suture pathological conditions.

    3. Reviewer #1:

      This study was designed to determine whether there is a relationship among cranial suture closure patterns, the molecular causes for suture patency/closure, and phylogeny. The authors use correlative data to test causal hypotheses related to brain size, suture closure patterns, and diet and search for the genetic underpinnings of the relationships they identify using reference genomes. There are many ideas put forward and methods used that are not clearly explained in the body of the work or in the supplementary material. This made it difficult to provide a clear evaluation of the work. Even checking original sources on which they base their approach, I found some disconnect between original sources and ideas laid out here. I see some interesting ideas in the study but a lack of solid reasoning behind the hypotheses proposed, confusion about the data and/or ideas summarized from the literature (the confusion could be on my part, but it rests with the authors to explain this more fully), and lack of detail regarding methods used to support their conclusions.

      1) The entire study rests on the authors scoring of sutures as patent or closed but no information is given other than a suture was considered closed if it was not visible ( 'obliterated"), and a suture was considered open if visible. These are problematic definitions for distinguishing patent from closed sutures if we accept the authors' definition of sutures as growth and stress diffusion sites. A suture can be visible but still be "closed" as evidenced by bony connections or bridges linking the bones that border the suture. In the case of bridging, the suture would be visible, so would be scored as "open" according to the authors' criterion, but functionally, the suture is closed. Also, in some mammals (e.g., the laboratory mouse) most cranial sutures do not close in typically developing individuals.

      2) Age estimates are not provided for the specimens used in analysis. In many mammalian species, suture closure occurs in a somewhat predictable fashion - this, coupled with tooth formation/eruption patterns is one of the ways that forensic scientists aged skeletal remains prior to the advent of modern technologies. The order of suture closure is not necessarily similar across vertebrates, or even across mammals. This means that, without known or estimated ages for each skull included in analysis, age becomes an unrecognized source of variation that will affect analytical outcome.

      3) The authors' impact statement: "brain growth and skull ossification sequence cause suture closure in mammals evolution without common genetic factors causing premature suture closure diseases in humans" is hard to digest as brain growth is not considered by the authors but instead brain size. From a developmental perspective, brain size or even some form of the encephalization quotient (EQ) is not what is commonly proposed to drive suture closure/patency (or degree of patency). Instead it is the dynamics of brain growth that is proposed as a stimulus for the initiation of mineralization of cranial bones. As bones increase in size, new bone is added at the leading edge of opposing bones that line the suture, while the stem cells in the center of the suture remain to add to the mesenchymal cell population of the suture, keeping the suture patent. In short, the dynamics of brain growth (including any signaling emanating from the brain, dura, bones, or even the suture itself) contributes to suture patency. Because sutures tend to close later in life (after childhood in humans), normal suture closure appears to be associated with the termination of brain growth. Making the jump in their study from estimates of EQ (in some way estimated here) to dynamics of brain growth as a cause requires several steps and knowledge on timing and rate of growth that is not considered by the authors.

      4) The authors assume a suture closure pattern across the skull that starts at the anterior (rostrally) and move posteriorly (caudally) and builds this into their model. This seems to be based on a work by Koyabu et al. (2014), but that study is about the appearance of ossification centers for bones (not suture formation or closure) and the study actually clumps the frontal and parietal into the same group in their final analysis so why this supports and anterior to posterior direction of suture closure is not clear.

      5) The authors conclusion: (Lines 289-292 does not follow from their analyses.) Brain growth was not analyzed. I am uncertain what they mean by suture self-regulation as I don't think their detection of genetic variants in common across a diverse set of species means that those are controlling suture patency/closure.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      Nayler et al. report methods to generate cerebellar organoids from human induced pluripotent stem cells and their characterization by single-cell sequencing and bioinformatic analysis. They further test the effect of adding Matrigel to the system, which has previously been useful in other organoid systems. The topic is important for the study of human cerebellar developmental and modeling of human disease. The paper suffers from a number of issues, especially the fact that the claims in the text are not supported by the data.

      Specific comments:

      The method is largely the same as developed by Muguruma et al, a methodology that has not proved to be very effective or reproducible. That said, it is not clear that cerebellar organoids generated in this report have differentiated as well as the original paper based on immunolabeling, though this may be due to low power images. The authors repeatedly point out that their method does not need co-culture with mouse granule cells, however they show no maturation of Purkinje cells, which is what prior reports had used them for.

      1) While this method is not entirely novel, single-cell sequencing has not previously been performed using this method. Unfortunately, their analysis of the scRNAsea data is qualitative and unconvincing.

      2) Canonical markers are not associated with the expected populations. For example, PCP4 and IGF1 are found in the P0-choroid plexus group and not with P6 Purkinje cells (PCs), suggesting the markers or separation of populations used for classification are not sufficient. CXCL14 is used as an identifier for PCs, however the gene appears to be downregulated in the P6-PC expression table, while it is instead upregulated in the P0 expression table. These discrepancies between the text and the data do not give confidence in the overall analysis.

      3) Fig S4A there is no legend describing what the dot plot shows (color scale, size scale)

      4) To substantiate cell classification, the authors compare their data with previously published mouse datasets. Cell type clusters are generously suggested to have a "high degree of overlap" with mouse data, with a "high degree of confidence". These claims are not statistically supported nor upon close inspection do they appear to be accurate. While some cells types cluster with mouse cell types, others clearly do not. For example, of the two major cerebellar neurons, human granule cells are found in three clusters (granule cell precursors, granule cells (S-phase), and granule cells (G2M-phase), of which only one clusters with mouse granule cells. Human and mouse Purkinje cells do not cluster. The authors state that pseudotime trajectory reconstruction shows "a pattern reminiscent of the developmental cellular phylogeny of the cerebellum; progression from primitive CP/RP cell types to RL/VZ precursors and subsequently to committed neuronal progeny..." however the choroid plexus and roof plate do not give rise to rhombic lip or ventricular zone precursors (note, ventricular zone precursors are not depicted in the data).

      5) Embedding of cerebellar organoids in Matrigel is novel, however a major finding of this report is that Matrigel increases organoid variability, which itself is already a significant issue in the organoid field. The role of Matrigel in promoting specification of rhombic lip over ventricular zone could be useful.

      6) Have they looked at gene expression any earlier than DIV21? When is the timepoint at which each of the key cerebellar markers appear? This information is lacking for all markers assessed and it is not clear why the timepoints that they are showing were chosen. More characterization and perhaps even scRNA at multiple time-points would have given a clearer view of what they have induced.

      7)There is huge variability in gene expression even before the Matrigel addition step. It is therefore unclear how this is an advancement in making cerebellar organoids compared to the original Muruguma paper in 2015 (which was a very qualitative paper itself).

      8) Low power images of immunolabeling make it impossible to assess the localization of labelling and distinguish between real and background staining. eg: Fig S1 and Fig 1A. This is critical in the stem cells field where spatial organization cannot be relied upon.

      Their interpretation and their data don't always match with regard to their defined cell types and scRNAseq data. For example, ATOH1 only appears in group 5 yet they mention that more groups are graule cell precursors. Also, they say that a major impact of MG encapsulation is the expansion of the GC lineage, yet earlier in the paper they say that ATOH1 expression levels, a marker of the GC lineage, were unchanged, making it very difficult to get a clear picture of what they have found.

      A major issue (along the same vein as their incorrect data interpretation) upon which the paper is framed is the assumption that the human cell types are like their mouse counterparts. No experiments were carried out to show the validity of this assumption. Figure 3B overlays the human and mouse data. Why such low representation of the human cells? Is it because of low sequencing depth (technical issue) or vastly different molecular composition of these organoids when compared to the mouse cerebellum?

      Overall the execution is poor, and the data are not analyzed in any depth. Critically, there is a complete mismatch between what is stated in the text and what is shown in the figures. The claim to have produced all major cerebellar cell types would have been the novel aspect of the paper, but the data are unconvincing.

    2. Reviewer #2:

      In this Tools and Resources manuscript, Nayler and colleagues demonstrate a robust and reproducible protocol for hIPSC derived cerebellar organoids which do not require feeder populations. In general development of reliable pluripotent cell derived cerebellar cell types and organoids have been lagging compared to other regions of the brain and this paper represents a new resource. Given that the manuscript is presented as a resource, more detailed explanation of the generation of the organoids should be provided and their reproducibility should be demonstrated in more detail. Further histological characterization of the organoids with additional markers is needed to really see the reproducibility and the robustness of the methodology.

      Major comments:

      1) Authors mention that the PCs have bipolar morphology (data not shown). I think this is one of the critical pieces of data that demonstrates the quality of the organoids and should be shown. In general, more IF analysis of the organoids with additional markers would have been helpful to understand the variabilities and the composition of the cerebellar organoids that were generated with their method.

      2) Did the authors observe a delay in the maturation of the Matrigel embedded organoids? It is curious that there is an increase in the earlier progenitor cells (based on the increase in the OLIG2 expression as opposed to PTF1A). Based on the data later in the paper, authors suggest that Matrigel increases the expansion of GCPs. How does the non-significant enrichment of the ATOH1 expression shown in Figure 1G relate to the data presented later in the manuscript? It looks like only one of the organoid had upregulation of ATOH1 where other two didn't show any change?

      3) Authors should report the relative proportions of the VZ- derived vs. RL-derived cell types within each organoid.

      4) Were there any astrocytes (other than Bg) and OPC/oligodentrocytes observed in the organoids? Or do they need to culture them longer to observe those cells.

      5) Why is there very low expression of PCP4 in the PCs and the cluster with most PCP4 expression is classified as Choroid plexus? Based on the in situ in figure S4, there is no PCP4 in the CP. Is this a species difference? In general characterization of the PCs are confusing to me based on the markers used. Please elaborate.

      6) Based on the clustering shown in figure 3, is there a particular age from the mouse data that showed higher enrichment for overlapping human cerebellar organoid cells. The way the data is presented is hard to interpret and understand. Also, the ranges of the ages in the mouse data that overlaps with the respective human data is a lot larger than I would have expected (page 9 first paragraph). I am not an expert on integrating such multi age/species data however, I wonder if some additional pseudotime analysis like monocle could be performed on the combined data set represented in Figure S7 and Figure 3 would reveal finer temporal resolution of the human organoid with respect to the mouse developmental data.

      7) Were there differences in the pseudotime ordering of the cells from Matrigel embedded compared to the ones from the control organoids (related to point 2).

    3. Reviewer #1:

      In this manuscript, Nayler et al present a new protocol to generate cerebellar organoids that they differentiate from human iPSCs. Using this system and single-cell sequencing, the authors show that most major cerebellar cell types develop in these organoids. They also find that the micro-environment of the developing organoids changes growth dynamics and cellular differentiation, which motivated the authors to suggest that this organoid approach may be a good model for studying human cerebellar development and disease. The strength, and indeed the motivation, of this manuscript is the description of a novel model system with which to study multiple human cerebellar subtypes in an ex vivo system. In general, this work is a timely addition to several other recent studies on the transcriptomics of mouse cerebellar development, transcriptomics of human cerebellar development, and the use of hPSC derived Purkinje cells grown in co-cultures with mouse granule cells. The data in this manuscript are strong and likely of broad interest to the neuroscience community. However, below I outline several concerns that, if addressed, would help improve the clarity, readability, and impact of the manuscript:

      Comments:

      1) In the title, the authors state "...cerebellar organoids shows recapitulation of cerebellar development". Development in what? human? model systems? Some specificity will be needed in this title, especially since recent work from the Millen group has unveiled some specific differences between mouse and human cerebellar development.

      2) In the Abstract, the authors state "However, this was at the expense of reproducibility." What do you mean? There are issues with reproducibility? If yes, the authors need to provide a thorough discussion about this, as this issue would be essential for researchers to know about if they were to adopt this approach.

      3) Also in the Abstract, the authors state "...conditions, representing a more biologically relevant..." More biologically relevant than what? What about the counter argument that studying the cerebellum would be "more biologically relevant" in vivo in an animal model?

      4) In the Introduction, the authors state "Specifically, abnormal cerebellar development is an emerging theme contributing to many brain disorders (Sathyanesan et al., 2019)." Do you mean to many non-motor brain disorders?

      5) A couple of times in the Introduction the authors use the Manto et al. 2012 reference. This is in fact a very large online book consisting of several dozen chapters. Rather than using such a broad sweep approach, I would highly recommend using the primary original references for such key statements. It's also slightly misleading since Manto himself did not have any involvement in these developmental studies.

      6) The authors state that "Current models have mainly focussed on the differentiation of hPSC-derived Purkinje cells through co-culture with mouse cerebellar progenitors." Okay, but what is your argument against such methodology? Some context and motivation for this statement should be provided.

      7) In the Introduction, the authors frame their case by stating "As a proof of principle,..." But, what is this method proof of principle for?

      8) In the Introduction, the authors state "...we show perturbation analysis of the organoids..." Please state what the perturbation was, and what problem was this perturbation used to test?

      9) Based on the Introduction of the paper, it is very hard to see what motivates this work. Also, related, why focus on the basement membrane? What led to this? The authors need to provide a much stronger rationale for the study upfront, and in particular for the specific concepts that they tackle using their new approach.

      10) The authors state that induction of GBX2 was observed at the expense of the anterior marker OTX2. Apologies if I have missed it, but what was the experiment that shows directly in your organoids that OTX2 was initially high and then lowered due to GBX2?

      11) The authors state "...EBs to MG treatment, we encapsulated these at three different timepoints during differentiation..." What was the justification for picking these timepoints?

      12) The authors state "Overall, the relative effect of MG encapsulation resulted in distinct responses in the various cerebellar populations..." So, what does it mean that each cell type has a different response? Please expand on this.

      13) The authors state "using the murine cerebellum as a close developmental blueprint, most signatures indicate a mixture of mid-late embryonic temporal maturity, suggesting that the cerebellar organoids recapitulate developmental stages of the normally developing cerebellum. An exception to this was overlap of human GCs with murine GCs of postnatal maturity, suggesting that this cell type was more mature than its counterparts." AND "human PCs clustered more closely to murine progenitors and astroglia, suggesting that by day 90 organoid-derived PCs were still developmentally immature, compared with murine PCs. In further support of this, we did not detect appreciable levels of SHH."

      The sentences in this statement raise several questions. First, PCs normally develop before the GCs. Thus, the finding that PCs in the organoids are less mature than GCs is surprising and may even be concerning as it suggests that the organoids do not fully (or reliably) replicate the temporal order of normal cell development that is so characteristic for cerebellar development. Second, the relationship between PC SHH secretion and the responding GC is now well established and has been shown to be an important, if not essential, mechanism for GCs proliferation in vivo. It is therefore surprising that GCs form and proliferate in the organoid without proper SHH signaling. What may be the mechanism for this? The authors need to account for this issue and provide a discussion to address all of these points as well. Moreover, the authors should discuss how the maturation state of PCs in the organoids is different between this paper and the recently published Buchholz et al paper (2020 - DOI: 10.1073/pnas.2000102117).

      14) The authors argue about the cell structure and expression of cell markers in the organoids. However, based on what is shown, it is not clear how robust these features are in the organoids. The authors need to provide additional images of the organoids at much higher magnification in order to properly demonstrate cell structure and identity. In this regard, based on their argument, it would be important for authors to show the bipolar morphology of Calbindin-positive cells and excessive neural outgrowth at the periphery of the organoids (currently referred to in text as "data not shown"). Finally, it would be interesting to see whether different cell types are intermingling or spatially segregated in the organoids. That is, what does the cellular organization in the organoid actually look like?

      15) Along the same lines as above, it seems to me that the authors should present more details about the anatomical architecture of the organoids. One of the major arguments raised by the authors is that the organoids recapitulate many features of normal cerebellar development. Of course, the organoids likely don't show all the intricacies of in vivo cerebellar development, but given that the 3-dimensional assembly of the cerebellum is essential for all aspects of cellular and circuit formation, one needs to fully appreciate exactly what aspects of the cerebellum the organoid is able to reflect. Only then can one predict its full utility towards studying different aspects of development or disease.

      16) There are several cases that the authors state "data not shown". In every one of these cases the data seems essential to me and it should be presented in full.

      17) The authors use the fact that the cell types from the human organoids cluster with mouse cerebellar cell types as an argument that the human organoids have a good representation of the cerebellar cell types. But, the authors also go on to state that the human organoids are advantageous over model organisms because they may better model human genetic background. These two statements are contradictory, especially given the previous issue raised about the organoids not reproducing the temporal sequence of cellular development. Do the authors have additional data to support their statements about the biological relevance of their xeno-free conditions? For example, did they find any human specific genes or developmental pathways? The statements presented by the authors creates a circular argument that needs to be revised and/or supported by additional data. What would help is a much deeper comparison between the organoids, human cerebellar development, and mouse cerebellar development.

      18) What is the fold-change of RNA expression in figure 1 based on? What is the statistical test actually testing? What is the control that this fold-change is compared to?

      19) On the issue of statistics, the section describing the statistics in the methods is rather brief. It would help tremendously if the authors expanded this section by describing which test goes with which experiment and some level of justification for the use of the different statistical tests would be very useful as well.

      20) The authors use a lot of abbreviations. Some of these abbreviations hinder the readability of the text, which would be especially problematic for an audience not as closely acquainted with these terms. It may help to limit the use of abbreviations to cell-types and gene names. For example, Matrigel and embryonic bodies do not have to be abbreviated.

      21) The size of the text in all figures is too small, including gene names, axis labels, and legends.

      22) In the Discussion, the authors state "...this includes proximally located territories in which adjacent signalling is required for cerebellar maturation and development." I am not sure whether enough direct evidence is presented to make this conclusion. As commented on before, additional anatomy should be presented, and based on those data, inter-cellular signaling could then be examined with more confidence. Otherwise, the authors would have to tone down and/or revise this conclusion.

      23) The authors conclude that "hiPSC-derived organoid models offer unprecedented opportunities to model brain development and disorders and for therapeutic development..." I agree, but as a general comment, I found it very hard to know what exactly the authors are comparing in this paper. It appears that the comparisons are mainly to mouse development, although it seems that a more thorough and direct side by side comparison should be made. I suppose some kind of detailed developmental timeline-based model is warranted.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Based on several recent molecular studies, the strength of the current manuscript is the establishment of an organoid approach that could potentially add to our knowledge of normal and abnormal cerebellar development by providing a flexible technique with which to resolve cellular mechanisms. However, there was overall agreement that while the approach has promise, the data presented are lacking in terms of a concrete comparison to known milestones in cerebellar development (in animal models or human). Moreover, given the technical nature of the manuscript, it was deemed necessary that a more complete characterization of the organoid "anatomy" would be required in order to convince the reader of the claims. There was also a concern that the quantitative aspects and interpretation of the scRNA-seq experiments, particularly the characterization of the clusters obtained and the analysis performed to compare the human organoid data to the mouse developmental data, could have been carried out with greater depth.

    1. Reviewer #3:

      In this paper the authors show for the first time that optogenetic activation of the subthalamic nucleus (STN) is aversive and can drive avoidance behavior. This effect may be mediated by polysynaptic activation of the Lateral habenula, which they show is activated following optogenetic activation of the STN. They propose that the STN may excite glutamatergic neurons in the ventral pallidum that in turn project to and excite the lateral habenula. The authors do mention that other pathways may mediate the aversive effects but no other pathways are tested.

      Overall this paper presents a simple and clear demonstration that optogenetic activation of the subthalamic is aversive. It may be that this effect involves activation of the ventral pallidum and the lateral habenula but the evidence provided to support this possibility is weak and currently uncompelling.

      Major issues,

      -While it has not to my knowledge been reported that activation of the STN can drive aversive responses there are a number of lines of evidence that suggested it should be the case. None of these are mentioned in the paper and should be discussed. First the STN is part of the indirect pathway in the basal ganglia. Previous work has shown through optogenetic and other methods that the indirect pathway striatal neurons in the dorsal and ventral striatum can drive aversive responses and are involved in aversive learning (for a critical review that discusses this literature see Soares-Cunha et al., 2016). In line with this, recordings of the indirect pathway have also shown that this pathway is preferentially involved in processing aversive information, for example STN neurons are activated by nociceptive information and are needed for appropriate behavioural responses to nociceptive stimuli (Pautrat et al., 2018), STN neurons are also activated by aversive stimuli and by negative reward prediction error (Breysse et al., 2015). The paper needs to discuss their findings in the context of this and other previous work (these references are just examples and not an exhaustive list) that supports the role of the indirect pathway in processing aversive information.

      -Another topic that should be discussed is the heterogeneity of the STN. The authors themselves mention that the STN is composed of distinct spatio-molecular domains. This may well be relevant as rabies tracing from the EP neurons that project to the habenula and from the glutamatergic neurons in the ventral pallidum has revealed that they receive the majority of their input from the parasubthalamic nucleus and not from the core of the STN (Stephenson-Jones et al., 2016, Stephenson-Jones et al., 2020, Tooley et al., 2018). This raises the possibility that the aversive responses from the STN are primarily driven by neurons in the pSTN. The authors could test this point by restricting their ChR2 expression to one or the other region of the STN. At the moment all example images show that expression is in both the STN and pSTN. This possibility should be discussed.

      -The authors mention that they perform selective activation for the STN-VP pathway by stimulating the STN terminals in the VP. It is not clear that this will selectively activate this pathway. If the STN neurons that project to the VP also project to other areas then these will likely also be activated due to back propagating action potentials driven by the ChR2 stimulation. More work needs to be done to determine if the VP is really the pathway that mediates the aversive effect. Additional work including multi-colour retrograde tracing, selective inactivation of the VP projection while stimulating the STN or stimulating the STN fibers in the VP while inactivating the STN cell bodies would be needed to really determine if the VP is important for mediating the aversive effect. This may be beyond the scope of what the authors want to do but would be needed to support a claim that their evidence "provide strong support for a STN-VP-LHb is a pathway for aversion".

      -The title should not include the word encoded as there were no experiments performed in this paper that looked at any aspect of coding in the STN.

    2. Reviewer #2:

      In this manuscript Serra et al. demonstrate that stimulation of subthalamic nucleus (STN) neurons can drive place avoidance and delayed (presumably bisynaptic) excitation of lateral habenula (LHb) neurons. They also show that STN inputs to the ventral pallidum (VP) can drive place avoidance and excitation of VP neurons. While the potential role of a STN-VP-LHb of driving aversion and avoidance is intriguing, the manuscript leaves many open questions regarding the nature of STN's role in mediating aversion, as well as the circuit mechanisms governing STN-induced avoidance.

      Major Comments:

      1) STN in aversion: The manuscript addresses the role of the STN in mediating "aversion" in a very limited manner, despite the framing of the title ( "Aversion encoded in the subthalamic nucleus"). Based on the title I expected data showing that STN activity is correlated with the aversiveness of stimuli, or data showing that STN activity is required for aversion processing. Instead the authors show that STN stimulation can drive avoidance, which does not necessarily mean that this activity drives "aversion" per se. Data showing that STN represents the aversiveness of stimuli or that activity here is necessary for avoidance or other responses to aversive stimuli would strengthen the point. Currently the evidence for the statement made by the title is weak.

      2) Claims about the role of the STN->VP->LHb pathway in the abstract and elsewhere in the text: The authors demonstrate that activation of STN terminals in VP recapitulates their RTPP avoidance effects, but they do not directly demonstrate that these effects are mediated by downstream VP->LHb connectivity. They show that activation of STN terminals in VP results in excitation of VP units, but it remains unknown whether STN neurons specifically target/activate VP neurons that project to LHb, and/or whether they target VP glutamate neurons specifically (the primary cell type in the VP->LHb pathway that mediates aversion). The current data set does not demonstrate either that that a) STN-induced activity changes are LHb are predominantly mediated by VP (as opposed to EP or GP or other connections), or that b) avoidance elicited by STN->VP activation is mediated by LHb activity. Therefore, statements throughout the manuscript about the STN-VP-LHb circuit are not supported.

      3) Statistical analysis: The authors provide comprehensive statistical information for their behavioral experiments, but not for the electrophysiology. It appears that individual neurons were treated as independent measurements even when they were recorded from the same subject, though in some cases it is not clear how many mice were recorded from (e.g. 1G, 5D, 5E). If multiple measurements were taken from the same subjects, then this should be taken into account in the statistical analysis (such as by including subject as a random effect in an ANOVA or linear mixed model).

    3. Reviewer #1:

      In this study, Serra et al. attempted to study the circuit responsible for aversive behavior in mice. They had previously observed that subthalamic nucleus (STN) excitation induced aversive jumping behavior. The authors proposed that the indirect projection to the lateral habenula (LHb) via the ventral pallidum (VP) could be involved. They used Pitx2-Cre mice for STN-specific gene expression and performed real-time place preference paradigm (RT-PP) and elevated plus maze (EPM) as a means to study aversive behavior. Overall, the findings in this study are potentially important as they describe a previously unknown role of the STN, and its downstream targets, in aversive behavior. However, the authors have not convincingly demonstrated the pathway involved. The evidence so far is rather circumstantial and the arguments made were based entirely on gain-of-function experiments using ChR2. As outlined below here are a number of significant concerns that need to be addressed.

      Major:

      1) The authors should demonstrate the effectiveness and specificity of Pitx2-Cre in driving ChR2 expression. What was the cellular expression pattern within the STN? Did the authors observe ChR2 expression in 100% of STN neurons? Did it label any non-neuronal cells? Did the neighboring regions also express ChR2? According to Papathanou et al., that is likely to be the case. The authors should provide a more rigorous histological examination. Otherwise, a more in-depth discussion is needed to address how these concerns would confound the interpretation of results.

      2) It is interesting that Pitx2/ChR2-eYFP mice avoided STN-photostimulation by spending less time in the light-paired compartment. It should be discussed why not the compartment where STN is stimulated is not completely avoided.

      3) It is unclear if the mice jumped in this study, as the authors had previously observed. Was there any other movement-related behavioral changes?

      4) In Figure 1B, it seems like the entry into Compartments A and B of Pitx2/ChR2-eYFP on Day 5 and 6 is not very different. However, in Figure 1C, the representative heatmap shows a difference. In contrast, in Figure 1B, it seems like the entry into Compartments A and B of Pitx2/ChR2-eYFP on Day 9 is equal. Whereas in Figure 1C, the representative heatmap shows substantial entries. It would be helpful to have an explanation for the discrepancies.

      5) Assuming that the STN excitation duration is 10 seconds upon entry to the "Light" compartment, do the mice remain in the "Light" compartment? If the mice are only stimulated at the entry point of the "Light" compartment, do they just remain there and avoid exiting (as a means to avoid reentry)? As 10 seconds is a long time period for the mice to move around, is the stimulation continued if they then switch to the neutral compartment before the end of the 10-second stimulation period?

      6) It is not exactly clear the point of the first EPM experiment with 10 minutes of stimulation of STN neurons or their terminals. That is a very long time period; it is very likely plasticities were induced with such a paradigm and would confound the study.

      7) The STN-VP slice experiment does not really address any of the circuit questions they proposed to answer. The STN-VP connection is already known. It would be more interesting if the authors show the specific connection between STN and the glutamatergic VP neurons, as they speculate as the downstream target of the STN. This is an important point because of the complex cellular composition within the VP.

      8) It would be important to show that direct optogenetic stimulation of glutamatergic neurons within the VP produced the same phenotype. At the very least, the authors should locally infuse glutamatergic blockers into the VP to examine if the effects with STN stimulation can in fact be blocked.

      9) Both Figures 1 and 5 show a rather low density of STN fiber in the VP and they are restricted to about one-third of the VP. The involvement of the STN-VP circuit in mediating the observed behavior is less than convincing. On the other hand, there are no investigations of whether direct connections to other known targets are involved in the aversive response.

      10) All optogenetic interrogations were based on ChR2 stimulation. As antidromic spikes can propagate to other collateral branches in other synaptic targets of STN neurons (i.e., the GP, EP, and/or SNr), orthogonal approaches are needed to decisively show STN-VP circuit is involved.

      11) What is the latency of STN-driven spiking in LHb? The latency in the peri-stimulus time histogram Figure 4 looks too short to be a polysynaptic event. It also does not match up with that stated in the text (i.e., 10 ms, line 212). This is not a trivial matter as synaptic delays can provide important clues for whether mono- vs polysynaptic events are involved.

      12) In Figure 5E, DNQX and APV did not completely block the evoked currents. A more rigorous examination is needed if multiple neurotransmitters were released.

      13) As anxiety and aversive behaviors are often dichotomous between males and females, the authors should comment on whether there were any sex differences observed.

      14) Some of the sample sizes are very small (only 3-5).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      While the demonstration that stimulation of subthalamic nucleus (STN) neurons produces avoidance is potentially interesting, the circuit basis of this effect was not well established. Specifically, the proposed functional connection of STN with lateral habenula through ventral pallidum was not clearly demonstrated and the STN stimulation findings on their own represent a more minor advance.

    1. Reviewer #2:

      The study provides evidence that an aphid effector Mp64 and a Phytophthora capsici effector CRN83_152 can both interact with the SIZ1 E3 SUMO-ligase. The authors further show that overexpression of Mp64 in Arabidopsis can enhance susceptibility to aphids and that a loss-of-function mutation in Arabidopsis SIZ1 or silencing of SIZ1 in N. benthamiana plants lead to increased resistance to aphids and P. capsici. On siz1 plants the aphids show altered feeding patterns on phloem, suggestive of increased phloem resistance. While the finding is potentially interesting, the experiments are preliminary and the main conclusions are not supported by the data.

      Specific comments:

      The suggestion that SIZ1 is a virulence target is an overstatement. Preferable would be knockouts of effector genes in the aphid or oomycete, but even with transgenic overexpression approaches, there are no direct data that the biological function of the effectors requires SIZ1. For example, is SIZ1 required for the enhanced susceptibility to aphid infestation seen when Mp64 is overexpressed? Or does overexpression of SIZ1 enhance Mp64-mediated susceptibility?

      What do the effectors do to SIZ1? Do they alter SUMO-ligase activity? Or are perhaps the effectors SUMOylated by SIZ1, changing effector activity?

      While stable transgenic Mp64 overexpressing lines in Arabidopsis showed increased susceptibility to aphids, transient overexpression of Mp64 in N. benthamiana plants did not affect P. capsici susceptibility. The authors conclude that while the aphid and P. capsici effectors both target SIZ1, their activities are distinct. However, not only is it difficult to compare transient expression experiments in N. benthamiana with stable transgenic Arabidopsis plants, but without knowing whether Mp64 has the same effects on SIZ1 in both systems, to claim a difference in activities remains speculative.

      The authors emphasize that the increased resistance to aphids and P. capsici in siz1 mutants or SIZ1 silenced plants are independent of SA. This seems to contradict the evidence from the NahG experiments. In Fig. 5B, the effects of siz1 are suppressed by NahG, indicating that the resistance seen in siz1 plants is completely dependent on SA. In Fig 5A, the effects of siz1 are not completely suppressed by NahG, but greatly attenuated. It has been shown before that SIZ1 acts only partly through SNC1, and the results from the double mutant analyses might simply indicate redundancy, also for the combinations with eds1 and pad4 mutants.

      How do NahG or Mp64 overexpression affect aphid phloem ingestion? Is it the opposite of the behavior on siz1 mutants?

    2. Reviewer #1:

      In this manuscript, the authors suggest that SIZ1, an E3 SUMO ligase, is the target of both an aphid effector (Mp64 form M. persicae) and an oomycete effector (CRN83_152 from Phytophthora capsica), based on interaction between SIZ1 and the two effectors in yeast, co-IP from plant cells and colocalization in the nucleus of plant cells. To support their proposal, the authors investigate the effects of SIZ1 inactivation on resistance to aphids and oomycetes in Arabidopsis and N. benthamiana. Surprisingly, resistance is enhanced, which would suggest that the two effectors increase SIZ1 activity.

      Unfortunately, not only do we not learn how the effectors might alter SIZ1 activity, there is also no formal demonstration that the effects of the effectors are mediated by SIZ1, such as investigating the effects of Mp64 overexpression in a siz1 mutant. We note, however, that even this experiment might not be entirely conclusive, since SIZ1 is known to regulate many processes, including immunity. Specifically, siz1 mutants present autoimmune phenotype, and general activation of immunity might be sufficient to attenuate the enhanced aphid susceptibility seen in Mp64 overexpressers.

      To demonstrate unambiguously that SIZ1 is a bona fide target of Mp64 and CRN83_152 would require assays that demonstrate either enhanced SIZ1 accumulation or altered SIZ1 activity in the presence of Mp64 and CRN83_152.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Detlef Weigel (Max Planck Institute for Developmental Biology) served as the Reviewing Editor.

      Summary:

      A major tenet of plant pathogen effector biology has been that effectors from very different pathogens converge on a small number of host targets with central roles in plant immunity. The current work reports that effectors from two very different pathogens, an insect and an oomycete, interact with the same plant protein, SIZ1, previously shown to have a role in plant immunity. Unfortunately, apart from some technical concerns regarding the strength of the data that the effectors and SIZ1 interact in plants, a major limitation of the work is that it is not demonstrated that the effectors alter SIZ1 activity in a meaningful way, nor that SIZ1 is specifically required for action of the effects.

    1. Reviewer #3:

      In their paper, Liutkute et al., use an elegant combination of force profile analysis (FPA) and photo-electron transfer (PET) experiments to probe the co-translational folding pathway of the N-terminal domain of the protein HemK. Over the past decades, it became increasingly clear that co-translational folding pursues different routes than those found in solution. Despite the fact that many proteins fold and unfolded many times during their lifespan after being released from the ribosome, the question of whether and how proteins fold during the process of translation is not only fundamental but also extremely difficult to access experimentally. Here, Liutkute et al. present a synergistic combination of two largely different methods to answer this question. By stalling a nascent polypeptide chain at different sequence positions and measuring the amount of full-length relative to arrested protein in a gel assay, the authors identified a sequential folding path in which the order of helix formation of the 5-helix NTD of HemK follows the order from N- to C-terminus. The authors interpret these results using the foldon concept from the Englander lab. Though the FPA is a rather qualitative experimental tool that measures the amount of molecules that crossed a certain force threshold, the analysis is striking. These experiments were complemented by PET-FCS experiments that were used to quantify the kinetic rates of conformational fluctuations of ribosome-stalled states of the protein. The conclusions drawn by the authors are that conformational fluctuations slow-down the further a protein is away from the ribosome exit tunnel. In my opinion, the work is a substantial step towards understanding the process of co-translational folding. The experiments are beautiful, well described, and the results are of clear interest to a broad readership.

      1) I would like to emphasize that care has to be taken when deducing the order of events from single time-point experiments such as FPA. The speed of translation compared to the folding speed is an important factor that eventually dictates the order at which certain structural elements will form. I admit, however, that the formation of helices, at least in solution, typically exceeds translation speeds by far, thus indicating that the identified intermediates will also form under conditions of continuous translation. Nevertheless, it would have been interesting if the authors could provide data or relevant publications about the folding speed of the HemK-NTD.

      2) The PET-FCS is indeed very appealing, however, I had some problems in understanding the actual procedure that was used for fitting. On p. 25, it is mentioned that the diffusion and triplet component based on the empirical fit with eq. 1 were subtracted from the data. Equation 1 would rather indicate that a separation of the dynamic components requires a division of the data by the relevant diffusion and triplet terms.

      3) I would call eq. 1 'empirical' rather than 'analytical'.

      4) On p. 25, the authors explain that the dynamic components of the FCS-curves were fitted using a sum of terms, one for each species. It would have been more explanatory if the authors would provide the actual equations that had been used for fitting. I would have guessed that the authors derive expressions for the correlation functions of the individual models, e.g., using the approach of Gopich & Szabo (see Eq. 1 in Gopich et al. (2009) J Chem Phys, 131, 095102), but the approach described in the methods sounds different.

      5) I was surprised that the two-step model can even provide negative, i.e., rising, amplitudes, which is very unusual for autocorrelation functions. This feature implies that the kinetic models have amplitudes that are decoupled from the actual kinetic rates. It would be great if the authors could clarify this point.

      6) I find the calculation of free energy barriers a bit overstretched given the complexity of the system. First, the pre-exponential factor of the Eyring equation (eq. 2) is only adequate for gas-phase reactions, particularly when assuming a transmission coefficient of 1. The appropriate pendant is Kramers equation. Clearly, the problem of defining the pre-exponential factor for folding reactions remains also with the Kramers expression. However, a large body of work has been dedicated to this problem over the past 20 years. It seems that a value of 1 μs-1 seems to be a good guess (see e.g. Schuler & Eaton (2008) Curr Opin Struct Biol, 18, 16). Clearly, there is no way to decide whether conformational fluctuations slow-down due to a decrease of the free energy barrier or due to a change in the pre-exponential factor.

    2. Reviewer #2:

      Liutkute and coworkers use a combination of arrest peptide assays and fluorescence correlation spectroscopy to investigate the folding of the HemK N-terminal domain. Previous work from the same group has shown that the domain rapidly forms compact structures co-translationally while still partially within the ribosome exit tunnel, limited by the rate of elongation. Data from the arrest peptide assay presented here suggest that, surprisingly, stably folded structures form as soon as the first of five helices in the domain has moved past the tunnel constriction. Several additional apparent folding events occur at longer chain lengths, suggesting discrete events of structure formation within the tunnel and near its vestibule. Experiments with a destabilized mutant (4xA) indicate that some of the folding events are dependent on formation of the hydrophobic core of the domain, suggesting that they depend on tertiary structure formation. PET-FCS experiments with HemK nascent chains reveal two interconverting states, compact (C) and dynamic (D). Both states are populated similarly regardless of chain length. However, the barrier between these states increases when the domain emerges from the ribosome. These experiments indicate a destabilizing effect of the ribosome on the nascent chain. Taken together, the experiments support earlier work that proposed a sequential co-translational folding mechanism for the HemK NTD, and provide rate constants for the dynamics at the earliest stages of nascent chain folding.

      The experiments appear very carefully designed and executed, and the data is of high quality. The PET-FCS measurements in particular provide valuable quantitative information about early nascent chain folding and should be of broad interest. While the results from arrest peptide experiments are intriguing, I have concerns about their interpretation, detailed below.

      Main point:

      The arrest peptide data is interpreted entirely in terms of a pulling force on the nascent chain, generated by folding. The conclusion that formation of just one (peak I) or two (peak II) alpha-helices inside the tunnel generate substantial mechanical forces is surprising, particularly given the presumed mechanism of arrest released mediated by force. How would a force be generated by a single alpha helix? It is easier to rationalize that forces acting on the arrest peptide are generated by stable tertiary structures. However, in that case, the 4xA mutant should show much lower arrest release in the region where full folding of the domain is expected (regions VII and VIII in Fig. 1), because the mutant is largely unfolded (see Holtkamp et al., Science 2015). This effect is not observed. Together, these considerations make we wonder whether alternative explanations for the observed release rates can be ruled out. For instance, could sequence-specific effects that are not related to folding of HemK, such as local interactions of the nascent peptide with the tunnel, cause the observed changes in arrest release rates? Alternatively, could local structure formation (of an alpha helix) in the tunnel cause arrest release that is not mediated by a pulling force?

      At a minimum, the authors should discuss how they envision single alpha helices to generate the forces necessary to accelerate arrest release (which have been estimated in the literature, e.g. in Goldman et al, Science 2015, and Kemp et al., PNAS 2020).

      In addition, two control experiments should be carried out: (1) An experiment demonstrating that a bona fide unstructured protein yields more or less constant arrest release rates over a range on nascent chain lengths. Perhaps a construct starting residue 73 of HemK could serve as a control. (2) An experiment with previously characterized folded domains (e.g. some of the spectrin constructs from Kemp et al, PNAS 2020; or some of the constructs from Farías-Rico e al., PNAS 2018) to establish the fraction of full length protein (f_FL) obtained with stably folded domains under the experimental conditions used in the present manuscript. How do the f_FL values for the HemK NTD compare to fully folded proteins under the conditions used here?

    3. Reviewer #1:

      This study by Liutkute et al. investigates the co-translational folding of a small alpha-helical domain from HemK. The study continues earlier studies by Rodnina and colleagues that showed using FRET and other measurements that HemK begins folding inside the ribosome exit tunnel and occurs sequentially as individual alpha-helical segments are able to be accommodated in the exit tunnel vestibule. Folding completes just outside the ribosome when the entire HemK domain is exposed. The current work extends these earlier studies using biochemical assays of "force" on the nascent chain and spectroscopic assays of intramolecular dynamics with an N-terminal fluorescent probe.

      The force assays illustrate that tension is seen as individual alpha helices move beyond the exit tunnel constriction, and at other previously documented steps of folding in the vestibule. These intra-ribosomal events are not impacted by a mutation that disrupts packing of the hydrophobic core. The fluorescence quenching dynamics show that the N-terminus is more dynamic inside the exit tunnel prior to folding and not dynamic after folding outside the tunnel. A detailed kinetic model of the fluorescence correlation data is provided to help explain the observations.

      Overall, the study provides a finer resolution view of the sequential co-translational folding of HemK. Although the broad concepts from the earlier studies are not changed by the current work, the study introduces analytical tools based on fluorescence quenching and FCS that may be useful to study the co-translational folding of other proteins.

      My primary suggestion is that the authors should be more explicit about what is being measured in the "force" sensor assay. SecM stalling relies on a specific secondary structure of the stalling sequence that causes an altered P site geometry that is unfavourable for peptide bond formation. Stalling will not occur if this altered geometry cannot be stabilized. Thus, what the authors refer to as 'force' is actually a constraint applied to the nascent chain to prevent SecM secondary structure formation. Thus, the folding is not generating force so much as constraining the nascent chain as a consequence of the ribosome exit tunnel geometry. It is a subtle, but I feel important, distinction to explain the assay. The reason is that such a constraint can actually be due to reasons other than folding. For example, an interaction between the nascent chain and the exit tunnel (or other proteins) could similarly constrain the nascent chain.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

    1. Reviewer #3:

      Summary:

      The authors report a between-subjects, double-blind psychopharmacological study on explore/exploit behavior in healthy human subjects. The authors used propranolol to block norepinephrine (NE), and amisulpride to block dopamine (DA), and compared to a group taking placebo. Using a 3-armed bandit task, coupled with computational modelling and pharmacological manipulation, the authors show that "tabula rasa" (or random exploration) is reduced when NE is blocked. This interpretation was supported by behavioral effects whereby subjects taking propranolol were significantly more consistent than other groups when facing identical choices, and chose the low-value option more often than the other groups. Blocking DA did not appear to affect any parameters. The computational model showed that the E-greedy parameter, which computes the proportion of time an entity makes a random selection, was most affected by the NE blockade. In addition, the modelling shows that some directed exploration (exploring lesser-known options) was also at play.

      General comments:

      The manuscript is well-written and the results are compelling. The findings are important to researchers particularly interested in the cognitive effects of catecholamines, and/or the explore/exploit dilemma. The results may not be that interesting to a broader readership.

      Criticisms:

      1) I do not really like the use of the term "tabula rasa" exploration, over "random" exploration. Using the term random exploration is just simpler, and clearer. The particular problem for me is that "tabula rasa" has the connotation that both the current "tabula rasa" choice and all future choices will not take into account information obtained before that choice. Random exploration is a better term because it is easy and intuitive to see that random choices can be sprinkled in with choices based on previous information, whereas tabula rasa implies wiping previous information away from that point forward. As best I can tell, previous related work has not termed the random exploration associated with the E-greedy parameter "tabula rasa". One consideration I am wrestling with is that apparently there is another parameter in one or more of the models that reflects random exploration (line 618, inverse temperature). This may be why the authors opted to call the E-greedy parameter something else. At the very least, I would like a better explanation of the choice of term (tabula rasa) as well as a thorough explanation of the difference between tabula rasa and random exploration. I recommend changing the term used as well, but am amenable to accepting an argument for keeping it.

      2) Line 162: "Reported findings were corrected for IQ (WASI)". How? It seems WASI was included as a covariate in the repeated-measures ANOVA, but it's not clear exactly what factors went into the ANOVA by the results reported lines 170-185. I recognize that often in higher-impact journals including a full description of the factors and levels of statistical tests is considered a tedious waste of space, but I feel that holds only in cases where the structure of the test is obvious. In my opinion, that is not the case here.

      3) Line 209-210: "the probability of choosing bandits with a lower expected value (here the low-value bandit, Fig 1e) will be higher. We investigated whether such behavioural signatures were increased in the long horizon condition (i.e. when exploration is useful), and we found a significant main effect of horizon (F(1, 54)=4.069, p=.049, η2=.07; Figure 3c)." Isn't this just evidence of general exploration, not specifically tabula rasa exploration? How does this test rule out, for example, directed exploration?

    2. Reviewer #2:

      In this study, Dubois and colleagues claim that noradrenaline promotes tabula-rasa in decision making during exploration, using a novel paradigm involving a short and a long horizon conditions, to elicit exploitation and exploration, respectively. The work tests different computational models and examined in particular supposedly less costly forms of exploration, that is 1) tabula-rasa, in which prior information is ignored and the same probability is assigned to all available options and 2) novelty exploration, in which information processing is biased toward choices that has not been encountered previously. They provide evidence that both of these processes coexist with more demanding exploration strategies. In addition, using a double-blind, placebo-controlled, drug study, they provided support for a role of noradrenaline in tabula-rasa exploration.

      This work extends previous work from the same group that aimed at solving the important question related to decision making and the neuromodulatory influences on these processes. The overall approach and the results are clearly presented. The extensive model comparison is particularly interesting to better approach this difficult question. The results are interesting and bring novel insights about the processes at play during exploration and the influence of neurotransmitters on these processes.

      1) Noradrenaline influence on tabula-rasa exploration:

      The authors claim that "Phasic noradrenaline is thought to act as a reset button, rendering an agent agnostic to all previously accumulated information, a de facto signature of tabula-rasa exploration." It might be interesting to discuss the results in terms of a potential impact of noradrenaline onto the subjective value of the choices. For instance, Rogers et al. (Psychopharmacology, 2004) suggest that propranolol affects the processing of possible losses in decision-making paradigms, and might also reduce the discrimination between the different levels of possible gains (Rogers et al. 2004). In another study, Sokol-Hessner et al. (Psychol Sci., 2015) also report a loss aversion reduction after propranolol administration. These effects might also change prior information and reset behavioral adaptation to look for new opportunities. In this latter study the authors also report a lack of effect of propranolol onto choice consistency, contrary to what the present study reports. I was also wondering how this new result about the effect of propranolol on decision making relates to previous findings from the same group (Hauser et al. 2019) where they described noradrenaline influence on information gathering and the urgency to decide. Finally, according to the network reset hypothesis, it has been indeed suggested that a change in the environment might enhance information gathering at the expense of prior expectations to produce an adaptive behavioral output. Perhaps the authors might avoid using the term 'agnostic', this might instead reflect a reduced influence of 'top-down' prior information, related to changes in subjective value of the different choices.

      2) Model selection:

      One strength of the paper is that the authors compared several computational models. The model selection is presented in Figure 4 and in Figure 4 - Figure supplement 1, the authors provide additional information regarding the winning model that accounted best for the largest number of subjects in comparison with two other models, namely the UCB model (with novelty and greedy parameters) or hybrid (with novelty and greedy parameters). It would be useful for the reader to get a better sense about the number of subjects which results favored any given model (i.e. a more exhaustive picture). One could use the same table as the one presented as in the Appendix Table 2 with the respective number of subjects for which the model achieved the best performance. In fact, as shown in Figure 4, the winning model does not look very different (at least visually) from other models such as UCB (with novelty and greedy parameters) or hybrid (with novelty parameter or novelty and greedy parameters) models. As such, I am wondering whether the conclusion about the 𝜖-greedy parameter would hold true if other model with similar performance were tested e.g. with UCB model (with novelty and greedy parameters) or hybrid (with novelty and greedy parameters)?

      3) The authors used propranolol (40mg), a non-selective β-adrenoceptor antagonist to reduce noradrenaline functioning. Previous studies have shown that it significantly decreased heart rate (e.g. Rogers et al., 2004). How that might relate to the reported results? In terms of NA influence and given the distributions of β receptors, could the authors be more explicit about the relation of their work with the potential mechanisms (e.g. Goldman-Rakic et al. J Neurosci. 1990 or Waterhouse et al., Journal of Pharmacology and Experimental, 1982).

      4) Could the authors clarify whether the PANAS questionnaire was administered to the participants prior to or after the drug treatment to understand if this group difference was a mere difference in groups or whether this was a consequence of the drug administration. It would be indeed interesting to have a measure of the drug effect on these parameters.

      5) The authors claim that: "Although tabula-rasa exploration can comprise influences of attentional lapses or impulsive motor responses, the difference between horizon conditions cancels them out". I would suggest to temper this claim as the effect might be more enduring in the long horizons' conditions. The authors might also want to look at RT variability in addition to RT means that did not differ between groups.

    3. Reviewer #1:

      Dubois and colleagues investigate how two modes of exploration - tabula-rasa and novelty-seeking - contribute to human choice behavior. They found that subjects used both tabula-rasa and novelty-seeking heuristics when the task conditions were in favor of exploration. Specifically, participants could, and had to, make more responses in the long-horizon condition, which favored exploration, compared to the short-horizon condition, which favored exploitation. Then the authors provide evidence that blockade of norepinephrine beta receptors leads to decreased tabula-rasa exploration and increased choice consistency whereas blockage of D2/D3 dopamine receptors had little effects. Novelty seeking was not affected by catecholaminergic drugs.

      The paper provides evidence on exploration-exploitation trade-offs from two different points of view. On the one hand, it addresses computational aspects of exploration by investigating how computationally intense forms of exploration might be supplemented by the usage of heuristic strategies. For doing so, the authors propose a novel task allowing them to disentangle these strategies and quantitatively assess their usage. On the other hand, the findings presented in the paper shed some novel light on neuropharmacological mechanisms underlying explorations. Some interpretations seem to go beyond the data and information is missing in the description of the results and the computational approaches used. In general though, the manuscript conveys the impression of a well-designed and carefully conducted study.

      Major points:

      General

      1) It is one thing to come up with computational terms and model-based quantities correlating with behavior but a different one to show their psychological meaning. Did the trials with tabula-rasa exploration or novelty exploration differ in terms of response times from the other types of responses? Did participants report that they indeed intended to explore in the tabula-rasa exploration trials?

      2) On a related note, how do the authors distinguish random (tabula-rasa) exploration from making a mistake? From how the task was designed, choosing the low value option appears to receive a more natural interpretation as a mistake rather than as exploration because this option was clearly dominated by the other options and remained so within and across trials.

      3) Previous research of the authors (Hauser et al., 2017, 2018, 2019) has associated beta receptor blockade with enhanced metacognition, decreased information gathering/increased commitment to an early decision (Hauser et al., 2018, JNS) and an arousal (i.e., reward)-induced boost of processing stimuli. Of course, it is possible that norepinephrine plays multiple roles, but it appears not exactly parsimonious to imbue it with a different role for each task tested. Are there some commonalities across these effects that could be explained with some common function(s)?

      4) Throughout, the paper implies that a beta blocker provides information about the function of norepinephrine in general. However, blocking beta receptors leaves synaptic norepinephrine to act on alpha receptors; accordingly, beta-blockers can be viewed as partial alpha agonists. Given that the function of these receptor families differs, more care should be taken when describing the nature of the intervention, labeling the groups and interpreting the effects.

      Introduction:

      5) As mentioned above, the paper investigates not only computational aspects of exploration but also the underlying neuropharmacological correlates. However, the introduction focuses mostly on different computational algorithms (which is in itself very helpful for the understanding of the paper!) while the neuropharmacological basis of explorative behavior is only briefly introduced. In the same regard, while some insights were given in the Discussion, it would be interesting to have a rationale for using amisulpride and propranolol already in the introduction.

      6) Relatedly, the introduction focuses on tabula-rasa and novelty strategies based on the argument that these are more computationally efficient. The authors may also want to motivate this with the perspective of neural constraints/brain process. Specifically, they argue that it may be computationally demanding to process the expected value (mean) and variance of choice options. However, computational efficiency has been put forward as an argument for why mean-variance-like signals are coded in the brain, particularly with multi-outcome options where expected utilities are difficult to compute (D'Acremont and Bossaerts, 2008). Thus, the computational efficiency argument at the moment seems insufficiently motivated.

      Materials and Methods:

      7) Successful performance of the task is based on the ability to discriminate between different reward types and select the one with the higher value. From the experimental design description, one can see that in order to do so, the subjects needed to distinguish between different apple sizes. In this regard, a question arises: how large was the difference between two adjacent apple sizes? Was it large enough so that after a visual inspection, the participant could easily understand that the apple size = 7 was less rewarding than the apple size = 8? Finally, since the task requires visual inspection of reward stimuli, was the subject vision somehow tested and did it differ between groups?

      8) The point of heuristics from a psychological perspective is that they dispense with the need to use full-blown algorithmic calculations. However, in the present models, the heuristics are only added on top of these calculations and the winning model includes Thompson exploration. Stand-alone heuristic models would do the term more justice and one wonders how well a model would fare that includes only tabula rasa exploration and novelty exploration.

      9) The simulations provide a nice intuition for understanding choice proportions from different models/strategies (Figures 1e and 1f). However, it would be helpful to provide simulated results for long and short horizons separately. Do the models make different predictions for the two horizons? Additionally, it would be helpful to also show the results from other models (i.e. the proportion of low value bandit chosen by novelty agent). These can be added in the supplement.

      10) One of the best-known effects of propranolol is to reduce heart rate. Did the authors measure heart rate and can they control for the possibility that peripheral effects of the drug explain the findings (and what was the reason for not collecting pupil diameter data, contrary to the previous research of the authors)?

      11) The long horizon condition appears to confound exploration with higher effort demands and longer delays to reward, at least in the early responses. If the authors cannot control for these they should mention them as limitations.

      12) Not only choice rules but also value functions seem to differ between Thompson and UCB (lines 583 and 593). This raises the question how well pharmacological effects on choice rules can be distinguished from effects on valuation and how confident we are that the observed effects indeed arise from changes in choice rules.

      Discussion:

      13) Line 410: The statement that memory is not at play in the present task because all information is always visible on the screen seems too strong. At least some exploration-relevant information, such as the overall distribution of outcomes across all options, is not presented and may be remembered differently by the different groups.

      D'Acremont M, Bossaerts P. Neurobiological studies of risk assessment: a comparison of expected utility and mean-variance approaches. Cogn Affect Behav Neurosci. 2008;8(4):363-374. doi:10.3758/CABN.8.4.363

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

    1. Reviewer #3:

      This is a comprehensive meta-analysis of empirical literature on sex differences in mammalian trait variability. The authors nicely articulate competing hypotheses: "estrus-mediated variability" (which predicts higher trait variability in females because they exhibit cyclic reproductive [estrous] hormone secretion that occurs over multi-day timescales) vs. "male variability hypothesis" (which predicts higher trait variability in males because they are the heterogametic sex). Several prior meta-analyses related to this have not provided support for the estrus-mediated variability hypothesis. The analysis performed here differs significantly from prior work in that the subjects were 27,147 mice from the International Mouse Phenotyping Consortium, which generated over 2x10^6 data points. Unlike other meta-analyses, the subjects of this analysis were therefore more systematically evaluated (9 WT strains across 11 labs). A total of 218 continuous traits were evaluated, grouped into 9 functional trait groups. Some traits were biased towards males and others towards females. There was no consistent pattern of greater variability in either sex. The results support a straightforward conclusion that neither hypothesis adequately explains patterns of trait variability. the discussion is a restrained defense of the practice of including females (please clarify that monitoring of estrous cycles was not performed in these studies so the females are classified as as "unstaged"); consequently females can be included in research studies without a default assumption that they are any more likely to introduce more variability than including males. The authors also apply their data on widespread differences in trait specific lnCVR values to the potential for phenotypic response to selection due to rapidly changing environmental events. The discussion is well written with the sections that are each meaningful. The web-based tool is a very helpful contribution. The discussion of statistical implications of the work (e.g., equalizing power and Type I consequences of unequal variance) is of significance to research on mammalian biology.

      1) The present work adds important new information to a growing literature (see for example Smarr BL, Rowland NE Zucker I. Male and female mice show equal variability in food intake across 4-day spans that encompass estrous cycles. PLoS One. 2019 Jul 15;14(7):e0218935) indicating that incorporation of unstaged female rodents in biomedical research does not increase variability compared to that generated by males; importantly, it also specifies several circumstances in which specific traits are more variable in one sex than the other.

      2) The statement on line 41-42 is a strong overgeneralization and should be tempered and/or clarified: "However, scientists in (bio-)medical fields have not traditionally regarded sex as a biological factor of intrinsic interest (2-7)." This is an overstatement. The study of sex differences and sexual differentiation in mammals (a class of animals of most direct relevance to biomedical research) has a long history, complete with dedicated journals (e.g. Biology of Sex Differences), learned societies, etc. Such an enduring interest in sex among biologists only makes the present work more interesting and important. This critique may be addressed with a more clear definition of "(bio-)medical", here, and throughout the manuscript.

      3) Colloquialisms such as "This is an important step, but we can go much further" (line 50) are vague and difficult for this reader to endorse as true, as written and we recommend deletion.

      4) In the Introduction, the authors delineate competing hypotheses: "estrus-mediated variability" vs. "male variability hypothesis". In their elaboration of the former hypothesis, the authors should clarify that the historical concern regarding decreased power and increased variability in females compared to males specifically regarded the inclusion of females that were not synchronized (or "staged") so as to be tested/treated on the same day/phase of the estrous cycle. Data from these so-called 'randomly cycling' females were predicted to be more variable than data from males. "Staged" females were presumed to be less variable, and the interventions and costs associated with the presumed need for staging are viewed as onerous. But a growing literature, including the important new results from the present study, argues that there is no empirical support for the contention that females generally are more variable than males across many traits.

      5) Methods: the data analysis pipeline is clear and rigorous. It should be stated that the data used come from unstaged females.

    2. Reviewer #2:

      Summary:

      There are significant methodology and interpretative concerns with this article. The analysis over stretches and does not consider the potential weaknesses. It needs to refocus on the primary question of whether there is a pattern in the sex's impact on the variance for these traits. The analysis then needs to go deeper and remove other sources of variance that could be confounding their findings.

      Major comments:

      Methodology

      1) The methodology is not clear.

      2) Meta-analysis is used when you don't have access to the raw data - why not use mixed effect regression models?

      3) The variance summary metric is calculated for an institute and strain for data collected in multiple batches, with potential baseline shifts as the data is collected across many years. This isn't a representative metric of variability for a sex as there are multiple sources of variance impacting this metric.

      4) Figure 3b and code: It is very rare for a fixed effect analysis to be justifiable. Why assume that there is no variation between the different traits when testing effect of sex? Normally you would explore sources of heterogeneity by meta regression rather than just assume it is sex differences.

      5) "A previous study found that the heterogametic sex was more variable in body size". If this holds, would not traits that are correlated with body weight also demonstrate the same finding?

      6) "minimum of 2 different institutes" is a very low N. Why would this give meaningful analysis? What was the minimum amount of data for a strain*centre for a trait to be included?

      7) Consider the recent discussions on phenotypic plasticity and the phenotypic interaction with the environment (https://www.nature.com/articles/s41583-020-0313-3 ). This suggests a fixed effect model is not appropriate. The results and approach need discussing in this context.

      Conclusion;

      1) It isn't made clear that this analysis is trying to assess the role of sex across strains and institutes.

      2) There are no discussions of the potential weakness of the analysis.

      3) Figure 3a

      • Why is there no discussion of measures of heterogeneity within the meta-analysis at the population level?

      • Should the differences in classification as male or female biased within functional group not be assessed by a fisher exact test and the p value adjusted for multiple testing before you state an area has a difference?

      4) Concern by "Notably most SD trait means also show the greater difference in trait variance" - seems to be an eyeball rather than a statistical analysis

      5) I have concerns on relating these results to power

      • These estimates are from an analysis across strains, batches and institutes looking at global behaviour in the traits. This absolute variance measure would be very different to that seen in a lab within a classic parallel group design study with one strain.

      • They advocate a factorial design but suggest the powering of the sexes independently. This feeds into the misconception that to study both sexes you have to double your sample size.

      6) The authors report that this analysis on mean differences was in accordance with previous studies. Not really. The differences will arise from the different approaches taken and highlights how this summary metric is losing sensitivity. The authors relate many of these changes to differences in body size. However, the earlier published analysis, adjusted for body weight.

      7) Why would the "difference in variability impact on the potential of each sex to respond to changes in specific environments"?

    3. Reviewer #1:

      This study looks at whether there are sex differences in the variability of traits in mice, via a meta-analysis of published datasets. The analyses show that females typically show greater variability in traits categorised as immunological, while males show greater variability in morphological traits. Traits related to the eye were also more variable in females. These findings are interpreted in light of evolutionary theory about greater between-individual variability in males, and greater within-individual variability in female mammals due to estrus. A handy online tool is provided to allow researchers to consider possible sex-specific variability in traits at the experimental design phase.

      I enjoyed the paper and thought the question and conclusions were interesting. The figures are great. I am not an expert in meta-analyses, so my comments mostly relate to the hypotheses and discussion of the results.

      1) The paper jumps about quite a bit between talking about sex differences relevant to mammals only and those that might apply to animals more generally. For example, the Introduction begins with reference to biomedical research (mammals) and the estrus hypothesis (mammals) but then introduces the "male variability" hypothesis by stating the "males are often the heterogametic sex". Given that the subject of your study is the mouse, I think it would be more logical to restrict the Introduction to mammals (i.e. explain the two hypothesis with respect to mammals). You could then include a section in the Discussion on if/why we might expect the same trends in other animals (see below also).

      2) I feel that the rationale behind the two hypotheses (female estrus and male variability) could be explained better in the Introduction. i.e. WHY estrus might produce higher variability in females and WHY stronger sexual selection or male heterogamety might produce greater male variability. A few extra sentences on each would probably be enough. At the same time, I think it would be worth clarifying a priori the extent to which these hypotheses are expected to apply to different traits. Some predictions are given only in the Discussion (e.g. estrus expected to mostly affect immune response and physiology).

      3) The Discussion on eco-evolutionary implications (line 184) would be greatly strengthened if it included at least one specific example of how sex-specific differences in trait variability might affect the evolutionary trajectory of a population. At present, one very general hypothetical is given, but I did not find it easy to follow (disease/climate change kills more of one sex than the other --> sex ratio of the population is skewed (temporarily?) --> mating system is "influenced" --> "downstream effects on population dynamics"). It is also stated that "modelling sex difference in trait variability could lead to different conclusions compared to existing models (cf 44)". The cited study there is on Eurasian sparrowhawks. I'm not familiar with this sparrowhawk study, but perhaps it is a suitable one to highlight in more detail as a clear example? What sort of different conclusions would be expected? It's fantastic that your paper is aiming to speak to a broad range of biologists, but I think that greater clarity in this section is needed to make ecologists and evolutionary biologists really take notice.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers agreed that the topic of the study was an interesting one, and that the issue of sex differences in trait variability is relevant to good experimental design. As you'll see below, however, Reviewer #2 felt that the current analytical treatment of this mouse dataset is not appropriate to the question. Of particular concern is that sources of variability other than sex were not adequately considered.

    1. Reviewer #3:

      Saderi and colleagues study the effects of arousal and task engagement on sound responses in the (primary) auditory cortex and inferior colliculus of ferrets. Arousal is measured by pupillometry, task engagement by contrasting an auditory detection task with passive sound exposure, and effects are quantitatively dissociated using a general linear model and multiple regression. The authors find that the sound responses of about half of the recorded neurons are modulated by task engagement and/or by arousal, with IC neurons most frequently modulated by arousal and AC neurons modulated by both factors. Increased arousal was associated with enhanced sound responses. In AC, task engagement was associated both with enhanced and suppressed sound responses. In IC, task engagement was associated with suppressed sound responses.

      Major comments:

      1) Some of the main conclusions of the results from AC are not novel. Using a different experimental approach, the study of Knyazeva et al., 2020, Front Neurosci. 14: 306 already suggested that the discharges of many neurons in AC are affected by arousal, that task effects can disappear if effects of arousal have been accounted for, and that there is no systematic difference in response modulation between neurons tuned, or not tuned, to task-relevant sounds. Dissociations of the effects of different non-auditory factors on sound responses in AC have also been described by Zhou et al., 2014, Nat Neurosci. 17:841-850 and by Carcea et al., 2017, Nat Comm. 8:14412.

      2) The study is based on a relatively small number of neurons and behavioral sessions, potentially reducing the strength of the statistical inference, e. g., that IC was more strongly affected by arousal than AC. It appears that data from about 20 behavioral sessions entered analyses. This estimate is based on the information that 1-3 behavioral blocks were tested during individual sessions (line 611) and that Figure 1F shows the results of about 36 active-passive comparisons in four animals. This indicates that, on average, about 10 neurons were simultaneously recorded in individual sessions. Therefore these neurons were statistically more dependent than neurons recorded in different sessions. This needs to be considered for potentially global effects such as arousal and task engagement. The authors should include this information, together with the number of trials in active and passive blocks and whether the responses to different TORCs were averaged.

      3) The authors did not distinguish single unit and multiunit data. This difference should be considered in detail because it could affect the interpretation of whether there are units that are affected both by arousal and task engagement.

      4) The authors should include a statement that the results on the effects of task engagement may not apply to all types of auditory tasks. This is highly important because the authors used an auditory detection task, which is a task that may not require AC at all.

    2. Reviewer #2:

      Main Review:

      Saderi and her colleagues have performed a cool study that attempts to determine whether and how two behavior-related variables - arousal and task engagement - differently influence activity in two stages of the auditory neuraxis, IC and A1. They define arousal as pupil diameter and task-engagement as a binary variable determined by the experimental block design. They find that although these two parameters often co-vary, they sometimes do not. They find that IC was more influenced by arousal and A1 was modulated by both arousal and engagement. One of their main findings is that previous reports of task-engagement effects may in fact be attributed to arousal state.

      This is a nice quantification of neural activity and behavior. My major concerns are all thematically linked and they stem from the use of a continuous readout of arousal (i.e. pupil diameter) but a binary readout of task-engagement (i.e. the block the animal is in at any moment). Relatedly, I am interested in knowing whether neural effects can be accounted for by the animals from which they were recorded (and from that particular animal's behavior). I expect that my enthusiasm for this paper will not be diminished in any way regardless of any changes that come out of the deeper analyses outlined below. Also, I do not intend that responses to these concerns will require any new experimentation.

      Major concerns:

      1) Can task engagement be explained more rigorously as a continuous rather than binary variable? In my experience training and testing animals on appetitive behaviors, task engagement can wax and wane within a single block, across an experimental recording session, or across days of behavioral testing. Such changes in engagement can be inferred, for example, as strings of (seemingly) easy trials in which the animal does not answer correctly. The authors should attempt to quantify through behavioral analysis (running lapse rate, lick latency, etc) whether and how task engagement may be changing within and across task blocks. Alternatively, the authors could clearly explain that their binary encoding of engagement has limitations and may not actually describe the animal's engagement at any given moment.

      2) Can a continuous readout of task engagement better explain neural activity? For many neurons, task-engagement does not provide unique predictive information, yet for others it does (e.g. Fig. 3C). If task engagement can be modeled as a continuous rather than binary variable, is it still true that "some apparent effects of task engagement should in fact be attributed to fluctuations in arousal" (Abstract)? In general, I worry that the current analysis is effectively a floor on task-related modulations since it assumes constant engagement throughout a task block.

      3) Can neural heterogeneity be attributed to animal-to-animal behavioral variability? Even if task engagement does not vary within a task block for any one animal, it may indeed vary across animals. In theory, the actual task engagement of some animals might more closely mirror the block design that the experimenters are imposing, and some animals may simply have a higher level of engagement than others. This could mean that some results that are currently attributed to population-level heterogeneity (e.g. some A1 neurons do this, while others do that) might actually be attributed to animal-to-animal heterogeneity as opposed to distinct neural populations. For example, the authors state that for a subset of neurons, persistent task-like activity after a block change can be accounted for by pupil, whereas for other neurons this effect cannot (Fig. 7, line 452). The authors should confirm that key findings are consistent across animals and not related to degrees of task engagement (see point #1). If the findings are not consistent across animals but can be explained by each animals' unique behavior, this would also be really cool.

    3. Reviewer #1:

      This study distinguishes effects of generalized arousal and specific task engagement on the activity of neurons in the inferior colliculus and auditory cortex of ferrets as they engaged in a tone detection task, while monitoring arousal via pupillometry. The authors found that arousal effects were more prominent in IC, while arousal and engagement effects were equally likely in A1. Task engagement was correlated with increased arousal. They propose that there is a hierarchy such that generalized arousal enhances activity in the midbrain, and task engagement effects are more prominent in cortex. I have no major concerns, but two points to consider:

      I would like to know how the model would perform if task engagement were modeled as a continuous regressor.

      The authors state that they separated single units and stable multi units from the electrode signal, but I do not see where these data are separately reported.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Saderi and her colleagues have attempted to determine whether and how two behavior-related variables - arousal and task engagement - differently influence activity in two stages of the auditory neuraxis, IC and A1. They define arousal as pupil diameter and task-engagement as a binary variable determined by the experimental block design. They find that although these two parameters often co-vary, they sometimes do not. They find that IC was more influenced by arousal and A1 was modulated by both arousal and engagement. One of their main findings is that previous reports of task-engagement effects may in fact be attributed to arousal state.

    1. Reviewer #3:

      The role of histone chaperone Hira during the formation of paternal pronucleus has been well documented in both mouse (Lin et al, 2014; Inoue and Zhang, 2014; Nashun et al, 2015), and in Drosophila (Loppin et al, 2005). The histone chaperone Hira is known to act in a protein complex with Ubn and Cabin 1 (Tang et al, 2012). The authors built on their previous findings (Lin et al, 2014) and assessed the effect of the Ubn and Cabin 1 oocyte deletion during the fertilisation. Not surprisingly, the observed phenotypes more or less recapitulated the observation made using Hira deletion. In this sense, the findings are not novel. It has also been previously shown that deletion of Hira leads to the removal of the whole complex (Nashun et al, 2015).

      The authors add some potentially interesting observations using 1PN (aberrant) human zygotes. Although the observed lack of Hira complex components in these zygotes could be interesting, the causality is not established.

      Beyond the statements above, there are major issues that would need to be addressed:

      1) Validation and characterisation of the ko/kd models: Ubn1 knockdown using morpholinos: Fig S1C - lots of protein remains present in the nucleus, Hira Zp3Cre driven oocyte specific knockout - how much Hira protein is left in the zygote?

      2) H3.3 staining to document the deletion of the complex: Figs1E - not obvious what the authors are trying to say here? How is H3.3 signal quantified? Only paternal signal should be affected by the KO ?? The same is true for Fig2D - no signal is obvious even in the control.

      3) Presence of Cabin1 in the zygote - pre-extraction needs to be carried out (Fig 2C)

      4) Fig S2: overexpression of Hira : is there a significant difference between the Hira signal in control (het) and KO zygote?? It does not appear so, which undermines the whole knockout study. The same is true for the quantification of H3.3 . What should the quantification of GFP signal demonstrate?

      5.) The authors say that they developed a conditional KO for Hira in the main text. But they haven't verified the Hira deletion after Cre expression (by IF or PCR)

      6) "Data not shown" in the text. The authors say that their new hiraF/F, zp3 females are sterile but they don't show it.

      7) The authors never show anti-ubn1, cabin1 staining on HiraKO.

      8) Language: the text needs editing. There a number of statements that are wrong: Hira (or any other component of the complex) does not incorporate into chromatin - the complex associates with chromatin to incorporate histones (there are several other examples of similar statements).

    2. Reviewer #2:

      A high proportion of in vitro fertilized eggs yield zygotes with 1 pronucleus (1PN) instead of the normal 2PN. The authors previously showed that maternal Hira is important for H3.3 deposition on the male pronucleus; and that the loss of Hira leads to a high proportion of 1PN zygotes.

      In this manuscript, the increase in 1PN zygotes after fertilization was confirmed following deletion of Hira in mouse oocytes. The effect could be rescued upon microinjection of Hira RNA. The authors also depleted the other Hira protein complex subunits, Cabin1 and ubinuclein-1. The 1PN phenotype was again seen. Human 1PN zygotes were finally shown to lack HIRA on the abnormal pronucleus.

      This is an interesting observation that is definitely worth the investigation. The lack of HIRA components on the abberant pronucleus in 1PN human zygotes is an important find. However, because the authors had already shown that the loss of Hira correlates with a high proportion of 1PN in mice, the experiments (though respectable) provide limited novelty as is.

      Main concern:

      • Unless there are reasons to believe that there are Hira-independent Cabin1 and ubinuclein-1 functions in oocytes, their depletion only serves to confirm the role of Hira and its relation to the 1PN phenotype. The rescue experiment and human data is important, but again serves as confirmation on the role of HIRA without further mechanistic insights.

      Perhaps novelty could come through a deeper exploration on Hira levels in oocytes and what differentiates 'poor quality' oocytes that lead to 1PN from normal ones. For example, does maternal Hira RNA and protein levels increase with maturation? Are HIRA levels lower in poor quality oocytes? Is there a step in the IVF procedure that affects Hira levels and/or changes on the paternal chromatin?

    3. Reviewer #1:

      This study establishes the role of additional members of the Histone chaperone HIRA complex in male pronucleus formation in mouse. Genetic inactivation of maternal Ubn1 and Cabin1 affects histone deposition following protamine removal on the fertilizing sperm nucleus in a way similar to maternal Hira mutants. However, the study does not provide new insights about the way these factors function or cooperate during paternal chromatin assembly. Analysis of aberrant human zygotes revealed a correlation between the lack of male pronucleus and the absence of maternal HIRA. Although the data are generally convincing, the manuscript does not sufficiently acknowledge earlier work. Notably, the rescue experiment which is presented as a "proof of principle" for future human therapy is not entirely original.

      Substantive concerns:

      1) The authors present the (partial) rescue experiment of maternal Hira KO (oocyte injection of Hira mRNA) as an original experiment that serves as a proof of principle for future therapy. However, a very similar rescue experiment of Hira KD oocytes was successfully performed by Inoue & Zhang, NSMB, 2014, a work that is not cited in the manuscript.

      2) The authors used PLA to detect interactions between the Hira complex proteins in mouse zygotes. However, it is not clear from the images in Fig. 1C how the specific interactions are actually appreciated. The foci seem to be everywhere and not particularly in the male pronucleus shown in the insets.

      3) The occurrence of 1 PN human zygotes is intriguing but the origin of this defect is unknown. It could reflect a more general problem than the sole lack of Hira expression. In this context, overcoming male pronuclear formation by re-expression of Hira seems to represent a hazardous therapeutic strategy.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All three reviewers agree on the fact that the study, although interesting, does not appear sufficiently novel regarding the already established role of Hira complex in sperm chromatin remodeling in mouse and other animals. In addition, although the reviewers were intrigued by the observation that 1PN human zygotes lack HIRA, the origin and timing of this defective expression are not established. The reviewers share the feeling that these experiments do not really bring novel insights about the regulation of HIRA levels in mammalian oocytes.

    1. Reviewer #3:

      The manuscript by Vera et al. reports on cohesin-dockerin interaction studies of cellulosomal subunits using mainly single-molecule FRET, but also molecular dynamics simulations and NMR measurements. The authors study a range of cohesin-dockerin pairs and discover a varying distribution of two alternative binding modes that apparently follows a built-in cohesin-dockerin code. Finally, the authors show that prolyl isomerase activity can regulate kinetics towards equilibrium/steady state as well as distribution of the binding modes. The results are important for understanding the mechanistic basis cellulosome function.

      In my opinion, this is an important paper, which provides new interesting insight into cellulosome function. The single-molecule FRET and molecular dynamics parts of the study are well designed, the corresponding experiments are thoroughly performed, and data are carefully analysed. The manuscript is also very well written. However, there are several issues that need to be addressed:

      1) The authors claim to have uncovered a built-in cohesin-dockerin code. However, the principles of the code remain elusive. For example, what is the relationship between the Pro66 cis/trans conformation and the binding mode? What needs to be known to predict the dockerin binding mode? This point should be elaborated in the manuscript.

      2) The conclusion that prolyl isomerase activity is able to change the distribution of binding modes requires more consideration and/or research. First, it seems from Figure 6A that the expected steady-state B1 fraction of c1C-CcCel5A and c1C-CcCel5A+prolyl isomerase could be the same within error ranges. Second, it is unlikely that the enzyme will change the equilibrium ratio of Pro66 cis/trans conformation that is controlled by thermodynamics. Therefore, the prolyl isomerase activity may only be relevant in case of slow re-equilibration kinetics.

      3) NMR measurements were performed in order to check if the dockerin ́s Leu65 - Pro66 peptide bond is in the cis conformation in the cohesin-dockerin complex. The authors found very similar dockerin chemical shifts in the absence or presence of 1.3 equivalents of cohesin suggesting that the binding does not significantly alter the conformation. However, this is an indirect measurement, although NMR also allows direct determination of Pro cis/trans conformation (based on 13C chemical shifts and NOE patterns, e.g. see https://doi.org/10.1107/S1744309110005890 ). The authors should check if direct determination of the cis conformation is possible in their case. Also, peak doubling in the 15N-1H HSQC spectrum should be checked, which is an indication of Pro cis/trans equilibria.

      4) Furthermore, a direct measurement of the Pro66 cis/trans ratio for two cohesin-dockerin pairs that show distinct B1/B2 preferences would be useful to clarify the role of Pro66.

    2. Reviewer #2:

      By analyzing the formation of a series of dockerin-cohesin complexes from the cellulosome of two species of the Clostridium bacteria using smFRET experiments and other techniques, the authors conclude that the overall equilibrium between the two binding modes of the complex can be allosterically regulated by the enzymatic isomerization of dockerin's proline 66, which is part of a structural clasp between the N and C terminus of the protein. They speculate that a mechanism of enzymatically or environmentally driven clasp de/stabilization may be present in other dockerin-cohesin complexes, as well, and may provide the cellulosome with the required plasticity to carry out its function.

      In large part the work is clearly written and the claims seems to be supported by the data provided, however there are few issues that the authors should address:

      1) The computer simulations presented in the manuscript are not described very clearly. For example on page 19 regarding the foldX MC method: the author identifies two variables to describe the binding: an "axis" Z and a rotation angle phi. An axis, however, is defined by three coordinates, while the authors always associate a single number to Z. The reader has to guess that the axis is the axis of symmetry of the two binding modes and Z is only an offset along the axis. Similarly in eq. (4) the authors associate the sum over the conformations indexed by i to an average (first line page 20) but in reality that sum and the others that appear in the argument of the logarithm of equation 4 are an estimate of the partition function of the system.

      2) The computer simulations of the complex do not seem to add significant information to the overall message of the manuscript: the rigid-body coarse grained approach does not allow to distinguish allosteric effects as the authors already admit, while the FoldX approach provides only very large errors. Most probably, given the presence of well defined crystallographic structures for some of the complexes, simple free-energy estimation techniques (i.e. metadynamics, steered MD etc.) based on classical atomistic molecular dynamics simulations (with limited homology modelling for the mutants) would have provided more accurate results. The authors should explain why they did not consider this approach.

      3) The data about the time dependency of the FRET signal in C. cellulolyticum are a bit worrying. The authors should dissect them more carefully, possibly adding additional control experiments to exclude artifacts (whose possible presence is also admitted by the authors in the caption of Fig.6 figure supplement 3). Then, if the process is confirmed, they should really try and identify the underlying process in a more precise way.

      4) Fig 5C and Fig 5F show two different curves for the same data. Similarly Figure 6 figure supplement 4 C shows two different histograms for the same complex. If this is the result of repeated experiments, the authors should make an effort and report histograms with error bars. Visual comparison of histograms which have a large intrinsic variability may be misleading.

      5) A picture showing a model of the molecular structure of the dyes attached to the molecular structure of the proteins would be very useful to to understand the relative size of the objects.

    3. Reviewer #1:

      Vera et al. report the detection of binding and quantification of populations of two different orientations of assembly of dockerin and cohesin, which define structural organization and plasticity of bacterial cellulosome multi-enzyme complexes. The authors apply smFRET spectroscopy in in-vitro experiments carried out on isolated, modified domains. Vera et al. find uneven distributions of populations of the protein in the two modes of binding. Vera et al. investigate the molecular origins of the observed bias by studying homologous sequences obtained from various organisms, by mutagenesis and by domain-swap experiments. The authors complement experimental studies by Monte Carlo and molecular dynamics simulations. The authors arrive at the conclusion to having identified a cohesion-dockerin "code" of binding and a novel allosteric control mechanism involving cis/trans isomerization of a C-terminal proline residue in dockerin.

      Structural plasticity of the cellulosome induced by variable assembly of the cohesion-dockerin adapter, facilitated by rotational symmetry of the two-helical binding interface, is an interesting biological phenomenon. The dual binding mode is already reported in the literature (refs. 23, 24, Wojciechowsky et al. Sci Rep 2018, 8:5051), somewhat limiting the novelty of findings. But forces and mechanisms that drive the orientations are not yet understood. The authors successfully developed a smFRET assay that can distinguish the two binding modes of the cohesion-dockerin interaction and that can measure the respective populations in vitro. Their homology, mutagenesis and domain-swap experiments show that specific interactions within the binding interface are not responsible for modulation of orientation. Instead, they show that interactions of a C-terminal proline can modulate binding. However, the relevance of findings for the in-vivo situation appear unclear.

      I have the following concerns:

      1) The authors' smFRET assay clearly distinguishes the two binding modes B1 and B2. A key element of their work, which goes beyond state of the art, is the quantification of populations estimated from integrals of smFRET histograms and PDA. Their FRET analysis presumes that photophysics or quantum yields of donor/acceptor fluorophores are independent on orientation of binding. But the protein micro-environment at the positions of the labels close to the binding interface may change in B1 and B2 orientation, which may modulate photophysics and thus FRET. This would, in turn, lead to errors in estimation of populations. The authors could test for such effects by measuring fluorescence of donor-only and acceptor-only constructs in B1 and B2 orientations.

      2) From their study of homologous sequences, mutagenesis experiments and swap of helix 1 and 2 of dockerin, the authors provide a solid body of data that shows that specific interactions within the binding interface are not responsible for the swap of binding mode. Instead, their results show that interactions of a C-terminal proline can modulate binding through an elusive mechanism. Proline mutagenesis experiments and enzymatic cis/trans isomerization show significant effects. But the relevance of a prolyl isomerase for the modulation of the dockerin-cohesin interaction in vivo remains speculation. The conclusion calls for additional experiments where, e.g., changes in catalytic activity of cellulosomes are measured upon application of a prolyl isomerase. Alternatively, the packing of enzyme subunits in the dense cellulosome may be responsible for alternate binding. Such protein-protein interactions may also modulate a proline interaction.

      3) An allosteric mechanism of the proline interaction in modulating binding, as proposed in this work, is not sufficiently supported by the data presented. The flexibility of the C-terminal tail of dockerin, which hosts the proline, and its close proximity to the cohesin binding interface, evident in structures (please provide PDB IDs in Fig. 1), may allow a direct interaction of the proline with cohesin.

      4) The impact of the intrachain proline/tyrosine interaction on binding, however, identified by the authors, is very interesting. This finding calls for further investigations on mechanistic details. Here high-resolution techniques, like NMR, which can provide atomic details of protein structure and dynamics, are desirable. Such experiments could help to identify potential allosteric effects on the conformation and thus on binding.

      5) Having said that, the authors state (in the abstract and introduction) to have performed NMR experiments in their study. But no NMR data are shown or discussed in this manuscript.

      6) If the C-terminal proline was a biologically relevant switch that modulates binding, this residue should be conserved. Have the authors checked conservation of the C-terminal proline in homologous sequences?

      7) The authors conclude to have identified a cohesion-dockerin "code". The word "code" in this context is unclear to me. What do the authors mean by "code"?

      8) The authors conducted and analysed a set of kinetic experiments. But these experiments are not described at sufficient detail in the results and methods sections.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers find your work of interest and acknowledge your development of an elegant smFRET assay that can detect and quantify populations of cohesion-dockerin binding orientations. They further acknowledge your interesting finding of a role of the molecular clasp in modulating binding orientation involving a terminal proline. The reviewers find, however, that your conclusions of an enzymatic and allosteric control mechanism present in the cellulosome is not sufficiently supported by the data presented. The study lacks molecular-level information required to identify allosteric effects, which could, for example, be obtained using NMR spectroscopy that falls short in the present work. The proposed Monte Carlo approach and coarse-grained computer simulation does not provide sufficient molecular details and dynamic information to obtain mechanistic insight. There are further issues with the kinetic experiments. Some reported quantities are within error and controls are required to exclude artefacts.

    1. Reviewer #3:

      In this manuscript by Hansen et al., the authors describe three low (3.0 to 4.0 Å) resolution crystal structures of Ca2+-ATPase from Listeria, a gram positive bacterium. Two are crystal structures of wild type protein with B eF3- and AlF4- in the absence of Ca2+, thus, likely to represent the E2P ground state and E2~P transition state. The third one is a structure of a G4 mutant, in which 4 Gly residues are inserted into the A-domain -M1 linker, with BeF3- and Ca2+-present in crystallisation, designed to capture the E2P[Ca2+] state. Authors state, however, the three structures are virtually the same and that the E2·BeF3- crystal structure represents a state just prior to ("primed for") dephosphorylation. They also propose that proton counter transport "mechanism" is different from that of SERCA.

      As Listeria Ca2+-ATPase has been studied by a single molecule FRET, its crystal structures will certainly contribute to our understanding of ion pumping. Furthermore, different from SERCA, Listeria Ca2+-ATPase transports only one Ca2+ per ATP hydrolysed. Therefore, how site I is managed is an interesting topic, although let's not forget the same 1:1 stoichiometry is observed with plasma membrane Ca2+-ATPase (PMCA), for which an EM structure appeared in 2018 (ref. 9). The authors indeed find that the Arg795 side chain extends into binding site I. This part is solid and a more elaborate (and interesting) discussion could be made than what is currently described.

      Another solid finding is that the two E2·BeF3- crystal structures are similar to the E2·AlF4- crystal structure, although how similar is unclear as a structural superimposition reporting an RMSD is not provided and the presented figure makes it difficult to judge directly; the structures are viewed from almost one direction, which makes it infeasible to discern the differences in M1 and M2 and in the horizontal rotation of the A-domain. Two or three structures are superimposed, but with cylinders and again viewed from only one direction. As the authors designate that the structures represent H+ occluded states, it is important to clearly show the extracellular gate is really closed to H+ (not only to Ca2+ as well). For completeness, they should also examine the effect of crystal packing on the A-domain position.

      With regard to the point that the E2·BeF3- structure is "primed for dephosphorylation", only Fig. 2 is shown, in which differences appear to be the path of the TGES loop and the orientation of the Glu167/183 side chain. Their atomic models show that there is plenty of space for the Glu167 sidechain to take an orientation similar to that of Glu183 in SERCA. The authors should, however, provide an omit annealed Fo-Fc map for the Glu167 side chain and explain why that is the preferred and only orientation. If a Glu side chain is free to move, it could adopt in less than a nanosecond a different orientation. If it does, then the difference in the orientation of the Glu side chain does not sufficiently explain "the rapid dephosphorylation observed in single-molecule studies". The authors place further emphasis on proton occlusion and countertransport. However, this part of the manuscript is more speculative and, as detailed later should, at least, be entirely moved to the Discussion section.

      As mentioned, the authors place a larger emphasis on proton countertransport. Here a number of issues show up. First of all, I think they have frequently used the term "occlusion" improperly. From my understanding, occlusion of a site (or ion) means that the site (or ion) is inaccessible from either side of the membrane. This means more than closure of the gates, as the two gates have to stay closed for a substantial length of time (i.e. locked). It is experimentally well established with SERCA that Ca2+ ions are occluded in E1P species. It can be shown that the lumenal gate is closed for Ca2+ in the E2 state. However, that does not necessarily mean that the gate for H+ is also closed. As far as this reviewer knows, nobody has actually demonstrated that H+ is occluded, even in the E2 state of SERCA.

      Furthermore, the authors presume that protons enter the binding sites through a different pathway from that used for Ca2+ release, citing ref 26. However, if it does, can closure of the gate for Ca2+ really mean closure for the gate for H+? This seems a contradictorily statement as the authors designate that the E2·BeF3- state in Listeria Ca2+-ATPase as a proton occluded state (p.12). Apparent closure of the gate for Ca2+ on the extracellular side in a crystal structure seems insufficient for such a statement. One must keep in mind that a crystal structure merely provides a possible conformation in that particular state. It may not, however, represent the most populated conformation for that state. It is equally plausible that the E2·BeF3- complex takes a closed conformation for only a small fraction of the time. At this resolution it is simply not possible to determine if H+ occupies the binding site in the crystal structure. Furthermore, although it may be possible to show the gate is closed for Ca2+, it would be very difficult to show the gate is closed for H+. Thus, more experimental evidence is required to support that the structure represents a H+ occluded state.

      The authors write in the Abstract "Structures with BeF3- mimicking a phosphoenzyme state reveal a closed state, which is intermediate of the outward-open E2P and the proton-occluded E2-P* conformations known for SERCA". In essence this statement is fine, although what "closed" means is still unclear to me. In Figure 1, the authors state that "LMCA1 structures adopt proton-occluded E2 states". This statement is a bit misleading, because, in E2·BeF3-, the lumenal (extracellular) gate can in fact be opened and closed, at least with SERCA. As the authors recognize (p.14), the BeF3- complex of SERCA can be crystallised in two conformations, one with the lumenal gate is closed (with thapsigargin) and the other with the gate open; yet, they write "In SERCA, the calcium-free BeF3 -complex adopts an outward-open E2P state,..." p.8). This is for lumenal (extracellular) Ca2+, not for H+. Further evidence is required to establish that the extracellular gate of LMCA1 is fixed in a closed position for H+ in E2·BeF3-. Again more experimental evidence is required to support that E2·BeF3- is a H+ occluded state.

      The authors write that "SERCA has two proposed proton pathways: a luminal entry pathway [26] and a C-terminal cytosolic release pathway [27] (p. 9). One has to be careful here, as the luminal entry pathway has not been experimentally confirmed in SERCA. The authors write that "The luminal proton pathway has been mapped to a narrow water channel …” [26]. But since the pathway is not confirmed in SERCA I don't think it can be used to justify that the corresponding part of LMCA1 is mainly hydrophobic and that protons cannot enter through this pathway.

      The description on the exit pathway for H+ also needs clarification. They describe (p. 10; first line) "In SERCA it consists of a hydrated cavity...[27]. ... M7 in LMCA1 further blocks the pathway ... and LMCA1 therefore does not appear to have a C-terminal cytosolic pathway either" and rationalize that "This may explain why no distinct proton pathways are required in LMCA1". I think it should be made clearer that this is a proposal rather than an established fact.

      As H+ release takes place in the E2 to E1 transition the authors state that the E2·BeF3- structure of LMCA1 is different from that of SERCA. However, I don't think they can confidently make such statements without E1 and E2 structures of LMCA1. Furthermore, these descriptions (discussion) should not be in the "Results" section. As they conclude that LMCA1 use the Ca2+ release pathway, which is assumed to be the same as that in SERCA (even though no Ca2+ release pathway is visualised in their crystal structures), for H+ entry, why does SERCA not use the same pathway? I think experimental evidence is required for a proposal that H+ binds to E309 from the cytoplasmic side.

    2. Reviewer #2:

      The manuscript by Hansen et al. presents three new structures of LMCA1, Ca2+-ATPase 1 from Listeria monocytogenes. They determined structures with BeF and AlF, and a Gly4 linker form of LMCA1 in complex with BeF. This latter structure is at 3 Å resolution and was very challenging. The other two structures are at low 4 Å resolution. These structures are a follow up to an excellent single-molecule fluorescence study of the same enzyme. The structures support the main conclusion of that work that LMCA1 more rapidly progresses through the dephosphorylation step of the reaction cycle. The manuscript is well written, the structures and findings are interesting and make a significant contribution, and the work seems ideally suited for this journal. There are no substantive concerns with the manuscript. Overall the R factors are high for the structures, particularly the 3 Å resolution structure for which they should be lower. However, the authors offer a reasonable explanation for this in the supplemental information provided.

    3. Reviewer #1:

      Structural comparison is an important tool to understanding how proteins function at the molecular level. The mechanistic premise of obtaining LMCA1 structures from the gram-positive bacteria Listeria monocytogenes was to understand how Ca2+ pumps have different Ca2+ stoichometies to the mammalian SERCA and how they are proton coupled differently. Per molecule of ATP hydrolyzed, SERCA exports two Ca2+ ions in exchange for 2 or 3 protons, whereas LMCA1 exports a single Ca2+ and perhaps 1 proton in return.

      The paper describes two intermediate states of LMCA1 and from my understanding a mechanism is proposed based on structural differences in ionisable groups at the Ca2+ binding site, in particular the positioning of Arginine 795 that in SERCA is a glutamate. Since a previous crystal structure of LMCA1 was determined the new mechanistic insights rely heavily on the details achieved by the improved resolution. While this is technically an important achievement, just the assignment of side-chains in the current structures is not sufficient to reach the mechanistic conclusions reached and, as such, the current paper is unfortunately too preliminary. Proton-coupling pathways are mechanistically difficult to detangle and require extensive experimentation, such as ITC, mutagenesis and transport measurements as well as computational approaches. Indeed, ion or proton coupling pathways that alter energetics are rarely just the result from differences in a few residues. For example, glucose (GLUT) transporters are passive sugar transporters, whilst the bacterial counterparts are proton coupled. The proton coupling in the bacterial proteins is due to single aspartic acid residue in TM1. Whilst one can convert the bacterial sugar transporters to be no longer proton coupled by the mutagenesis of this TM1 residue to asparagine, you cannot make GLUT transporters proton coupled by mutating the corresponding asparagine residue to aspartic acid.

      One would have liked the authors to biochemically demonstrate how they could evolve LMCA1 to function similar to SERCA. This would have broader implications in our understanding of how biological systems can evolve substrate coupling and energetics.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      We all agreed that the LMCA1 complex structures are an important step forward for providing a structural framework for piecing together an ion pumping model to follow on from the previous smFRET studies. Nonetheless, two of the reviewers think that the mechanistic conclusions reached - based solely on crystal structures - require further validation. In particular, further experimental work (and likely computational) is required to i) confirm the hitherto designated crystallographic "states" and ii) to begin clarify how LMCA1 and SERCA have different Ca2+:H+ stoichiometries as there are other, plausible models.

    1. Reviewer #3:

      Bolze and colleagues describe a new database of mitochondrial variation that consists of a greater number of samples than existing databases. To overcome some of the limitations of existing databases, they use the same sequencing pipeline for all samples, do not select for any particular phenotypes, and report both heteroplasmic and homoplasmic calls. They demonstrate the utility of their database by defining intervals of invariable regions, which may indicate mutational constraint and could aid in interpreting candidate variants in disease patients. The authors also calculate the filtering allele frequency for LHON variants and suggest that the allele frequencies for many LHON variants in their database and UKB are too high for the variants to be considered pathogenic and that they should be reclassified. The main limitations of this database, as stated by the authors, are the lack of diverse haplogroups and the relatively low depth of coverage considering the variable heteroplasmy of the mitochondria. The technical aspects of the data aggregation and database are solid, and the scientific analyses are sound. I have only a few comments that would strengthen the paper.

      1) There is no discussion of how to distinguish heteroplasmy from sequencing errors. While some filtering was done akin to germline variant filtering (particularly that calls at positions with fewer than 10 reads were removed), this could still result in a ~1/11 variant being called as heteroplasmic (at 9%). The spike in Figure 3F (final panel) around 90% ARF could suggest that something like this could be happening (homoplasmic variants with sequencing errors reverting to another base). Was there a minimum heteroplasmy level used for this analysis? Perhaps showing these plots filtered to a minimum of 2, 5, etc of the same alternate allele would reveal a sensible cutoff that could then be used for the whole paper.

      2) Line 484: This is the only mention of NUMTs in the paper, but the complications that can arise from them are not detailed by the authors. Considering the mitochondrial coverage, how confident are the authors that their low heteroplasmic calls are not false positives resulting from NUMTs?

      3) Along the same lines, the authors use HaplotypeCaller, which is a standard tool for germline variation but not optimized for mitochondrial calling. Was this run in haploid or diploid mode? It would be useful to state the limitations of using this tool to call mitochondrial variants as it is designed for diploids.

      4) The suggestion that "all protein-coding genes in the mitochondrial genome were highly intolerant to LoF variants" is certainly plausible, but not definitive from the current data. While 0 LoFs are observed, how many would be expected? If these genes are small (which they must be since they are on a very small chromosome), the number of expected variants based on a mutational model (akin to [Samocha et al., 2014]) would likely be <1, and thus 0 would not necessarily be remarkable. Given that, you may not be quite powered to do this at a per-gene level, but pooling all the genes may provide enough power to make a broader statement. The same goes for the % of bases invariable analysis (Figure 5) - it would be good to make this more quantitative, perhaps comparing these proportions to autosomes, or within each other (are the tRNA and rRNA ones significantly different from the protein-coding? Would it be possible to split protein-coding by synonymous, missense, LoF?).

      5) "Indeed, we found that no haplogroup markers -- even those from haplogroups not represented in our dataset -- were mapped to these highly constrained regions" - is this not circular? Markers that delineate haplogroups are found as homoplasmic calls that were used to determine the constrained regions, so it stands to reason that these would not be found in them, no? But perhaps I'm missing something.

    2. Reviewer #2:

      The authors represent a resource of human mtDNA variants and heteroplamies from 195983 individuals, and scoring 14,324 mutations. The resource is of value. It may be possible to criticize the European ancestry- heavy data set, and the American specificity of it, but the authors fully acknowledge and disclose this in their manuscript, and make the data available to others to continue the work. Other high depth human papers are out there (Wei 2019 reference) and others, but the data is often not available due to patient confidentiality issues as in Wei 2019. Having this dataset available is of great intrinsic value.

      I only have a few comments that would require looking into the data for a few small things, or changing the writing of the manuscript.

      Comments:

      1) My biggest concern is that the authors use a read-aligning method where they take in all calls where the was at least 1 read mapping to mtDNA. The logic seems to be that they do not want to discard reads that may "mis-map" to the NuMTS, but this leads to another, potentially larger problem of potentially including NuMTS as heteroplasmic variants (See PMID: 23972387). For instance, the recent claim of paternal mtDNA transmission appears to be the result of a complex NuMT that was able to amplify in the strategies used in the original study (PMID: 32269217). More details on how the authors exclude the possibility of NuMTS incorporation are needed, especially in light of the 1+ alignment parameters used.

      2) Line 340 - 357 - regarding LHON. The problem with choosing LHON for this analysis is that it has a complicated clinical manifestation, which may not support the handling of the 14484t>C allele in the manner present. First, the 8:1 male to female ratio in becoming afflicted (with homoplasmic LHON), the fact that many people with the homoplasmic allele will not become afflicted, and the fact that it can onset late in life (after having children) all could contribute to it's allele being more representative in a random sampling of the population.

      While the authors are correct that the allele on its own may not be pathogeneic in specific haplogroup backgrounds (Howell 2003 reference), or require the co-expression with secondary "affector" mtDNA mutations (ex. PMID: 25342614 - alleles including 3397A>G, 3497C>T, 3571C-T, 3745G>A, and other "helper" mutations in MitoMap). The paper need a bit more on the 14484 conclusion due to all of these issues. Perhaps finding linkage (or lack thereof) to these helper alleles would strengthen this section sufficiently.

      3) Lines 206 - 207. How did the authors handle AGG / AGA codons? In 2010 a lab published evidence that AGA and AGG may not be true stop codons, but are simply not coded in the human mtDNA genome (PMID: 20075246). While this finding remains not universally accepted, it does explain the lack of an AGA/AGG-binding translational termination factor in the mitochondria. It is possible that the authors are in a position to comment on the behaviour of AGA or AGG codons, relevant to their section on PCG-truncating mutations.

      4) The work - especially discussing the control region, overlaps a bit more with Wei et al. 2019 than the manuscript lets on. A bit more direct openness about this overlap and similar finding should be introduced into the manuscript, within the discussion.

    3. Reviewer #1:

      Bolze et al. report their effort to sequence the mitochondrial genomes of ~200,000 individuals. The authors generated a large, unified database that can be used for the investigation of mitochondrial mutations and the prediction of pathogenic alleles. Importantly, it addresses key limitations of other currently available sources, mainly it is not biased for mitochondrial diseases, all analyses were done in the same lab and using the same bioinformatics tools, and heteroplasmic alleles are reported. The authors then use their source to draw conclusions on the nature of mitochondrial mutations, their distribution across the mt-genome, and to challenge previously annotated pathogenic mutations, specifically for LHON disease.

      For example, figure 3A, which is one of the main take home messages from the paper, does not reflect hardly any "interesting" alleles. The vast majority of the >14,000 discovered variants cannot be seen on the plot. Unfortunately, many of the plots display the same data in similar, and unnecessary formats, making the figures dense and confusing. Examples include figure 3F (mean and max ARF distribution) and figure 5A, B & C.

      Another, and more concerning issue, is the quality of heteroplasmic variants. The authors mention very briefly in the Methods section what was done to consider NUMTS - nuclear copies of mtDNA - that may be mutated and thus bias SNV calling. From their short description, it seems like NUMTS could be a source of errors. Furthermore, Figure 2E shows that the vast majority of individuals had {less than or equal to}1 heteroplasmic variation. This observation cannot be reconciled with the basis underlying current methods to infer cellular lineages based on heteroplasmy in a cellular population (PMID: 30827679).

      These issues are particularly critical when using the data to draw conclusions on the pathogenesis of mutations, which is the focus of the last part of the manuscript. When considering the effect of m.14484T>C mutation on LHON disease, the authors argue that this mutation should be reclassified as non-pathogenic as it satisfies the "Bening Strong 1" criteria. Given the above limitations, this is certainly too strong of a conclusion. Stronger evidence for this claim is required, especially since all subjects carrying this mutation are from the same haplogroup.

      Lastly, to assess the probability that m.14484T>C is indeed non-pathogenic, the authors use previously published estimates of the "maximum credible population allele frequency". Despite the abundance of papers that estimate these parameters, the authors provide only one number, with no error or range estimates, and show that the frequency of m.14484T>C is higher than expected. It is important to understand what is the certainty of this claim, and ideally to reflect it as a range around the dashed lines in Figure 6.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      Bolze and colleagues describe a new database of mitochondrial variation that consists of a greater number of samples than existing databases. To overcome some of the limitations of existing databases, they use the same sequencing pipeline for all samples, do not select for any particular phenotypes, and report both heteroplasmic and homoplasmic calls. They demonstrate the utility of their database by defining intervals of invariable regions, which may indicate mutational constraint and could aid in interpreting candidate variants in disease patients. The authors also calculate the filtering allele frequency for LHON variants and suggest that the allele frequencies for many LHON variants in their database and UKB are too high for the variants to be considered pathogenic and that they should be reclassified. The main limitations of this database, as stated by the authors, are the lack of diverse haplogroups and the relatively low depth of coverage considering the variable heteroplasmy of the mitochondria.

      While the database is indeed unique and will likely be very valuable for the community, on the whole, the computational analyses are in several places superficial, in some cases even flawed and overall not as well presented as they could be.

    1. Reviewer #3:

      General assessment:

      This manuscript examines publicly available genomes of a number of Enterobacteriaceae species, and makes statements regarding their evolution, geographical distribution and antimicrobial resistance. While repurposing existing data can add value, such analyses must be carefully done and inferences only made after assessment and consideration of the potential limitations and biases of such data. Currently, the rationale and methods for performing the analyses outlined in this manuscript are not sufficient to support the conclusions. Following critical evaluation of the metadata associated with the genomes, and more robust analyses, useful insights may be obtained.

      Numbered summary of substantive concerns:

      1) More justification for examination of these particular bacterial species is required. For example, only 59 M morganii genomes were included; given these small numbers, how big is the clinical problem, and is a global analysis really possible?

      2) There is no description of inclusion / exclusion criteria for these genomes. It is clear that most genomes derived from the United States; a full description of the selection process will provide a greater understanding of potential bias, which could affect the results and conclusions reached.

      3) A number of outbreaks are stated to have been observed, but there is no robust evidence presented to support such identification, other than presumably clustering in the phylogenetic trees. More generally, without proper evaluation of the metadata associated with the genomes, there is a large risk that any observations (regarding similarity or clustering, or higher prevalence of resistance determinants, etc) are merely due to the nature of the genome collection rather than true biological or epidemiological relatedness. A critical evaluation of the representativeness of the genome collection is required.

      4) Various qualitative statements on differences between species or clades are made, such as the relative richness of resistomes, but (in addition to the issue described in the previous point) such statements require the use of appropriate statistical tests. Definitions are required for terms such as "closely related", "comparable" resistome diversity etc.

      5) The analyses performed are currently not sufficient to underpin many of the statements made in this manuscript regarding the evolution and transmission of these bacteria. For example, the trees presented in the figures appear to be cladograms, therefore the branch lengths are meaningless. Branch lengths are important in this context. Also, the phylogeography was evaluated by mapping genome origins physically onto a map, but there are more sophisticated approaches for this (eg phylogenetic diffusion models), though such analyses may regardless be heavily biased by the nature of the genome collection.

    2. Reviewer #2:

      This manuscript presents species-by-species analysis of presence and distribution of antimicrobial resistance (AMR) genes for the less isolated Enterobacteriaceae species using the genome and meta data registered in PATRIC database. It is valuable, but most analyses are not quantitative but just descriptive, and sentences describing the results are not easy to read. The phylogenetic tree and heatmap indicating presence of AMR genes are presented for each species, but it's hard to understand what the main message is in each figure, and what are characteristics of a species compared to the others. The current manuscript will be useful as a dictionary indicating the presence of a specific AMR gene in each species for researchers in AMR.

      -Each figure should have legends to let readers understand which color indicates what at a glance. Information of geographical region should be clearly indicated in the figure, in particular when it is mentioned in the main text. Also, what do the different colors in the strain names in the tree mean?

      -The Method section is too simple and lacks sufficient explanation. For example, what is a criterion to judge presence of an antimicrobial resistance gene?

      -The list of detected AMR genes at the top should be clearly categorized using different colors and headers (e.g., "ESBL", "AmpC" etc)

      -L126: what is the "outbreak"? I cannot tell in the figure and how it was defined.

      -Examples of the not quantitative but just descriptive explanations are L135 "richer resistome" and L136 "common". Why do the authors not specifically present the number and percentage?

      In the entire text, the authors do not conduct any statistical test to judge significance of the difference they mention.

    3. Reviewer #1:

      Sekyere and Reta present a comprehensive descriptive characterization of the epidemiology, phylogeographical distribution and antibiotic resistance profiles of six species of Enterobacteriaceae. Using a total of 2377 publicly available genomes, the authors show many multidrug resistant clones that are distributed worldwide. This study potentially provides important insight into a group of clinically relevant bacteria that remain poorly characterized compared to their more well-known relatives. Below are my comments.

      Major comments:

      1) The entire study is basically a descriptive enumeration of the resistance characteristics six different bacterial species based on genome sequences, with numerous reference to "less" or "more" or synonyms of these words (a few examples are line 140 "richer resistome diversity", line 157 "lesser resistome abundance and diversity", line 163 "richly endowed", line 215 "fewer resistome diversity and abundance", line 217 "sparse", lines 218 and 221 "virtually absent", line 222 "substantial abundance", line 244 "richest abundance of resistomes"). The lack of statistical analyses to compare lineages/clusters of the same species and between species and determine significant differences among them is problematic. Throughout the text, there is no reference to specific numerical values (e.g., p values) when making these comparisons.

      2) Similar to my comment above are the references to "short (or close) evolutionary distance" (for examples, lines 131, 208, 228, 265, 432, 439). How was evolutionary distance measured - number of SNPs, phylogenetic distance, average nucleotide identity? This "closeness" or "shortness" should be explicitly stated in terms of number, for example number of SNPs.

      3) The Methods section needs more details. I have listed my specific comments on methods below.

      3 a) Lines 504-511: How many genomes were initially downloaded? Were these genomes complete or in draft stages? How were these filtered and the final 2377 genomes selected? What were the criteria for selecting the 2377 genomes - number of contigs, size of genomes, assembly quality, available metadata, etc - or did the authors use programs that check genome quality such as CheckM? Line 510 "filtered to remove poor genome sequences" How is poor defined here?

      3 b) Line 517: How were the 1000 genes used for phylogenetic reconstruction selected?

      3 c) Lines 522-525: Simply drawing the distribution of subspecies and species on a map does not constitute a phylogeographical analysis. There are many biases that can influence the geographic distribution of microbes, most notably the sampling scheme used (for example, more samples from a single country or from a specific host/environment/setting), the composition of the database being used (NCBI and PATRIC in this study) and the collection of more strains of a single species and fewer strains in other species. The current study, similar to many others, has these biases and were in fact mentioned in the Results section. How do the authors address these biases?

      3 d) Lines 526-531 Resistome analyses: The current study is basically a summary of the information from the NCBI Pathogen Detection database. The authors need to briefly describe how resistance genes were identified in the genomes from this database. Since the entire study and all figures focus on the ARGs, authors need to show the reliability and confidence on how these were identified.

      4) Results, lines 187-188: Citation for "local and international outbreaks" needed. How did the authors come up with the inference that lines 183-186 represent outbreaks? Analyses of outbreaks require information on dates of sampling, which are lacking from this dataset. Hence, to make inferences that such topologies in the tree represent outbreaks is quite a stretch. I suggest that the authors either carry out temporal analyses of their data to be able to say that there were outbreaks or remove suggestions of the occurrence of outbreaks.

      5) Discussion, lines 447- 457: I agree that both vertical and horizontal modes of evolution of resistance bacteria are important mechanisms in the spread of resistance in many pathogens and there are numerous previous studies that have reported this. However, the study did not carry out any specific analyses on HGT and vertical evolution, hence to say that "both phenomena are being observed" (lines 455-456) is misleading.

      6) Discussion or Conclusion: The authors mentioned that a limitation in their study is that the genomes they downloaded were those available only up to January 2020. I think there are a few more important limitations and caveats that need to be discussed (for example, see comment 3.c above)

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on medRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that the topic is interesting in principle, i.e. tracking antibiotic resistance globally in less well-studied but nonetheless clinically important bacterial species. However, the reviewers also had several major concerns, with the main concerns being:

      1) Overall lack of rigor in the analysis. This is due in large part to a lack of precision in the methods, e.g. differences in diversity are not statistically supported, lengths of evolutionary distance are not defined, the definition of a resistance gene is unclear, how an outbreak is defined is unclear.

      2) The paper does not address biases in sample collection. Since the data were taken from a central repository, there are many different studies included, each with their own biases. It is important to address these biases when comparing datasets from different groups and from different geographical locations.

      3) There is insufficient evidence to make claims about horizontal gene transfer.

      The individual reviews provide more details on each of these points.

    1. Reviewer #3:

      In this paper, Barnett and colleagues used network-based, data-driven analyses to characterize how the default mode network (DMN) and the Medial Temporal Network (MTN) interact with the hippocampus. First, the authors confirmed previous findings that the MTN is a distinct network from the DMN. Second, the authors identified three subnetworks of the DMN that differ from each other based on their connectivity profiles. They further investigated cross-network and intra-network dynamics during rest and also the representational similarity of patterns within these networks during a memory retrieval task. Finally, they used meta-analytic analyses to develop hypotheses about the specific cognitive functions of the MTN and DMN subnetworks.

      Major comments:

      1) One noteworthy aspect of this paper is that the networks identified by the current investigation do not map on perfectly to a previous framework outlined by the senior author (the AT-PM framework; Ranganath and Ritchey, 2012). I think that readers of this work will be very curious to hear about this update, and I think that the similarities and differences between the AT-PM framework and the current findings should be made crystal clear. For example, perhaps a schematic could be used to visually depict the similarities and differences.

      2) In addition to this suggested visualization, I think that memory scholars that are familiar with the AT-PM framework will be curious to know how these results can update the current thinking of how different brain networks organize memories and perform different types of cognitive functions. The meta-analysis partially does this, but one is left wondering about how this changes our updates the field's understanding of how specific types of memory (e.g. object versus scene memory as in Maass et al., Brain, 2019) are supported.

      3) The authors state in the methods, "This sample size is comparable to the cohort sample sizes from the seminal Power et al., study investigating functional brain organization." I think a bit more can be said about the effect sizes reported in the previous literature (which might be inflated due to publication bias), and the power to detect such effect sizes (or smaller) here.

      4) I found the results reported in the section "Regions within the same community represent similar kinds of information during a memory task" difficult to follow. Moreover, I was not sure what this analysis provides beyond the resting state analyses. This paper would be strengthened if these analyses were linked to behavioral performance on the memory retrieval task.

      5) I was surprised to see that the Anterior Hippocampus was more highly correlated (numerically) to the DMN (Supplementary Table 1) and the MP and PM sub-networks (Figure 4) compared to the MTN network. Is this difference statistically significant, and, if so, do the authors think that this difference is meaningful?

      6) Tau spreading models have been demonstrated to follow patterns of function connectivity (Franzmeier et al., Nature Comms, 2020). The authors may wish to comment on the relevance of these findings to different patterns of tau accumulation in different types of dementia.

    2. Reviewer #2:

      Overall, I thought that the topic addressed and approaches used were interesting and in particular I appreciated the motivation of relating data-driven analyses of resting state data to existing theoretical frameworks and task-based data. As described below, I believe the manuscript could be strengthened with additional comparison to past work as well as addressing a potential methodological issue.

      1) As noted by the authors, past work has used data-driven approaches on resting state data to subdivide the default mode network. The manuscript would be strengthened by highlighting the similarities/differences of the current work with such past work. In terms of revealing subnetworks, Is it believed that some aspects of the data acquisition/delineation methods employed here are preferable? MTL signal dropout was mentioned in the discussion, but was this a major motivating factor? Might there be any way of quantifying or tabulating the differences between the proposed subdivisions here and other efforts in order to help bridge the current findings to past work and to assess how and why the current results might differ?

      2) The motivation to link data-driven network clustering approaches (e.g. the MTN and DMN subnetworks found here) with more hypothesis-driven approaches (e.g. the PM/AT framework) is a key strength of the study, although the findings and conclusions drawn about the relationship were a little difficult to fully understand. For example, how functionally distinct are the MTN and the PM/AT DMN subnetworks given that the PM/AT framework highlights the distinct contributions of subregions of the MTN (e.g. PHC/PRC)? Is it thought that there is a distinction between PM/AT pathways that spans DMN and MTN but is not captured here or do the findings suggest that a better distinction in terms of understanding hippocampal-based memory in the brain is between DMN subregions and MTN? Relatedly, might it be possible that the DMN subnetworks connectivity with the hippocampus is mediated by MTL subregions? More generally, this comment is intended to probe the authors as to whether they believe that the data-driven and hypothesis-driven are reconcilable or if they are arguing that the data-driven approach is preferable.

      3) To what degree might the spatial proximity of the ROIs influence the results of the various analyses? In particular, I wonder if the analyses done using pattern similarity might be influenced by partial non-independence of adjacent ROIs. That is, adjacent ROIs might have correlated pattern similarity due to smoothing and other sources of voxelwise spatial non-independence, and so insofar as there are more nearby ROIs within networks than across networks, it might influence the observed results. Similar concerns might be applicable to the Participation analysis, but seem less obvious.

    3. Reviewer #1:

      This paper characterizes resting state functional connectivity across the brain and within memory networks, evaluates whether similar networks arise in a memory-guided decision-making task, and collects descriptions of the function of these networks in prior imaging studies. The authors find that the DMN and a Medial Temporal Network (MTN) can be differentiated, and that there are three subnetworks within the DMN that interact differently with different parts of the hippocampus and that have been ascribed different kinds of functions in prior imaging studies.

      The paper provides a systematic overview and re-examination of multiple approaches that have been used before to characterize networks across the brain and those focused on memory systems. My overall sense is the paper will be very useful to the cognitive neuroscience / memory communities but does not present a substantial theoretical advance. I am also concerned about the interpretation of the memory task connectivity data, as described below.

      Major comments:

      -It seems possible to me that the trial-by-trial RSA analyses run on the task data are picking up on basically the same signal as the functional connectivity resting state analyses. If the authors ran the RSA analyses TR by TR on the resting state data, would that pick up the same structure? Similarly, would the functional connectivity analyses on the task data explain the same variance as the RSA? Univariate signals can drive RSA effects, so careful analyses would need to be done to demonstrate that these methods are picking up on different aspects of the interactions between these regions. Relatedly, if the authors have access to a non-memory task dataset, perhaps it could be useful to show that the results are different in that case.

      -The results are displayed on surfaces, but I think (but am not sure) that all the analyses were done in the volume. Given the interest in the hippocampus and its connectivity, it would be very useful to see results displayed in the volume in addition to (or replacing) the surfaces.

      -By eye, the MP network as shown in Fig 2 looks much less coherent than the other two. It is difficult to see much cluster structure there at all. I am therefore unsure how confident to feel in the existence of this as a distinct network.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers felt that this work represents a useful contribution to the literature, relating different perspectives on the nature of interactions between brain areas and how these interactions may support memory, but that it does not offer a substantial theoretical advance beyond prior work. The reviewers also raise some methodological concerns that the authors may wish to consider.

    1. Reviewer #3:

      In this work, Yao and colleagues described transcriptome profiling of human plasma from healthy individuals by TGIRT-seq. TGIRT is a thermostable group II intron reverse transcriptase that offers improved fidelity, processivity and strand-displacement activity, as compared to standard retroviral RT, so that it can read through highly structured regions. Similar analysis was performed previously (ref. 20), but this study incorporated several improvements in library preparation including optimization of template switching condition and modified adapters to reduce primer dimer and introduce UMI. In their analysis, the authors detected a variety of structural RNA biotypes, as well as reads from protein-coding mRNAs, although the latter is in low abundance. Compared to SMART-Seq, TGIRT-seq also achieved more uniform read coverage across gene bodies. One novel aspect of this study is the peak analysis of TGIRT-seq reads, which revealed ~900 peaks over background. The authors found that these peaks frequently overlap with RBP binding sites, while others tend to have stable predicted secondary structures, which explains why these regions are protected from degradation in plasma. Overall, this study provided a robust dataset and expanded picture of RNA biotypes one can detect in human plasma. This is valuable because the findings may have implications in biomarker identification in disease contexts. On the other hand, the manuscript, in the current form, is relatively descriptive, and can be improved with a clearer message of specific knowledge that can be extracted from the data.

      Specific points:

      1) Several aspects of bioinformatics analysis can be clarified in more detail. For example, it is unclear how sequencing errors in UMI affect their de-duplication procedure. This is important for their peak analysis, so it should be explained clearly. Also, it is not described how exon junction reads (when mapped to the genome) are handled in peak calling, although the authors did perform complementary analysis by mapping reads to the reference transcriptome.

      2) Overall, the authors provided convincing data that TGIRT-seq has advantages in detecting a wide range of RNA biotypes, especially structured RNAs, compared to other protocols, but these data are more confirmatory, rather than completely new findings (e.g., compared to ref. 20).

      3) The peak analysis is more novel. The authors observed that 50% of peaks in long RNAs overlap with eCLIP peaks. However, there is no statistical analysis to show whether this overlap is significant or simply due to the pervasive distribution of eCLIP peaks. In fact, it was reported by the original authors that eCLIP peaks cover 20% of the transcriptome. Similarly, the authors found that a high proportion of remaining peaks can fold into stable secondary structures, but this claim is not backed up by statistics either.

      4) Ranking of RBPs depends on the total number of RBP binding sites detected by eCLIP, which is determined by CLIP library complexity and sequencing depth. This issue should be at least discussed.

      5) Enrichment of RBP binding sites and structured RNA in TGIRT-seq data is certainly consistent with one's expectation. However, the paper can be greatly improved if the authors can make a clearer case of what is new that can be learned, as compared to eCLIP data or other related techniques that purify and sequence RNA fragments crosslinked to proteins. What is the additional, independent evidence to show the predicted secondary structures are real?

      6) The authors should probably discuss how alignment errors can potentially affect detection of repetitive regions.

      7) Many figures are IGV screenshots, which can be difficult to follow. Some of them can probably be summarized to deliver the message better.

    2. Reviewer #2:

      Yao et al used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) to study apheresis plasma samples. The first interesting discovery is that they had identified a number of mRNA reads with putative binding sites of RNA-binding proteins. A second interesting discovery from this work is the detection of full-length excised intron RNAs.

      I have the following comments:

      1) One doubt that I have is how representative is apheresis plasma when compared with plasma that one obtains through routine centrifugation of blood. The authors have reported the comparison of apheresis plasma versus a single male plasma in a previous publication. I think that to address this important question, a much increased number of samples would be necessary.

      2) For the important conclusion of the presence of binding sites of RNA-binding proteins in a proportion of apheresis plasma mRNA molecules, the authors need to explore whether there is any systemic difference in terms of mapping quality (i.e. mapping quality scores in alignment results) between RBP binding sites and non-RBP binding sites, so that any artifacts of peaks caused by the alignment issues occurring in RNA-seq analysis could be revealed and solved subsequently. Furthermore, it would be prudent to perform immunoprecipitation experiments to confirm this conclusion in at least a proportion of the mRNA.

      3) In Fig. 2D, one can observe that there are clearly more RNA reads in TGIRT-seq located in the 1st exon of ACTB, compared with SMART-seq. Is there any explanation? Will this signal be called as a peak (a potential RBP binding site) in the peak calling analysis (MACS2)? Is ACTB supposed to be bound by a certain RBP?

      4) For Fig 2A, it would be informative for the comparison of RNA yield and RNA size profile among different protocols if the author also added the results of TGIRT-seq.

      5) As shown in Figure 4 C (the track of RBP binding sites), it seems quite pervasive in some gene regions. How many RBP binding sites from public eCLIP-seq results are used for overlapping peaks present in TGIRT-seq of plasma RNA? What percentage of plasma RNA reads have fallen within RBP binding sites? Are those peaks present in TGRIT-seq significantly enriched in RBPs binding regions?

      6) Since there is a considerable portion of TGIRT-seq reads related to simple repeat, one possible reason is likely the high abundance of endogenous repeat-related RNA species in plasma. Nonetheless, have authors studied whether the ligation steps in TGIRT-seq have any biases (e.g. GC content) when analyzing human reference RNAs and spike ins (page 4, paragraph 2)?

      7) As described in Figure 2 legend, there are 0.25 million deduplicated reads for TGIRT-seq reads assigned to protein-coding genes transcripts which are far less than 2.18 million reads for SMART-seq. The authors need to discuss whether the current protocol of TGIRT-seq would cause potential dropouts in mRNA analysis, compared with SMART-seq?

      8) While scientific thought-provoking, the practical implication of the current work is still unclear. The authors have suggested that their work might have applications for biomarker development. Is it possible to provide one experimental example in the manuscript?

    3. Reviewer #1:

      The Lambowitz group has developed thermostable group II intron reverse transcriptases (TGIRTs) that strand switch and also have trans-lesion activity to provide a much wider view of RNA species analyzed by massively parallel RNA sequencing. In this manuscript they use several improvements to their methodology to identify RNA biotypes in human plasma pooled from several healthy individuals. Additionally, they implicate binding by proteins (RBPs) and nuclease-resistant structures to explain a fraction of the RNAs observed in plasma. Generally I find the study fascinating and argue that the collection of plasma RNAs described is an important tool for those interested in extracellular RNAs. I think the possibility that RNPs are protecting RNA fragments in circulation is exciting and fits with elegant studies of insects and plants where RNAs are protected by this mechanism and are transmitted between species.

      I have one major comment for the authors to consider. In my view the use of pooled plasma samples prevented the important opportunity to provide a glimpse on human variation in plasma RNA biotypes. This significantly limits the use of this information to begin addressing RNA biotypes as biomarkers. While I realize that data from multiple individuals represents a significant undertaking and may be beyond the scope of this manuscript, I urge the authors to do two things: (1) downplay the significance of the current study on the development of biomarkers in the current manuscript (e.g., in the abstract and discussion - e.g., "The ability of TGIRT-seq to simultaneously profile a wide variety of RNA biotypes in human plasma, including structured RNAs that are intractable to retroviral RTs, may be advantageous for identifying optimal combinations of coding and non-coding RNA biomarkers for human diseases."). (2) Carry out an analysis in multiple individuals - including racially diverse individuals - very important information will come of this - similar to C. Burge's important study in Nature ~2008 where it was clear that there is important individual variation in alternative splicing decisions - very likely genetically determined. This second suggestion could be added here or constitute a future manuscript.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy Nilsen (Case Western Reserve University) served as the Reviewing Editor.

    1. Reviewer #3:

      In this manuscript Brown et al characterized fatty acid taste discrimination in Drosophila melanogaster. Fat taste is relatively poorly understood, but has critical implications for feeding and obesity research; thus, studies that advance our understanding of the molecular and physiological underpinning of this modality are important. The finding that Ir56d neurons enable organisms to discriminate between short, medium and long chain fatty acids but not to differentiate between types of medium chain fatty acids is certainly novel and interesting. It is also surprising but fascinating that this receptor is only required for the detection of medium fatty acids. The manuscript is well written and the figures presented in a clear and thoughtful manner. These findings lay out ground for future exciting work to investigate how sweet taste and fatty acid taste perception are selectively modulated by the brain since these gustatory neurons overlap and whether such discrimination is altered depending on the state of hunger.

      Strengths:

      1) Despite the overlapping nature of taste neurons in this case, i.e., Ir56d neurons being co-expressed with Gr64f - those that broadly label the sweet GRNs and the fact that Ir56d neurons are responsive to both sucrose and fatty acids; mutation in Ir56d results in loss of taste for hexanoic acid, but not sucrose. Authors use this taste discrimination to their advantage in combination with a robust aversive taste memory assay to address the question of differential fatty acid taste perception.

      2) Authors rule out the potential involvement of olfaction in modulating taste perception.

      3) Use of CRISPR-Cas9 to generate Ir56dGAL4 flies, implying accurate and targeted genome editing, provide validation to the results obtained when Ir56d expressing neurons are silenced. Additionally, use of the fly gustatory system for in-vivo Ca2+ imaging strengthens and corroborates the results at the physiological level, especially the rescue experiments.

      Overall (minor) comments and questions:

      1) Are the differences in taste discrimination between male and female flies?

      2) Individual data points should be shown whenever possible for all figures (except PER because that would make it impossible to interpret).

      3) Can the authors discuss how discriminating between different fatty acids types may be adaptive? Are they found in different food sources, some of which are "good" and some "bad"? Is there evidence from other organisms about this type of molecular discrimination in fatty acid taste?

    2. Reviewer #2:

      In the present paper Brown et al., study the ability of Drosophila melanogaster to discriminate between Fatty Acids (FAs) of different lengths. Using a combination of behavioral experiments, molecular biology and in vivo calcium imaging, the authors show that a subset of Ir56d expressing neurons are able to differentiate FAs. However, the Ir56d receptor is only necessary for the detection of medium-length FAs but not short- or long-. The paper explores in detail the role of the Ir56d receptor as FA detector, a role previously described by the authors in a previous paper Tauber et al 2017.

      Major concerns:

      I consider that the experiments are properly done, and so the statistical analysis, however gain in knowledge is very limited. So far, the authors can prove that flies can discriminate FAs of different lengths, being Ir56d the receptor detecting medium-length FAs, a result that expands the knowledge gained in Tauber et al 2017. In figure 3, the authors show that silencing Ir56d neurons using tetanus toxin expression, reduces dramatically PER to medium-length fatty acids, but not to short or long, pointing to a different set of neurons involved in their detection. However, the in vivo calcium imaging experiments show that Ir56d neurons also respond to short- and long- FAs. In this regard, I disagree with the statement at the abstract: Characterization of hexanoic acid-sensitive Ionotropic receptor 56d (Ir56d) neurons reveals broad responsive to short-, medium-, and long- chain fatty acids, suggesting selectivity is unlikely to occur through activation of distinct sensory neuron populations. In fact, I consider that selectivity would come from the activation of different subsets of gustatory neurons. It seems that Ir56d neurons could be a subset of the neurons that generally respond to FAs, providing the specificity for medium-length FAs. Other neurons, in addition to the Ir56d ones, might be responding to short- and long- FAs in an Ir56d independent manner.

      I consider the authors should explore in deep how short- and long- FAs are actually detected, whether it depends on other Ionotropic Receptors (probably Ir25a and Ir76b might be involved (Ahn et al. 2017)) and which subset of gustatory neurons are actually responding to these compounds, considering they do not require Ir56d nor Ir56d neurons.

    3. Reviewer #1:

      This paper investigates fatty acid taste in flies and asks the broad question of whether flies can discriminate different compounds within a single taste modality. The authors' main finding is that flies can discriminate between long, medium, and short chain fatty acids using a previously established aversive memory taste paradigm. When they delve into the cellular and molecular basis of fatty acid detection they find that IR56d neurons respond to all three classes of fatty acids, but are required only for the behavioural responses to medium chain molecules. Similarly, CRISPR/Cas9 deletion of the IR56d receptor reveals that it too is required only for medium-chain fatty acid responses. Thus, different fatty acid classes presumably activate distinct, but partially overlapping subsets of appetitive taste neurons. In general I think the paper is potentially interesting (see comment 1 below) and the data mostly supports the conclusions. However, there is some lack of attention to details that make some of the data hard to interpret (see minor comments).

      Concerns:

      1) The ability of flies to discriminate between different fatty acid classes is presented as the interesting finding, since, as the authors point out, discrimination between compounds within a taste modality is generally not thought to occur. On the surface I agree that this is interesting. However, in the authors' set up of the main question (line 101), they raise an important issue: "Is it possible that flies are capable of differentiating between tastants of the same modality, or is discrimination within a modality exclusively dependent on concentration?" This should be rephrased to replace "concentration" with "intensity" since not all tastants at the same concentration have the same intensity, and from a behavioural perspective it is intensity that matters. Given that, the authors don't do anything to demonstrate that their discrimination task does not depend on intensity, aside from the fact that 1% solutions of all the FA seem to give similar PER. They need to show more explicitly that this task is truly showing identity-based discrimination.

      2) The second broad concern I have is over the nature of short and long chain fatty acid detection. Interpreting the discrimination results would be greatly aided if we knew what other neurons mediate the PER to these molecules. Is it the non-IR56d population of Gr64f neurons?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers find fatty acid taste discrimination potentially interesting and agree that the experiments are performed to a high standard. One major concern is whether discrimination is based on intensity rather than quality. A second limitation is that the mechanism of FA detection is not greatly advanced beyond the authors' previous work: the cellular mechanisms for long and short chain FA detection remain unclear. The reviewers agreed that if the major concerns of Reviewer 1 were addressed, this manuscript would provide a broader understanding of fatty acid discrimination.

    1. Reviewer #2:

      In this paper the authors use a genomics approach to tackle the question of how the combined transcriptional response to two signals compares to the responses to the two treatments individually. They treat MCF-7 cells with TGF-beta and retinoic acid, and find that the combined response at the level of gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) may encompass additivity, multiplicativity but also a wide range of other intermediate or more extreme behaviours.

      The work is conceptually very interesting, and the manuscript text and figures were extremely clear and a pleasure to read. We suggest that the following major points be addressed to clarify the assumptions and limitations of the study.

      The authors treat the cells for 72h. This is a very long time where secondary effects may be dominating the results. The choice of this time point should, at the very least, be justified and discussed. For example, previous studies that quantitatively characterized distinct temporal dynamics in SMAD signaling after TGF-beta treatment showed a transient, dose dependent SMAD response in the first 4 h after TGF-beta treatment, with a strong early peak in the nuclear/cytoplasmic ratio of SMAD2/4 (Clarke & Liu, 2008; Schmierer et al, 2008; Zi et al, 2011; Zi et al, 2012; Strasen et al., 2018). In addition, TGF-b signaling has been suggested to depend on cell density and cell cycle stage (Zieba et al, 2012), which may also affect the results. Also it would be helpful to have a quantitative measure of the corresponding nuclear TF levels at the selected time-point after 72h (e.g for main affected TFs such as pSMAD2 and RARA levels).

      MCF7 cells were treated with three different doses of TGF-beta (1.25, 5, and 10 ng/mL) and RA (50, 200 and 400 nM). As it seems that the selected doses are higher than what has been used in previous studies, the authors should comment on their choice. The authors state that "We defined a master set of 1,398 upregulated genes by selecting the set of genes that were differentially expressed in any dose of the combination treatment (log FC {greater than or equal to} 0.5 and padj {less than or equal to} 0.05) and that had increased expression in each dose of each individual signal." It is unclear how this gene set relates to the top-right Venn diagram in Fig 1B, in which only 303 genes are shown as being upregulated in all three treatments and the total according to the numbers in the diagram are >1398.

      Fig 1B shows that a large proportion of genes were differentially expressed in response to both signals, but not to either of the signals individually. Their responses are presumably more non-additive than the responses of genes upregulated in response to all three treatments. Restricting analysis to the latter group therefore introduces a bias for certain modes of combinatorial regulation. The justification for this choice should be discussed.

      The authors suggest a bimodal distribution for the observed c values, with peaks at 0 and 1. The authors write that "Our simulated c value distributions bear a moderate resemblance to our observed c value distributions". This conclusion is central to the paper's claim that "Gene regulation gravitates towards either addition or multiplication when combining the effects of two signals" (title) and that "the combined responses exhibited a range of behaviors, but clearly favored both additive and multiplicative combined transcriptional responses" (abstract). However, the additional peak at c=1 is not obvious from the data in Fig. 1E. Stronger evidence (i.e. statistical analysis of the observed distributions) would be needed to demonstrate overrepresentation of c values ~1. Alternatively, the title and abstract could be revised to better reflect the strength of the findings.

      The authors frame the work on the basis of simple models of gene regulation by pairs of transcription factors that predict either addition or multiplication. However, they are activating two signalling pathways that could interact also at the level of signal transduction (and need not be directly regulating the genes in question, as noted in point 1). How justifiable is it to make inferences about the nature of combinatorial transcriptional regulation from this kind of experimental set up? These issues should be made more clear from the beginning, and should be taken into account when interpreting the data.

      Related to the point above, the authors use chromatin accessibility as a proxy for TF binding. However, this does not need to be the case, especially if the accessibility data are considered quantitatively. For example, TFs may bind and recruit remodeling factors that affect accessibility differentially across the genome, obscuring the relationship between TF binding and accessibility. This is especially pertinent at longer time scales after perturbation. We suggest presenting the data on accessibility as just that, instead of presenting it as data that directly reports on TF binding. The relationship to TF binding can and should still be explored in the analyses, but with clarification for how accessibility data is limited in this case.

      The following are instances where accessibility data is described as directly reporting on TF binding that we recommend revising (the list is not exhaustive):

      -the title of section two

      -Fig.2E

      -the link between models of TF control and the relationship between peaks and expression, such as the reference to the thermodynamic model at the end of section 3

      -remove the implicit assumption between cooperativity of TF binding and super-additive peaks in section 3 and section 4. This may help explain more naturally the lack of dual-motif finding in section 4

    2. Reviewer #1:

      Cells perform many types of computations to respond to external signals at the transcriptional regulatory level. Often, regulatory sequences read out the concentration of input transcription factors and combine that information to dictate the level of transcriptional output. Yet, for most genes, the quantitative rules for how regulatory regions integrate multiple inputs remain unclear.

      Sanford et al. studied how two signals are interpreted by downstream genes using quantitative tools such as RNA-seq and ATAC-seq. The authors propose two phenomenological models to understand combinational regulation. Specifically, a model in which output gene expression in the presence of two different input signals is the sum of the gene activity in the presence of each signal alone (additive), and an alternate model where the output of the two signals is the product of the output driven by each individual signal (multiplicative).

      The authors performed a genome-wide analysis of thousands of genes and found that most genes responding to either TGF-β or retinoic acid behave in either an additive or multiplicative fashion. The authors further asked whether these additive/multiplicative behaviors can be explained by the accessibility of DNA regulatory regions reported by ATAC-seq. The result reveals that DNA accessibility is mostly additive. However, they also find that multiplicative gene expression is correlated with super-additive accessibility.

      This work provides a platform to quantitatively assess combinatorial transcriptional regulation both at the level of DNA accessibility and transcriptional output. Indeed, one of the exciting aspects of the work is the attempt to use the quantitative values of DNA accessibility reported by ATAC-seq to constrain possible biophysical models of transcriptional regulation. We foresee that this work will set the stage for a better understanding of the molecular relation between transcription factor binding and the gene activity resulting from this binding, in general, and for dissecting the molecular mechanisms of combinatorial regulation, in particular.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In this work the authors used a genomic approach to investigate the way cells interpret two combined signals versus two individual signals. The authors used RNA-seq to examine the gene expression outputs from thousands of genes in response to two signal inputs, TGF-b and retinoic acid, either individually or in combination. The authors found that when stimulated with both signals, most cells exhibited additive or multiplicative responses. The authors further used paired chromatin accessibility by ATAC-seq to relate such responses to putative transcription factory binding patterns in these genes. Surprisingly, ATAC-seq revealed that most genes prefer addition to combine two signals as chromatin accessibility is largely additive, although some super-additive accessibility may respond to multiplicative gene expression.

      This work provides a platform to quantitatively assess combinatorial transcription regulation both at the level of DNA accessibility and transcriptional output. Although the concept of additive v.s. multiplicative transcriptional response is phenomenological, it may be used to clarify and constrain certain biophysical models of transcriptional regulation and set the stage for a better understanding of the molecular relation between combinatorial transcription factor binding and corresponding gene activity.

      While the work is written in a clear and concise language, there are places that require further clarification and better presentations.

    1. Reviewer #2:

      The authors investigated the joint influences of visual evidence strength and action (un)certainty on the formation of perceptual decisions, and used MEEG to track the associated cascade of visual-motor processing using a relatively complex set of analyses. This manuscript addresses a general question that has already attracted (but also continues to attract) considerable interest. One of the main advances of this specific work (in addition to the advanced MEEG analyses) is the explicit manipulation of action certainty in addition to evidence strength. My enthusiasm for this work, however, remains somewhat limited in light of the following aspects.

      1) The article is set-up from a perspective of adjudicating between strictly "serial models" of perceptual decisions in which decisions are reached about what is viewed before turning to the appropriate action, versus more "continuous models" in which potential action plans are formed while evidence accumulation is still taking place. Is there not already ample evidence for the latter scenario (e.g., the work of Tobias Donner, Floris de Lange, Ian Could, and others)? Moreover, the authors currently provide only a single reference for the serial model, which dates back to 1966. Thus, the temporal overlap between visual evidence accumulation and action planning is, in itself, not very surprising, nor new; and yet it appears a central component of the article's pitch.

      2) While the manipulation of action (un)certainly provides an interesting extension of the popular random-dot-motion task, the nature and rationale of this manipulation remain insufficiently unclear. Do participants view multiple patches of equal coherent motion and arbitrarily decide which to respond to? If so, does this not confound action uncertainty with evidence (i.e., more patches with motion may give more evidence)? And should this not make participants faster, rather than slower? Are they slower simply because they are asked to make a "fresh" response? At a minimum the authors should more clearly explain this manipulation, starting in the Results section. In this, the authors should clarify exactly how visual signals and action certainly are independent in their design, or (as I suspect) acknowledge that the current manipulation confounds action certainty with the availability, collective strength, and/or spatial region of the visual evidence (which may each in turn affect neural signals throughout the brain).

      3) It would help to first show the (basic) effects of sensory and action certainty on time-frequency activity in several brain areas (at least visual and motor), for example by showing power modulations for each of the certainty levels, together with a contrast plot of high vs low certainty. This would help understand the data, before turning to the more complex analyses. Such a plot may reveal, for example, decreased alpha activity in posterior sites with higher action uncertainty, simply as a result of more visual stimulation. If so, this may be problematic for the more complex analyses of transfer entropy. It could also help justify the current focus on beta and gamma (but not, for example, alpha) and to help understand the distinction between modulations in beta and gamma.

      4) I am surprised the authors find a gamma decrease rather than an increase. Does gamma not usually increase with motor preparation (e.g., Donner et al. Current Biology 2009) and visual attention (e.g., Fries et al., Science, 2001; Siegel et al., Neuron, 2008)?

      5) Given that both certainty manipulations affected RT, are all neural correlates of these certainty manipulations not "confounded" with differences in RT?

      6) Do the two uncertainty factors (sensory and action certainty) interact? This information appears missing from the analysis of the behavioural data. Also, if these two factors interact, it would be sensible to also explore this in the modelling and MEEG analyses.

    2. Reviewer #1:

      This study uses combined EEG/MEG to characterise the neural dynamics of the visuomotor decision process by separately manipulating its perceptual- and action-related components. Subjects monitored 4 simultaneous random dot stimuli to detect changes from incoherent to coherent motion, and indicated detection with a finger press. Perceptual and action uncertainty were manipulated by varying the motion coherence of the stimuli, and number of motor response options (1 vs. 3), respectively.

      Authors identify activity in the beta and gamma bands correlating with decision-related trajectories predicted by an accumulation-to-bound model. They reveal distributed networks in both frequency bands that show a negative relationship with the predicted patterns (i.e., desynchronization after onset of coherent motion). Several interesting findings stand out: 1) beta activity follows a gradual progression from posterior to anterior regions, a finding further supported by a connectivity analysis assessing the direction of information flow. 2) The accumulating signals across the identified regions overlap in time, which is taken as evidence for a continuous flow of information along the visual-to-motor pathway. 3) regions where (beta) activity flow is modulated by perceptual (as opposed to action) uncertainty show earlier responses to perceptual evidence, and are more likely to drive the information flow to downstream areas.

      This is overall a well-written, clearly structured paper on an ever-relevant topic. Authors use elegant, rigorous statistical methodology, and their characterisation of beta activity provides some important insight into the global neural dynamics of decision making, in particular the temporal properties of decision-related signals across the perception-to-action processing pipeline. I do however have a couple of points of concern regarding parts of the results (in particular those involving gamma activity) and their interpretation:

      1) Gamma band activity is seen to exhibit a negative relationship with the predicted accumulating signal, with a gradual desynchronisation upon the onset of perceptual evidence (coherent motion). I found this surprising, as several previous studies looking at decision-related activity have shown increases in gamma activity with perceptual evidence (Polania et al. 2014 Neuron, Donner et al 2009 Curr. Biol., Wilming et al. 2020 biorxiv). Is it possible that with the broad gamma range investigated here (31-90Hz) and the spectral smoothing involved, the negative relationship might be at least partly driven by activity in the lower ranges, i.e., qualitatively closer to task/motor-related beta desynchronisation? It would be interesting to see if the significant negative correlation is maintained with a slightly narrower gamma range (e.g., >35Hz or >40Hz). Either way, I think it's important for these results to be discussed in relation to the literature mentioned above.

      2) Regarding the interpretation of the beta-gamma relationship, authors seem to place the results in the context of feedforward/feedback information dynamics (or at least they make several references to the literature throughout the manuscript). I am not sure if I understand or agree with this interpretation - if anything, doesn't the temporal progression of decision-related information for gamma and beta observed here (e.g., Fig. 5b) go against the current understanding of their roles in feedforward and feedback information flow, respectively? Some clarification on this point would be very useful.

      3) While the timing of beta/gamma decision-related accumulation is summarised in Figs. 4/5, I think it would be informative to also include (either in the main figures or as supplement) the actual trial-averaged traces, highlighting the overall timing differences between activity in the two bands (from Fig. 4), as well as the progression across the anterior-posterior axis (shown in Fig. 5).

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      While we found the topic very relevant - especially the role of large-scale beta dynamics in visuomotor processing - and the approach used interesting, our overall enthusiasm was limited by concerns regarding novelty, design and interpretation. Critically, it remains unclear whether we are dealing with narrowband oscillations here, especially regarding the reported gamma band results, but also in terms of separating different oscillatory contributions in the alpha/beta frequency ranges. Since everything that follows hinges on this assertion, one would have to first establish a separation of these different spectral contributions in order to attribute particular dynamics to particular bands.

    1. Reviewer #3:

      This report examines the mechanisms by which the KSHV KaposinB (KapB) protein causes disassembly of processing bodies (PBs) in HUVECs. Convincing data is presented showing that mDia1 and ROCK, factors downstream of RhoA, are necessary for PB disassembly in HUVEC cells. Data suggesting cofilin enhances KapB PB disassembly is less convincing. Over-expression of actinin-1 or directly activating actomyosin contraction favored PB disassembly, implicating mechano-responsive signaling components. Analysis of YAP, a mechano-responsive transcription factor showed that levels were elevated in cells expressing KapB and its knockdown rescued PB formation in KapB expressing cells. Expression of constitutively active YAP promoted PB disassembly, similar to KapB, although it did not reproduce the stabilization of ARE-containing mRNAs seen in KapB-expressing cells. Interestingly, subjecting cells to shear stress or increasing the stiffness of the matrix on which they grow, both thought to activate YAP, recapitulated the PB disassembly phenotype seen in cells expressing KapB and knockdown of YAP abolished this.

      These are interesting and exciting results that further illuminate the mechanisms by which a viral protein perturbs PB function. Perhaps even more exciting is the finding that mechano-sensitive signaling pathways can influence PB formation (and perhaps) function. The data are of high quality and support the major conclusions of the study. However, a couple items could be addressed that raised questions with me. First, there is some question as to whether or not the impact shown is a general effect on PBs as a whole or just on the HEDLS marker that is used exclusively in the study. Showing that another PB marker (or two) behaves similarly would support this conclusion. Perhaps doing this for a few key conditions- such as the shear-stress and expression of constitutively active YAP would be possible. The authors conclude, based on a TEAD-Luc reporter assay, that YAP transcriptional activity is not induced even though it appears to be up significantly compared to controls (Fig S5A, left panel). Could they elaborate on how they arrived at this conclusion? The argument that levels of phospho-YAP are not increased in KapB-expressing cells is not supported by the data. While the ratio may not be different, the total amount of phospho-YAP is clearly elevated, as are total YAP levels. Throughout the manuscript, can the authors comment on the impact of knockdowns on cell viability, morphology, if any?

    2. Reviewer #2:

      The authors demonstrate that disappearance of P-bodies from cells expressing a KSHV protein, KapB, requires factors regulating actin contractility, mechanosensation and YAP - but does not require the transcriptional regulatory activity of YAP. The function of P-bodies has long been contentious, and the endogenous mechanisms regulating P-body assembly vs. disassembly are still being elucidated. Many studies of P-body dynamics have relied on treatment with sodium arsenite, global translational inhibition, etc. This study therefore has the potential to add significantly to our understanding of P-body disassembly mechanisms and improve our understanding of the role of these ribonucleoprotein granules in cells. Several points of data presentation and interpretation may benefit from clarification.

      1) The introduction and discussion present P-bodies as sites of decay of ARE-containing mRNAs, a long-accepted model of P-body function. However, building on well-established observations from the Izaurralde lab that RNA decay is uncoupled from P-body formation, recent work by Parker, Singer, and Chao utilizing single-molecule imaging of 5' end decay provided clear support for cytosolic localization of RNA decay events, with no decay occurring inside P-bodies, strongly supporting a storage/translational repression role for P-bodies rather than a role in decay. The authors then attempt to provide a complex explanation of the observation that constitutively active YAP decouples P-body disassembly from ARE mRNA stability, rather than considering this result in the context of alternative P-body models.

      2) It is unclear why, in Fig. 1B (middle panel), there is a large, statistically significant increase in P-bodies per cell in vector-expressing cells - which do not express KapB - treated with shDia1-1 over shNT - but not with shDia1-2. Is this due to the more efficient silencing of mDia1 expression by shDia1-1, and does mDia1 have a KapB-independent effect on P-bodies? Or does this suggest off-target shRNA effects?

      3) It appears throughout the manuscript that there is always far more dispersion in P-body numbers in experimental (either shRNA or inhibitor-treated) cells than in control cells, though this may be an artefact of the fold-change calculation in which the authors normalize control cells to 1.0 and present no estimate of variance. Especially for experiments in which p values are close to the cutoff for significance, meaningful analysis of variance in all measurements is important and presentation of the raw data pre-normalization may be helpful.

      4) In Figure 4A, are the KapB expressing cells larger than the vector-expressing cells, or is a higher magnification used? The nuclei appear nearly double in diameter. In the immunofluorescence experiment, no other control marker is imaged to support the assertion that YAP signal is selectively increased by KapB expression. No image quantitation is performed to support the assertion that "nuclear:cytoplasmic YAP was not markedly increased". Quantitation across multiple fields of view (and discussion of how many cells were utilized in the image analysis) rather than presentation of a single image would address these concerns. The authors' observation that the fraction of phosphorylated YAP, as measured by Western blotting in Fig. 4B, decreases in KapB expressing cells appears incongruent with the stated lack of change in cytoplasmic:nuclear YAP in KapB vs. vector expressing cells (Fig. 4A).

      5) While I appreciate that the authors have utilized the luciferase assay in multiple studies, direct measurement of the luciferase reporter mRNA stabilities should be performed to differentiate between changes in stability of the ARE mRNA vs. selective translational repression of the ARE mRNA in this specific experimental context.

      6) "Comparison of the transcriptomic data from HUVECs subjected to shear stress from Vozzi et al (2018) (Accession: GEO, GSE45225) to entries in the ARE-mRNA database (Bakheet, Hitti, and Khabar 2017) showed a 20% enrichment in the proportion of genes that contained AREs in those transcripts that were upregulated by shear stress." This comparison (1) lacks any measure that this enrichment is significant, and (2) relies on a single steady-state microarray measurement, and therefore does not accurately report on RNA decay rates/permit conclusions about RNA dynamics.

      7) It is impossible for the reviewer to assess "unpublished data" on autophagy cited in the discussion.

    3. Reviewer #1:

      In this manuscript the authors show that the oncogenic transcription factor YAP is an important factor in the signaling pathway from the Kaposi Sarcoma virus protein KapB via the host cell GTPase RhoA down to the disassembly of processing bodies (PBs). This is in principle an interesting finding. However, the connection between KapB and PB-disruption, between YAP and the Rho pathway, Kaposi KapB and the Rho pathway, as well as the connection between Kaposi virus infection and YAP (and Rho) have been described before. Therefore, this connection alone does not come as a surprise. New mechanistic insight into how exactly YAP contributes in PB disruption is unfortunately missing.

      1) A bit contradictory is that the last author in 2015 was first authoring a paper in which they did not receive a significant PB-rescue with ROCK inhibitor, leading them to the conclusion that contractility and PB-disruption are independent events downstream of RhoA activity. In the current manuscript they now revise this and convince the reader that PB disruption involves contractility (which is also more in line with earlier work (Takahashi et al., 2011)).

      2) The fact that contractility leads to YAP activation is known, but the authors now convincingly show that this does not happen in parallel, but that PB disruption depends on YAP activation. Therefore, the most interesting aspect is that RNAi-mediated removal of YAP leads to suppression of P-body disruption. This finding places YAP as an essential intermediate between contractility and PB-disruption. This reviewer really likes this finding but requests that the authors follow this path a little further and add to the mechanism.

      i) Is it based on a protein-DNA interaction of YAP, i.e. does YAP need to act as transcription factor to induce PB dissolution? And what transcripts would then be induced and be required for PB disruption or dispersal? Could it be something like DICER RISC (Chaulk et al., 2014)? The authors delineate that this first option is less likely to them but no experimental proof is provided.

      ii) The effect of YAP on PBs might be based on a protein-RNA interaction or

      iii) It might depend on a protein-protein interaction between YAP and an unidentified partner?

      iv) Finally, one could ask if PB dispersal is connected to an induction of autophagy?

    1. Author Response

      Summary:

      A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

      We would like to thank eLife for this Preprint Review service.

      In this manuscript, we present for the first time a model of DNA rereplication, which permits us to analyse how the process evolves at the single-cell level, across a complete genome, over time. This analysis revealed a pronounced heterogeneity at the single cell level, resulting in increased copies of different genomic loci in different cells, and highlighted rereplication as a powerful mechanism for genome plasticity within an evolving population. We would like to thank the reviewers for their critical appraisal of our work and the editor for his summary of the reviews. The points raised were overall easy to address, and we have done so in a revised version of the manuscript, where we have also clarified points which were unclear to the reviewers. Importantly, we have clarified that: there are currently no available methods for studying rereplication dynamics experimentally at the single cell level across the genome, and it is exactly this analysis that our manuscript offers; model assumptions were either standard and previously validated experimentally for DNA replication or subjected to sensitivity analysis with key findings shown to be robust to model assumptions; there was no arbitrary cut-off point in the rereplication process, which was analysed over time - an advantage of our approach. Data were depicted early in the process (2C) and late in the process (16C) but findings were robust across the process; fission yeast cells can be experimentally induced to rereplicate to different extents (from 2C to 16C or even 32C) and our model permits us to capture the process as it evolves at any ploidy; correlations between experimental and simulated data were highly significant and robust to model assumptions.

      We would like to thank the reviewers for their comments, which we believe have helped us improve our manuscript and clarify points of possible misunderstanding. A point-by-point response follows.

      Reviewer #1:

      The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

      The manuscript has been modified to further clarify the underlying questions and model assumptions. We would like to point out that the model was presented in detail in the supplementary material of the original manuscript, which included all model assumptions. In addition, model parameters used for the base-case model were systematically varied, the outcome was presented in a separate paragraph (“Sensitivity Analysis” in Results), and findings were shown to be robust to model assumptions. These points are presented in detail below.

      1) It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

      With this work our goal is to elucidate the fundamental mechanisms and properties underlying DNA re-replication. Specifically, we aim to investigate how re-replication evolves over time along the genome, and how it may lead to different number of copies of different loci at the single-cell level and result in genetic heterogeneity within a population. Given the large number of origins along the genome and the stochasticity of origin firing (Demczuk et al., 2012; Kaykov and Nurse, 2015; Patel et al., 2006), it is unclear how re-replication would evolve along the genome in each individual cell in a re-replicating population and how local properties and genome-wide effects would shape its progression and the resulting increases in the number of copies of specific loci. As no experimental method exists that can analyze DNA re-replication at the single-cell level over time along the genome, we designed a mathematical model that is able to track the firing and refiring of origins and the evolution of the resulting forks along a complete genome over time, and in this way capture the complex stochastic hybrid dynamics of DNA re-replication. Since existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007), we believe that our in silico model, which is the first modeling framework of DNA re-replication, is an important contribution in the field.

      In the revised version of our manuscript, we have modified the introduction to explain these points in more detail.

      2) One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

      Modeling choices and assumptions are presented in detail in the Supplementary material of the manuscript, and were made to accurately capture the dynamics of origin firing, which is known to be stochastic, as established by many studies in fission yeast (Bechhoefer and Rhind, 2012; Patel et al., 2006; Rhind et al., 2010) and the continuous movement of forks along the DNA. Specifically, the choice of the exponential distribution used for assigning a firing time to each origin has already been discussed and validated in our previous work on normal DNA replication (Lygeros et al., 2008). Indeed, as shown in Figure 2 of (Lygeros et al., 2008), our model was able to accurately reconstruct experimental data derived by single molecule DNA combing experiments (Patel et al., 2006).

      The use of the exponential distribution for transition firing times is standard in stochastic processes in general, including what are known as Piecewise Deterministic Markov Processes (PDMP), the class where the models considered in the paper belong. There are good mathematical reasons for this, for example the "memoryless" property that makes the resulting stochastic process Markov, a basic requirement for the model to be well-posed [M. H. A. Davis, "Markov models and optimization", Monographs on Statistics and Applied Probability, vol. 49, Chapman & Hall, London, 1993]. Practically, assuming an exponential distribution can be quite general, because the rate (the probability with which a transition "fires" per unit time) is allowed to depend on the state of the system, both the discrete state (in our case, the state of individual origins) and the continuous state (in our case, the progress of individual replication forks). It can be shown that one can exploit this dependence to write seemingly more general processes (that at first sight do not have exponential firing times) as PDMP (with exponential firing times) by appropriately defining a state for the system [M. H. A. Davis, "Piecewise-Deterministic Markov Processes: A General Class of Non-Diffusion Stochastic Models", Journal of the Royal Statistical Society. Series B (Methodological), Vol. 46, No. 3 (1984), pp. 353-388]. In the manuscript this feature is exploited in what we call the LF model, where the rate of the exponential firing time of each origin (probability of firing per unit time) depends on the state of the system (specifically, the number of PreR origins), as discussed in the section on Sensitivity Analysis. We have further clarified these in the revised manuscript.

      3) The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

      Under the experimental conditions employed (ectopic expression of a mutant version of the licensing factor Cdc18, stably integrated in the genome under a regulatable promoter), the vast majority of cells undergo rereplication but to relatively low levels, resulting in cells with a DNA content of 2C-8C. Though the DNA content of several cells indeed appears similar to the DNA content of normal G2 phase cells, the vast majority (>90%) of cells undergo rereplication, as manifested by the appearance of DNA damage and, eventually, loss of viability. We have chosen this experimental set-up (medium levels of rereplication) as it allows induction of rereplication in practically all cells in the population, without the abnormal nuclear and cellular morphology which accompanies a pronounced increase in DNA content (ie 16C), and would make single-cell imaging more prone to artifacts. Fission yeast cells can be induced to undergo rereplication to various extents, by regulated expression of different versions of Cdc18 to different levels and/or co-expression of Cdt1. We have now explained this more extensively in the revised manuscript and thank the reviewer for identifying a point which may not have been clear in the first version of the manuscript.

      Concerning the possibility of studying two loci at the same time, we have indeed tried to tag a second region with TetR/TetO, however the signal-to-noise ratio and thus reproducible detection of the TetR focus was suboptimal under rereplication conditions. We therefore did not proceed further with this approach.

      4) What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

      The definition of signal ratios is given in Results: DNA re-replication at the population level: “Specifically, we computed in silico mean amplification profiles across the genome, referred to as signal ratios in (Kiang et al., 2010), by averaging the number of copies for each origin location and normalizing it to the genome mean in 100 simulations. In these profiles, peaks above 1 correspond to highly re-replicated regions, and valleys below 1 correspond to regions that are under-replicated with respect to the mean.”

      Indeed, as observed by the reviewer, simulated peaks appear overall sharper and higher than experimental peaks. This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 probes and 2 independent experiments. We have clarified this in the Results.

      Last, we chose to compare in silico and experimental profiles at a similar ploidy. Plotting in silico profiles of an earlier timepoint would indeed lead to visually more similar patterns in terms of peak intensity, but we believe this could be misleading for the readers.

      5) From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

      We have now clarified this point. Experimental observations show that under high levels of rereplication, DNA content reaches 16C four to six hours following accumulation of Cdc18 (Nishitani et al., 2000). Estimates for 0.5 kb/min and the LF model are therefore closer to experimental observations.

      6) I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

      In the manuscript, we initially present what we observe in the Results section and then proceed to provide possible explanations in Discussion. We quote from the Discussion: “Such in trans negative regulation of distant origins could be explained by competition for the same limiting factor: high-level amplification of a given locus recruits high levels of the limiting factor, indirectly inhibiting firing of other genomic regions.” and “[…] in cis elements contribute to amplified copy numbers not only directly by passive re-replication, but also implicitly through increasing the firing activity of their neighbors”. To our understanding, these sentences are in complete agreement with the reviewer’s suggestions. Nonetheless, and to make this even more clear, we have modified the Discussion in our revised manuscript.

      7) Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

      We have clarified this point in the revised manuscript.

      8) One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

      As the reviewer points out, one of the important input parameters of the model are origin efficiencies. Since the model is stochastic however, origin efficiencies do not directly determine the amplification levels at a single-cell level. For example, in Figures 3A and Supplementary Figure S4, we show the outcome of 4 random simulations with identical underlying parameters, where it is clear that re-replication can lead to markedly different single-cell amplification levels. Indeed, genome-wide analysis across 100 simulations (Supplementary Figure S5) indicated that on the onset of re-replication, amplification levels are highly unpredictable (again, despite the fact that the input parameters are identical).

      On the contrary, when analyzing amplification profiles at a population level (averaging across sets of 100 simulations), the most highly amplified regions appear to be highly reproducible. We agree with the reviewer that these population level profiles are strongly affected by the origin efficiencies, but they are not determined solely by them. For example, low efficiency origins can be highly amplified, or highly efficient origins can be suppressed (see discussion on in cis and in trans effects) depending on their neighborhood and system-wide effects, and the extend of these effects depends on the fork speed. Sensitivity analysis with respect to different model assumptions, or model parameters (see Results, section Sensitivity Analysis and Supplementary Figure S3) indicated that amplification profiles might appear sharper or flatter, but overall amplification hotspots were highly robust.

      To summarize, in our conclusions (Discussion, section Emerging properties of re-replication) we highlight these properties (stochasticity vs. robustness) and elaborate further on how they emerge during the course of re-replication (onset vs. high re-replication) or depending on the level of analysis (single-cell vs. population level).

      9) It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

      We thank the reviewer for the useful suggestion, which we have incorporated in the revised manuscript.

      10) The methods section should provide more detail.

      We would like to point out that Supplementary Material, including a full mathematical description of the model is available on BioRxiv, which was also available at the time of the preprint review, (https://www.biorxiv.org/content/10.1101/2020.03.30.016576v1.supplementary-material ), and has also been uploaded as a separate document in our GitHub page: https://github.com/rapsoman/DNA_Rereplication

      Reviewer #2:

      Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

      1) It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

      Fission yeast cells undergo robust rereplication, and reach a ploidy up to 32C - see for example (Kiang et al., 2010; Mickle et al., 2007; Nishitani et al., 2000). 16C is therefore a usual ploidy for rereplicating fission yeast cells, observed under many experimental conditions. In addition, by manipulating the licensing factors over-expressed, different levels of ploidy can be experimentally achieved, ranging from 2C (the normal ploidy of a G2 cell, but with uneven replication) to 32C. In Figure 4, we have employed a truncated form of Cdc18 (d55P6-cdc18 (Baum et al., 1998)), which induces medium-level re-replication, as confirmed by FACS analysis in Supplementary Figure S6A. Under these conditions, the vast majority of the cells (>90%) undergo re-replication, albeit at medium to low levels. We have opted to use this strain to avoid artifacts due to disrupted nuclear morphology under high levels of re-replication We have now clarified this point in the revised manuscript. We would like to point out that in silico analysis is not carried out at 16C only but across different ploidies – it is actually a strength of our approach that we can follow the rereplication process as it evolves, at any ploidy, and we have shown that our conclusions are robust throughout. We show plots at the beginning of the process (2C) and towards the end (16C), at the single-cell and at the population level, to facilitate comparison.

      Last, as also discussed in our response to reviewer 1, simulated data appear sharper, with higher peak values than experimental data (Figure 2). This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 neighboring microarray probes and 2 independent experiments. We have clarified this in the revised manuscript.

      2) This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

      We agree with the reviewer that comparing our predictions with known gene duplication events in S.pombe would be of interest. Unfortunately to our knowledge no such dataset for fission yeast exists in the literature. The most comprehensive datasets are the ones from (Kiang et al., 2010; Mickle et al., 2007), which analyse rereplicating cells, and which we have already exploited in our paper. We would like to point out that this manuscript aims to show how rereplication evolves genome-wide. Whether the additional copies generated can lead to gene duplication events is beyond the scope of the present manuscript.

      3) The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

      One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

      This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

      It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

      We thank the reviewer for the positive comments. Indeed, as we elaborate in our Discussion, we believe that the mechanism behind the observed in trans effects is the competition for a factor that exists in a rate-limiting quantity (see also reply to point 6, reviewer 1 above), which is essentially the constant in his/her equation. Though less pronounced, such in-trans effects are also possible in the UF model, and could be due to the total DNA increase being dominated by certain origins, as suggested by the reviewer. We do not suggest anywhere in the manuscript that this inhibition is direct, but rather clearly state that it is an indirect effect.

      Reviewer #3:

      This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

      The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

      We would like to point out that the overlap between experimental and simulated data is highly significant. Firstly, the Spearman correlation coefficient between simulated and experimental genome-wide profiles is highly statistically significant (p values ranging from 7.310-12 to 3.610-41 for the three fission yeast chromosomes). Furthermore, 100.000 repetitions of random peak assignment resulted in only one case where 10 out of 22 peaks overlapped (median 2 out of 22 peaks overlapping), while comparing simulated and experimental data resulted in 14 out of 22 peaks overlapping. Simulations appear more sharp than experimental data, this is however expected as simulated data correspond to the actual number of copies generated, while experimental data are subject to background noise, have a signal-to-noise ratio that is limited by the experimental method employed and represent averages of 3 probes and 2 independent experiments (see Kiang et al., 2010 and also above). We have modified the manuscript to clarify this point. The reviewer suggests that the model is of limited use, because one could trivially generate new experimental data. We would like to point out that existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007). To date no experimental method can generate single-cell, whole-genome, time-course measurements in re-replicating cells. Our model aims to fill this gap, and for this reason we believe in its usefulness.

      Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

      We would like to point out that it is the nature of replication in fission yeast which is stochastic, as experimentally shown (Patel et al., 2006), and defined at the level of single origins, and this is captured by the simulations. Heterogeneity amongst single rereplicating cells has not been previously shown or suggested in any organism, at least to the best of our knowledge. It is in our opinion a highly interesting observation, as it provides a powerful mechanism for generating a plethora of different genotypes within a population, from which phenotypic traits could be selected.

      Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

      Again, the reviewer seems unaware that no experimental method currently exists for analysing the dynamics of re-replication at a single-cell level genome-wide. We also feel obliged to point out that modeling and in silico analysis is in our opinion of great value for analysing complex biological processes, even when experimental methods are available. Though we are sure this is not what the reviewer really meant, his/her comment appears derogative to a complete field.

      Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

      As clearly stated in our manuscript (Results, section Modeling DNA re-replication across a complete genome), many studies have estimated fork speed in yeasts in normal DNA replication, with plausible values ranging from 0.5 kb/min to 3 kb/min (Duzdevich et al., 2015; Heichinger et al., 2006; Raghuraman et al., 2001; Sekedat et al., 2010; Yabuki et al., 2002). In our model, we set the base-case value as the lowest estimate (0.5 kb/min), but also explored the model’s sensitivity to this parameter by simulating the model for higher values (1 and 3 kb/min). This analysis indicated that estimates for 0.5 kb/min were closer to biological reality, a non-surprising finding given that fork speed is expected to be slower in re-replication that in normal replication.

      Overall, the comments of reviewer 3 appear in our eyes more derogative than constructive and provide little specific criticism.

      References

      Baum, B., Nishitani, H., Yanow, S., and Nurse, P. (1998). Cdc18 transcription and proteolysis couple S phase to passage through mitosis. The EMBO Journal 17, 5689–5698.

      Bechhoefer, J., and Rhind, N. (2012). Replication timing and its emergence from stochastic processes. Trends in Genetics 28, 374–381.

      Duzdevich, D., Warner, M.D., Ticau, S., Ivica, N.A., Bell, S.P., and Greene, E.C. (2015). The dynamics of eukaryotic replication initiation: origin specificity, licensing, and firing at the singlemolecule level. Mol. Cell 58, 483–494.

      Heichinger, C., Penkett, C.J., Bähler, J., and Nurse, P. (2006). Genome-wide characterization of fission yeast DNA replication origins. The EMBO Journal 25, 5171–5179.

      Kiang, L., Heichinger, C., Watt, S., B\ähler, J., and Nurse, P. (2010). Specific replication origins promote DNA amplification in fission yeast. Journal of Cell Science 123, 3047–3051.

      Lygeros, J., Koutroumpas, K., Dimopoulos, S., Legouras, I., Kouretas, P., Heichinger, C., Nurse, P., and Lygerou, Z. (2008). Stochastic hybrid modeling of DNA replication across a complete genome. Proceedings of the National Academy of Sciences 105, 12295–12300.

      Menzel, J., Tatman, P., and Black, J.C. (2020). Isolation and analysis of rereplicated DNA by Rerep-Seq. Nucleic Acids Res 48, e58–e58.

      Mickle, K.L., Oliva, A., Huberman, J.A., and Leatherwood, J. (2007). Checkpoint effects and telomere amplification during DNA re-replication in fission yeast. BMC Molecular Biology 8, 119.

      Nishitani, H., Lygerou, Z., Nishimoto, T., and Nurse, P. (2000). The Cdt1 protein is required to license DNA for replication in fission yeast. Nature 404, 625–628.

      Patel, P.K., Arcangioli, B., Baker, S.P., Bensimon, A., and Rhind, N. (2006). DNA Replication Origins Fire Stochastically in Fission Yeast. Mol. Biol. Cell 17, 308–316.

      Raghuraman, M.K., Winzeler, E.A., Collingwood, D., Hunt, S., Wodicka, L., Conway, A., Lockhart, D.J., Davis, R.W., Brewer, B.J., and Fangman, W.L. (2001). Replication Dynamics of the Yeast Genome. Science 294, 115–121.

      Rhind, N., Yang, S.C.-H., and Bechhoefer, J. (2010). Reconciling stochastic origin firing with defined replication timing. Chromosome Res 18, 35–43.

      Sekedat, M.D., Fenyö, D., Rogers, R.S., Tackett, A.J., Aitchison, J.D., and Chait, B.T. (2010). GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome. Molecular Systems Biology 6, 353.

      Yabuki, N., Terashima, H., and Kitada, K. (2002). Mapping of early firing origins on a replication profile of budding yeast. Genes to Cells 7, 781–789.

    2. Reviewer #3:

      This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

      The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

      Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

      Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

      Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

    3. Reviewer #2:

      Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

      1) It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

      2) This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

      3) The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

      One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

      This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

      It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

    4. Reviewer #1:

      The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

      1) It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

      2) One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

      3) The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

      4) What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

      5) From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

      6) I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

      7) Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

      8) One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

      9) It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

      10) The methods section should provide more detail.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Tim Formosa (University of Utah School of Medicine) served as the Reviewing Editor.

      Summary:

      A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

  2. Aug 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the References

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes.

      They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      **Major points:**

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      We agree with the reviewer that we need to describe more about the Hasle et al. paper (now ref #20 in the revised manuscript) and expand our description of other applications that could be performed with the method. For this purpose, we have made the following changes:

      We have modified the relevant paragraph in the Introduction.

      p.3 the second paragraph

      Recently, an imaging based method named “visual cell sorting” was described that uses the photo-convertible fluorescent protein Dendra2 to enrich phenotypes optically, enabling pooled genetic screens and transcription profiling(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). Here, we developed an analogous approach to execute an imaging-based pooled CRISPR screen using optical enrichment by automated photo-activation of the photo-activatable fluorescent protein, PA-mCherry.

      We have also added the following paragraph in the Discussion.

      p.14 line 1

      In our study, optical enrichment was utilized for pooled CRISPR screens on phenotypes identifiable through microscopy. However, optical enrichment can be used for other purposes, as demonstrated previously(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). In a recent study by Hasle et al.(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), the process of separating cells by FACS after optical enrichment was termed “visual cell sorting”. This method was used to evaluate hundreds of nuclear localization sequence variants in a pooled format and to identify transcriptional regulatory pathways associated with paclitaxel resistance using single cell sequencing(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), demonstrating the broad applicability and power of this approach beyond CRISPR screening.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.

      We thank the reviewer for pointing this out and to avoid future confusion, we restricted the usage of photo-conversion to specifically indicate conversion of fluorescence from one color into another: e.g. when talking about the published visual cell sorting paper in which Dendra2 is used as a photo-convertible fluorescent protein. We use photo-activation in reference to the activation of PA-mCherry in our work.

      1. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets.

      For validating hits coming from the nuclear size screen, we have verified the successful transduction of corresponding sgRNA constructs by FACS analysis, but have not confirmed the knockdown. Before final journal publication, we propose to perform rt-qPCR on our 15 gene hits before and after knockdown to measure the percentage of knockdown separately.

      In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      We apologize for the confusion. These phenotypes were verified with pools of 3-4 sgRNAs and the histograms are representative of a single replicate infected with a mixed 3-4 sgRNA pool. We have modified the legend to Figure 5 (original Fig. 4) and the method section to explain this point.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.

      The description of the experiment and information about the selected sgRNAs has been added in the Method section as follows:

      p.23

      Verification of hits from nuclear size screen

      For each hit in the nuclear size screen, the two sgRNAs with the highest phenotypic score in the screen and the two sgRNAs with the highest score predicted by the CRISPRi-v2 algorithm24 were selected and pooled to generate a mixed sgRNA pool of 3-4 sgRNAs (detailed information in Supplementary file 8). Cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were transduced with pooled sgRNAs targeting each gene and puromycin selected for 2 days to prepare for imaging. Cells were then seeded into 96-well glass bottom imaging dishes. Images were collected the next day and nuclear size was measured using the Auto-PhotoConverter µManager plugin. To focus on cells with successful transduction, BFP was co-expressed on the sgRNA construct and only cells with BFP intensity above a threshold value were included in nuclear size measurements. This BFP threshold was established by comparing the average BFP intensity of cells with and without sgRNA transduction (Fig.S3a).

      We agree with this important point and have changed the figure legend of Fig. 5c (original Fig. 4c) to just describe the plot:

      c, The ratios between median level of nuclear size measured from microscopy and H2B-mGFP fluorescence or FSC signal measured from FACS after knockdown, were plotted separately. TACC3, confirmed to be a control gene, was used for comparison (Grey bar).

      The typo has been corrected.

      Reviewer #1 (Significance (Required)):

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      **General comments**

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories:

      Seeding density

      Time elapsed (in hours) between cell plating and optical enrichment

      The number of fields of view examined

      The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted

      The total time taken (in hours) to perform imaging and photoconversion

      The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      We agree with the reviewer and apologize for the confusion that arose from our description. We also thank the reviewer for suggesting using established methods. However, MAUDE, an analysis for sorting-based CRISPR screen with multiple expression bins, might not be suitable for our study since 1) the distribution of mCherry fluorescence intensity is a reflection of photo-activation efficiency and not sgRNA effect 2) only one sorting bin is collected for each experimental condition. Our analysis is adapted from an existing method from the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing).

      We agree with the reviewer regarding clarifying other points and rewrote the following part in the Method section:

      p. 20

      mIFP proof-of-principle screen, Nuclear size screen, FSC screen and H2B-mGFP screen

      For the mIFP proof-of-principle screen, mIFP positive cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP mIFP-NLS) and mIFP negative cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “mIFP sgRNA library” (CRISPRa library with 860 elements, see Supplementary file 5) and the “control sgRNA library” (CRISPRa library with 6100 elements, see Supplementary file 6) separately. For the nuclear size screen, FSC screen and H2B-mGFP screen, cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “nuclear size library” (CRISPRi library with 6190 elements, see Supplementary file 7). To guarantee that cells receive no more than one sgRNA per cell, BFP was expressed on the same sgRNA construct and cells were analyzed by FACS the day after transduction. The experiment only continued when 10-15% of the cells were BFP positive. These cells were further enriched by puromycin selection (a puromycin resistance gene was expressed from the sgRNA construct) for 3 days to prepare for imaging. For FSC and H2B-mGFP screens, cells were then subjected to FACS sorting. Cells before FACS (unsorted sample for FSC and H2B-mGFP screens) and top 10% cells based on either FSC signal (high FSC sample) or GFP fluorescence signal (high GFP sample) were separately collected and prepared for high throughput sequencing. For mIFP proof-of-principle screen and nuclear size screen, cells were then seeded into 96-well glass bottom imaging dishes (Matriplate, Brooks) and imaged starting from the morning of the next day (around 15 hr after plating). A series of densities ranging from 0.5E4 cells/well to 2.5E4 cells/well with 0.5E4 cells/well interval were selected and seeded. The imaging dish with cells around 70% confluency was selected to be screened on the imaging day. For mIFP proof-of-principle screen, a single imaging plate was performed for each replicate while 4 imaging plates per replicate were imaged for the nuclear size screen. When executing multiple imaging runs, 2 consecutive runs could be imaged on the same day (day run and night run). 64 (8x8, day run) or 81 (9x9, night run) fields of view were selected for each imaging well and each field of view was subjected to an individual round of imaging directly followed by photo-activation. Around 200-250 cells were present in each given field of view and 60% to 80% surface area of each well was covered. Either mIFP positive cells or cells passing the nuclear size filter were identified and photo-activated automatically using the Auto-PhotoConverter µManager plugin. The total time to perform imaging and photo-activation of a single 96-well imaging dish with around 1.5 million cells was around 8 hr. The night run generally took longer, since more fields of view were included than in the day run. Cells were then harvested by trypsinization and pooled into a single tube for isolation by FACS. Sorting gates were pre-defined using samples with different photo-activation times (e.g. 0s, 200ms, 2s) and detailed gating strategies are described in Supplementary file 1. Sorted samples were used to prepare sequencing samples.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      To address these, we have added the following sentences:

      p. 4 line 16

      A photo-activatable fluorescent protein was chosen over a photo-convertible fluorescent protein to increase the number of channels available for imaging. PA-mCherry was chosen to leave the better performing green channel open for labeling of other cellular features. Moreover, non-activated PA-mCherry has low background fluorescence in the mCherry channel (Fig. S1b), and it can be activated to different intensities when photo-activated for various amounts of time.

      p. **14 line 10

      Phenotypes of interest should be identifiable under the microscope and generally require fluorescent labeling. Commonly used fluorescence microscopes use four channels for fluorescent imaging with little spectral overlap: blue, green, red and far red. In our study, the red channel was occupied by cell labeling with PA-mCherry and the blue channel was used to estimate sgRNA transduction efficiency. Since sgRNA transduction efficiency can be measured by other approaches, the blue channel could be used together with the remaining two channels to label cellular structures. Combining bright field imaging with deep learning can be used to reconstruct the localization of fluorescent labels(Ounkomol, C.; Seshamani, S.; Maleckar, M. M.; Collman, F.; Johnson, G. R. 2018), making it possible to use bright field imaging to further expand the phenotypes that can be studied with our technique.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      We thank the reviewer for this remark and revised all figures accordingly. Points and fonts were enlarged, and schematics were simplified or removed.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      We agree with the reviewer and the following adverbs have been removed: “highly” in abstract and page 15; “easily” on pages 4 and 11; “completely” on page 11 and three “only” on page 12.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      We apologize that we missed this, and the sequencing data has been deposited to GEO (GSE156623) which will be made available before final publication. The following part has been added to address this.

      p. 24

      DATA AND SOFTWARE AVAILABILITY

      The raw and processed data for the high throughput sequencing results have been deposited in NCBI GEO database with the accession number (GSE156623). The plugin Auto-PhotoConverter developed for open source microscope control software μManager(Edelstein, A. D.; Tsuchida, M. A.; Amodaj, N.; Pinkard, H.; Vale, R. D.; Stuurman, N. 2014) has been deposited on github (https://github.com/nicost/mnfinder).

      **Specific comments**

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      We apologize for our poor terminology. Our original definition of “specificity” is the same as “precision” suggested by the reviewer. To avoid future confusion, we have changed all relevant occurrences of “specificity” into “precision”. The following sentence was modified to clarify the definition:

      p. 5 line 15

      To evaluate the precision (the fraction of called positives that are true positives) of this assay, all cells were collected and analyzed by FACS after image analysis and photo-activation (Fig. 2d and 2e). We calculated precision as the fraction of photo-activated cells (mCherry positive cells) that are true positives (mIFP-mCherry double positive cells) (Fig. 2f).

      Measuring recall is complicated because the microscope is unable to visit all locations in the imaging plate, hence recall will depend on the fraction of cells actually “seen” by the microscope. For the screening strategy employed in the nuclear size screen, recall is not as important as precision, since lower recall rates are compensated for by screening larger cell numbers. We therefore did not attempt to measure recall directly.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      We agree with the reviewer and have changed the following description:

      p. 5 line 20

      The precision varied with the initial percentage of mIFP positive cells and ranged from 80% to ~100% (initial percentage of mIFP positive cells ranging between 2.3% and 43.7%) (Fig. 2f). Precision is expected to fall below 80% with initial percentage of mIFP positive cells less than 2.3%. However, these results indicate that optical enrichment can be used to identify hits with high precision even at relatively low hit rates.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      We apologize for our confusing description. To avoid the confusion, we rewrote the paragraph describing the experiment and added a schematic (Fig. 3a) to better describe this experiment. We also simplified the result by just presenting the lower right panel of original Fig. 2b (current Fig. 3b) and moved the other data into supplementary figures (Fig. S2).

      p. 6 line 4

      mIFP negative cells and mIFP positive cells were separately infected with two different CRISPRa sgRNA libraries (6100 sgRNAs for mIFP negative cells; 860 sgRNAs for mIFP positive cells) at a low multiplicity of infection (MOI) to guarantee a single sgRNA per cell. Note that in these experiments, the sgRNAs only function as barcodes to be read out by sequencing, but do not cause phenotypic changes as the cells do not express corresponding CRISPR reagents. These two populations were then mixed at a ratio of 9:1 mIFP negative cells: mIFP positive cells. We again used mIFP expression as our phenotype of interest (outlined in Fig. 3a). Two biological replicates were performed and at least 200-fold coverage of each sgRNA library was guaranteed throughout the screen, including library infection, puromycin selection, imaging/photo-activation and FACS.

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      We have clarified our description. There are no clear BFP+ and BFP- populations but instead one continuous population due to the background expression of BFP from the dCas9 construct: dCas9-KRAB-BFP (which is now clearly indicated in the manuscript). On top of the dCas9-KRAB-BFP, another BFP is encoded on the sgRNA construct, which leads to a higher BFP expression level.

      There was no gating in the experiment, the grey dots in the figure represents wild type cells without viral transduction while the red dots (partially covered by the grey dots) were cells infected with the two negative control sgRNAs. We mistakenly wrote the legend of original Fig. S3 (current Fig. S3a) that these were FACS data; however, the data were acquired by imaging. We apologize for the confusion and thank the reviewer for detecting the issue. We completely rewrote the legend to Fig. S3a (original Fig. S3) to clarify.

      We now include the number of cells analyzed and the number of replicates for the other flow plots and histograms in the manuscript.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      To address the reviewer’s point, we have included the following sentence:

      p. 8 line 23

      We defined nuclear size as the 2D area in square microns measured by H2B-mGFP using an epifluorescence microscope, as determined by automated image analysis (Fig. 4a and Supplementary file 2).

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff.

      Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      We clarified the description of the statistical analysis of the screen (see new/changed text below). Mann-Whitney p-values for the two replicates were calculated independently. The Mann-Whitney U test was not performed against a randomized set of phenotypic scores, but using the phenotypic scores of the 22 control non-targeting sgRNAs that were part of the library. Because there are only 22 control sgRNAs (adding more control sgRNAs would increase the size of the library, and reduce the number of genes that can be screened within a given amount of time), the statistical significance of testing genes against these controls is not expected to be very high, and using direct approaches such as multiple hypothesis testing are not expected to yield hits. Instead, we calculated a score combining the severity (phenotypic score) and the trustworthiness (Mann-Whitney p value) of the phenotype (a method previously developed in the Weissman lab at UCSF: https://github.com/mhorlbeck/ScreenProcessing24). We thank the reviewer for suggesting using false discovery rate control as a better method for choosing a cutoff. We modified our original analysis and now determine the threshold of our score based on a calculated empirical false discovery rate (eFDR). We used this approach to maximize the number of true hits and relied on a repeat of the screen and follow-up testing of hits to narrow down true hits. We added the following part in the method section and added an analysis example to the supplementary files (Supplementary file 9)."

      p. 22

      Bioinformatic analysis of the screen

      Analysis was based on the ScreenProcessing pipeline developed in the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing)**(Horlbeck, M. A.; Gilbert, L. A.; Villalta, J. E.; Adamson, B.; Pak, R. A.; Chen, Y.; Fields, A. P.; Park, C. Y.; Corn, J. E.; Kampmann, M.; Weissman, J. S. 2016). The phenotypic score (ε) of each sgRNA was quantified as previously defined(Kampmann, M.; Bassik, M. C.; Weissman, J. S. 2013)** (Supplementary file 9). For the mIFP proof-of-principle screen, phenotypic score of each group was the average score of two sgRNAs assigned to the group and averaged between two replicates except otherwise described. For the nuclear size screen, FSC screen and H2B-mGFP screen, genes were scored based on the average phenotypic scores of the sgRNAs targeting them. For the nuclear size screen, phenotypic scores were further averaged between 4 runs for each replicate. For the nuclear size screen, FSC screen and H2B-mGFP screen, sgRNAs were first clustered by transcription start site (TSS) and scored by the Mann-Whitney U test against 22 non-targeting control sgRNAs included in the library. Since only 22 control sgRNAs were included, significance of hits was assessed by comparison with simulated negative controls that were generated by random assignment of all sgRNAs in the library and phenotypic scores of these simulated negative controls were scored in the same way as phenotypic scores for genes. A score η that includes the phenotypic score and its significance was calculated for each gene and simulated negative control. The optimal cut-off for score η was determined by calculating an empirical false discovery rate (eFDR) at multiple values of η as the number of simulated negative controls with score η higher than the cut-off (false positives) divided by the sum of genes and simulated negative controls with score η higher than the cut-off (all positives). The cut-off score η resulting in an eFDR of 0.1% was used to call hits for further analysis (Supplementary file 9). An example analysis is described in detail in Supplementary file 9 and raw counts and phenotypic scores for all four screens are listed in Supplementary file 10 and 11.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      We agree with the reviewer and have changed the sentence into:

      p. 10 line 17

      These data suggest that a direct measurement utilizing a microscope can provide different information and reveal hits that are inaccessible using other screening approaches.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      We agree with the reviewer and have modified the sentence with a citation:

      p. 12 line 20

      This is significantly faster than in situ methods which process millions of cells over a period of a few days(Feldman, D.; Singh, A.; Schmid-Burgk, J. L.; Carlson, R. J.; Mezger, A.; Garrity, A. J.; Zhang, F.; Blainey, P. C. 2019).

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      We have added more description in the discussion section:

      p. 13 the second paragraph

      Optical enrichment screening also is possible for phenotypic screens with relatively low hit rates (defined as the fraction of all genes screened that are true hits). The ability to detect hits at low hit rates in our method depends on multiple factors, including: 1) the penetrance of the phenotype; 2) cellular fitness effect of the phenotype; 3) detection and photo-activation accuracy of the phenotype; 4) limitations imposed by FACS recovery and sequencing sample preparations of low cell numbers. The first three factors vary with the phenotype of interest. We optimized the genomic DNA preparation protocol (Methods), and are now able to process sequencing samples from a few thousand cells, enabling screens of low hit rate phenotypes. In our nuclear size screen, more than 1.5 millions cells were analyzed during each run with 2000-4000 cells recovered after FACS sorting. The hit rate of this screen was 2.76%, similar to optical CRISPR screens performed in an arrayed format(de Groot, R.; Luthi, J.; Lindsay, H.; Holtackers, R.; Pelkmans, L. 2018)**, demonstrating the possibility to apply our approach to investigate phenotypes with low hit rates.

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      The relevant description has been moved to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      To address this point, we have added the following sentences in legend of Fig. 5:

      The cell population is heterogeneous due to inefficient knockdown, incomplete puromycin selection, and penetrance of the phenotype. A BFP was expressed from the same sgRNA construct. Only cells with high BFP intensity, indicating successfully sgRNA transduction, were included for data analysis as described in Methods.

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      We applied the Kolmogorov-Smirnov test and the corresponding sentence was changed into:

      p. 10 last line

      *14 out of 15 hits were confirmed to be real hits (Kolmogorov-Smirnov test two tailed p-value

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      The figures were simplified to just show the example images.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      We agree with the reviewer and changed the color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      We agree with the reviewer’s point; “most” has been changed to “some”.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      The bars represent the ratio of the median level of H2B-mGFP intensity (the axis is now labeled with "GFP" rather than "FITC", the colloquial name for the channel used on the FACS machine) measured by FACS and the median nuclear size of the same population of cells measured by microscopy. We plan to perform additional experiments to measure DNA content using a DNA dye in the same cell by microscopy so that we will be able to correlate these on a cell by cell basis. Data will be added before final publication.

      Reviewer #2 (Significance (Required)):

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      We do not in any way claim that the presented method replaces arrayed screens. However, most current sgRNA libraries are pooled libraries, and the few available arrayed sgRNA libraries are expensive and difficult to maintain, hence our methods to screen pooled sgRNA libraries are timely and useful. Comparisons with arrayed screens are unwarranted as no claims are made with respect to arrayed screens.

      We have clarified the manuscript in many places, and hope it is now readable and better understandable by more readers with diverse backgrounds.

      **Specific points:**

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      We used the term specificity inadvertently and should have used precision, as also pointed out by Referee 2. This has been corrected in the current manuscript. We picked the mIFP phenotype as this was a proof of principle screen to clarify the performance of our screening approach and needed a phenotype that can be measured both by microscopy and FACS. We demonstrate that multi-parametric read-outs are possible, but do not think that the first demonstration of new technology needs such an application.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      We improved the description of this experiment. To clarify, we used mIFP in a proof of concept screen to validate whether sgRNAs infecting mIFP positive cells can be distinguished from those infecting mIFP negative cells No phenotypic signature other than the mIFP signal is used (as described in the text). As customary in pooled screens, a primary comparison was made between the positive (optically selected) cells and the complete population. To improve the clarity of this screen, we further described the concept of pooled sgRNA screens, which may have made this section harder to follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      There are no phenotypes induced in the mIFP population (as now explicitly explained in the text). The mIFP population is isolated using optical enrichment, and we test our ability to discriminate the sgRNAs present in the enriched population. It is unsurprising that comparing to the negatively selected population (which is not possible in most other pooled screens) is significantly better than comparing against the total population (as customary in pooled screens).

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      There are no arrayed screens performed in our study.

      Reviewer #3 (Significance (Required)):

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      Specific points:

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      Significance

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      General comments

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories: • Seeding density • Time elapsed (in hours) between cell plating and optical enrichment • The number of fields of view examined • The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted • The total time taken (in hours) to perform imaging and photoconversion • The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      Specific comments

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff. Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      Significance

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes. They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      Major points:

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.
      2. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets. In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.
      2. The legend of Figure 4c is not describing what the plot is showing. Instead it tells the readers the authors' interpretation of the data.
      3. Figure S1b there is a typo

      Significance

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

    1. Reviewer #3:

      This paper by Thaker et al describes the use of lung-on-a-chip microfluidic devices for early interactions during acute M. tuberculosis infection under conditions chosen to mimic the alveolar environment in vivo. The authors use time-lapse microscopy to study host-Mtb interactions in macrophages and alveolar epithelial cells, the role of the Mtb Type VII secretion system and the impact of surfactant on Mtb infection. This study suggests that organ-on-a chip systems might be able to reproduce host-microbe physiology during infection, which is difficult to reproduce ex vivo using single cells, air-liquid interface, organoids or organ explants. This is an exciting approach which has the potential to expand the ability to study host-pathogen interactions, but there are some limitations that dampen my enthusiasm.

      Major concerns:

      While I recognize that it is challenging to use live cell imaging with colocalization markers, much of the data of the paper, such as comparisons between AECs and macrophages, or mutant Mtb strain vs WT, or role of surfactant, rests on the ability to determine the precise localization of bacteria. However, neither AECs nor macrophages are specifically identified with high enough resolution to give confidence that the Mtb are associated with those cells specifically, and more importantly, that the bacteria are growing intracellularly rather than extracellularly. The authors show multiple bacterial microcolonies that grow in size over time, but whether these are inside or outside cells, and whether the cells are AECs or macrophages isn't overtly specified. Many of the images are of such low resolution that only tiny dots of bacteria are observed. To the author's credit, the quantitative and statistical analysis is very rigorous, however, better evidence for the issues raised above would increase confidence in the results. This point is highlighted in detail by by the following:

      Lines 60-63: "Inoculation of the LoC with between 200 and 800 Mtb bacilli led to infection of both macrophages (white boxes in Fig. 1M, P, zooms in Fig. 1O, R) and AECs (yellow boxes in Fig. 1M, P, zooms in Fig. 1N, Q) under both NS (Fig.1M-O) and DS (Fig. 1P-R) conditions." Identification of GFP-expressing macrophages can be assumed based on their expression of GFP (though the cells themselves aren't colocalized) on images but the same cannot be said of AECs. The yellow boxes could represent AECs or spaces on the chip with no cells at all. Furthermore, the 2D images showed in Figure 1 do not necessarily represent infected cells, and the possibility of visualization of Mtb outside the cells should be considered. Thus, higher resolution images, with clear colocalization and z-stacks, would increase the confidence in the results.

      The data arguing for attenuation of Esx-1 mutant Mtb in AECs and macrophages is not strong, and the authors do not actually make a direct statistical comparison between appropriate groups (i.e. AEC NS WT vs Esx-1, or Mac NS WT vs Esx-1). For example, it appears that the mean/median growth rate of WT Mtb in macs is ~0.25hr-1, which appears roughly the same for Esx-1 mutant Mtb in the same cells. There may be a difference under DS conditions, but since the comparisons aren't made directly it is impossible to know.

    2. Reviewer #2:

      The manuscript by Thacker et al, entitled "A lung-on-chip model reveals an essential role for alveolar epithelial cells in controlling bacterial growth during early tuberculosis" is an interesting study describing a new in vitro model to determine the early events of Mycobacterium tuberculosis infection. This model is important and novel; however, this study is descriptive and some of the findings (e.g., attenuated growth of M. tuberculosis after exposure to surfactant in macrophages and alveolar epithelial cells, as well as changes on the M. tuberculosis cell wall after exposure to surfactant, or that exposure to surfactant does not alter the extracellular viability of M. tuberculosis) have been reported by others using other in vitro models. The use of the ESX-1 attenuated mutant is not clear in this study, as well as the concept that exposure to surfactant may change the attenuation of this strain. The composition of mouse surfactant and human surfactant is also quite different, thus extrapolating results need to be done with caution.

      Major concerns:

      1) Results provided in Figures 1, 2 and Fig. 3 supplement 1 are confusing, and readers need to guess what they are looking at, especially in Figure 1 M-R. As this is an important model , it will be appropriate to have detailed and better images showing well-defined cells, and quantify their findings in Tables (e.g. number of alveolar epithelial cells type I and II, number of macrophages, numbers of endothelial cells, bacteria per cell, etc.). In Fig. 3 supplement 1 one needs to guess what is intracellular or extracellular within the studied system.

      2) The definition of Normal surfactant (NS) vs. Deficient surfactant (DS) is confusing as used. Alveolar epithelial cells type II (AT-IIs) become type I (AT-I) over time in in vitro cultures (in 5 to 7 days) and thus, these stop secreting surfactant. Authors found that after 6-11 passages AT-IIs stopped producing surfactant but also lost their cellular characteristics as well as the expected characteristics of AT-Is. This needs to be further studied in detail to ensure that this cell is not an artifact produced by multi-passaging in vitro. Authors need to use several AT-IIs and AT-Is markers to be certain that the DS cell monolayers indeed still are ATs. Surfactant protein C, although used as a marker for AT-IIs, is a soluble protein that has been shown to interact with many cells within a cellular system. A correlation between SPTPC and AQP5 expression over time is also necessary as points out the differentiation of AT-IIs to AT-Is, a key feature of the role of AT-IIs as progenitors of AT-Is.

      3) Authors did not consider that M. tuberculosis can form micro-colonies on the cell surface of alveolar epithelial cells and thus, the intracellular growth that they are reporting could be extracellular growth. Did the authors after infection treat the system with an antibiotic to kill extracellular M. tuberculosis bacilli attached to the alveolar epithelial cell surface? In addition, the concept of M. tuberculosis micro-colonies growing inside cells need to be better explained. Are these bacterial clumps? How the authors discern that the ones that are not growing vs. the ones that are dead?

      4) If I understand the described method well, the staining of Curosurf (poractant alfa) is not as such. Authors used a commercial labeled phosphatidylcholine (PC) added into the Curosurf. This labeled PC may or may not interact with Curosurf components, but what is obvious is that it makes micelles. What it is quantified is the interaction of the labeled PC with M. tuberculosis. Moreover, the artificial addition of this phospholipid (at 10%) is changing the original composition of Curosurf, and this may have physiological implications. Authors need to confirm if the PC added was indeed DPPC. Authors also need to come up with a better way to demonstrate that Curosurf components are opsonizing M. tuberculosis bacilli. In addition, why authors used 1% Curosurf for their experiments. Is there a dose titration effect? Why authors did not use Survanta or Infasurf or mouse surfactant?

      5) The in vivo simulation of infection using grow rates randomly chosen from the kernel density estimations for the respective populations. In this graph, it is very important to discern the bacteria with high growth rates from the bacteria with low growth and intermediate growth rates (at the 99 percentile, 75 percentile, at the 50 percentile, at the 25 percentile and at the 1 percentile) and assess how these are projected to behave in vivo. As presented it is not very informative about the impact of NS ATs vs. DS ATs on M. tuberculosis infectivity in this model system.

      6) Similar alterations on the M. tuberculosis cell wall and release of cell wall components to the milieu when exposed to physiological concentrations of human lung surfactant have been already described. The same is applicable to the slower replication rate in ATs (an intracellular killing in macrophages) after M. tuberculosis exposure to human lung surfactant. Although two different systems, authors need to contrast their findings with these reported ones in their discussion. In addition, it is not clear how many times this was performed. Statistics are mentioned on the figure legends, but there are no stats in the figure.

    3. Reviewer #1:

      1) What quality control is done for each experiment to determine the ratio of type I and type II AECs in each chip set up for each experiment? This is of particular importance because the authors do not show any images where they stain for both type I and type II AECs in the same chip. Do the authors have images stained for both type of cells to illustrate the composition of each chip? After figure 1, what staining is done to confirm the DS cells decrease proSPC expression for each experiment?

      2) The authors focus on the difference in surfactant gene expression in the newly isolated AECs (NS) versus in vitro passaged AECs (DS), but they also observe that aqp5 is downregulated. In fact, the data supports that the cells are just de-differentiating during passage in culture, which will have multiple effects on the cells, not just surfactant production. This should be commented on and discussed. After loss of those markers, how do the authors confirm they still have type I and type II AECs in their cultures? Is there microscopy data with other markers that are retained in the AECs? The add back experiments with Curosurf support that surfactant can contribute to bacterial control, but this imparts only a partial complementation and the evidence for de-differentiation implies other pathways at play.

      3) One of the biggest concerns is that the authors never stain for type I or type II AECs after infection and make the conclusion that the bacteria are within type II cells based on the absence of macrophage staining. However, the bacteria may not even be in a cell, or the AECs could be dying during infection. On a related note, there is no data presented that shows that type I cells are not infected in the lung on chip system with Mtb.

      4) The authors state that their data with the Esx1 mutant "demonstrates that ESX-1 secretion is necessary for rapid intracellular growth in the absence of surfactant, consistent with the hypothesis that surfactant may attenuate Mtb growth by depleting ESX-1 components on the bacterial cell surface". This seems like quite a jump in interpretation of the data since the Esx1 mutant is likely attenuated for many reasons, and this attenuation is dominant to any effect that surfactant is having. The authors also show that PDIM levels are not different in the presence or absence of surfactant, and this is an Esx1 dependent lipid.

      5) What is the purpose for including the icl1/icl2 mutant? This experiment is not included in the data quantification.

    1. Reviewer #3:

      The study by Taverna et al. uses NGN2-induction in human, chimpanzee, and bonobo pluripotent stem cells to attempt to decouple the process of neuronal maturation from the cell cycle in order to study species-specific differences in neuronal maturation. Using single cell RNA sequencing, analysis of neuronal morphology, and electrophysiological recordings, the study argues that neuronal maturation is delayed in human compared to chimpanzee and bonobo among a heterogeneous class of sensory neurons and that this delay is cell-intrinsic. However, the current data are incompletely analyzed and do not provide strong support for this conclusion.

      Major comments:

      The dramatic differences in cell type composition of the induced neurons across species, revealed by single cell sequencing in Figure 2A, pose significant problems for the interpretation of the rest of the results. Specifically, if the chimpanzee cells are biased to making different sensory neuron cell types than the human cells, then differences in maturation rates between cell types rather than between species could drive the results. The authors must take into account the influence of cell type, individual, and species in order to support their claims of species differences.

      First, the number of individuals (only one chimpanzee individual) used for single-cell analysis is inadequate. There could be individual differences in timing and neuronal composition between lines that are independent of species and are not accounted for. At least 3-5 individuals per species should be used to enable statistical analysis of species differences. Ideally, the same lines should be used for single-cell analysis and morphological/physiological analyses. Staining for the cluster markers discovered from the current single cell analysis could also be applied to the remaining individuals to understand whether induced neurons have a similar composition across all the individuals from the three species.

      If the single chimpanzee individual shown in the single cell data is really representative of the three chimpanzee lines used elsewhere in the manuscript, the dramatic differences in neuronal types across species must be taken into account in subsequent analyses. For example, gene expression in Figure 3 could be analyzed on a cluster by cluster basis rather than grouping all neuronal clusters together. As shown, the differences across species could just be due to cell-type specific differences (for example, cluster 4 appears to be made up of entirely chimpanzee neurons while cluster 5 has more equal species representation). For physiology and morphology experiments, post hoc marker staining could ensure that neurons of the same type are compared across species, or if not registered to individual cells, it could still reveal the similarities and differences in composition between plates.

      Does NGN2 induction make a valid cell type? The authors should compare their expression data to previous work utilizing NGN2 induction (Zhang et al 2013) as well as to data from mouse and human tissue samples. It would be helpful to clarify whether the differences with previous work (i.e. induction of sensory neurons compared to cortical neurons) are due to incomplete characterization previously or to a different outcome here. And most importantly, it would be helpful to more clearly identify the endogenous cell types modeled in this data, perhaps by integration with primary sensory and cortical neurons single cell datasets.

      Do the BRN2 and CUX1-positive cells show co-expression with other cortical markers, like FOXG1 and EMX2, to support the statement that some of these cells may be cortical, or are these genes also expressed in some sensory neurons, or are these simply cells of mixed identify that lack in vivo counterparts?

      Please provide more detail about the NGN2 expression system as utilized across species.

      For each species, was the corresponding NGN2 gene used? If so, are there sequence differences between species that could influence differentiation?

      Is the time course of NGN2 expression the same across species?

      What are the dynamics of NGN2 induction in this system compared to normal differentiation - does persistent NGN2 expression after differentiation ultimately keep neurons in a more immature state?

      Does the NGN2 system entirely de-couple differentiation from cell cycle as the authors claim or do a few cell cycles still occur post-induction, and does this number differ between species? The focus in the introduction on cognition and the role of cortical differences between humans and non-human primates is puzzling in light of the claim that most of the neurons generated in this study are sensory neurons. If the authors' conclusions are valid, then it seems that this finding should be framed differently. Are there known species differences in sensory neurons? Do these results suggest that delayed maturation is a more general phenomenon and not restricted to brain regions involved in cognition?

      The following sentence in the discussion attempts to address this point: "Of note, sensory neurons are interesting from an evolutionary point of view, as the development and evolution of working memory in humans is linked to a higher integration of sensory functions in the human prefrontal cortex." However, this statement and the references cited instead support the view that species differences might be found in the prefrontal cortex rather than in sensory neurons.

    2. Reviewer #2:

      This is a well written MS looking at comparing the rate/tempo of maturation of Chimpanzee, Bonobo and human neurons. The work is well done and easy to follow. The core findings are that human neurons, developed in vitro via a well-established directed differentiation protocol mature slower than the NHP neurons.

      Several groups have previously used both in vivo and in vitro models (similar to the one used here) to define cross-species maturation features. These earlier studies have shown that indeed human cells develop more slowly than other species (like mice or Chimpanzees). The authors recognize this work in their introduction. While the finding of slower human neuron maturation is not completely novel, the current work furthers these earlier studies by adding additional characterization of electrophysiological and molecular properties of the neurons made. It also highlights an underappreciated presence of sensory neurons in these cultures.

      Things to consider:

      1) Definitive characterization of the neurons produced by Ngn2 overexpression. Prior work defined the neurons mostly as pyramidal, of cortical origin. Here, the authors claim both mix identity (very probable) and the presence of large numbers of sensory neurons. One is left wondering whether this is a slightly different differentiation protocol, whether the interpretation of the data is different, or whether variability is high. If the authors classify the single cell RNA data from prior studies with this same protocol, would they still conclude that these are sensory neurons? If the authors could prove that the protocol produces bona fide sensory neurons, that would be an advance for the field. That may require direct comparison to endogenous sensory neurons (beyond a small number of markers) and classification based on electrophysiological properties (which the authors do have). Are these sensory neurons based on physiology?

      2) Could one use the system to point at mechanisms that may mediate the observed differences in maturation rates? This would move the field forward in a powerful way.

    3. Reviewer #1:

      The results are somewhat underdeveloped and there are several aspects of the study that can be improved by deeper analyses:

      1) The rigor of the experiments and statistical analysis is not clear. Although the use of several lines of iPSCs from each species is a strength, there are no details of how many batches of differentiation/induction were done or how many replicates were used for analysis. This is especially important for structural and functional analysis that can vary between lines and batches.

      2) The identity of the induced neurons as sensory neurons is interesting but is based solely on gene expression (scRNAseq). It would be more compelling if the authors would show other characteristics that identified this population of neurons. It is possible that some neurons express these sensory neuron genes, but do not express the proteins and/or do not differentiate into functional sensory neurons.

      3) The proportions of cells in each cluster of the scRNAseq would be informative to 1) identify changes as the neurons mature and compare between species, and 2) identify differences between species, as the authors state (page 9) that same populations were found in different proportions.

      4) Given the valuable time course scRNA seq data, the analysis of neuron maturation over time is somewhat limited. More sophisticated analysis of gene expression changes/coexpression would strengthen the overall impact of the data.

      5) Similarly, the discussion is superficial and focused on consequences but not causes of differences in neuron maturation time. The discussion does not build on the rich and extensive transcriptomic data to provide any mechanistic hypotheses of the causes of the differences.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The manuscript by Schörnig presents an elegant comparison of structural and functional maturation of cortical neurons from different primate species that is of broad interest to researchers interested in evolutionary neuroscience and those who are interested in the unique qualities of the human cortex. The authors use an induced neuron approach to generate cortical-like neurons from iPSCs from different species and compare the structure, function and gene expression of the different neurons over time in culture. This strategy bypasses development and provides much more heterogeneous cultures for analysis. While the results are largely descriptive, they provide very interesting resource data providing insight into both primate neural development and human-specific attributes.

    1. Reviewer #3:

      This very interesting manuscript further describes the receptive field structure of ON-OFF retinal direction selective ganglion cells. The authors demonstrate that spot light stimuli flashed at positions that do not correspond with dendritic processes of the recorded DSGC evoke strong excitatory responses that are most powerful on the preferred side of the (moving bar determined) receptive field. The authors go on to show that small light stimuli flashed in the dendritically sampled area of visual space are also non-uniform, and maximal on the preferred side. The authors data are in line with previous reports of a nondirectional zone at the periphery of the dendritic tree of DSGCs. The experimental approaches taken by the authors seem sound. I was concerned by the obviously different kinetics of the flash response recorded under control and GABAA/nAChR antagonists in Figure 1 D, is this a consistent finding, what are the authors thoughts on the unusual shape of the current in Figure 1 D (lower, red trace)? As indicated in the discussion the authors have not investigated the mechanisms underlying this asymmetry, other than dismissing structural determinants (dendritic tree asymmetry, investigation of existing EM volume). This to my mind is a vital component missing from the manuscript. The authors however do go on to describe using elegant light stimulus patterns and modelling some of the potential emergent properties of this behaviour. In this reviewer's mind, I am left puzzled and wanting to understand the cellular basis of the behaviour the authors have identified.

    2. Reviewer #2:

      In this research, Ding and colleagues present evidence that the excitatory input to OO DS RGCs from bipolar cells is strongly asymmetric, with strong inputs occurring on the side opposite from the SAC inhibition. They performed careful studies to show that this was not due to spatial asymmetry in the DSGC morphology nor to ribbon synapse density. Using 'interrupted motion' stimuli, which are effectively local directional stimuli, they show that this asymmetry leads to a non-directional response on one side of the cell's RF. Last, they create a model to show that such firing patterns could be used to improve localization of edge position under the specific conditions of an edge emerging from behind an occlusion.

      The work showing the asymmetry appeared careful, thorough, and well-done. The second half of the paper dealing with the functional consequences of this asymmetry left me with a few questions:

      1) Throughout the paper, several experiments showed no changes when a mix of receptor antagonists was added to exclude SAC inhibition as the origin of these effects. But I did not find a positive control, showing that these antagonists had the desired effect. Later, in Figures 5CD, the remaining effect after application of these antagonists was cited as evidence that the excitational asymmetry was responsible for the effect; that interpretation is only valid if the drugs truly kill all SAC input to the DSGC. What if the drugs were not 100% effective? Relatedly, in the experiments in 5CD, the measured responses all decrease with the antagonists, an effect that seems surprising and is not explained. Connecting the asymmetry in excitation to the interrupted motion is central to this paper, so it should have strong support.

      2) The measured functional results appear quite similar to results in Kuhn & Gollisch 2019, which is not cited in that context. That paper found that DSGCs responded to local contrast, not just motion, much like the results here, and suggested that oppositely tuned cells could be subtracted to eliminate this contaminating contrast signal or added to isolate the contrast signal. Here, the authors suggest a very similar use for these signals, albeit with a decoder of position and a focus on motion rather than contrast changes. (See line 528, where the authors suggest that this position-direction hypothesis is new. See also line 537: or could not be salient, if there's any kind of downstream opponent subtraction, as in primate MT.)

      3) The interrupted motion stimuli are more complex than standard motion stimuli, but it's not clear how ethological or naturalistic they really are. In particular, the occluder was the same contrast as the rest of the background, which seems like a very specific kind of occluded motion, and it's not clear how this would generalize when the occlude is the same or opposite contrast of the moving edge. Moreover, the existence of directed motion in these stimuli lead the authors to emphasize the motion on the 'preferred side', rather than just non-directional contrast changes, which seem as though they would also induce responses.

      4) The modeling/decoding aspect of this paper seems pretty speculative. It doesn't seem as though these cells are known to be involved in any kind of position encoding. The fact that they transmit information about contrast changes means they can enhance position-decoding, but many other RGCs could also (better?) serve this purpose. The optic-flow-field arrangement of these cells in the retina suggests just the opposite - that they appear likely to be used for optic flow detection, in which positional information is less relevant than the field structure.

      5) Last, I kept wondering how this offset excitatory input made the DSGCs look very similar to a classical Barlow-Levick model (though with DS inhibition). I believe a classical BL model would have many of the properties shown here, including the sensitivity to occluded ND motion on its 'preferred side'. Is there an advantage in the BL model formulation to having disjoint excitatory and inhibitory spatial inputs, rather than a broad excitatory field that overlaps with the delayed inhibition? If so, would such an advantage explain why this asymmetry might exist in these DSGCs, even with DS inhibition from the SACs? I guess I'm asking whether there is an advantage for general motion detection, rather than proposing a new role for these cells in localizing specific types of motion stimuli.

    3. Reviewer #1:

      This paper describes a new finding about stimulus encoding in On-Off directionally selective ganglion cells. It is well established that these cells have spatially displaced inhibitory input from starburst amacrine cells, and that the spatial offset of inhibitory input contributes to the cells' selectivity for direction of motion. The work in this paper shows that the cells also have spatially offset excitatory input, and that this input can give rise to a non-directional response. Several functional roles are suggested for the non-directional response. I felt that the evidence for the non-directional response was strong, but that the connection to visual function was too preliminary.

      Functional importance:

      The paper emphasizes the possible functional importance of the non-directional motion signal; this is a focus of the discussion, and is highlighted in both the abstract and introduction. I found this part of the paper less complete and convincing than the experimentally-driven results. Several issues contribute to this. One is that the contribution to identifying the position of a moving object is fairly modest. Another is that the impact of the non-directional component on other stimulus properties - e.g. the accuracy with which motion direction is encoded - is not explored. A third is that the position of a moving object is almost certainly encoded by multiple ganglion cell types, and hence the modest improvement in position encoding in the DS cell population may make even less contribution when the entire ganglion cell population is considered. A complete investigation of coding in the ganglion cell population is clearly too much, but a more balanced and complete consideration of the benefits and drawbacks of the mechanism described would strengthen the paper considerably.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers were in broad agreement that the findings were interesting and that the experiments were well executed and clear. The main concern is that the paper does not provide either a definitive mechanistic insight into why excitatory input is asymmetric, or a definitive functional argument about the importance of this asymmetry.

    1. Reviewer #3:

      Summary:

      Gene drives are alleles that bias their inheritance to spread through a population. Engineered gene drives could potentially be used to spread genes that prevent malaria transmission in mosquitoes. In this study, the authors develop a proof-of-principle of effector components that would be part of a proposed integral gene drives. Such drives are different from standard gene drives by separating the Cas9 and effector components at different loci, with each one having biased inheritance, a useful strategy if the Cas9 has a substantial fitness cost (though it remains unclear if this is the case). They can also more easily target conserved sites of important genes compared to a standard drive, though this is not unique to the integral gene drive strategy. The Cas9 and effector components would be expressed from natural promoters, with introns and translation skipping utilized so that the original gene works properly and so gRNAs can be expressed within the intron. The authors showed that the effector component of such a drive performed as expected, and that both effectors and the target gene were expressed. Overall, the manuscript is a mostly sound technical demonstration of the effector component of an integral gene drive.

      Review:

      1) It's unclear how exactly resistance alleles would be dealt with in the author's strategy. While an integral gene drive could target an essential gene so that resistance alleles are nonviable, that doesn't seem to be the strategy here, since the authors needed to target a gene with a promoter that would be a good match for their effector. The need for both an essential gene and a suitable promoter in one package may thus limit the use of the integral gene drive strategy. Higher fitness costs associated with disruption of the gene may partially ameliorate this issue, but this was not confirmed in the current study (transgenic strains had lower fitness, but was this due to the drive, the effector, or the reduced expression of the host target gene?).

      2) The authors removed their marker genes by surrounding them with LoxP sites and crossing their lines to Cre. This was justified since the authors believed that the presence of the marker would interfere with expression of the target gene, causing fitness issues. However, the authors found no sign of fitness reduction based on anecdotal (?) observations. Were these observations actually quantified, in which case they should be supplemental material? It could be particularly interesting in light of the fact that even without the marker, the transgenic strains suffered fitness effects. It would be nice if the decision to remove the marker was better justified in this section, based on the next section where it was found that the marker interfered with effector expression. Perhaps even combining or reversing the order of the sections would be appropriate (for example, consider first saying that the marker interferes with expression, then mention how this was expected and the marker could be removed, solving the problem).

      3) Based on figure 3D-E, it appears that the target host gene has reduced expression even after the marker is removed. This is quite important for future considerations, yet seems to be glossed over. For example, if a target is chosen that can effectively help remove resistance alleles due to fitness costs from disrupting the target gene, this means that the gene drive will also suffer fitness costs.

      4) The fitness analysis examining fecundity and hatch rates is not very informative. While similar fitness effects among the transgenic strains lends some weak evidence that inbreeding may account for the fitness reduction, variability between individuals certainly does not (after all, wild-type individuals were also highly variable). Also, if the Cre line has a different background than G3, wouldn't all the lines have received some of this background from prior crosses? Perhaps this could be the answer. It would nonetheless have been better for the authors to outcross the lines before inbreeding them, with similar inbreeding for the wild-type control, before doing this experiment. Because of the issues with this experiment, I'd suggest that it is conducted again with better controls or is moved to the supplement.

      5) It's hard to believe that no end-joining took place, even though the last sentence of the results indicates that no end-joining was detected. Did the authors not sequence any progeny with the drive, to look for end-joining products formed from maternally deposited Cas9? Other studies with vasa-Cas9 in Anopheles saw this phenomenon occur at a high rate. For end-joining products formed as an alternative to HDR, was it 21 individuals that were sequenced (nine with Aper1 and twelve form the full AP2 sequencing)?

    2. Reviewer #2:

      Hoermann et al. present a new gene engineering concept for disease vector mosquitoes, whereby endogenous mosquito genes are hijacked to express a heterologous effector peptide intended to render mosquitoes resistant to human pathogens. In addition, a synthetic intron added within the effector-coding sequence will express gRNAs for the CRISPR-Cas9 system, recognizing the transgene's own wild-type insertion locus. In the presence of a source of Cas9, the effector gene is thus able to home into a wild-type chromosome, triggering a gene drive effect that can increase the frequency of the modification in the mosquito population. A fluorescent marker, also cloned within the intron, is used at early steps to track the transgene, but is subsequently removed by Cre/lox excision to restore host gene + effector expression and to result in minimal genetic modification.

      This is an extremely elegant procedure and a remarkable technical achievement, especially in such a difficult species as Anopheles gambiae. The choice of midgut-specific promoters to express anti-malaria effectors makes sense to target early stages of development of parasites, before they had a chance to amplify in the mosquito. Using endogenous regulatory sequences without a need for promoter cloning alleviates the tedious work of individual promoter characterization. The molecular designs are well described, and the results likely to have a large future impact in the development of vector control tools, notwithstanding some weakness in assessing the antiparasitic effect of Scorpine in the transgenic mosquitoes (see below). I agree that this type of transgene should facilitate semi-field or field testing of candidate anti-parasitic effectors, before any true gene drive intervention is envisaged.

      Major Comments:

      P. falciparum transmission blocking assays - Fig. 5:

      I have several questions about figure 5.

      -Are mosquitoes with 0 parasite taken into account in the calculation of the mean and median? This should be explained in the legend or in Exp procedures

      -Several replicates have been pooled to generate the figure, for each transgenic strain. Is this legitimate? i.e. were the mean oocyst number and prevalence, reflecting the quality of each ookinete culture, similar enough between replicates to allow pooling? If not, it would be more legitimate to show the result of a single representative replicate. Please provide a table with the raw parasite counts of the separate replicates in a supplemental file so that readers can better judge these results. I note that panel C is very useful.

      -I find the bar graph hard to interpret. The median M is represented either as a stroke inside some bars, or overlapping the x axis when M=0. The size of the bar doesn't represent the mean, m. Does it represent a confidence interval? This must be explained in the legend. Maybe a dot plot where each dot represents the parasite counts of one mosquito would better represent these results?

      -From my point of view, mosquito numbers in some of these infections may be too low to yield solid results. Especially in the ScoG-AP2 experiment: 37 mosquitoes in the G3 control with a prevalence of 51% means that only 19 mosquitoes across R=2 replicates contained parasites. This low number is associated with a risk of atypical outliers in the parasite counts, even if the statistical tests presented here show good significance. In the panel C analysis of these values, we see from the size of the squares that the replicate that had the highest statistical significance also had the smallest number of mosquitoes. The replicate with a larger N has only one *. For the Aper1-Sco line, N is large and the statistical significance is high (although panel C shows that one of the 4 replicates showed no difference) but I'm still somewhat unconvinced of the effect of scorpine in this line: the mean only drops from 10 to 6 parasites, prevalence drops from 37 to 21%. Combining this moderate effect with the facts that (1) some replicates sometimes show no Scorpine effect, (2) the Sco-CP line, which has a comparably high level of scorpine expression according to Suppl fig . 3, shows the exact opposite, i.e. pro-parasitic effect, makes me doubt the antiparasitic effect of scorpine.

      In the case of the ScoG-AP2 line, scorpine expression is only 1/10 to 1/8 of the expression in the other two lines, but seems to have a similar effect as in the highest (Aper1) expressing line: one possibility is that fusion to GFP stabilizes Scorpine so that lower expression results in higher activity, but a milder effect would have been logical if scorpine had a dose-dependent effect.

      One caveat of these experiments is that the genetic background of the control mosquitoes (G3) is not exactly the same as the transgenics (G3 x KIL). There is a possibility that the KIL background contributed some alleles conferring elevated Plasmodium resistance (or the opposite in the case of Sco-CP). I would find the results more trustable if a control of equivalent genetic background could have been generated for each transgenic strain (in the process of homozygous line selection, the homozygous WT siblings could have been retained to serve as specific controls, though I know how demanding this work would have been...).

      Another caveat is that we don't know the precise kinetics (e.g. between 0-36h post blood meal) of the scorpine protein midgut concentration in each transgenic line, and we don't know at what time point after the blood meal parasites would be most susceptible to killing by scorpine (probably between 3 and 24h, time after which they transform into protected cysts). Taken together, the scorpine data is not highly conclusive and there remains much uncertainty about the efficacy of transgenically expressed Scorpine as an anti-plasmodium molecule. I'm not requesting additional experiments (though future long term assessments of these transgenic lines with new isogenic controls would be very interesting), but I invite the authors to downstate scorpine's potential effectiveness as an antimalarial effector in vivo. This does not decrease the importance of this work of which scorpine is only one aspect. A candidate molecule had to be chosen for these proof-of-principle experiments. Scorpine may not have been a very lucky choice, but its moderate (or opposite) effect should be seen as an interesting result in itself. The way is now open to test other possible candidates.

    3. Reviewer #1:

      This is a compelling demonstration of a number of important steps that take population replacement gene drive for malaria control closer to reality. I have no major concerns and think the manuscript shows the authors have made substantial progress in a) taking Integral Gene Drive (which is a recent idea from senior author Windbichler) into mosquitoes, b) successfully removing marker genes to make the whole system more effective, c) demonstrating that the approach works to express a molecule to reduce parasite infection rates in the lab while also making it possible to test these effector molecules in natural settings without risk of accidental drive release, and d) also showing that drive is successful. My comments are only minor and I think the study is high impact.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This paper demonstrates a number of important steps necessary for implementing the recently proposed "integral gene drive" strategy. In this approach, endogenous mosquito genes are hijacked to express a heterologous effector peptide intended to render mosquitoes resistant to human pathogens. Such drives differ from standard gene drives by separating the Cas9 and effector components at different loci, with each one having biased inheritance. This could be useful if the Cas9 has a substantial fitness cost and could also more easily target conserved sites of important genes compared to a standard drive. While it remains to be seen how effective this approach will be in practice, the paper provides valuable insights into how such gene drives could work in mosquitoes.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our upmost gratitude to the three anonymous reviewers for their constructive and insightful comments on our manuscript. We broadly agree with all comments made and have uploaded a preliminary revised version with changes highlighted in bold. We now deal with each of the reviewer comments in turn.

      Reviewer #1

      L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants?

      Having done a crude examination of unmapped reads, we couldn't find compelling evidence of them being of viral origin. The unmapped fraction in fact was in the same region as seen for other sRNA libraries in our lab which we found to occur for a number of reasons such as sequencing errors, incomplete assembly, differences between the sequenced lines and the reference line. Those all result in unmapped reads, which is also cause by since we employed a stringent mapping (0 mismatches).

      L67-68, which is the explanation?

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space.

      Thanks, this has been addressed.

      Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point.

      We agree, and have moved Fig3 to Supplement.

      *P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x 0.8, not "strong bias (0.2

      Thank you for pointing this out, we have now corrected this error. We have duly corrected it in the text.

      P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.

      Thank you for pointing this out. You are correct that the changes in the density distribution are not as striking for locus size. A great deal of deliberation on our part went into deciding what to do about this. In the end, we decided that for the size classes there was benefit in having several different classes with the understanding that having additional potentially redundant cut-offs would not adversely effect the analysis. In doing this, we were partially driven by the albeit subtle changes in the curve, but also by the desire to have size classes that were biologically relevant and informative. For example, a locus 3000nt captures the long tail. However, we neglected to fully explain these subtleties in our decision-making, something we have now rectified through some added explanation in the text. These choices were validated by the way size classes are differentially associated with different locus clusters in Figure 8.

      Fig 5: the legend has the C subfigure twice, the second should be D.

      Thank you for highlighting this. It has now been corrected.

      Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.

      Thanks for pointing this out. We agree with this suggestion and have replaced Table 1 with a Figure (Fig 5) which is indeed a better way to present those results.

      Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?

      Thank you for pointing this out, we understand that this isn't clear at the moment. Briefly, for this analysis we performed the clustering multiple times each time with a random sample of the loci (with replacement) of the same size as the original dataset. We then calculated the proportion of loci that retained their original clustering. We have clarified this in the figure legend and also elaborated on the approach in the methods section to ensure that it is better described.

      P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.

      Thank you for querying this. The process of choosing the appropriate value of k is a complicated one and we appreciate that the explanation could be clearer. After your comment, we re-visited our decision-making process and were reassured that a k value of 6 rather than 7 was indeed appropriate. The stability plots in Fig. 6A start with k=2 and it can be clearly seen for k=6 that stability is comparatively high for dimensions 7-10. Indeed, k values of 2,3 and 6 seem to be the only feasible values. k=7 is fairly unstable for all dimensions from 1-8. We have done some rewording of the methods to hopefully make this clearer.

      Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2.

      Thanks for pointing this out. This is now corrected and we have referenced all figures in the main text.

      Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each.

      A good suggestion - this has been replotted with different colours.

      Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file.

      Thanks for pointing this out. We originally submitted this file as a gff format. We are not sure why this got converted. We will make sure this is going to be in appropriate format in the final form, especially having suffered from the pains of pdf tables ourselves in the past.

      I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Indeed, the colours were inverted. Thanks a lot for that spot, we have now swapped them around.

      Reviewer #2

      I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      Thank you for this very insightful comment which has helped us to reflect on the methodological approach. While it is likely that there are some RNA breakdown products picked-up in the sRNA sequencing, we do not think that the locus-map as a whole is undermined by this. For example 54% of loci have a predominance for 21-nt sRNAs and 18% for 20-nt sRNAs, so the majority of sRNA loci do have a predominance for a specific RNA size.

      However, your point does raise a very valid concern with implications for the interpretation of LC4. Although we posit some explanations for these loci (e.g. DCL-mediated sRNA production without an accessory protein to provide PAZ domain-like sRNA measurement), given the very strong strand bias and association with genic regions we do agree that there is a risk that these loci predominantly represent degradation fragments. Therefore, we have now reworded how we discuss LC4 in the discussion to reflect this. This also reveals a key advantage of the clustering approach in that should LC4 indead represent degradation products, they have been successfully grouped together into a seperate cluster such that they don't undermine the insights gained from the other locus clusters.

      One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      Thank you for pointing this out. In a pursuit to share as many details as possible, we appreciate that this can be too verbose, as righlfully noticed here. In order to not compromise detail too much, we have created a second, toned down, version as csv which now includes essential details such as name, position and LC. As for the gff, we kept all details in since it loads quickly in a genome browser, but also into other tools such R in which those feature can be used as efficient filters.

      Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      We have included another table which includes direct accession number for ENA as well as numerous other meta data in Sup Table S6 i.e. "Supp_Table_S6_library_ENA_accession"

      Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      The snapshots have been improved to ensure better readability.

      Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      We agree and have changed it accordingly

      Figure 4 caption: The acronym "CRSL" should be defined.

      CRSL is now been duly defined in the manuscript

      Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      We have used the appropriate bibtex code to reference this Zenodo share (https://zenodo.org/record/3862405/export/hx). The current cite format does somehow omit some information. We hope this will be fixed by the publisher but we have also provided the full DOI address in the “additional information” section just in-case. We will keep an eye on how it comes out.

      Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Thank you for this suggestion, we have adapted the title to make it more descriptive.

      Reviewer #3

      …the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Thank you for these insightful comments. As we followed a very similar methodological approach to that used to produce the Arabidopsis sRNA locus map published in Hardcastle et al. (2018), we wanted to take the opportunity to compare the results and build upon the ongoing discussion concerning the evolution of sRNA mechanisms in Chlamydomonas (e.g. Valli et al. 2016). Your point about the possibility of an ancestral progenitor with greater diversity that was then lost is very valid. You are also of course correct about the limitations to what can be concluded from this study and the limited comparisons that can be made. We see our approach as a useful tool for hypothesis generation which can be complemented by more in-depth exploration in the future. With this in mind, and taking on board your comments, we have elaborated on our discussion of the evolutionary implications of our study, which we hope now gives a more balanced account.

      I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      That information was indeed missing, thanks for bringing it up. We have now included this in the gff file (column LC) as well as in another cleaner table (Supp_Table_S7_loci_class_annotation).

      Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      Thanks for pointing this out, this was an error on our part, the captions have now been corrected. Unfortunately, due to the ongoing pandemic-related restrictions we were unable to run to get a genome browser session to run to this point to create more loci figures.

      If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      Thank you for this suggestion. In fact, not all annotation features were used predictively in the MCA and clustering process, and so these "supplementary" annotations as outlined in supplementary table S3 can provide some cross-validation. With that in mind, we have now included an additional heatmap as a supplementary figure which shows associations for some of these supplementary annotations as well as corresponding explanations in the text. Further validation is provided by the chromosome tracks in figure 9 showing the distinct genomic distributions of each locus cluster despite chromosomal location not being a factor in the clustering.

      Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      Thank you for pointing this out, we have now corrected this error.

      Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      Table 1 has been replaced by the new Fig 5, annotation/abbreviations should now be more obvious.

      Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Thank you for pointing this out. We have now moved the reference to the paragons to earlier in the section where we introduce the six clusters. We hope this is now clearer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript presents a detailed map of sRNA (precursor) loci in the green alga Chlamydomonas reinhardtii based on large volumes of sequencing data (145 sRNA libraries). The locus map based on a false discovery rate of less than 0.05 had 6164 loci, covering 4.1% of the Chlamydomonas reference genome. Individual loci were annotated based on both intrinsic features, such as sRNA size, 5'-nucleotide, strand bias and phasing pattern, and extrinsic features, such as sRNA expression, genotype and overlap with genomic attributes (e.g., genes, transposons, methylation levels).

      By using the intrinsic and extrinsic features of each sRNA locus and Multiple Correspondence Analysis (MCA) approaches, the sRNA loci were clustered into six distinct classes, referred to as locus class (LC) 1-6. This strategy is partly validated by the grouping of well-characterized Chlamydomonas miRNAs into the same cluster, LC3.

      As the authors state, this data-driven approach is valuable for hypothesis generation since (with the possible exception of LC3) the biogenesis and function of most sRNA loci (and of the corresponding locus classes) remain uncharacterized in Chlamydomonas. The analysis provides a framework to facilitate future characterization of the diverse types of sRNAs in this model algal system.

      However, the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Some specific comments:

      1.I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      2.Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      3.If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      4.Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      5.Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      6.Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Significance

      Chlamydomonas reinhardtii is a model unicellular green alga, the lineage of which diverged from land plants approximately one billion years ago. Chlamydomonas encodes a great number of diverse small RNAs. However, the biogenesis and function of the majority of these sRNAs are not known. By grouping sRNA loci into specific classes (based on intrinsic and extrinsic features), this manuscript provides a framework that will facilitate the future characterization of sRNAs in Chlamydomonas and, very likely, in other algal species. This information may also contribute to our understanding of the evolution of sRNA loci within eukaryotes.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript describes the annotation of small RNA-prodicing loci from the green alga Chlamydomonas reinhardtii. A large number of small RNA-sequencing datasets were anlayzed and used to create genome-wide annotations of small RNA-producing loci. These loci were annotated based on several features, and then classified into six major groups based on these features.

      Major comments:

      Are the key conclusions convincing? --> Yes.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? --> No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation. --> Yes, additional analyses should be conducted, see itemized list below.

      Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. --> Perhaps a few weeks to a month of analysis and revision time.

      Are the data and the methods presented in such a way that they can be reproduced? --> Yes.

      Are the experiments adequately replicated and statistical analysis adequate? --> Yes.

      SPECIFIC COMMENTS:

      1.I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      MINOR COMMENTS:

      2.One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      3.Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      4.Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      5.Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      6.Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      7.Figure 4 caption: The acronym "CRSL" should be defined.

      8.Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      9.Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.:

      This study provides a genome-wide annotation of small RNA-producing loci from Chlamydomonas reinhardtii. This will serve as a use data resource for researchers working with this model system. The results overall confirm what is known from previous studies of Chlamy small RNAs : They are rather distinct from angiosperm small RNAs and from animal small RNAs.

      Place the work in the context of the existing literature (provide references, where appropriate).:

      This may be the first study to provide a genome-wide annotation (as opposed to a focused effort) for Chalmy small RNA populations.

      State what audience might be interested in and influenced by the reported findings:

      Chlamy researchers, especially those interested in gene silencing and genome annotations, and small RNA specialists with interest in annotations and in wide phylogenetic comparisons.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. :

      Plant microRNAs, siRNAS, genetics, and genomics.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, Müller, Matthews, Vali, and Baulcombe have used data-driven machine learning approaches to annotated and classified sRNA loci of Chlamydomonas reinhardtii. I have found the manuscript very interesting and a handy handbook for the appropriate way to annotate sRNA loci in different organisms. I believe this is not only a great resource paper on its own, but it also contains essential information to start understanding how Chalmydomonas silence TEs without a RdDM pathway. I have a few comments that may help to improve the manuscript.

      -L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants? -L67-68, which is the explanation?

      • Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space. -Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point. -P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x < 0.2 and x > 0.8, not "strong bias (0.2 < x < 0.8)", as in the text. -P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.
      • Fig 5: the legend has the C subfigure twice, the second should be D.
      • Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.
      • Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?
      • P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.
      • Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2. -Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each. -Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file. -I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Significance

      This is a very interesting aticle. It may looks a little bit technical but is provide useful information for people studying Chlamydomonas. In addition, the way the authors approached the annotation of sRNA is very meticulous and elegant. I would suggest people exploring small RNAs in non-model organisms to use this article as a handbook of how to annotate sRNAs. In this particular way the artivle will be of interest beyong the Chlamydomonas, and event plant, research field.

    1. Reviewer #1:

      The manuscript by Wuertz-Kozak et al explores the relationship between early life stress bone parameters in mice and humans. In mouse studies, micro CT and qPCR analyses were done, while in humans with depression and history of childhood neglect had bone turnover markers and DXA scans done. Increased CTX levels were noted in both mice with early life stress and in certain groups of humans with depression. These investigators recommend that early life stress be further assessed as a risk factor for human bone disease.

      1) Although the authors acknowledge the limitations of controlling and even assessing accurately the kind of impacts (e.g., nutritional, activity-related, body weight changes, age when stress inflicted etc) that may operate during childhood stress and neglect, the human model is very problematic because of this heterogeneity. There do not appear to be good parallels between the mouse model and the human cohort.

      2) Bone cell proliferation and differentiation are proposed to be affected in the mouse model. Proliferation can be directly measured in many ways and should be formally tested. Similarly, the stage of osteoblast differentiation can be easily assessed by PCR with well-validated gene markers of early vs late differentiation. The hypothesis proposed in line 140 can be directly tested.

      3) What is the significance of the increased innervation that is reported in Figure 1 and the reduced neuronal receptor expression in the next figure? It would make sense that more nerve growth would lead to greater receptor expression. Is it also unexpected that NGF2 levels are so low when there is increased nerve innervation to the bone in MSUS mice?

      4) The authors propose a 'catabolic shift' in bone in the MSUS mice. There are a few unusual things that have been reported in this matter. Most researchers would not consider osteoprotegrin a matrix gene (line 159). Furthermore, changes in osteocalcin, osteopontin and sclerostin mRNA would not be the most sensitive markers for the proposed catabolic shift. The proteins encoded by these genes are in the matrix but they are the products of osteoblasts and osteocytes and the bone formation marker P1NP per the authors is unchanged in the mice. It is the CTX that is elevated and perhaps more sensitive gene markers for a catabolic shift would be RANK-ligand, mCSF and perhaps osteoclastic genes.

      5) The Descriptive Result for the Human Study (line 172-184) is very difficult to follow. Many more key demographic, biochemical, and clinical characteristics of the human study populations need to be provided. The paper uses a wide age range of patients (18-65 years). Therefore some of the subjects will have gone through menopause and others who may not yet have reached peak BMD. This introduces a great deal of heterogeneity into the population being studied.

      6) What was the exposure and duration of the use of SSRI's in the population? These medications are implicated in reduced BMD and increased fracture rates in some studies.

      7) DXA results: (a) What site in the hip DXA is "H" or "collum femoris"? (b) One would have suspected that the total hip BMD and femoral neck BMD would have aligned with the results for the greater trochanter BMD, as shown in Table 1. Yet the 3 sites in the hip do not align. This suggests a weak relationship. (c) Lines 179-181, it seems that only ~33 subjects were included in the DXA studies. Given the heterogeneity of the population being studied in key parameters - age, sex etc - this would be an extremely small number to break into 4 groups as in Table 1, run statistical testing on, and report out on BMD results. This is a very under-powered study. BMD varies with age, sex, ethnicity, body size. Such characteristics need to be controlled to tease out an effect of childhood trauma and depression on bone.

      8) Micro CT data in the MSUS mice are driven by effects on body weight, and these data do not support a direct effect of postnatal stress on the bone itself.

      9) The human cohort needs to be better defined and described. It likely should not cover such a wide age range (18-65 years). Drug therapies for depression and their duration should be specified to compare the groups. A thorough medical assessment needs to be done on these subjects with screening labs and a basic screening medical history and physical examination. Many disorders known to affect bone could be missed (e.g., menopause, liver or kidney disease, etc). Alcohol consumption needs to be explored and clearly reported as well as the amount of smoking since both habits affect bone parameters.

    2. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      The authors examine the robustness of coupling of distinct oscillatory circuits of different frequencies across a range of temperatures. The two circuits have different means of generating oscillations and could therefore, potentially, be impacted to different degrees by temperature perturbations. Across all temperatures tested the two distinct rhythms increased their frequency but remained coordinated. The coordination was in the form of the previously-described integer-coupling where the cycle period of the slow rhythm was an integer multiple of that of the fast one. This is due to the fact that the slower rhythm was most likely to start at a given phase within the faster oscillation cycle. The temperature robustness of this coupling is an interesting and important result and the description and analysis are both well done.

      Major comments:

      The main finding of the paper is that a previously-described integer-coupling between two rhythms remains more or less intact across temperature variations. It is a nice descriptive finding, but rather disappointing in that there is so much more that could have been done rather easily that would have given much more depth to this finding. Most obviously, because it is known that the source of the coupling is the inhibitory synapse from the pyloric pacemaker to the gastric mill half-center, it is quite important to know how the strength of this synapse affects the interaction at different temperatures. That is, to expand what Bartos et al 1999 did across a range of temperatures. Short of that, it would have been nice at least to perturb the cycle period of the pyloric rhythm and see whether the interaction would remain robust across temperature despite changes in cycle period.

      While the study convinces the reader that integer coupling between pyloric and evoked gastric rhythms is robust to temperature changes, it does not attempt to to explore the origin of this robustness, e.g. by using different methods to activate the gastric rhythm or testing if integer coupling is present with spontaneous gastric rhythms.

    2. Reviewer #2:

      In the present paper, Powell and colleagues investigated how coupled oscillatory circuits maintain their coordination over a wide range of temperature. To do so they used the stomatogastric system of the crab Cancer borealis that contains the fast (1Hz) pyloric network and the slow (0.1 Hz) gastric mill network. The two generated rhythms are coordinated such that there are an integer number of pyloric cycles per gastric cycle. Both rhythms exhibit temperature-induced frequency changes, but their coordination is well maintained even at high temperature. Therefore, this study shows that the relative coordination between rhythmic circuits can be maintained as temperature changes, thus ensuring appropriate physiological functions even under global perturbations.

      This study, that uses a fantastic model for investigating neural networks in general, addresses an important physiological question. However, I have a few concerns that could be probably clarified with some additional explanations in the text:

      -While the intrinsic temperature sensitivity of the pyloric rhythm has been nicely investigated in some previous excellent publications (most done by the authors), that of the gastric rhythm is less well known. Stadele has shown that increasing the temperature leads to a breakdown of the gastric rhythm that can be rescued by modulatory afferences. What do we know about the temperature sensitivity of the afferent neurons that are stimulated to trigger the gastric rhythm here? Is there the possibility that what is observed also includes an effect of the temperature changes on these neurons (MCN1 function for example) or that the gastric temperature sensitivity described here reflects in fact that of the afferences?

      -All experiments were performed in conditions in which the gastric rhythm is triggered by stimulation of the two dorsal posterior esophageal nerves (dpons) that contain axons of modulatory afferent neurons. However stimulating these nerves also modulates the pyloric network that is also a target of those afferences (as stipulated in the text line 583-584). Isn't this a bias in the experiments and their interpretations? Also, because as schematically represented in Fig 1, the pyloric pacemaker neuron AB has direct connections with Int1 gastric neuron that is itself connected to the LG gastric neuron, the simplest interpretation of the experiments would be that this connection is preserved and remains efficient even under high temperature. Is it finally one of the conclusions of the paper?

      -In the same vain, the sensitivity to temperature changes of the gastric rhythm has been studied here but with the pyloric network, being itself intrinsically sensitive to temperature changes, still active (Fig 3 and related text). What do we know about the intrinsic temperature sensitivity of the gastric rhythm when elicited by dpons stimulation but isolated from the pyloric network (AB neuron killed for example)?

      -Data presented here show that coordination between PD and LG neurons is preserved after temperature increase, but that this is not the case between PD and DG neuron that shows no phase-coupling at high temperature (Fig 6). The PD neurons are used here as an indicator of the pyloric rhythm while the LG neurons indicate the gastric rhythm. Then what would be the conclusions of the authors if the DG neuron would have been used as the gastric rhythm indicator? How do you conciliate everything together?

    3. Reviewer #1:

      Powell and colleagues measured coordination robustness between pyloric and gastric rhythms in in vitro preparations of Cancer borealis exposed to temperature variations (7-23C degrees). Using extracellular recordings, they first show that spontaneous rhythms are not stable, likely resulting from multiple physiological processes that are difficult to monitor. Therefore, they rather used bouts of activity reproducibly evoked by stimulation of a neuromodulatory pathway. As expected, cold temperatures slowed down rhythms, warm temperatures accelerated rhythms in a similar manner. Despite this variation in rhythm frequency across temperatures, coordination between pyloric and gastric rhythms was stable . This suggested that the activity of rhythmogenic neurons is coordinated across temperatures. Powell and colleagues also found that the gastric Lateral Gastric motor neuron (LG) was phase-locked with the Pyloric Dilatator neuron (PD), suggesting they may be involved in coordination robustness.

      The originality of the study is that the authors focused on the coordination of pyloric (1 Hz) and gastric (0.1 Hz) networks. A large quantity of raw data is beautifully illustrated. Data analysis is sophisticated and convincingly supports the interpretations on the authors. The text is exquisitely written in a clear style and pleasant to read. In my view, the study contains the first experiments of a potentially exceptionally interesting study, once more mechanistic insights are added. To further strengthen the relevance of the study, I would suggest pursuing one of the three options below to further uncover the mechanisms underlying the effects described. 1.) Could the authors design causality-based experiments to identify which neuron is responsible for the coordination of the rhythms at different temperatures? There are many interconnected neurons in Figure 1C. Even if LG is phase locked to PD, is it possible that another neuron drives PD and LG? If PD controls LG, would it be relevant if the authors reversibly switched off PD (e.g. with tonic hyperpolarisation) and see the effect on gastric rhythm frequency at various temperatures?

      2) Could the authors identify using pharmacological tools whether distinct neuromodulatory substances influence coordination robustness over specific ranges of temperature, but not in others? It seems that Stadele et al. 2015 PLoS Biol 13(9):e1002265 used a different way to evoke the rhythm, and their gastric rhythm crashed at lower temperatures (13C degrees) than in the present study (27C degrees). Do the authors think that the different stimulation approaches used in the two studies could involve different neuromodulatory substances, which would result in different robustness profiles?

      3) Do the same intrinsic properties or synaptic connections underlie coordination robustness across temperatures? Modeling suggests that different conductances are involved in a temperature-dependent manner (Alonso and Marder 2020 Elife 9:e55470.2020). Is it possible for the authors to experimentally deactivate specific conductances using dynamic clamp in LG or PD or with pharmacological tools and determine whether this would reversibly disrupt the coordination between pyloric and gastric networks in some specific temperature ranges but not in others?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study addresses an important question about the physiology of coupled oscillatory neuronal networks operating under a wide range of temperatures. The stomatogastric system of the crab Cancer borealis contains the fast (~1Hz) pyloric network and the slow (~0.1 Hz) gastric mill network. The two generated rhythms are coordinated so that there is a given number of pyloric cycles per gastric cycle. Powell and colleagues show that upon stimulation of a neuromodulatory pathway, these coupled oscillatory circuits exhibit reproducible bouts of activity and maintain their coordination, and that this coordination is maintained over a wide range of temperatures, thus ensuring appropriate physiological functions even under global perturbations.The authors show that the gastric Lateral Gastric motor neuron (LG) is phase-locked with the Pyloric Dilatator neuron (PD), suggesting these neurons may be involved in coordination robustness.

    1. Author Response

      Summary:

      The bacterial ribosome from E. coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryo-EM reconstruction with an overall resolution of 2Å, and even better than that in the best areas of the map. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these could be better developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what the advantage is here as the values are the same when estimated from half map FSCs. Therefore, it is suggested that the discussion about the model-to-map FSC is toned down considerably in (or even removed), while adding in more information about the new findings in the map, along the lines of the comments below.

      We thank the reviewers for their interest in this work, and for their helpful comments on the first version of the manuscript. We provide responses to the individual points below.

      Reviewer #1:

      This paper describes a 2A cryo-EM reconstruction of the E.coli 70S ribosome. This structure represents the highest resolution ribosome structure, by any method, available thus far and highlights interesting modifications that were not possible to see in previous structures. I'll let the ribosome experts comment on the relevance of these and focus my review on the cryo-EM technical parts. The paper is clearly written and the figures are informative and beautiful.

      The first author is particularly gratified that the figures were well received.

      Major comments:

      1) The authors make a big deal out of resolution assessment by model-to-map FSCs. It is unclear to me why they do this. First of all, model-to-map FSC is not a new resolution measure: it is in widespread use already. Second, it is unclear why the authors are so forceful in stating that it is better than the half-map FSC. They say "While map-to-model FSC carries intrinsic bias from the model's dependence on the map, in a high resolution context it does provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." What additional information does it provide? I would say it only provides true additional information if the atomic model comes from another experiment! In the way it is used here: by refining the model inside the very same map, there is a danger of increasing model-to-map FSC values through overfitting of the model (see also below). This danger is not recognized enough in the text (it is only hinted at in the sentence above), and overfitting is not measured explicitly for this case. Yes, half-map FSC measures self-consistency, but in practical terms (when done right!), this doesn't matter for the resolution estimate. The same is true for model-to-map FSCs: when done right they convey the right information, but the danger of self-consistency (through overfitting) also exists here. As the paper is mainly about the high-resolution ribosome structure, and no proper evaluation of the relative merits of half-map FSC versus model-to-map FSC is performed, I would suggest that the authors remove (or at least tone down considerably) their statements about resolution assessment from the manuscript.

      All three reviewers commented on our emphasis on using the map-to-model FSC criterion. We thank the reviewers for pointing out our motivation to discuss FSC metrics was not clear. We agree with the reviewers that the map-to-model FSC metric has been available for some time. However, in the ribosome field, the half-map FSC is still very commonly used as the sole resolution-dependent metric, including in recent literature that we cited (Nürenberg-Goloub, 2020; Tesina, 2020; Stojković, 2020; Pichkur, 2020; Halfon, 2019), as well as in a newer publication (Loveland, 2020, Nature, https://doi.org/10.1038/s41586-020-2447-x ). We mention some of the shortcomings of half-map FSC, which the third reviewer alludes to in their comment on “intense debate” in the field. While it is acknowledged as best practice to examine both maps and models, many visitors to the PDB likely will download only the model. Therefore we find it prudent to communicate confidence in the model resolution and not just the half-maps, particularly in this resolution regime. Again, this is not common in recent ribosome literature, which we will clarify in the Discussion. We will make changes throughout the manuscript to streamline and clarify our discussion of the two metrics, including an additional comparison to a newly released ribosome structure, as detailed below.

      When we discuss “additional information provided by map-to-model FSC,” we recognize that there may be semantic issues with the word “information” as map-to-model FSC depends on the same information content of the maps. However, the map-to-model FSC provides new information about the model quality to the reader. While half-map FSC tells us something about the best model one might achieve, new practical information lies in the authors’ handling of the model, which will vary among individuals (as discussed further below). Furthermore, model refinement procedures leverage well-defined chemical properties (i.e. bond lengths, angles, dihedrals, and steric restraints) that the map “knows” nothing about, which has value for keeping the realism in check. This is also why we originally included the sentence, “Sub-Ångstrom differences in nominal resolution as reported by half-map FSCs have significant bearing on chemical interactions at face value but may lack usefulness if map correlation with the final structural model is not to a similar resolution.” We will rewrite portions of this section for clarity.

      Comparisons to other recent high-resolution cryo-EM ribosome structures show discrepancies in the reported half-map FSC and map-to-model FSC calculated by us (see beginning of section “High-resolution structural features of the 50S ribosomal subunit”), with the map-to-model FSC values being to lower resolution. These structures report half-map FSCs only, which we could not replicate because of unavailability of half-maps, but we describe our calculation of map-to-model FSC with their deposited maps. We did not explicitly highlight the comparisons with their reported half-map FSC resolutions in the original manuscript, and we will include further discussion to more clearly communicate our point. We will also include another comparison to the newly released structure by Pichkur et al. (Pichkur, 2020) which has become available during the review process and is the closest to our map resolution. The map-to-model FSC with their model and map yields 2.29 Å resolution, while a simple rigid-body fit of our model into their map without further adjustment yields 2.07 Å. This difference highlights the practical insufficiency of focusing only on half-map FSC and the value of our model as a reference for future work.

      2) To test for the presence of overfitting their atomic models in the maps, the authors should shake-up the atomic models and refine them in the first independently refined half-map. The FSC of that model versus that half-map (FSC_work) should be compared with the FSC of that very same model versus the second half-map (FSC_test). Deviations between the two would be an indicating of overfitting. If that were to be observed, the weights on the stereochemical restraints should be tightened until the overfitting disappears. The same weighting scheme should then be used for the final model refinement against the sum of the half-maps.

      In lieu of what the reviewers have suggested, we think the additional map-to-model comparison of our model rigid-body docked into the 2.1 Å 50S map by Pichkur et al. provides reasonable evidence that our model suffers from minimal overfitting. Without any additional refinement of our model into their map, the map-to-model FSC resolution is 2.07 Å. We will include the new comparison in the revised manuscript.

      For model refinement, we used default parameters for phenix.real_space_refine, which internally optimizes weights for hundreds of different “chunks” during the refinement. This “black box” aspect does not give us facile control over the weighting scheme. However, we also note that the final model is not “fresh” out of Phenix; rather, the macromolecules have been meticulously reviewed and adjusted manually in Coot, with blurred maps to aid in accurate modeling for areas that are not as well connected/resolved. RSR in Coot was also required to “stitch” sections of the model together, since the models were refined in multiple focus-refined maps. Further, we think that for models that are ⅔ RNA, manually optimizing the Ramachandran restraints is unlikely to provide much new insight into RSR of this structure.

      3) Figure 1 -supplement 7: if radiation damage breaks the ribose rings, they should still be OK during early movie frames. This could be investigated by performing per-frame (or per-few-frames) reconstructions. The radiation damage argument would be a lot stronger if the density is present in early frames, yet disappears in the later ones. There will be a balance between dose-resolution and achievable spatial-resolution to see this of course. But it may be worth investigating.

      This is a great suggestion, and we have now carried out this analysis. We have performed the early-frame reconstruction and now have an alternative hypothesis that may make more sense. We will include the alternative hypothesis that we are likely seeing disorder due to conformational flexibility in the RNA backbone, rather than radiation damage, which seems unlikely given the features in the early-frame map. We will also update Figure 1–figure supplement 7 with new panels to aid this discussion.

      Reviewer #2:

      The manuscript by Watson et al. presents the structural analysis of a bacterial ribosome at high resolution. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these are missing or not well developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what this actually brings here as the values are the same when estimated from half map FSCs. The structure visualizes chemical modifications of ribosomal RNA and amino acids and water molecules, which together are interesting and important. However, here one would expect a comparison with structures of previously analyzed bacterial ribosomes, e.g. E. coli and T. thermophilus, e.g. from the same group and from the work by Fischer et al., Nature 2015: how far are the sites conserved? How do the maps compare? Are the same features seen? It is surprising to see that the main chemical modifications are not discussed and shown (only summarized in the Suppl. Data). Pseudo-uridines are mentioned, but how were these identified? It should be mentioned here that due to their isomeric nature these can be discussed only from their typical hydrogen bond pattern. The paper discusses new sites with chemical modifications, but this could benefit from a more thorough discussion of existing biochemical data or from including new biochemical characterization. The structural role of these modifications is not much described. The side chain of IAS119 has no density, hence one should be careful in interpreting an isomerization of this residue, not sure whether the data allow the conclusions to be made. Similar for the mSAsp89 residue for which the density is uncertain, hence it is not clear whether the conclusions stay on a safe ground.

      We thank the reviewer for their interest in this work. We addressed our emphasis on the map-to-model FSC in response to reviewer #1.

      For the majority of rRNA modifications, we included the supplementary figure as a reference for comparison to the published 4YBB and 4Y4O maps and models. These modifications have been extensively described in the structural biology literature, including in the recent cryo-EM study of the 50S ribosomal subunit (Stojković, 2020) and warrant no detailed comment by us at this time. Instead, we focused on new features that were not previously observed, such as hypomodifications and new modifications. The new modifications are the isoAsp observed in uS11 and the thioamide modification in uL16.

      IAS119 modeling in uS11: We thoroughly analyzed Asn or isoAsp modeled at this residue, and will provide additional evidence that isoAsp is correctly modeled at residue 119. In the original maps, although the side chain density is weak, the backbone density is unequivocal. There is clear density for the extra methylene group (marked with an asterisk in Fig. 4A). We have now calculated a map of the 30S subunit using the first three frames in the image stacks corresponding to a ~3 electron/Å2 dose. In this map, the side chain of isoAsp is more clearly visible (we will include a new figure panel with this density in the supplement). In addition to visual inspection, PHENIX provides a quantitative measure of the fit that also rules out Asn at this position. As we noted in the Methods, “Initial real-space refinement of the 30S subunit against the focused-refined map using PHENIX resulted in a single chiral volume inversion involving the backbone of N119 in ribosomal protein uS11, indicating that the L-amino acid was being forced into a D-amino acid chirality, as reported by phenix.real_space_refine.” Of the 10,564 chiral centers in the 30S subunit, only that for N119 stands out, having an energy residual nearly 2 orders of magnitude larger than the next highest deviation. This stereochemical problem was resolved by modeling isoAsp at this position. We will add these refinement details to the Methods.

      Furthermore, as we noted in the manuscript, isoAsp has been identified in E. coli uS11 by biochemical means (see David, 1999). We examined the phylogenetic conservation of the neighboring sequences in uS11, finding that the N is nearly universal in bacteria and organelles, and D is nearly universal in archaea and eukaryotes (Figure 4 and original Figure 4–figure supplement 1). Finally, even in lower-resolution maps of the archaeal and eukaryotic ribosomes, we find that isoAsp better fits the density, visually with respect to the backbone, and quantitatively based on correlations between RSR models and the density (original Figure 4–figure supplement 2). We therefore think we have been careful in interpreting the isoAsp in uS11, structurally, phylogenetically, and in light of available biochemical evidence. We also provided an in-depth analysis of the neighboring 16S/18S rRNA residues that are in intimate contact with the isoAsp119 region of uS11. See Figure 4B and Supplementary Table 2 and accompanying description.

      mSAsp89: Density for mSAsp89 has been seen previously in the X-ray crystal structure of the 70S ribosome (Noeske, 2015). Here, we also see density for mSAsp89 at lower contour levels. See Figure 1–figure supplement 5. We should have noted in the legend of this panel that we used a lower contour level for mSAsp89 and m7G527, to reveal the modifications. This will be added. Notably, at higher contours that still enclose the standard nucleobase and amino acid side chains, we do not see clear density for the mSAsp89 and m7G527 modifications, in Figure 1–figure supplement 6. In the section of the manuscript covering hypomodifications in RNA, we will clarify this point.

      Pseudouridines: We will clarify how pseudouridines are inferred in the main text. These can be inferred if a solvent molecule or other polar atom is within hydrogen-bonding distance of the N3 in pseudouridine (would be C5 in uridine). We will update Figure 1–figure supplement 5 to better show solvent molecules within hydrogen bonding distance of pseudouridine N3 atoms.

      From a methodological point of view it would be interesting to discuss in more detail how this high resolution structure was obtained, what the specific aspects of high-resolution data collection were and which were the important parameters to refine the structure. Also, how were the thousands of water molecules validated? Regarding the discussion on electrostatic potentials, in contrast to what might be intuitive, the contribution of electron scattering is actually stronger at medium resolution, i.e. its effect does not need high resolution per se. The discussion on radiation damage is a hypothesis at this stage and should be done more carefully including processing of the data using less electron dose (see detailed points below). Taken together, this work describes some interesting findings, but some remain unclear in the discussion because for some no biochemical data are available yet. However, this analysis provides useful hints to design future experiments. Also, there are no developments of tools in this paper in contrast to what is stated.

      We will add some additional information to the Discussion and Methods. In terms of the water molecules, we have not gone through these one by one at this point. We actually do not claim to have introduced new tools, but we note that our water modeling spurred the incorporation of phenix.douse into the latest PHENIX releases. This will be more clearly stated, and we will acknowledge Pavel Afonine for helping us as he developed this functionality. (He indicated we should cite Liebschner, 2019.) Solvent modeling is ripe for future development, as we note in the Discussion.

      Although scattering is stronger at medium resolution, it is not absent at < 2 Å. See the recent atomic-resolution structures of ferritin for examples. In fact, we have now examined the 2.1 Å map deposited by Pichkur et al. (Pichkur, 2020), in which the thioamide is barely visible. The thioamide in the 2.2 Å map deposited by Stojković (Stojković, 2020) is not obviously visible. We will add panels showing this in the revised manuscript.

      We have now used the early frames to address the question of ribose damage and the carboxylate of IAS119 in uS11, as noted above.

      Reviewer #3:

      The bacterial ribosome from E.coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryoEM reconstruction with features compatible with a resolution in the range of overall 2Å and below that resolution in the best areas of the map. With this level of detail, a chemical interpretation of many and fundamental aspects of translation and antibiotic interaction can be discerned in physicochemical terms, greatly improving our understanding of this key component of bacterial cells. The manuscript is well presented with clear evidence supporting the author's claims and interpretations. Specially remarkable is the detailed and accurate handling of the reference list.

      We thank the reviewer for their interest in our work. In the revision, we will keep the references mostly as-is, but will add a few based on the revisions we need to make.

      Mayor concern:

      There is an intense debate within the cryoEM community regarding which is the best way to estimate the resolution of a cryoEM reconstruction. In this manuscript, the authors claim map-to-model FSC values could "in a high resolution context [...] provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." Regardless of the opinion of this reviewer about this specific point, if a map-to-model FSC is to be used as a claim of "high-resolution" a convincing overfitting test proving the absence of overfitting in the refined model should be presented. Otherwise, map-to-model FSC values may be artificially high due to unrealistic deformation of the model. The authors thus, should prove their refined model is not overfitted.

      This was a concern of all the reviewers, which we addressed above. We think the comparisons to other recent structures, especially the 2.1 Å 50S map by Pichkur et al., makes the case for using the map-to-model FSC criterion.

    2. Reviewer #3:

      The bacterial ribosome from E.coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryoEM reconstruction with features compatible with a resolution in the range of overall 2Å and below that resolution in the best areas of the map. With this level of detail, a chemical interpretation of many and fundamental aspects of translation and antibiotic interaction can be discerned in physicochemical terms, greatly improving our understanding of this key component of bacterial cells. The manuscript is well presented with clear evidence supporting the author's claims and interpretations. Specially remarkable is the detailed and accurate handling of the reference list.

      Mayor concern:

      There is an intense debate within the cryoEM community regarding which is the best way to estimate the resolution of a cryoEM reconstruction. In this manuscript, the authors claim map-to-model FSC values could "in a high resolution context [...] provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." Regardless of the opinion of this reviewer about this specific point, if a map-to-model FSC is to be used as a claim of "high-resolution" a convincing overfitting test proving the absence of overfitting in the refined model should be presented. Otherwise, map-to-model FSC values may be artificially high due to unrealistic deformation of the model. The authors thus, should prove their refined model is not overfitted.

    3. Reviewer #2:

      The manuscript by Watson et al. presents the structural analysis of a bacterial ribosome at high resolution. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these are missing or not well developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what this actually brings here as the values are the same when estimated from half map FSCs. The structure visualizes chemical modifications of ribosomal RNA and amino acids and water molecules, which together are interesting and important. However, here one would expect a comparison with structures of previously analyzed bacterial ribosomes, e.g. E. coli and T. thermophilus, e.g. from the same group and from the work by Fischer et al., Nature 2015: how far are the sites conserved? How do the maps compare? Are the same features seen? It is surprising to see that the main chemical modifications are not discussed and shown (only summarized in the Suppl. Data). Pseudo-uridines are mentioned, but how were these identified? It should be mentioned here that due to their isomeric nature these can be discussed only from their typical hydrogen bond pattern. The paper discusses new sites with chemical modifications, but this could benefit from a more thorough discussion of existing biochemical data or from including new biochemical characterization. The structural role of these modifications is not much described. The side chain of IAS119 has no density, hence one should be careful in interpreting an isomerization of this residue, not sure whether the data allow the conclusions to be made. Similar for the mSAsp89 residue for which the density is uncertain, hence it is not clear whether the conclusions stay on a safe ground.

      From a methodological point of view it would be interesting to discuss in more detail how this high resolution structure was obtained, what the specific aspects of high-resolution data collection were and which were the important parameters to refine the structure. Also, how were the thousands of water molecules validated? Regarding the discussion on electrostatic potentials, in contrast to what might be intuitive, the contribution of electron scattering is actually stronger at medium resolution, i.e. its effect does not need high resolution per se. The discussion on radiation damage is a hypothesis at this stage and should be done more carefully including processing of the data using less electron dose (see detailed points below). Taken together, this work describes some interesting findings, but some remain unclear in the discussion because for some no biochemical data are available yet. However, this analysis provides useful hints to design future experiments. Also, there are no developments of tools in this paper in contrast to what is stated.

      Overall, this work appears to be promising, but it could benefit from clearer explanations, further comparisons with previous structures and clearer formulation of the conclusions drawn. There is indeed a significant level of novelty in this study.

    4. Reviewer #1:

      This paper describes a 2A cryo-EM reconstruction of the E.coli 70S ribosome. This structure represents the highest resolution ribosome structure, by any method, available thus far and highlights interesting modifications that were not possible to see in previous structures. I'll let the ribosome experts comment on the relevance of these and focus my review on the cryo-EM technical parts. The paper is clearly written and the figures are informative and beautiful.

      Major comments:

      1) The authors make a big deal out of resolution assessment by model-to-map FSCs. It is unclear to me why they do this. First of all, model-to-map FSC is not a new resolution measure: it is in widespread use already. Second, it is unclear why the authors are so forceful in stating that it is better than the half-map FSC. They say "While map-to-model FSC carries intrinsic bias from the model's dependence on the map, in a high resolution context it does provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." What additional information does it provide? I would say it only provides true additional information if the atomic model comes from another experiment! In the way it is used here: by refining the model inside the very same map, there is a danger of increasing model-to-map FSC values through overfitting of the model (see also below). This danger is not recognized enough in the text (it is only hinted at in the sentence above), and overfitting is not measured explicitly for this case. Yes, half-map FSC measures self-consistency, but in practical terms (when done right!), this doesn't matter for the resolution estimate. The same is true for model-to-map FSCs: when done right they convey the right information, but the danger of self-consistency (through overfitting) also exists here. As the paper is mainly about the high-resolution ribosome structure, and no proper evaluation of the relative merits of half-map FSC versus model-to-map FSC is performed, I would suggest that the authors remove (or at least tone down considerably) their statements about resolution assessment from the manuscript.

      2) To test for the presence of overfitting their atomic models in the maps, the authors should shake-up the atomic models and refine them in the first independently refined half-map. The FSC of that model versus that half-map (FSC_work) should be compared with the FSC of that very same model versus the second half-map (FSC_test). Deviations between the two would be an indicating of overfitting. If that were to be observed, the weights on the stereochemical restraints should be tightened until the overfitting disappears. The same weighting scheme should then be used for the final model refinement against the sum of the half-maps.

      3) Figure 1 -supplement 7: if radiation damage breaks the ribose rings, they should still be OK during early movie frames. This could be investigated by performing per-frame (or per-few-frames) reconstructions. The radiation damage argument would be a lot stronger if the density is present in early frames, yet disappears in the later ones. There will be a balance between dose-resolution and achievable spatial-resolution to see this of course. But it may be worth investigating.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The bacterial ribosome from E. coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryo-EM reconstruction with an overall resolution of 2Å, and even better than that in the best areas of the map. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these could be better developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what the advantage is here as the values are the same when estimated from half map FSCs. Therefore, it is suggested that the discussion about the model-to-map FSC is toned down considerably in (or even removed), while adding in more information about the new findings in the map, along the lines of the comments below.

    1. Reviewer #3:

      In this paper, the authors proposed an automatized method to sub-cortically parcellate the brain given a set of manual delineations. One of its strongest points relies on the adoption of a Bayesian approach, combining priors from the brain anatomic and MRI acquisition. These priors are then used to estimate the posterior probabilities per voxel, which after a series of operations on them provide a final sub-cortical parcellation. The paper sounds correct from a technical point of view and the proposed method potentially relevant, given the importance of having competent tools to find good sub-cortical brain delineations, especially in high resolution datasets.

      I have some possible concerns and suggestions that might increase the quality of the paper:

      -From Figure 4, it is clear how estimated Dice coefficients decrease with age. As it is well noted by the authors, this is likely caused due to the fact that the priors were built from 10 subjects that had an average age of 24.4 years and thus, the highest predicted performance rates are reflected for subjects whose age range (18-40) lies around this average prior age. I know that the authors mentioned in the paper that they plan on modelling the effects of age in the priors in future works. However, I was wondering whether they could already sort of address this question in the current work. Since the data used to test this age bias has already been manually delineated, what if the authors generate new priors for this set of delineations, including subjects from all ages, and test whether the predicted Dice coefficients still depend on age, in the same way as was done in Figure 4?

      -Automatized methods are usually sensitive to the number of subjects used to build the parcellation, with results from a bigger training cohort being potentially more robust and generalizable. As said earlier, I think that one of the strongest points of the automated method presented in this paper is the adoption of a Bayesian approach, which usually works efficiently for small sample sizes and allows to update previous results when new data comes. Still, I think it could be highly illustrative to show the performance of the current method depending on the initial training size. From the same set of delineations of the 105 subjects used to test the age bias, what if the authors show the predicted performance from generating the priors on a training set varying its size?

      -What is the value for the scale parameter delta that appears in the priors? Is that a free parameter? If so, do results change when this parameter varies?