10,000 Matching Annotations
  1. Jul 2024
    1. eLife assessment

      This important study, characterizing the epigenetic and transcriptomic response of a variety of cell types representative of somatic, germline, and pluripotent cells to BPS, reveals the cell type-specific changes in DNA methylation and the relationship with the genome sequence. The findings are convincing and provide a basis for future analyses in vivo. This work should be of interest to biomedical researchers who work on epigenetic reprogramming and epigenetic inheritance.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing editor’s list of items remaining to be addressed followed by our responses/actions:

      (1) The order and organization of supplemental figures and tables is almost impossible to navigate. Please put them in order. 

      All the sections from the previous Supplementary files have been divided into individual Supplementary files so that each can be referenced without confusion from the text. All of the references in the body of the text and the author responses have been updated to reflect this change.

      (2) The question of sample sizes was partially addressed, with authors stating that cell culture work in iPSCs and PGCLCs was done in replicates of 3. Sertoli and granulosa cells were generated from pooled preps - how many individuals, were they littermates? 

      Sertoli and granulosa primary cultures were generated from littermates and each prep used 5 animals (males for Sertoli cells and females for granulosa cells). These changes have been added to the body of the text on pages 39 and 40.

      (3) Authors need to discuss the limitations of doing work in triplicates. Their PCA (Supplement Figure 9) reveals that in several cases samples from the same treatment were not discriminated by PC1 and/or PC2. This is especially true in e and f, the variance of which was explained by PC1 for cell type, but for which treatments showed poor discrimination by PC2. Some discussion of the limitations of sample size should be provided.

      Additional text has been added to what is now Supplementary file 15 to acknowledge this limitation imposed by the limited number of replicates (three) and the ability to resolve the differences in treatments by PCA in subplots e and f. However, we also note that the differences were sufficient to identify significant DMCs/DMRs/DEGs.

      Reviwer 2 also noted a potential weakness that “exposures are more complicated in a whole organism than in an isolated cell line.”

      We note that in our revised manuscript we included wording noting that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      Reviewer #1 (Public Review): 

      Critiques/Comments: 

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this. 

      [Addressed on pages: 24-25] – We have added two sentences to the second paragraph of the Discussion section in which we now acknowledge this concern, but also point out that in vitro models of this sort also provide an experimental advantage in that they facilitate a deconvolution of the extensive complexity resident within the intact animal. Nevertheless, we acknowledge that this deconvolution requires ultimate validation of findings obtained within an in vitro model system to ensure they accurately recapitulate functions that occur in the intact animal in vivo.

      In response to Reviewer 2’s stated weakness of our study that “The weakness includes the fact that exposures are more complicated in a whole organism than in an isolated cell line,” please note that this added text includes the statement that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.  

      Addressed on pages: 39-45 and in new Supplementary file 15 – Additional text has been added in the Methods section to indicate that all samples involving cell culture models which include iPSCs and PGCLCs came from a single XY iPS cell line aliquoted into replicates and all primary cultures which included Sertoli and granulosa cells were generated from pooled tissue preps from mice and then aliquoted into replicates. Finally, all experiments in the study were performed on three replicates. Because this experimental design did indeed allow for comparisons among samples, we have added a new Supplementary file 15

      which displays PCA plots showing clustering among control and treatment datasets, respectively, as well as distinctions between each cluster representing each experimental condition.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?  

      Addressed on pages: 6-7 and in new Supplementary files 1-3  – The experiment shown in Figure 1 was designed to 1) serve as proof of principle that cells maintained in culture could be susceptible to EDC-induced epimutagenesis at all, 2) determine if any response observed would be dose-dependent, and 3) identify a minimally effective dose of BPS to be used for the remaining experiments in this study (which we identified as 1 μM). We agree that it is interesting that the 50 µM dose of BPS induced predominantly hypermethylation changes whereas the 1 µM and 100 µM doses induced predominantly hypomethylation changes, but are not in a position to offer a mechanistic explanation for this outcome at this time. As the results shown satisfied our primary objectives of demonstrating that exposure of cells in culture to BPS could indeed induce DNA methylation epimutations, that this occurs in a dose-dependent manner, and that a dose of as low as 1 µM of BPS was sufficient to induce epimutagenesis, the data obtained satisfied all of the initial objectives of this experiment. That said, in response to the reviewer’s request we have now added text on pages 6-7 alluding to new Supplementary files 1-3 indicating the total number of DMCs and DMRs, as well as the number of DEGs, detected in response to exposure to each dose of BPS shown in Figure 1, as well as stratifying those results to indicate the numbers of hyper- and hypomethylation epimutations and up- and down-regulated DEGs induced in response to each dose of BPS. While, as noted above, investigating the mechanistic basis for the difference in responses induced by the 50 µM versus 1 and 100 µM doses of BPS was beyond the scope of the study presented in this manuscript, we do find this result reminiscent of the “U-shaped” response curves often observed in toxicology studies. Importantly, this result does demonstrate the elevated resolution and specificity of analysis facilitated by our in vitro cell culture model system.

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.  

      Addressed on pages: 6-7 and new Supplementary files 1-6 – In general, we observed a coincidence between changes in DNA methylation and changes in gene expression (Supplementary files 1-3). Pertaining directly to the reviewer’s question about the extent to which we observed common DMRs and DEGs across all doses, while we only found 3 overlapping DMRs conserved across all doses tested, we did find an average of 51.25% overlap in DMCs and an average of 80.45% overlap in DEGs across iPSCs exposed to the different doses of BPS shown in Figure 1. In addition, within each dose of BPS tested in iPSCs, we also found that there was an overlap between DMCs and the promoters or gene bodies of many DEGs (Supplementary file 5). Specifically within gene promoters, we observed a correlation between hypermethylated DMCs and decreased gene expression and hypomethylated DMCs and increased gene expression, respectively (Supplementary file 6).

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.  

      Addressed on pages: 8-9 and new Supplementary file 4 – We observed an average of 11.05% overlapping DMCs between different pairs of cell types, we did not observe any DMCs that were shared among all four cell types. Indeed, this limited overlap of DMCs among different cell types exposed to BPS was the primary motivation for the analysis described in Figure 2. Thus, instead of focusing solely on direct overlap between specific DMCs, we instead examined similarities among the different cell types tested in the occurrence of epimutations within different annotated genomic regions. To better describe this, we have now added additional text to page 9. We have also added more detail to the legend for Figure 2 on page 8 to more clearly explain the significance of the dot sizes and colors, explaining that the dot sizes are indicative of the relative number of differentially methylated probes that were detected within each specific annotated genomic region, and that the dot colors are indicative of the calculated enrichment score reflecting the relative abundance of epimutations occurring within a specific annotated genomic region. The relative score is calculated by iterating down the list of DMCs and increasing a running-sum statistic when encountering a DMC within the specific annotated genomic region of interest and decreasing the sum when the epimutation is not in that annotated region. The magnitude of the increment depends upon the relative occurrence of DMCs within a specific annotated genomic region.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).  

      Addressed on page: 29 – We have added a new paragraph just before the final paragraph of the Discussion section in which we acknowledge that most of the cell types analyzed during our study were XY-bearing “male” cells and that the manner in which XX-bearing “female” cells might respond to similar exposures could differ from the responses we observed in XY cells. However, we also noted that our assessment of XX-bearing granulosa cells yielded results very similar to those seen in XY Sertoli cells suggesting that, at least for differentiated somatic cell types, there does not appear to be a significant sex-specific difference in response to exposure to a similar dose of the same EDC. That said, we also acknowledged that in cell types in which dosage compensation based on X-chromosome inactivation is not in place, differences between XY- and XX-bearing cells could accrue.

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.  

      Addressed on page: 11 and in a new Supplementary file 8  – Previous reports have indicated that BPS does not have the capacity to bind with the androgen receptor (Pelch et al., 2019; Yang et al., 2024). However there have been reports indicating that BPS can interact with other endocrine receptors including PPARγ and RXRα, which play a role in lipid accumulation and the potential to be linked to obesity phenotypes (Gao et al., 2020; Sharma et al., 2018). To address the reviewer’s comment we assessed the expression of a panel of hormone receptors including PPARγ, RXRα, and AR  in each of the cell types examined in our study and these results are now shown in a new Supplementary file 8. We show that in addition to not expressing either estrogen receptor (ERa or ERb), germ cells also do not express any of the other endocrine receptors we tested including AR, PPARγ, and RXRα. Thus we now note that these results support our suggestion that the induction of epimutations we observed in germ cells in response to exposure to BPS appears to reflect disruption of non-canonical endocrine signaling. We also note that non-canonical endocrine signaling is well established (Brenker et al., 2018; Ozgyin et al., 2015; Song et al., 2011; Thomas and Dong, 2006). Thus we feel the suggestion that the effects of BPS exposure could conceivably reflect either disruption of canonical or non-canonical signaling in any cell type is well justified and that our data suggests that both of these effects appear to have accrued in the cells examined in our study as suggested in the text of our manuscript.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways) are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?  

      Addressed on pages: 19-21 – Per the reviewer’s request, we have added text to indicate that Figure 6a is indeed data from all four cell types examined. We have also modified the text to further clarify that Figure 6c displays the expression of other G-coupled protein receptors which are expressed at similar, if not higher, levels than either ER in all cell types examined, and that these have been shown to have the potential to bind to either 17β-estradiol or BPA in rat models. As alluded to by the reviewer, this is indicative of a wide variety of distinct pathways and/or functions that can potentially be impacted by exposure to an EDC such as BPS. Thus, we have attempted to acknowledge the reviewer’s primary point that BPS may interact with a variety of receptors or other factors involved with a wide variety of different pathways and functions. Importantly, this illustrates the strength of our model system in that it can be used to identify potential impacted target pathways that can then be subsequently pursued further as deemed appropriate.

      (9) In Figure 7, what were the 138 genes? Any commonalities among them? 

      Addressed on page: 22 and in a new Supplementary files 13 and 14 – We have now added a new supplemental Excel file (Supplementary file 13) that lists the 138 overlapping conserved DEGs that did not become reprogrammed/corrected during the transition from iPSCs to PGCLCs. In addition, we have added new text on page 22 and a new Supplementary file 14 which displays KEGG analysis of pathways associated with these 138 retained DEGs. We find that these genes are primarily involved with cell cycle and apoptosis pathways which, interestingly, have the potential to be linked to cancer development which is often linked to disruptions in chromatin architecture.

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      Addressed on page: 6 – We have now significantly reduced the length and scope of the final paragraph of the Introduction per the reviewer’s recommendation.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.  

      Addressed on page: 37 – We have added additional text detailing that all mice used in the project were bred onsite, water was non-autoclaved conventional RO water, and our selection of 5V5R extruded feed for mice used in this study which was highly controlled for the presence of isoflavones and has been certified to be used for estrogen-sensitive animal protocols.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters. 

      Strengths: 

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation. 

      Weaknesses: 

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line. 

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors. 

      Reviewer #2 (Recommendations For The Authors): 

      Overall, this is an intriguing paper but more transparency in the replicates and methods and a more rigorous bioinformatic treatment of the data are required. 

      Specific comments: 

      (1) End of abstract "These results suggest a unique mechanism by which an EDC-induced epimutated state may be propagated transgenerationally following a single exposure to the causative EDC." This is overly speculative for an abstract. There is only epigenetic inheritance following mitosis or differentiation presented in this study. There is no meiosis and therefore no ability to assess multi- or transgenerational inheritance. 

      Addressed on page: 2 – We have modified the text at the end of the abstract to more precisely reflect our intended conclusions based on our data. In our view, the ability of induced epimutations to transcend meiosis per se is not as relevant to the mechanism of transgenerational inheritance as their ability to transcend major waves of epigenetic reprogramming that normally occur during development of the germ line. In this regard the transition from pluripotent iPSCs to germline PGCLCs has been shown to recapitulate at least the first portion of normal germline reprogramming, and now our data provide novel insight into the fate of induced epimutations during this process. Specifically, we show that a prevelance of epimutations was conserved during the iPSC à germ cell transition but that very few (< 5%) of the specific epimutations present in the the BPS-exposed iPSCs were retained when those cells were induced to form PGCLCs. Rather, we observed apparent correction of a large majority of the initially induced epimutations during this transition, but this was accompanied by the apparent de novo generation of novel epimutations in the PGCLCs. We suggest, based on other recent reports in the literature, that this is a result of the BPS exposure inducing changes in the chromatin architecture in the exposed iPSCs such that when the normal germline reprogramming mechanism is imposed on this disrupted chromatin template there is both correction of many existing epimutations and the genesis of many novel epimutations. This observation has the potential to explain the long-standing question of why the prevalence of epimutations persists across multiple generations despite the occurrence of epigenetic reprogramming during each generation. Nevertheless, as noted above, we have modified the text at the end of the abstract to temper this interpretation given that it is still somewhat speculative at this point.

      (2) Doses used in the experiments. One needs to be careful when stating that the dose used is "below FDA's suggested safe environmental level established for BPA" because a different bisphenol is being used here (BPA vs BPS) and the safe level is that which the entire organism experiences. It is likely that cell lines experience a higher effective dose.  

      Addressed on pages: 3, 5, and 26 – We have now made a point of noting that our reference to an EPA-recommended “safe dose” of BPA was for humans and/or intact animals. Changes to this effect have been made in the second and sixth paragraphs of the Introduction section. In addition, we have added text at the end of the fourth paragraph of the Discussion section acknowledging that, as the reviewer suggests, the same dose of an EDC could exert greater effects on cells in a homogeneous culture than on the same cell type within an intact animal given the potential for mitigating metabolic effects in the latter. However, we also note that the ability we demonstrated to quantify the effects of such exposures on the basis of numbers of epimutations (DMCs or DMRs) induced could potentially be used in future studies to study this question by assessing the effects of a specific dose of a specific EDC on a specific cell type when exposed either within a homogeneous culture or within an intact animal.

      (3) Figure 1: In the dose response, what was the overlap in DMCs and DEGs among the 3 doses? Are the responses additive, synergistic, or completely non-overlapping? This is an important point that should be addressed. 

      Addressed on page: 6-7 and in Supplementary files 1-5 – Please see our response to Reviewer 1 critique #4 above where we address similar concerns. While we do find overlap among different cell types with respect to the DMCs, DMRs, and DEGs displayed in Figure 1, we found the effect to be only partially additive as opposed to synergistic in any apparent manner. The fold increase in DMCs, DMRs, and DEGs resulting from exposure to doses of 1 μM or 50 μM ranged from 2.5x to 4.4x, which was well below the 50x increase that would have been expected from a strictly additive effect, and the effect increased even less, if at all, in response to exposure to doses of 50 μM versus 100 μM BPS. Finally, as now noted in the Discussion section on page 25, our conclusion is that these results display a limited dose-dependent effect that was partially additive but also plateaued at the highest doses tested.

      (4) Methods: How many times was each exposure performed on a given cell type? This information should be in the figure legends and methods. In the case of multiple exposures for a given line, do the biological replicates agree? 

      Addressed on pages: 39-45 and in new Supplementary file 15 –  Please see our response to Reviewer 1 critique #2 where we address similar concerns with newly added text and analysis. We now note repeatedly on pages 39-45 that each analysis was conducted on three replicate samples, and we display the similarity among those replicates graphically in a new Supplementary file 15.

      (5) DNA methylation analyses. Very little analysis is presented on the BeadChip array other than hypermethylated/hypomethylated and genomic regions of DMCs. What is the range of methylation changes? Does it vary between hypo vs. hyper DMCs? How many array experiments were performed (biological replicates) and what stats were used to determine the DMCs? Are there DMCs in common among the various cell types? As an example, if more meaningful analysis, one can plot the %5mC over a given array for comparisons between control and treated cell types. For more granularity, the %5mC can be presented according to the element type (enhancers vs promoters). 

      Addressed on pages: 10 and 39-45 and in new Supplementary files 1-5, 15 –  Please see our response to Reviewer 1 critique #2 above where we address similar concerns regarding the number of biological replicates used in this study. DMCs on the Infinium array are identified using mixed linear models. This general supervised learning framework identifies CpG loci at which differential methylation is associated with known control vs. treated co-variates. CpG probes on the array were defined as having differential changes that met both p-value and FDR (≤ 0.05) significant thresholds between treatment and control samples for each cell type analyzed. The range of medians across all samples was 0.0278 to 0.0059 for hypermethylated beta values and -0.0179 to -0.0033 for hypomethylated beta values. As noted above, we did observe an overlap in DMCs between cell types. Thus, we observed an average of 11.05% overlapping DMCs between two or more cell types but we did not observe any DMCs shared between all four cell types. We have added additional text on page 9 and new Supplementary files 1-5 to now more clearly describe that this limited similarity in direct overlap of DMCs was the underlying motivation for the analysis described in Figure 2. Finally, the enrichment dot plots shown in Figure 2 provide the information the reviewer requested regarding the %5mC observed at different annotated genomic element types.

      (6) The investigators correlate the number of DMCs in a given cell type with the presence of estrogen receptors. Does the correlation extend to the methylation difference (delta beta) at the statistically different probes?

      Addressed in a new Supplementary file 7 – We have added a new Supplementary file 7 in which we provide data addressing this question. In brief, we find that the delta betas of probes enriched at enhancer regions and associated with relative proximity to ERE elements in Sertoli cells, granulosa cells, and iPSCs appear very similar to those associated with DMCs not located within these enriched regions. However, when we compared the similarity of the two data sets with goodness of fit tests, we found these relatively small differences were, in fact, statistically significant based on a two-sample Kolmogorov-Smirnov test. These observed significant differences appear to indicate that there is higher variability among the delta betas associated with hypomethylated, but not hypermethylation changes occurring at DMCs associated with enhancers, potentially suggesting a greater tendency for exposure to BPS to induce hypomethylation rather than hypermethylation changes, at least in these specific regions.

      (7) Methylation changes relative to EREs are presented in multiple figures. Are other sequences enriched in the DMCs? 

      Addressed in a new Supplementary file 11. We profiled the genomic sequence within 500 bp of cell type-specific enriched DMCs that were either associated with enhancer regions in Sertoli, granulosa, or iPS cells or transcription factor binding sites in PGCLCs for the identification of higher abundance motif sequences. We then compared any motifs identified with the JASPAR database to potentially find transcription factors that could be binding to these regions. Interestingly we found that the two most common motifs across all cell types were associated with either the chromatin remodeling transcription factor HMG1A or the pluripotency factor KLF4.

      (8) Please present a correlation plot between the methylation differences and the adjacent DEGs. Again, the absence of consideration of the absolute changes in methylation and gene expression minimizes the impact of the data. 

      Addressed on pages 6, 7, and 17 and in a new Supplementary file 6 – We analyzed the relationship between DMCs at DEGs promoter regions and the corresponding change in expression of that DEG. Our data support a relationship between up-regulated genes showing decreased methylation in promoter regions and down-regulated genes showing increased methylation at promoter regions, although there were some exceptions to this relationship.

      (9) EM-Seq is mentioned in Figure 7 and in the material and methods. Where is it used in this study? 

      Addressed on page 22 – We now note in the text on page 22 that EM-seq was used during experiments assessing the propagation of BPS-induced epimutations during the iPSC à EpiLC à PGCLC cell state transitions to gather higher resolution data of changes to DNA methylation differences at the whole-epigenome level.

      References

      Brenker C, Rehfeld A, Schiffer C, Kierzek M, Kaupp UB, Skakkebæk NE, Strünker T. 2018. Synergistic activation of CatSper Ca2+ channels in human sperm by oviductal ligands and endocrine disrupting chemicals. Hum Reprod 33:1915–1923. doi:10.1093/humrep/dey275

      Gao P, Wang L, Yang N, Wen J, Zhao M, Su G, Zhang J, Weng D. 2020. Peroxisome proliferator-activated receptor gamma (PPARγ) activation and metabolism disturbance induced by bisphenol A and its replacement analog bisphenol S using in vitro macrophages and in vivo mouse models. Environ Int 134. doi:10.1016/J.ENVINT.2019.105328

      Ozgyin L, Erdos E, Bojcsuk D, Balint BL. 2015. Nuclear receptors in transgenerational epigenetic inheritance. Prog Biophys Mol Biol. doi:10.1016/j.pbiomolbio.2015.02.012

      Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. 2019. Characterization of Estrogenic and Androgenic Activities for Bisphenol A-like Chemicals (BPs): In Vitro Estrogen and Androgen Receptors Transcriptional Activation, Gene Regulation, and Binding Profiles. Toxicol Sci 172:23–37. doi:10.1093/TOXSCI/KFZ173

      Sharma S, Ahmad S, Khan MF, Parvez S, Raisuddin S. 2018. In silico molecular interaction of bisphenol analogues with human nuclear receptors reveals their stronger affinity vs. classical bisphenol A. Toxicol Mech Methods 28:660–669. doi:10.1080/15376516.2018.1491663

      Song K-H, Lee K, Choi H-S. 2011. Endocrine Disrupter Bisphenol A Induces Orphan Nuclear Receptor Nur77 Gene Expression and Steroidogenesis in Mouse Testicular Leydig Cells. Endocrinology 143:2208–2215. doi:10.1210/endo.143.6.8847

      Thomas P, Dong J. 2006. Binding and activation of the seven-transmembrane estrogen receptor GPR30 by environmental estrogens: A potential novel mechanism of endocrine disruption. J Steroid Biochem Mol Biol 102:175–179. doi:10.1016/j.jsbmb.2006.09.017

      Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. 2024. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. Environ Sci Technol 58:2817–2829. doi:10.1021/ACS.EST.3C09779/ASSET/IMAGES/LARGE/ES3C09779_0004.JPEG

    1. eLife assessment

      This manuscript provides important results that assessed the contribution of two catecholaminergic projections to the hippocampus during environment-guided reward behavior. The authors use 2-photon imaging in the hippocampus of behaving mice to provide solid evidence that there are dissociable roles of dopamine and norepinephrine in this structure. Although of great interest to the field of learning and memory, the results would be strengthened by additional data collected from dopaminergic projections to the hippocampus.

    2. Reviewer #1 (Public Review):

      Summary:

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head restrained mice running down a virtual linear path. Mice were trained to collect water reward at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.

      The revised manuscript included additional evidence of increased (but transient) signal in LC axons after a transition to a novel environment during periods of immobility, and also that a change from dark to familiar environment induces a peak in LC axon activity, showing that LC input to dCA1 may not solely signal novelty.

      Strengths:

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis at the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.

      The authors have addressed my concerns in a thorough manner. The reviewer also appreciates the increased transparency of reporting in the revised manuscript.

      Weaknesses:

      Listed below are some remaining comments.<br /> The increase in LC activity with any change in environment (from familiar to novel or from dark to familiar) suggests that LC input acts not solely as a novelty signal, but as a general arousal or salience signal in response to environmental changes. Based on this, I have a couple of questions:

      • Is the overall claim that LC input to the dHC signals novelty still valid based on observed findings - as claimed throughout the manuscript?<br /> • Would the omission of a reward be considered a salient change in the environment that activates LC signals, or is the LC not involved with processing reward-related information? Has the activity of LC and VTA axons been analysed in the seconds following reward presentation and/or omission?

    3. Reviewer #2 (Public Review):

      Summary:

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.

      The main findings were as follows:<br /> - In a familiar environment, activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.<br /> - VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.<br /> - In contrast, activity of LC axons ramped up before initiation of movement on the Styrofoam wheel.<br /> - In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.

      Strengths:

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.

      Weaknesses:

      (1) The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.

      (2) Some aspects of the methodology would benefit from clarification.<br /> First, to help others to better scrutinize, evaluate and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data (see below, Recommendations for the authors).<br /> Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?<br /> Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Fig. 3a, but as <0.2 cm/s for the imaging data analysis in Fig. 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Fig. 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Fig. 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Fig. 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the novelty-induced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.

      REVIEW OF THE REVISED MANUSCRIPT<br /> I thank the authors for their responses addressing some of the weaknesses I raised in my original comments.

      Regarding their clarification of some methodological issues [Point 2) above], I have a few additional comments:<br /> - I appreciate that the authors clearly state the sample sizes contributing to the data. However, sample size justifications (e.g. based on previous studies, considerations of statistical power, practical considerations or a combination of these factors) are still lacking.<br /> - It is good that the authors have now clearly indicated how many mice they excluded due to lack of GCaMP expression or due to failure to reach the behavioral criteria. They also indicated that they discarded some of the collected datasets, based on the visual assessment of imaging sessions and the registration metrics output by suite2p. I appreciate that this may be common practice (although I am not using 2-photon imaging myself). However, I note that to minimize the risk of experimenter bias and improve reproducibility, it would be preferable to have more clearly defined quantitative criteria for such exclusions.<br /> - The authors clarified in their response why they used two different linear tracks for their studies of VTA and LC axon activity. I would encourage them to include this clarification in the manuscript. From the authors' response, I understand that they chose the different track lengths to facilitate comparison to previous studies involving LC and VTA axon recordings. However, given that the present paper aimed to compare LC and VTA axon recordings, the use of different track lengths for LC and VTA axon recordings remains a limitation of the present paper.

    4. Reviewer #3 (Public Review):

      Summary:

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength to their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches to accommodate their unequal LC-CA1 and VTA-CA1 sample sizes. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of navigation and memory.

      Weaknesses:

      The conclusions of this manuscript are mostly well supported by the data. However, increasing the sample size of the VTA-CA1 group and using experimental methods that are identical among LC-CA1 and VTA-CA1 groups would help to fully support the author's conclusions.

    1. Reviewer #3 (Public Review):

      Nitta et al. use a fly model of autosomal dominant optic atrophy to provide mechanistic insights into distinct disease-causing OPA1 variants. It has long been hypothesized that missense OPA1 mutations affecting the GTPase domain, which are associated with more severe optic atrophy and extra-ophthalmic neurologic conditions such as sensorineural hearing loss (DOA plus), impart their effects through a dominant negative mechanism, but no clear direct evidence for this exists particularly in an animal model. The authors execute a well-designed study to establish their model, demonstrating a mitochondrial phenotype and optic atrophy measured as axonal degeneration. They leverage this model to provide the first direct evidence for a dominant negative mechanism for 2 mutations causing DOA plus by expressing these variants in the background of a full hOPA1 complement.

      Strengths of the paper include well-motivated objectives and hypotheses, and overall solid design and execution. There is a thorough discussion of the interpretation and context of the findings. The results technically support their primary conclusions with minor limitations. First, while only partial rescue of the most clinically relevant metric for optic atrophy in this model is now acknowledged, the result nevertheless hamstrings the mechanistic experiments that follow. Second, the results statistically support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. In added experiments, the ability of WT hOPA1 and I382M but not 2708del, D438V or R445H to rescue ROS levels or mitophagy in the context of dOPA1 knockdown serves to support axonal number as a valid measure of mitochondrial function in this context. However, the critical experiment demonstrating a dominant negative effect was performed in the context of expressing WT hOPA1 along with a pathogenic variant, in which no differences in ROS, COXII expression or mitophagy were seen. This makes it difficult to conclude that the dominant negative effect of D438V and R445H on axon number is related to mitochondrial function.

      As an animal model of DOA that may serve for rapid assessment of suspected OPA1 variants, the results overall support utility of this model in identifying pathogenic variants but not in distinguishing haploinsufficiency from dominant negative mechanisms among those variants. The impact of this work in providing the first direct evidence of a dominant negative mechanism is under-stated considering how important this question is in development of genetic treatments for dominant optic atrophy.

      Comments on revised version:

      The authors have addressed the comments in my initial review. Through these modification and those related to the comments from the other reviewers, the manuscript is strengthened.

      Comments on author responses to each of the reviews:

      Reviewer 1:

      Interpretation of data has been appropriately reorganized in the discussion.

      Quantified mitochondria in the model show no difference in number. There is reduced size and structural abnormalities on electron microscopy.

      Application of mito-QC revealed increased mitophagy.

      Regarding partial rescue of axonal number in the mutant model, statistical significance between control and rescue is still not depicted in Figure 4D. Detailing possible explanations for this has been addressed in the discussion. However, only partial rescue of the most clinically relevant metric for optic atrophy in this model hamstrings subsequent mechanistic experiments that follow.

      Discussion regarding variant I382M has been improved.

      While reviewer 1's concerns about axonal number as a biomarker for OPA1 function are valid, it is worth noting that this is the most clinically relevant marker in the context of DOA. That said, I agree that the mechanistic DN/HI studies needed support using other measures of mitochondrial function, and the authors have done this. The ability of WT hOPA1 and I382M but not 2708del, D438V or R445H to rescue ROS levels or mitophagy in the context of dOPA1 knockdown serves to support axonal number as a valid measure of mitochondrial function in this context. However, the critical experiment demonstrating a dominant negative effect was performed in the context of expressing WT hOPA1 along with a pathogenic variant, in which no differences in ROS, COXII expression or mitophagy were seen. This makes it difficult to conclude that the (marginal) DN effect of D438V and R445H on axon number is related to mitochondrial function, and serves as a minor weakness of the paper.

      Which exons are included in the transcript, and therefore, which isoforms are expressed in the model, has been addressed.

      Reviewer 2:

      The authors have addressed the need to include greater methodological details.

      Language concerning the clinical utility of the model in informing treatment decisions has been appropriately modified. As pointed out by Reviewer 1, additional studies were needed to better establish the potential clinical utility of this model in screening DOA variants. The authors have completed those experiments, and the results overall support utility of this model in identifying pathogenic variants but not in distinguishing HI/DN mechanisms among those variants.

      Reviewer 3:

      The author has addressed the partial rescue effect as above.

      The authors have not modified the text to acknowledge the marginal effect sizes in the critical experiment of the study that demonstrates a DN effect. Statistically, the results indeed support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. This remains a weakness of the study.

    2. eLife assessment

      This study provides valuable insights into the complex genetics of dominant optic atrophy. Leveraging a fly model, the investigators provide solid evidence, albeit with small effect sizes, for a dominant negative mechanism of certain pathogenic variants that tend to cause more severe phenotypes, a long held hypothesis in the field. The work is of high interest to those in the optic atrophy and degeneration fields.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Strengths:

      The authors have generated a novel transgenic mouse line to specifically label mature differentiated oligodendrocytes, which is very useful for tracing the final destiny of mature myelinating oligodendrocytes. Also, the authors carefully compared the distribution of three progenitor cre mouse lines and suggested that Gsh-cre also labeled dorsal OLs, contrary to the previous suggestion that it only marks LGE-derived OPCs. In addition, the author also analyzed the relative contributions of OLs derived from three distinct progenitor domains in other forebrain regions (e.g. Pir, ac). Finally, the new transgenic mouse lines and established multiple combinatorial genetic models will facilitate future investigations of the developmental origins of distinct OL populations and their functional and molecular heterogeneity.

      Weaknesses:

      Since OpalinP2A-Flpo-T2A-tTA2 only labels mature oligodendrocytes but not OPCs, the authors can not suggest that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation (line 118-9). It remains possible that LGE/CGE-derived OPCs migrate into the cortex but are later eliminated.

      We are glad that the reviewer appreciates our work and are grateful for the positive comments and the constructive suggestion. We agree with the reviewer that our methodology by itself cannot suggest whether the lack of LGE/CGE-derived-OLs in the neocortex is caused by competitive postnatal elimination or not. That is why we cited a parallel work by Li et al. (ref [17] in the original manuscript; ref [19] in the revised manuscript), in which in utero electroporation (IUE) failed to label LGE-derived OL lineage cells in both embryonic and early postnatal brains. Although they did not directly explore CGE using IUE, their fate mapping results using Emx1-Cre; Nkx2.1-Cre; H2B-GFP at P0 and P10 revealed very low percentage of LGE/CGE-derived OL lineage cells. The lack of adult labeling in our study together with the lack of developmental labeling in the other study prompted us to hypothesize that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation. In the revised manuscript, we have expanded the discussion to explain this point more clearly.

      Reviewer #2 (Public Review):

      [...] Strengths:

      The strength and novelty of the manuscript lies in the elegant tools generated and used and which have the potential to elegantly and accurately resolve the issue of the contribution of different progenitor zones to telencephalic regions.

      We are glad that the reviewer appreciates our work and are grateful for the overall positive comments.

      Weaknesses:

      (1) Throughout the manuscript (with one exception, lines 76-78), the authors quantified OL densities instead of contributions to the total OL population (as a % of ASPA for example). This means that the reader is left with only a rough estimation of the different contributions.

      We thank the reviewer for this constructive suggestion. We have replaced the density quantification (Figure 2F and 3D in the original manuscript) with contributions to the total OL population (% of ASPA) (Figure 2J and 2N in the revised manuscript).

      (2) All images and quantifications have been confined to one level of the cortex and the potential of the MGE and the LGE/CGE to produce oligodendrocytes for more anterior and more posterior cortical regions remains unexplored.

      The quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. We apologize for not having stated and presented this information clearly enough, and for the confusions it may have caused. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200*) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      (3) Hence, the statement that "In summary, our findings significantly revised the canonical model of forebrain OL origins (Figure 4A) and provided a new and more comprehensive view (Figure 4B )." (lines 111, 112) is not really accurate as the findings are neither new nor comprehensive. Published manuscripts have already shown that (a) cortical OLs are mostly generated from the cortex [Tripathi et al 2011 (https://doi.org/10.1523/JNEUROSCI.6474-10.2011), Winker et al 2018 (https://doi.org/10.1523/JNEUROSCI.3392-17.2018) and Li et al (https://doi.org/10.1101/2023.12.01.569674)] and (b) MGE-derived OLs persist in the cortex [Orduz et al 2019 (https://doi.org/10.1038/s41467-019-11904-4) and Li et al 2024 (https://doi.org/10.1101/2023.12.01.569674)]. Extending the current study to different rostro-caudal regions of the cortex would greatly improve the manuscript.

      As explained in the response to comment (2), our original quantifications included different rostro-caudal regions of the cortex. In the revised manuscript, we have added more schematics and representative images in the Supplementary Figure 2 for better illustration to resolve the concern of comprehensiveness.

      We thank the reviewer for listing and summarizing highly relevant published researches along with the parallel study by Li et al. submitted to eLife. We apologize for the omission of the first two references in our original manuscripts and have cited them in appropriate places (ref [10] and ref [11] in the revised manuscript). However, we believe these works do not compromise the novelty and significance of our work for the following reasons:

      (1) Tripathi et al. 2011 (ref [10] in the revised manuscript) analyzed OL lineage cells in the corpus callosum and the spinal cord, but not in the cortex and anterior commissure. Their analysis was performed in juvenile mice (P12/13), not in adulthood. Most importantly, their analysis of ventrally derived OL lineage cells relied on lineage tracing using Gsh2Cre, which in fact also label OLs derived from Gsh2+ dorsal progenitors. In contrast, we analyzed mature OLs in the cortex, corpus callosum and anterior commissure in 2-month-old adult mice. We used intersectional and subtractive strategy to label OLs derived from dorsal, LGE/CGE and MGE/POA origins. Our strategy differentiated the two different ventral lineages (LGE/CGE vs. MGE/POA) and avoided mixed labeling of OLs from ventral and dorsal Gsh2+ progenitors.

      (2) Winkler et al. 2018 (ref [11] in the revised manuscript) analyzed OLs derived from dorsal progenitors but only quantified those in the gray matter and the white matter of somatosensory cortex. Their quantification relied on co-staining with Olig2/Sox10, and thereby included both oligodendrocyte precursors (OPCs) and OLs. In contrast, we analyzed mature OLs from three origins and quantified not only neocortical regions (Mo and SS) but also an archicortical region (Pir). Our analysis revealed that although dorsally derived OLs dominate neocortex, ventrally derived OLs, especially the LGE/CGE-derived ones, dominate piriform cortex.

      (3) Orduz et al. 2019 (ref [7] in the original manuscript and the revised manuscript) mainly focused on POA-derived OLs in the somatosensory cortex. Although they performed limited analysis on MGE/POA-derived OPCs at postnatal day 10 and 19, no quantification of MGE/POA-derived OLs was performed in terms of their density, contribution to the total OL population and spatial distribution in the cortex. In contrast, we performed systematic quantification on these aspects to demonstrate that MGE/POA-derived OLs make small but sustained contribution to cortex with a distribution pattern distinctive from those derived from the dorsal origin.

      (4) Li et al. 2024 (ref [17] in the original manuscript and [19] in the revised manuscript) is a parallel study submitted to eLife. Their and our independent discoveries nicely complemented each other. Using different sets of techniques and experiments but some shared genetic mouse models, we both found that LGE/CGE made minimum contribution to neocortical OLs. Their analysis in the prenatal and early postnatal stages together with our analysis in the adult brain painted a more comprehensive picture of cortical oligodendrogenesis. The uniqueness of our work is that we performed systematic quantification of all three origins and uncovered the differential contributions to neocortex, piriform cortex, corpus callosum and anterior commissure.

      In summary, our work developed novel strategies to faithfully trace OLs from the three different origins and performed systematic analysis in the adult brain. Our data uncovered their differential contributions to neocortex, piriform cortex and the two commissural white matter tracts, which significantly differ not only from the canonical view but also from other previous studies in aspects discussed above. We believe our discoveries did significantly revise the canonical model of forebrain OL origins and provided a new and more comprehensive view.

      Reviewer #3 (Public Review):

      [...] Intriguingly, by using an indirect subtraction approach, they hypothesize that both Emx1-negative and Nkx2.1-negative cells represent the progenitors from lateral/caudal ganglionic eminences (LC), and conclude that neocortical OLs are not derived from the LC region.The authors claim that Gsh2 is not exclusive to progenitor cells in the LC region (PMID: 32234482). However, Gsh2 exhibits high enrichment in the LC during early embryonic development. The presence of a small population of Gsh2-positive cells in the late embryonic cortex could originate/migrate from Gsh2-positive cells in the LC at earlier stages (PMID: 32234482). Consequently, the possibility that cortical OLs derived from Gsh2+ progenitors in LC could not be conclusively ruled out. Notably, a population of OLs migrating from the ventral to the dorsal cortical region was detected after eliminating dorsal progenitor-derived OLs (PMID: 16436615).

      The indirect subtraction data for LC progenitors drawn from the OpalinFlp-tdTOM reporter in Emx1-negative and Nkx2.1-negative cells in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line present some caveats that could influence their conclusion. The extent of activity from the two Cre lines in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mice remains uncertain. The OpalinFlp-tdTOM expression could occur in the presence of either Emx1Cre or Nkx2.1Cre, raising questions about the contribution of the individual Cre lines. To clarify, the authors should compare the tdTOM expression from each individual Cre line, OpalinFlp::Emx1Cre::RC::FLTG or OpalinFlp::Nkx2.1Cre::RC::FLTG, with the combined OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line. This comparison is crucial as the results from the combined Cre lines could appear similar to only one Cre line active.

      Overall, the authors provided intriguing findings regarding the origin and fate of oligodendrocytes from different progenitor cells in embryonic brain regions. However, further analysis is necessary to substantiate their conclusion about the fate of LC-derived OLs convincingly.

      We thank the reviewer for these thoughtful comments. We agree with the reviewer that the presence of Gsh2-positive cells in the late embryonic cortex by itself could not rule out the possibility that they originate/migrate from Gsh2-positive cells in the LC at earlier stages. Staining dorsal-lineage intermediate progenitors with Gsh2, or performing intersectional lineage tracing using Gsh2Cre along with a dorsal-specific Flp driver, would provide more direct evidence on this issue. Nonetheless, as our lineage tracing of LGE/CGE-derive OLs did not employ Gsh2Cre, the doubt on the identity of Gsh2+ cortical progenitors should not affect the interpretation of our data.

      Regarding the subtractional LCOL labeling strategy used in our study, we wonder if there was any misunderstanding by the reviewer. As stated in our manuscript (line 59-61) and reiterated by the reviewer, OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG labels OLs derived from progenitors that express neither Emx1Cre nor Nkx2.1Cre. As these two progenitor pools do not overlap with each other, there is a purely additive effect of their actions. If there is any concern about efficiency and specificity, it would be non-adequate Cre-mediated recombinations that lead to mislabeling of dOLs or MPOLs as LCOLs (i.e., OLs derived from Emx1 or Nkx2.1-expressing progenitors were not successfully “subtracted” and thereby “wrongly” retained RFP expression). Therefore, the bona-fide LGE/CGE-derive OLs would only be fewer but not more than RFP+ LCOLs labeled by our subtractional strategy, even if any of the Cre lines did not work efficiently enough. In any case, this would not affect our conclusion that LGE/CGE-derive OLs make a minimal contribution to neocortex, as the “ground truth” contribution by LGE/CGE could only be less but not more than what we have observed using the current strategy.

      In support of our conclusion, a parallel study by Li et al. 2024 (ref [17] in the original manuscript; ref [19] in the revised manuscript) also provided independent experimental evidence that “any contribution of oligodendrocyte precursors to the developing cortex from the lateral ganglionic eminence is minimal in scope (quoted from its eLife assessment).” In addition, in their revision, they performed Gsh2 immunostaining in P0 Emx1Cre::HG-loxP mouse and found nearly all Gsh2+ cells in the cortical SVZ were derived from the Emx1+ lineage. We are glad that this additional piece of evidence further clarified the case, but still want to emphasize that the subtractional strategy we took was designed purposefully to avoid the potential uncertainty of Gsh2Cre and to more faithfully label LGE/CGE-derived OLs. Therefore, the validity of our conclusion about the fate of LC-derived OLs should be independent from the question on the identity of Gsh2+ cortical progenitors and stands well by itself.

      We hope that these explanations have adequately addressed the reviewer’s concerns. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In Figures 2C, 2D, 2E and 3D, the authors should provide counts of labelled cells as a % of ASPA+ cells. This will give an accurate picture of the contribution of the different progenitor regions to OLs.

      The graphs in Figure 2F are unnecessary since they are simply repeats of C-E but re-arranged.

      We thank the reviewer for the valuable suggestions. These two recommendations are sort of related, and thereby we made the following changes. We replaced the density quantification in Figure 2F and 3D with % of ASPA (Figure 2J and 2N in the revised manuscript) to give an accurate picture of the contribution of the different progenitor regions to OLs, as suggested by the reviewer. We still retained the density counts in Figure 2C-E (Figure 2G-I in the revised manuscript). Together with quantifications of rotral-caudal and larminar distributions presented in Supplementary Figure 2, these data demonstrated that OLs from differential origins display distinct spatial distribution patterns.

      At what ages were the quantifications performed in all the figures?

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section of the revised manuscript.

      In 2D, and 3B the GFP should have been activated but the authors do not show it or quantify it presumably because GFP would flood the sections in the presence of Emx1Cre. Nevertheless, since eGFP is shown in the diagram in 2B, the authors should mention why they chose not to show it.

      We thank the reviewer for the helpful comment and the suggestion. We have modified the schematic in Figure 2B and added explanation in the figure legend (line 308-313). We also added a schematic in Supplementary Figure 1A along with images of GFP channel in Supplementary Figure 1D (line 338-350).

      All the main figures and supplementary figures are too small to see properly.

      We are sorry that there was severe compression of images in the combined manuscript file at the conversion step during the initial submission. We apologize for the compromised image quality and have re-uploaded full-size figures as individual files on BioRxiv soon after receiving the reviews. For the revised manuscript, we also take care to upload full-size figures at high resolution as individual files to ensure their quality of presentation.

      Supplementary Figure 2E is unnecessary and perhaps misleading the reader that cortical-derived OLs have a preference for the lower layers whereas the distribution may simply reflect the distribution of OLs in the cortex.

      We thank the reviewer for the helpful comment and the suggestion. We have removed this panel and replaced it with quantifications of relative laminar distributions of the total (ASPA+) OLs along with those from the three different origins (Supplementary Figure 2G in the revised manuscript). Indeed, the preference for the lower layers of dorsally-derived OLs mirrored the distribution of total OLs in the cortex, while the MGE/POA-derived OLs deviate significantly from others and exhibit higher preference towards layer 4.

      Quantification of labelled cells as a % of ASPA should also be performed in Supplementary Figure 3.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included quantifications of labelled cells as % of ASPA for both OpalinFlp::Emx1Cre::Ai65 and  OpalinFlp::Nkx2.1Cre::Ai65 (Figure 2J and N). The sum of the these two data sets will be equivalent to those of OpalinFlp::Emx1Cre::Nkx2.1Cre::Ai65 shown in Supplementary Figure 3, and thereby we did not perform additional quantifications to avoid redundant efforts.

      Imaging and quantification should be extended to more posterior regions of the cortex to find out whether the contribution is different from the areas already examined.

      We thank the reviewer for the suggestion on imaging and apologize for the confusion about the range of quantification. As explained in the response to comment (2) of weakness, the quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should provide Opalin reporter expression data across various brain regions at different developmental stages to clarify the expression pattern of the reporter.

      We appreciate the reviewer’s comment. We chose to performed all quantifications in adult mice as Opalin is a well-established marker for differentiated OLs and the recombinase-dependent reporter expression is accumulative and irreversible. If there is any non-specific labeling in any earlier developmental stage, it would be retained and manifested at the timepoint we examined as well. In another word, the fact that we did not detect any non-specific labeling in the current dataset but only confined labeling in mature OLs ensured that no non-OL labeling was present in earlier timepoint. As shown in Figure 1D-F, reporter expression activated by the Opalin driver is presented at high OL specificity in all analyzed brain regions. This is further corroborated by results from combinatorically labeled samples (Figure 2 and Supplementary Figure 2), in which only OLs but not any other cell types were labeled in all analyzed brain regions too. Following the reviewers’ suggestions, we have added representative images of more rostral and more caudal cortical regions (Supplementary Figure 2B-D), which also showed highly specific OL labeling.  

      (2) In Figure 1D, please specify the developmental stage of the mice used for staining.

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript.

      (3) The authors should clarify if the Opalin reporter expressed in OPCs and astrocytes at developmental stages of mice, such as P0, P7, and P30.

      We appreciate the reviewer’s comment, but as explained in response to comment (1), Opalin is a well-established marker for differentiated OLs which is not expressed in OPCs or astrocytes. As shown in Figure 1D-E, reporter expression is confined to CC1+ differentiated OLs with no colocalization with Sox9 (astrocyte marker). In support with this observation, only ASPA+ differentiated OLs but no OPC or astrocyte were labeled in any of the combinatorial lineage tracing samples generated using this line combined with progenitor-Cre lines. In addition to marker staining, we also did not observe any RFP+ cells with OPC or astrocyte morphology. As the recombinase-dependent reporter expression is accumulative and irreversible, the fact no non-specific labeling was observed in adult brain retrospectively proved the specificity of Oplain-Flp in earlier developmental stages.

      (4) In Figure 1E, authors should address why the efficiency of the tdTomato line is notably lower compared to that of H2B-GFP and whether the stability of reporters could impact the conclusions drawn.

      The difference in reporting efficiency is mainly caused by differences inherent to the two reporting systems. The TRE-RFP reporter is derived from Ai62, composed of a Tet response element and tdTomato inserted into the T1 TIGRE locus. The tdTomato expression is driven by tTA-TRE transcriptional activation. The HG-loxP reporter is derived from HG-Dual, composed of a CAG promoter, a frt-flanked STOP cassette, and H2B-GFP inserted into the Rosa26 locus. The H2B-GFP expression is driven by CAG promoter after Flp-mediated removal of the STOP cassette. A Flp-dependent tdTomato reporter designed in the same way as the HG-FRT reporter would have similar efficiency. In fact, the RC::FLTG reporter can be viewed as such a reporter in the absence of Cre, which did show similarly high efficiency as HG-FRT and supported efficient subtractive labeling of LGE/CGE-derived OLs. We apologize for a typo in the title of the Y-axis of the right panel in the original Figure 1F which may have caused potential misunderstanding. The “RFP+CC1+/CC1” should be “XFP+CC1/CC1”. We have corrected this mistake and revised the figure legend for clearer description of the data (Line 293-302 in the revised manuscript).

      (5) In Figure 2, please clarify the developmental stage of the mice used for staining. Authors should present the eGFP image in addition to tdTOM.

      We apologize for the omission of the age information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript. We thank the reviewer for the suggestion on eGFP image and have presented it in supplementary Figure 1 in the revised manuscript.

      (6) in Figure 2D, authors should display the eGFP image alongside the tdTomato image. It is difficult to assess the efficiency of Emx-Cre and Nkx2.1-Cre.

      We thank the reviewer for the suggestion on eGFP image and have presented eGFP image in Supplementary Figure 1D in the revised manuscript. There are two reasons why we chose to present it in the supplementary figure instead of main figure. First, we added ASPA staining in the green channel along with quantifications of RFP cells as % of ASPA in Figure 2 in the revised manuscript, following reviewer #2’s suggestion. Second, as pointed out by reviewer #2, GFP would flood the sections in the presence of Emx1Cre and could be quite distractive if it was shown together with RFP.

      We were not entirely sure what exactly the reviewer means by “assess the efficiency of Emx-Cre and Nkx2.1-Cre”, but we believe that the quantifications of RFP cells as % of ASPA clarified the contribution of each origin to the total OLs (Figure 2J and 2N in the revised manuscript).

      (7) Figure 3 depicts the entire brain, replicating the image presented in Figure 2. It would be beneficial to consolidate Figures 2 and 3, as they showcase identical brain scans of different regions.

      We thank the reviewer for the constructive suggestion and have consolidated Figures 2 and 3 in the original manuscript into Figure 2 in the revised manuscript.

    2. Reviewer #2 (Public Review):

      In this manuscript, Cai et al use a combination of mouse transgenic lines to re-examine the question of the embryonic origin of telencephalic oligodendrocytes (OLs). Their tools include a novel Flp mouse for labelling mature oligodendrocytes and a number of pre-existing lines (some previously generated by the last author in Josh Huang's lab) that allowed combinatorial or subtractive labelling of oligodendrocytes with different origins. The conclusion is that cortically-derived OLs are the predominant OL population in the motor and somatosensory cortex and underlying corpus callosum, while the LGE/CGE generates OLs for the piriform cortex and anterior commissure rather than the cerebral cortex. Small numbers of MGE-derived OLs persist long-term in the motor, somatosensory and piriform cortex.

      Strengths:

      The strength and novelty of the manuscript lie in the elegant tools generated and used. These have enabled the resolution of the issue regarding the contribution of different telencephalic progenitor zones to the cortical oligodendrocyte population.

      Comments on latest version:

      The revised manuscript by Cai et al has addressed all the issues raised. I have some minor comments:

      Figure 2: The y axis in figure 2L should be the same as the y axis in 2M to make the contribution to Mo and SS more clear.

      Figure 3: Although this is clear in the figure, A an B should be labelled as classical model and new model to help the reader understand immediately what the two figures show.

      Suppl Fig 2: It is not clear what 1-7 represent. It should be made clear in the legend which areas have been pooled into the different bins. The X axis should be labelled.

    3. eLife assessment

      In this study the authors revisited the question of the embryonic origin of telencephalic oligodendrocytes using some new and powerful genetic tools. There is convincing evidence to support previous suggestions of a predominantly cortical origin of oligodendrocytes in the cerebral cortex, however the new studies suggest that LGE/CGE-derived oligodendrocytes make a modest contribution in some areas, while MGE/POA-derived oligodendrocytes make a small but enduring contribution. The findings are valuable and should be of interest to developmental and myelin biologists.

    1. Author response

      Reviewer #1 (Public Review):

      […] Weaknesses:

      This work explores an interesting question on regulating myoD+ progenitors and the defects of this process in skeletal muscle differentiation by SRFS2 but spreads out in many directions rather than focusing on the key defects. A number of approaches are used, but they lack the robust mechanistic analysis of the defects that result in muscle differentiation. Specifically, the role of SRFS2 on splicing appears to be a misfit here and does not explain the primary defects in the migration of myoD+ progenitors. There are concerns about the scRNA-seq and many transcripts in muscle biology that are not expressed in muscle cells. Focusing on main defects and additional experimental evidence to clear the fusion vs. precocious differentiation vs. reduced differentiation will strengthen this work.

      (1) The analysis of RNA-seq data (Figure 2) is limited, and it is unclear how it relates to the work presented in this MS. The Go enrichment analysis is combined for both up and down-regulated DEG, thus making it difficult to understand the impact differently in both directions. Stac2 is a predominant neuronal isoform (while Stac3 is the muscle), and the Symm gene is not found in the HGNC or other databases. Could the authors provide the approved name for this gene? The premise of this work is based on defects in ECM processes resulting in the mis-targeting of the muscle progenitors to the nonmuscle regions. Which ECM proteins are differentially expressed?

      The GO enrichment analysis (Figure 2B) indicates that genes involved in skeletal muscle construction and function were significantly dysregulated, with both up-regulated and down-regulated genes observed, consistent with the phenotype analysis presented in Figure 1.

      We agree with the reviewer’s comments that Stac3 is the predominant muscle isoform with high expression in skeletal muscle tissues, while stac2 is expressed at low levels in these tissues. Therefore, we decided to delete the Stac2 data from the Figure 2C and will modify the text accordingly. We apologize for our errors.

      In response to the reviewer's comment regarding the Symm gene not being found in the HGNC or other databases, we carefully re-examined the genes presented in Figure 2C. We discovered that one of the genes is actually Synm, which encodes synemin, an intermediate filament protein. We will correct this in the manuscript.

      scRNA-seq analysis revealed defects in ECM processes in SRSF2-deficient myoblasts, which we believe likely resulted in the mis-targeting of muscle progenitors to non-muscle regions. However, comparing RNA-seq results from whole muscle tissues with scRNA-seq results is challenging.

      (2) Could authors quantify the muscle progenitors dispersed in nonmuscle regions before their differentiation? Which nonmuscle tissues MyoD+ progenitors are seen? Most of the tDT staining in the enlarged sections appears to be punctate without any nuclear staining seen in these cells (Figure 3 B, D E-F). Could authors provide high-resolution images? Also, in the diaphragm cross-sections in mutants, tdT labeling appears to be missing in some areas within the myofibers defined as cavities by the authors (marked by white arrows, Figure 3H). Could this polarized localization of tDT be contributing to specific defects?

      tdT staining revealed a substantial presence of MyoD-derived cells distributed beyond the muscle regions, as shown in Figure 3B. Quantify the number of MyoD+ progenitors dispersed in non-muscle regions is not meaningful.

      tdT+ cells also include those that previously expressed MyoD but have since differentiated into myotubes and myofibers, which is why many tdT+ staining is not nuclear.

      MyoD+ cells deficient in SRSF2 either undergo apoptosis or premature differentiation. Consequently, tdT staining in SRSF2-KO muscles showed many irregularities in the muscle fibers.

      (3) Is there a difference in the levels of tDT in the myoD" muscle progenitors that are mis-targeted vs the others that are present in the muscle tissues?

      tdT+ cells include those that previously expressed MyoD but have since differentiated into myotubes and myofibers, which are no longer MyoD+ cells. Additionally, tdT+ also include those currently expressing MyoD, which are MyoD+ cells.

      The fiber differences between WT and SRSF2-KO mice are easily discernible through tdT staining (Figure 2D and 3D), however, comparing the levels of tdT staining between the two groups is not meaningful.

      (4) scRNA is unsuitable for myotubes and myofibers due to their size exclusion from microfluidics. Could authors explain the basis for scRNA-seq vs SnRNA-seq in this work? How are SKM defined in scRNA-data in Figure 4? As the myofibers are small in KO, could the increased level of late differentiation markers be due to the enrichment of these small myotubes/myofibers in scRNA? A different approach, such as ISH/IF with the myogenic markers at E9.5-10.5, may be able to resolve if these markers are prematurely induced.

      SRSF2 is highly expressed in proliferative myoblasts, but its levels declined once differentiation begins. In our study, we used Myod1-Cre to delete the SRSF2 gene and performed the scRNA-seq analysis to examine the effects of SRSF2 deletion on the proliferation and differentiation of MyoD cells. Our analysis revealed that SRSF2 deletion caused proliferation defects and premature differentiation of MyoD cells (Figure 5G), leading to myofiber abnormalities.

      We determined that snRNA-seq analysis is not suitable for our study.

      Additionally, skeletal muscle cells (SKM) were defined based on the expression of skeletal muscle markers, as shown in Figure 4C.

      (5) TNC is a marker for tenocytes and is absent in skeletal muscle cells. The authors mentioned a downregulation of TNC in the KO SKM derived clusters. This suggests a contamination of the tenocytes in the control cells. In spite of the downregulation of multiple ECM genes showed by scRNA-seq data, the ECM staining by laminin in KO in Figure 3 appears to be similar to controls.

      Tenascin-C (Tnc) is also part of the extracellular matrix (ECM) family. scRNA-seq analysis revealed that multiple ECM genes were downregulated in SRSF2-KO myoblasts, however, this did not indicate that laminin was downregulated in the SRSF2-KO muscles.

      (6) The expression of many fusion genes, such as myomaker and myomerger, is reduced in KO, suggesting a primary fusion defect vs a primary differentiation defect. Many mature myofiber proteins exhibit an increased expression in disease states, suggesting them as a compensatory mechanism. Authors need to provide additional experimental evidence supporting precocious differentiation as the primary defect.

      Our analysis revealed that the deletion of SRSF2 caused premature differentiation of MyoD cells (Figure 5G), leading to abnormalities of myofiber formation. SRSF2 is highly expressed in proliferative myoblasts, but its expression declines quickly in myotubes. Therefore, it is unlikely that the low expression of SRSF2 in myotubes caused the primary fusion defect.

      (7) The fusion defects in KO are also evident in siRNA knockdown for SRSF2 and Aurka in C2C12, which mostly exhibits mononucleated myocytes in knockdowns. Also, a fusion index needs to be provided.

      SRSF2 knockdown and Aurka knockdown caused differentiation defects, including fusion defects. We quantified the percentages of both MyoG+ and MHC+ cells in the differentiation assay.

      (8) The last section of the role of SRSF2 on splicing appears to be a misfit in this study. Authors describe the Bin1 isoforms in centronuclear myopathy, but exon17 is not involved in myopathy. Is exon17 exclusion seen in other diseases/ splicing studies?

      Our study is the first to report that exon 17 inclusion of Bin1 is regulated by SRSF2. Specifically, the knockdown of Bin1 exon 17 caused severe differentiation defects in C2C12 myoblasts. The involvement of Bin1 exon 17 in myopathy requires further validation using clinical samples.

      Reviewer #2 (Public Review):

      […] Weaknesses: Although unbiased sequencing methods were used, their findings about SRSF2 served as a transcriptional regulator and functioned in alternative splicing events are not novel. The introductions and discussion is not clearly written. The authors did not raise clear scientific questions in the introduction part. The last paragraph is only copy-paste of the abstract. The discussion part is mainly the repeat of their results without clear discussion.

      While the role of SRSF2 as a transcriptional regulator involved in alternative splicing events is not novel, the specific SRSF2-regulated alternative splicing events and targeted genes in skeletal muscle have not been reported in other publications. We believe our interpretation of the data and comparison with related published studies are well presented in the Discussion section.

    1. eLife assessment

      This study presents valuable data on sensory integration in a model pre-motor neuron, the Mauthner cell. The authors use both stimulation of the optic tectum (a proxy for vision) and auditory stimulation to study the integration of these modalities in the Mauthner cell using convincing, technically demanding, and well done experiments. There are, however, concerns about the degree to which the two modalities interact; multisensory integration of subthreshold unisensory stimuli appears uncommon, and not significantly above events observed from single modalities. This work will be of interest to both synaptic physiologists and neurophysiologists working on sensory-motor integration.

    2. Reviewer #1 (Public Review):

      Summary:

      Otero-Coronel et al. address an important question for neuroscience - how does a premotor neuron capable of directly controlling behavior integrate multiple sources of sensory inputs to inform action selection? For this, they focused on the teleost Mauthner cell, long known to be at the core of a fast escape circuit. What is particularly interesting in this work is the naturalistic approach they took. Classically, the M-cell was characterized, both behaviorally and physiologically, using an unimodal sensory space. Here the authors make the effort (substantial!) to study the physiology of the M-cell taking into account both the visual and auditory inputs. They performed well-informed electrophysiological approaches to decipher how the M-cell integrates the information of two sensory modalities depending on the strength and temporal relation between them.

      Strengths:

      The empirical results are convincing and well-supported. The manuscript is well-written and organized. The experimental approaches and the selection of stimulus parameters are clear and informed by the bibliography. The major finding is that multisensory integration increases the certainty of environmental information in an inherently noisy environment.

      Weaknesses:

      Even though the manuscript and figures are well organised, I found myself struggling to understand key points of the figures.

      For example, in Figure 1 it is not clear what are actually the Tonic and Phasic components. The figure will benefit from more details on this matter. Then, in Figure 4 the label for the traces in panel A is needed since I was not able to pick up that they were coming from different sensory pathways.

      In line 338 it should be optic tectum and not "optical tectum".

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well-written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength and MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

      Strengths:

      The methods applied are challenging and appropriate and appear to be well executed. Open questions about the physiological underpinnings of M-cell function are addressed using sound experimental design and methodology, and convincing results are provided that advance our understanding of how two streams of sensory information can interact to control behavior.

      Weaknesses:

      Our concerns about the manuscript are captured in the following specific comments, which we hope will provide a useful perspective for readers and actionable suggestions for the authors.

      Comment 1 (Minor):

      Line 124. Direct stimulation of the tectum to drive M-cell-projecting tectal neurons not only bypasses the retina, it also bypasses intra-tectal processing and inputs to the tectum from other sources (notably the thalamus). This is not an issue with the interpretation of the results, but this description gives the (false) impression that bypassing the retina is sufficient to prevent adaptation. Adding a sentence or two to accurately reflect the complexity of the upstream circuitry (beyond the retina) would be welcome.

      Comment 2 (Major):

      The premise is that stimulation of the tectum is a proxy for a visual stimulus, but the tectum also carries the auditory, lateral line, and vestibular information. This seems like a confound in the interpretation of this preparation as a simple audio-visual paradigm. Minimally, this confound should be noted and addressed. The first heading of the Results should not refer to "visual tectal stimuli".

      Comment 3 (Major):

      Figure 1 and associated text.

      It is unclear and not mentioned in the Methods section how phasic and tonic responses were calculated. It is clear from the example traces that there is a change in tonic responses and the accumulation of subthreshold responses. Depending on how tonic responses were calculated, perhaps the authors could overlay a low-passed filtered trace and/or show calculations based on the filtered trace at each tectal train duration.

      Comment 4 (Minor):

      Figure 3 and associated text.<br /> This is a lovely experiment. Although it is not written in text, it provides logic for the next experiment in choosing a 50ms time interval. It would be great if the authors calculated the first timepoint at which the percentage of shunting inhibition is not significantly different from zero. This would provide a convincing basis for picking 50ms for the next experiment. That said, I suspect that this time point would be earlier than 50m s. This may explain and add further complexity to why the authors found mostly linear or sublinear integration, and perhaps the basis for future experiments to test different stimulus time intervals. Please move calculations to Methods.

      Comment 5 (Major):

      Figure 4C and lines 398-410.<br /> These are beautiful examples of M-cell firing, but the text suggests that they occurred rarely and nowhere close to significantly above events observed from single modalities. We do not see this as a valid result to report because there is insufficient evidence that the phenomenon shown is consistent or representative of your data.

    4. Author response:

      Answers to Reviewer #1 (Public Review):

      (1) Tonic and phasic components in Figure 1 are not clear.

      We will reformulate Figure 1A to show how the tonic and phasic components were measured. As this point was also raised by Reviewer #2 (Comment 3), we will explicitly clarify this in the Methods section. We will modify the color scheme to improve clarity.

      (2) Labeling of traces in Figure 4.

      We will add labels to traces informing which sensory pathways were stimulated to produce each response.

      (3) Optic tectum instead of optical tectum.

      We apologize for the error. We will replace “optical tectum” with “optic tectum” as also suggested by Reviewer #2.

      Answers to Reviewer #2 (Public Review):<br /> (1) Complexity of tectum upstream circuitry (Comments 1 and 2).

      Processing of visual information is certainly a major role of the tectum, but it is true that it also receives sensory inputs from other structures including sensory pathways. We will acknowledge this complexity in our revised manuscript along with suggestions for heading titles.

      (2) Figure 1 and associated text. 

      As mentioned in the provisional answer point 1 to Reviewer #1, we will reformulate Figure 1A and clarify how tonic and phasic responses were calculated.

      (3) Figure 3 and associated text.

      We will perform the analysis suggested by the reviewer and move calculations to the Methods section as requested.

      (4) Figure 5C and lines 398-410.

      We will consider omitting Figure 5C or clearly stating its value in the context of the rest of the data and our previous behavioral experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors explore mechanisms through which T-regs attenuate acute pain using a heat sensitivity paradigm. Analysis of available transcriptomic data revealed expression on the proenkephalin (Penk) gene in T-regs. The authors explore the contribution of T-reg Penk in the resolution of heat sensitivity.

      Strengths:

      Investigating the potential role of T-reg Penk in the resolution of acute pain is a strength.

      Weaknesses:

      The overall experimental design is superficial and lacks sufficient rigor to draw any meaningful conclusions.

      We hope that the reviewer will reconsider this severe criticism after examining the updated manuscript and results.

      For instance:

      (1) The were no TAM controls. What is the evidence that TAM does not alter heat-sensitive receptors.

      the impact of TMX on heat perception is not the object of this study. Nevertheless, it appears that heat-sensitivity in controls WT (blue dots) is slightly diminished after TMX administration (Figure 5A), suggesting that heat-sensitive receptors are moderately altered by TMX per se. This reduction is much more pronounced for LOX mice. Thus, although it is possible that TMX play a marginal role on heat sensitivity by itself, the results show a much more pronounced effect of TMX in LOX than in WT, in favor of a role for Penk Treg in heat sensitivity.

      (2) There are no controls demonstrating that recombination actually occurred. How do the authors know a single dose of TAM is sufficient?

      these results are now presented in figure S4. A 70% reduction in Penk mRNA is observed in Treg after a single administration of TMX.

      (3) Why was only heat sensitivity assessed? The behavioral tests are inadequate to derive any meaningful conclusions. Further, why wasn't the behavioral data plotted longitudinally

      The longitudinal data are presented in figure S5A. New behavioral tests have been performed and the results are now shown in figure S5E-H. Importantly, heat sensitivity was observed in two independent laboratory with two different tests.

      Reviewer #2 (Public Review):

      Summary:

      The present study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Strengths:

      The manuscript is clear and reveals a previously unappreciated role of enkephalins, as released by immune cells, in sensory perception. The rationale in this manuscript is easy to follow, and conclusions are well supported by data.

      Weaknesses:

      The sensory deficit of Penk cKO appears to be quite limited compared to control littermates.

      Reviewer #3 (Public Review):

      Summary:

      Aubert et al investigated the role of PENK in regulatory T cells. Through the mining of publicly available transcriptome data, the authors confirmed that PENK expression is selectively enriched in regulatory but not conventional T cells. Further data mining suggested that OX40, 4-1BB as well as BATF, can regulate PENK expression in Tregs. The authors generated fate-mapping mice to confirm selective PENK expression in Tregs and activated effector T cells in the colon and spleen. Interestingly, transgenic mice with conditional deletion of PENK in Tregs resulted in hypersensitivity to heat, which the authors attributed to heat hyperalgesia.

      Strengths:

      The generation of transgenic mice with conditional deletion of PENK in foxp3 and PENK fate-mapping is novel and can potentially yield significant findings. The identification of upstream signals that regulate PENK is interesting but unlikely to be the main reason why PENK is predominantly expressed in Tregs as both BATF and TNFR are expressed in effector T cells.

      Weaknesses:

      There is a lack of direct evidence and detailed analysis of Tregs in the control and transgenic mice to support the authors' hypothesis. PENK was previously reported to be expressed in skin Tregs and play a significant role in regulating skin homeostasis: this should be considered as an alternative mechanism that may explain the changed sensitivity to heat observed in the paper.

      We now provide a detailed analysis of Treg with or without Penk, from their immunosuppressive functions to their colocalization with sensory neurons in the skin, supporting their function as natural analgesics. The alternate hypothesis relative to skin homeostasis is now clearly presented and discussed.

      Recommendations for the authors):

      Reviewer #2 (Recommendations For The Authors):

      Most of my comments should be addressable in a revised manuscript but will require additional analysis.

      Major:

      - According to flow cytometry analysis, Penk is expressed mostly in Treg of the skin and colon. What may account for such restricted expression? Where could Treg-released enkephalins act?

      We now rephrased the paper to emphasize the known role of Batf in tissue Treg differentiation. We believe the Batf dependency of Penk expression is the reason why tissue Treg are more enriched in Penk than Treg from lymphoid organs. This is now clearly discussed.

      We also provide a new figure (Figure S1) that shows that binding of Batf and co factors AP1 and IRF4 were reported to bind to Penk regulatory regions. Altogether, the role of Batf in tissue Treg differentiation would explain why tissue Treg such as colon and skin are particularly enriched in Penk. This is now clearly stated in the revised manuscript. 

      As to know where Treg-released enkephalins act, we performed immunostainings in the skin and observed that Treg could colocalize with sensory neurons (shown in a new figure 5, panel D). This observation raise the hypothesis that  Treg-released enkephalins could act on sensory neurons locally.

      - Which mechanism can underlie heat hypersensitivity in Penk cKO mice? Which sensory neurons are involved? Are other sensory modalities affected, such as mechanical sensitivity?

      As stated above, we show that Treg can be in close contact with thermal sensors neurons producing CGRP. These data are shown in figure 5D. We have also tested may other nociceptive stimulus (innocuous and noxious) and did not detect significant differences. These data are presented as a supplementary figure S5. Whether enkephalins produced by Treg can change the stimulation threshold of various nervous fibers is currently performed by electrophysiology.

      - No control is provided to ensure that Penk is selectively excised in Treg cells in cKO mice.

      We have performed additional experiments with fluorescent probes to document Penk mRNA expression in cKO mice. The results on the specific expression of Penk mRNA in various subsets post-TMX are shown in a supplementary figure S4.

      - The authors acknowledge that Penk from Treg was previously studied in an animal model of inflammatory pain. However, which role these endogenous opioids play is unclear, especially since authors discovered that enkephalins are likely continuously released at steady states. This is not enough discussed in the narrative, which surprisingly does not separate the results from the discussion.

      The results and discussion are now separated in two sections.

      Minors:

      - Replace "Fox3 1" with "Fox31" (line 31), "functions 15" with "functions15" (line 43), "BATF 19" with "BATF19" (line 85).

      - Text mentions Figure S4 (line 125), which is most likely S3.

      Reviewer #3 (Recommendations For The Authors):

      Given the most significant finding of this paper is based on the heat-induced pain model, there is surprisingly little analysis of Tregs in this context. The authors analyzed spleen and colon Tregs at steady state, it is unclear whether any of these Tregs are involved in pain sensitivity directly. Skin Tregs or other relevant Tregs to this model should be analyzed in control and Lox mice. This is particularly relevant as PENK expression was previously reported in skin Tregs and plays a significant role in skin homeostasis (Yamazaki et al 2020 PNAS). Does PENK conditional deletion alter Treg frequencies, numbers, and immune suppressive function? Not even spleen or colon Treg were analyzed comparing control and lox mice.

      We now provide evidences showing unaltered immunosuppressive functions of Treg in the absence of Penk (Figure 4), and more importantly unaffected proportions of skin Treg in mice lacking Penk in Treg, at the very site of heat stimulation (Figure 5B-C). We also observed unaffected representation of Treg in the spleen and lymph nodes, but we do not feel that these data are necessary to interpret the results.

      Given the role of PENK in skin Tregs, could the observed effect in Figure 4 be due to altered skin homeostasis rather than sensitivity to pain?

      The reviewer is referring to a paper where Penk in skin Treg play a role on UV-damaged keratinocytes in vivo (Shime et al., 2020, PNAS). To our knowledge, a role for Penk produced by skin Treg on keratinocytes homeostasis at the steady state is currently unknown. Nevertheless, this hypothesis is now clearly stated and discussed in the manuscript.

      The authors stated that only after 7 days post tamoxifen treatment was heat hyperalgesia observed: deletion of PENK in Treg but not Tconv should be confirmed: is deletion only complete after 7 days or is the effect observed due to indirect effects of altered "normal" Treg function?

      We have performed a kinetics to document Penk deletion at D3, D7 and 30 post-TMX. Results show a specific deletion of Penk in Treg at all time points so we combined all the time points for the representation of the results (Figure S4). As for the indirect effects of “altered” normal function, we now provide the reader with a new figure (Figure 4), showing that Penk deficient Treg are not impaired in their suppressive function in vitro and in vivo.  

      Actual protein/peptide production of enkephalins by Tregs should be confirmed. It is also unclear which peptide(s) can be secreted and presumably responsible for the changes in heat sensitivity.

      This is a very interesting question that we addressed with a MENK ELISA but without success at reproducing the results. An ongoing project will use mass spectrometry to fully characterize the peptides produced by Treg and activated Tconv.

      The analysis of PENK regulation by Tregs is interesting despite them being entirely based on data mining. BATF is a pioneering factor expressed by all activated effector T cells. While the connection between BATF and PENK may explain why the authors observed PENK expression chiefly in activated effectors and Tregs, BATF cannot be the reason why PENK is "predominantly" expressed by Tregs. Similarly, 4-1BB and OX40 can be induced on effector T cells. Is PENK under the control of Foxp3? There are lots of publically available datasets on Foxp3/IL-2 dependent Treg signatures through which this can be addressed.

      We now provide a supplementary figure (Figure S1), showing a compilation of ChIP Seq studies for various transcription factors in various T cell subsets. We provide the reader with a list of all the TF that have been reported to bind in the regulatory regions of Penk. In agreement with our hypothesis, BATF, FOXP3, IRF4 and several others are present in that list. Further work is needed to decipher the exact contribution of each of those TF to the regulation of Penk in Treg vs activated Tconv that is beyond the scope of this report.

    2. eLife assessment

      This study presents a valuable finding on a new role of Foxp3+ regulatory T cells in sensory perception, which may have an impact on our understanding of somatosensory perception. The authors identified a previously unappreciated action of enkephalins released by immune cells in the resolution of pain and several upstream signals that can regulate the expression of the proenkephalin gene PENK in Foxp3+ Tregs. The generation of transgenic mice with conditional deletion of PENK in Foxp3+ cells and PENK fate-mapping is novel and generates compelling data; they also show a comprehensive analysis of Tregs in control and transgenic mice, longitudinal data on heat sensitivity and co-localization of PENK+ Tregs with thermal sensory neurons in the skin further supporting their hypothesis. The study would be of interest to the biologists working in the field of neuroimmunology and inflammation.

    3. Public Review:

      The study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Editors' note: The authors accepted most if not all the suggestions given by the reviewers and the revised version of the manuscript is substantially improved.

    1. eLife assessment

      Here, the authors developed a cell-based screening assay for the identification of small molecule inhibitors of nonsense-mediated decay (NMD), and used it to validate KVS0001, a new small molecule SMG1 kinase inhibitor derived from the existing inhibitor SMG1i-11, showing it inhibits NMD in cultured cells leading to expression of neoantigens from NMD-targeted genes and slows tumor growth of cancer cell lines possessing a significant number of out-of-frame indel mutations. The conclusions are supported by convincing evidence, and the significance of this work consists in the development of a new and very promising NMD inhibitor drug that acts as an inhibitor of the SMG1 NMD kinase and is effective in animal tumor studies. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal applications.

    2. Reviewer #1 (Public Review):

      Summary:

      This work identified new NMD inhibitors and tested them for cancer treatment, based on the hypothesis that inhibiting NMD could lead to the production of cancer neoantigens from the stabilized mutant mRNAs, thereby enhancing the immune system's ability to recognize and kill cancer cells. Key points of the study include:

      • Development of an RNA-seq based method for NMD analysis using mixed isogenic cells that express WT or mutant transcripts of STAG2 and TP53 with engineered truncation mutations.

      • Application of this method for a drug screen and identified several potential NMD inhibitors.

      • Demonstration that one of the identified compounds, LY3023414, inhibits NMD by targeting the SMG1 protein kinase in the NMD pathway in cultured cells and mouse xenografts.

      • Due to the in vivo toxicity observed for LY3023414, the authors developed 11 new SMG1 inhibitors (KVS0001-KVS0011) based on the structures of the known SMG1 inhibitor SMG1i-11 and the SMG1 protein itself.

      • Among these, KVS0001 stood out for its high potency, excellent bioavailability and low toxicity in mice. Treatment with KVS0001 caused NMD inhibition and increased presentation of neoantigens on MHC-I molecules, resulting in the clearance of cancer cells in vitro by co-cultured T cells and cancer xenografts in mice by the immune system.

      These findings support the strategy of targeting the NMD pathway for cancer treatment and provide new research tools and potential lead compounds for further exploration.

      Strengths:

      The RNA-seq based NMD analysis, using isogenic cell lines with specific NMD-inducing mutations, represents a novel approach for the high-throughput identification of potential NMD modulators or genetic regulators. The effectiveness of this method is exemplified by the identification of a new activity of AKT1/mTOR inhibitor LY3023414 in inhibiting NMD.

      The properties of KVS0001 described in the manuscript as a novel SMG1 inhibitor suggest its potential as a lead compound for further testing the NMD-targeting strategies in cancer treatment. Additionally, this compound may serve as a useful research tool.

      The results of the in vitro cell killing assay and in vivo xenograft experiments in both immuno-proficient and immune-deficient mice indicate that inhibiting NMD could be a viable therapeutic strategy for certain cancers.

      Weaknesses:

      The authors did not address the potential effects of NMD/SMG1 inhibitors on RNA splicing. Given that the transcripts of many RNA-binding proteins are natural targets of NMD, inhibiting NMD could significantly alter splicing patterns. This, in turn, might influence the outcomes of the RNA-seq-based method for NMD analysis and result interpretation.

      While the RNA-seq based approach offers several advantages for analyzing NMD, the effects of NMD/SMG1 inhibitors observed through this method should be confirmed using established NMD reporters. This step is crucial to rule out the possibility that mutations in STAG2 or TP53 affect NMD in cells, as well as to address potential clonal variations between different engineered cell lines.

      The results from the SMG1/UPF1 knockdown and SMG1i-11 experiments presented in Figure 3 correlate with the effects seen for LY3023414, but they do not conclusively establish SMG1 as the direct target of LY3023414 in NMD inhibition. An epistatic analysis with LY3023414 and SMG1-knockdown is needed.

      Comment on the revised version:

      Although KVS0001 exhibits promising properties as an SMG1 inhibitor for cancer treatment, it remains unclear if it is superior to existing SMG1 inhibitors, as no direct comparisons have been made.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work identified new NMD inhibitors and tested them for cancer treatment, based on the hypothesis that inhibiting NMD could lead to the production of cancer neoantigens from the stabilized mutant mRNAs, thereby enhancing the immune system's ability to recognize and kill cancer cells. Key points of the study include:

      • Development of an RNA-seq based method for NMD analysis using mixed isogenic cells that express WT or mutant transcripts of STAG2 and TP53 with engineered truncation mutations.

      • Application of this method for a drug screen and identified several potential NMD inhibitors.

      • Demonstration that one of the identified compounds, LY3023414, inhibits NMD by targeting the SMG1 protein kinase in the NMD pathway in cultured cells and mouse xenografts.

      • Due to the in vivo toxicity observed for LY3023414, the authors developed 11 new SMG1 inhibitors (KVS0001-KVS0011) based on the structures of the known SMG1 inhibitor SMG1i-11 and the SMG1 protein itself.

      • Among these, KVS0001 stood out for its high potency, excellent bioavailability, and low toxicity in mice. Treatment with KVS0001 caused NMD inhibition and increased presentation of neoantigens on MHC-I molecules, resulting in the clearance of cancer cells in vitro by co-cultured T cells and cancer xenografts in mice by the immune system.

      These findings support the strategy of targeting the NMD pathway for cancer treatment and provide new research tools and potential lead compounds for further exploration.

      Strengths:

      The RNA-seq-based NMD analysis, using isogenic cell lines with specific NMD-inducing mutations, represents a novel approach for the high-throughput identification of potential NMD modulators or genetic regulators. The effectiveness of this method is exemplified by the identification of a new activity of AKT1/mTOR inhibitor LY3023414 in inhibiting NMD.

      The properties of KVS0001 described in the manuscript as a novel SMG1 inhibitor suggest its potential as a lead compound for further testing the NMD-targeting strategies in cancer treatment. Additionally, this compound may serve as a useful research tool.

      The results of the in vitro cell killing assay and in vivo xenograft experiments in both immuno-proficient and immune-deficient mice indicate that inhibiting NMD could be a viable therapeutic strategy for certain cancers.

      Weaknesses:

      The authors did not address the potential effects of NMD/SMG1 inhibitors on RNA splicing. Given that the transcripts of many RNA-binding proteins are natural targets of NMD, inhibiting NMD could significantly alter splicing patterns. This, in turn, might influence the outcomes of the RNA-seq-based method for NMD analysis and result interpretation.

      This is a very important comment that highlights an important aspect of NMD and potential exciting downstream studies. We did not systematically assess RNA splicing in our work as we are not sure if inhibition of NMD would induce cancer specific splicing that would allow for tumor targeting. It is well established that NMD can impact splicing, including modulating cryptic exon expression, but finding and assessing antigenicity of targetable tumor specific antigens constitutes a study in and of its own. Our own data in figure 4C-F supports this, as a point mutation near a splice site in TP53 strongly induced NMD which was subsequently stopped by KVS0001 treatment. Doing a systematic review of this effect we feel is outside the scope of this manuscript. We’ve incorporated a comment into our discussion highlighting this deficiency, but certainly find the idea of mining RNA-splicing changes an exciting next endeavor.

      While the RNA-seq-based approach offers several advantages for analyzing NMD, the effects of NMD/SMG1 inhibitors observed through this method should be confirmed using established NMD reporters. This step is crucial to rule out the possibility that mutations in STAG2 or TP53 affect NMD in cells, as well as to address potential clonal variations between different engineered cell lines.

      This is possible, but we want to highlight that all hits from the screen were confirmed in a separate cell line with different clones. While this will not rule out effects to NMD due to STAG2 and TP53 knockdown, the final lead compound was also tested on different endogenous transcripts in both indel and normal transcripts controlled by NMD (i.e., ATF4) in multiple species (human and mouse).  Importantly, many of these assays employed the non-mutated transcripts from heterozygous mutant cells to ensure that cis-acting NMD was being measured and to control for any trans-acting splicing or other unanticipated biochemical effects.

      The results from the SMG1/UPF1 knockdown and SMG1i-11 experiments presented in Figure 3 correlate with the effects seen for LY3023414, but they do not conclusively establish SMG1 as the direct target of LY3023414 in NMD inhibition. An epistatic analysis with LY3023414 and SMG1-knockdown is needed.

      This is a great comment, and is supported by the recent push to confirm drug targets by chemical probes or knockout followed by loss of further effect due to the application of the drug in question. We attempted to knockout SMG1 in multiple cells lines used in this study, including RPE1, MCF10A, NCI-H358 and LS180, and were unable to obtain clones that have biallelic out of frame indels. We were able to obtain multiple clones with in frame indels. Based on our results and those in the publicly available database DepMap we suspect this gene is likely essential, making a simple knockout unfeasible. While this uncertainty is important to keep in mind, we feel it does not detract from the reporting of a novel NMD screen that is mechanistically agnostic and of a novel in vivo active NMD inhibitor.

      Reviewer #2 (Public Review):

      Summary:

      Several publications during the past years provided evidence that NMD protects tumor cells from being recognized by the immune system by suppressing the display of neoantigens, and hence NMD inhibition is emerging as a promising anti-cancer approach. However, the lack of an efficacious and specific small-molecule NMD inhibitor with suitable pharmacological properties is currently a major bottleneck in the development of therapies that rely on NMD inhibition. In this manuscript, the authors describe their screen for identifying NMD inhibitors, which is based on isogenic cell lines that either express wild-type or NMD-sensitive transcript isoforms of p53 and STAG2. Using this setup, they screened a library of 2658 FDA-approved or late-phase clinical trial drugs and had 8 hits. Among them they further characterized LY3023414, showing that it inhibits NMD in cultured cells and in a mouse xenograft model, where it, however, was very toxic. Because LY3023414 was originally developed as a PI3K inhibitor, the authors claim that it inhibits NMD by inhibiting SMG1. While this is most likely true, the authors do not provide experimental evidence for this claim. Instead, they use this statement to switch their attention to another previously developed SMG1 inhibitor (SMG1i-11), of which they design and test several derivatives. Of these derivatives, KVS0001 showed the best pharmacological behavior. It upregulated NMD-sensitive transcripts in cultured cells and the xenograft mouse model and two predicted neoantigens could indeed be detected by mass spectrometry when the respective cells were treated with KVS0001. A bispecific antibody targeting T cells to a specific antigen-HLA complex led to increased IFN-gamma release and killing of cancer cells expressing this antigen-HLA complex when they were treated with KVS0001. Finally, the authors show that renal (RENCA) or lung cancer cells (LLC) were significantly inhibited in tumor growth in immunocompetent mice treated with KVS0001. Overall, this establishes KVS0001 as a novel and promising ant-cancer drug that by inhibiting SMG1 (and therewith NMD) increases the neoantigen production in the cancer cells and reveals them to the body's immune system as "foreign".

      Strengths:

      The novelty and significance of this work consists in the development of a novel and - judging from the presented data - very promising NMD inhibiting drug that is suitable for applications in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application. It will be still a long way with many challenges ahead towards an efficacious NMD inhibitor that is safe for use in humans, but KVS0001 appears to be a molecule that bears promise for follow-up studies. In addition, while the idea of inhibiting NMD to trigger neoantigen production in cancer cells and so reveal them to the immune system has been around for quite some time, this work provides ample and compelling support for the feasibility of this approach, at least for tumors with a high mutational burden.

      Main weaknesses:

      There is a disconnect between the screen and the KVS0001 compound, that they describe and test in the second part of the manuscript since KVS0001 is a derivative of the SMG1 inhibitors developed by Gopalsamy et al. in 2012 and not of the lead compound identified in the screen (LY3023414). Because of high toxicity in the mouse xenograft experiments, the authors did not follow up LY3023414 but instead switched to the published SMG1i-11 drug of Gopalsamy and colleagues, a molecule that is widely used among NMD researchers for NMD inhibition in cultured cells. Therefore, in my view, the description of the screen is obsolete, and the paper could just start with the optimization of the pharmacological properties of SMG1i-11 and the characterization of KVS0001. Even though the screen is based on an elegant setup and was executed successfully, it was ultimately a failure as it didn't reveal a useful lead compound that could be further optimized.

      This is a helpful observation from an outside perspective. From our point of view, we were only alerted to the targeting SMG1 due to the previously reported off-target effects of LY3023414 on SMG and lack of plausible explanation for PIK3CA inhibition to efficiently inhibit NMD. We do feel that the screen is worth including for two reasons. First, it offers an unbiased approach for querying the entire NMD pathway for vulnerabilities useful to target. The library chosen was quite small, so the screen itself could be useful to others with larger libraries to test. Second, it did help identify SMG1 as the ideal target for NMD disruption. While targeting SMG1 is not novel, we felt it highlighted why we chose to develop KVS0001. To address this reviewer’s comment, we’ve included a couple sentences in the results and discussion strengthening the point that the screen provided an unbiased approach to finding the best target in the pathway to disrupt NMD and elaborating on the transition from LY3023414 and the screen to development of KVS0001.

      Additional points:

      - Compared to SMG1i-11, KVS0001 seems less potent in inhibiting SMG1 (higher IC50). It would therefore be important to also compare the specificity of both drugs for SMG1 over other kinases at the applied concentrations (1 uM for SMG1i-11, 5 uM for KVS0001). The Kinativ Assay (Fig. S13) was performed with 100 nM KVS0001, which is 50-fold less than the concentration used for functional assays and hence not really meaningful. In addition, more information on the pharmacokinetic properties and toxicology of KVS0001 would allow a better judgment of the potential of this molecule as a future therapeutic agent.

      We agree that the Kinativ assay may have poorly represented the activity of KVS0001 at the bioactive concentration. We have now added 1uM Kinativ data, the highest concentration we were able to run to figure S13.

      - On many figures, the concentrations of the used drugs are missing. Please ensure that for every experiment that includes drugs, the drug concentration is indicated.

      We apologize for this oversight and have added all drug concentrations on the appropriate plots.

      - Do the authors have an explanation for why LY3023414 has a much stronger effect on the p53 than on the STAG2 nonsense allele (Figure 1B, S8), whereas emetine upregulates the STAG2 nonsense alleles more than the p53 nonsense allele (Figure S5). I find this curious, but the authors do not comment on it.

      This is an interesting observation. The short answer is we’re not sure. The speculative answer is that it is related to the distinctly different mechanisms of actions of the two inhibitors (see comments from reviewing editor below).

      - While it is a strength of the study that the NMD inhibitors were validated on many different truncation mutations in different cell lines, it would help readers if a table or graphic illustration was included that gives an overview of all mutant alleles tested in this study (which gene, type of mutation, in which cell type). In the current version, this information is scattered throughout the manuscript.

      This is an excellent suggestion. We’ve included a new table S1 which incorporates the details of each cell line and the genes used in each for this study.

      - Lines 194 and 302: That SMG1i-11 was highly insoluble in the hands of the authors is surprising. It is unclear why they used variant 11j, since variant 11e of this inhibitor is widely used among NMD researchers and readily dissolves in DMSO.

      As this referee notes SMG1i-11 is soluble in DMSO in our hands as well, which enabled us to use it for our in vitro work. Unfortunately, the concentrations of DMSO required to dissolve the compound to suitable concentrations for in vivo work were too high to safely use in mice with our animal protocols. We also attempted to use ethanol, which also did dissolve SMG1i-11, but led to a significant amount of toxicity in both the drug and vehicle control arms.

      - Line 296: The authors claim that they were able to show that LY3023414 inhibited the SMG1 kinase, which is not true. To show this, they would have for example to show that LY3023414 prevents SMG1-mediated UPF1 phosphorylation, as they did for KVS0001 and SMG1i-11 in Fig. 3F. Unless the authors provide this data, the statement should be deleted or modified.

      We’ve modified this statement as requested by the referee, now saying we suspected SMG1 was the target based on previously published work.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      Your paper has been assessed by two reviewers with expertise in the NMD field. They both find the identification and characterization of a new potent and selective inhibitor of the SMG1 NMD kinase with in vivo activity to represent a significant advance in the field, and one that could ultimately be of value as the basis for a novel cancer therapy. However, as you will see both reviewers have concerns about whether the SMG1 inhibitor screen you developed belongs in the paper because it was not used to identify the KVS0001 inhibitor, which instead was generated based on a previously published set of SMG1 inhibitors, and because the NMD inhibitor that did emerge from your screen, LY3023414, was not shown to be a direct inhibitor of SMG1 kinase activity. While it is an elegant screen, during the revision of the paper you could consider streamlining the manuscript by emphasizing how the screening assay was used to validate KVS0001, and bolstering the characterization of the new KVS0001 NMD inhibitor by conducting the proposed additional experiments.

      Each of the reviewers raises additional points that should be addressed in a revised version.

      The reviewing editor has two additional points:

      (1) While emetine inhibits NMD, it is not really a direct NMD inhibitor, as implied, but rather a potent protein synthesis elongation inhibitor that acts by binding to the E-site of the 40S ribosomal subunit, and is therefore, like anisomycin, another protein synthesis inhibitor, working indirectly to inhibit NMD. This should be acknowledged in the section where emetine is first used as an "NMD inhibitor".

      This has been included in the indicated section at the referee’s request.  

      (2) To establish that the observed phenotypic effects of KVS0001 are due to on-target inhibition of SMG1, the authors could generate and express an SMG1 point mutant that is resistant to KVS0001 inhibition, which could be based on the SMG1 catalytic domain structure that the authors used originally to design KVS001. Inhibitor-resistant kinase mutants are the gold standard for demonstrating that the biological consequences of a novel protein kinase inhibitor are due to on-target effects. Admittedly, because SMG1 is such a huge protein, this may be technically challenging and is likely beyond the scope of the present paper.

      -We agree with the reviewing editor on all accounts: this would be an ideal experiment to run, but also that it is beyond the scope of the present paper. As indicated in our discussion above with reviewer 1, SMG1 knockout was not possible in our hands, and we suspect it may be due to the gene being essential. Creating an inhibitor resistant mutant could overcome this issue and create an ideal model to test the target for KVS0001. Unfortunately finding such a mutant would likely require significant amounts of trial and error to create a resistant mutant that did not lose SMG1 function. And SMG1 is huge, creating technical issues for experimenting. Due to the anticipated amount of work for such a study we believe this would be better accomplished in future studies.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors did not mention a new SMG1 inhibitor and its effects described in Cheruiyot et al, Cancer Res 2019 (PMID: 34215620).

      A comment regarding this discovery and its implications for our work was added to the discussion.

      (2) There is an inconsistency between the manuscript text and methods sections regarding the time of drug treatment (16 hours vs 14 hours) in the HTS screen.

      This has been double checked in our notebook and fixed to reflect 16hrs as the correct incubation time. Thank you for identifying that clerical oversight.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 61: The references to NMD reviews are very old (refs 20 and 21). I suggest citing more recent, up-to-date reviews instead.

      Two additional references, one from 2016 and another from 2023, have been added to increase support for this statement in the introduction.

      (2) Figure S1: Shouldn't the caption of the right panel (TP32 data) say "clone 221" rather than "clone 22"?

      This has been fixed.

      (3) Figure S18: Please indicate on the y-axis that you are displaying RPKM for p53.

      This has been fixed.

      (4) Figures 4D and S19: Please indicate concentrations used for all drugs.

      This has been fixed.

    4. Reviewer #2 (Public Review):

      Summary:

      Several publications during the past years provided evidence that NMD protects tumor cells from being recognized by the immune system by suppressing the display of neoantigens, and hence NMD inhibition is emerging as a promising anti-cancer approach. However, the lack of an efficacious and specific small molecule NMD inhibitor with suitable pharmacological properties is currently a major bottleneck in the development of therapies that rely on NMD inhibition. In this manuscript, the authors describe their screen for identifying NMD inhibitors, which is based on isogenic cell lines that either express wild-type or NMD-sensitive transcript isoforms of p53 and STAG2. Using this setup, they screened a library of 2658 FDA-approved or late-phase clinical trial drugs and had 8 hits. Among them they further characterized LY3023414, showing that it inhibits NMD in cultured cells and in a mouse xenograft model, where it, however, was very toxic. Because LY3023414 was originally developed as a PI3K inhibitor, the authors claim that it inhibits NMD by inhibiting SMG1. While this is most likely true, the authors do not provide experimental evidence for this claim. Instead, they use this statement to switch their attention to another previously developed SMG1 inhibitor (SMG1i-11), of which they design and test several derivatives. Of these derivatives, KVS0001 showed the best pharmacological behavior. It upregulated NMD-sensitive transcripts in cultured cells and the xenograft mouse model, and two predicted neoantigens could indeed be detected by mass spectrometry when the respective cells were treated with KVS0001. A bispecific antibody targeting T cells to a specific antigen-HLA complex led to increased IFN-gamma release and killing of cancer cells expressing this antigen-HLA complex when they were treated with KVS0001. Finally, the authors show that renal (RENCA) or lung cancer cells (LLC) were significantly inhibited in tumor growth in immunocompetent mice treated with KVS0001. Overall, this establishes KVS0001 as a novel and promising ant-cancer drug that by inhibiting SMG1 (and therewith NMD) increases the neoantigen production in the cancer cells and reveals them to the body's immune system as "foreign".

      Strengths:

      The novelty and significance of this work consist in the development of a novel and - judging from the presented data - very promising NMD inhibiting drug that is suitable for applications in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application. It will be still a long way with many challenges ahead towards an efficacious NMD inhibitor that is safe for use in humans, but KVS0001 appears to be a molecule that bears promise for follow-up studies. In addition, while the idea of inhibiting NMD to trigger neoantigen production in cancer cells and so reveal them to the immune system has been around for quite some time, this work provides ample and compelling support for the feasibility of this approach, at least for tumors with a high mutational burden.

      Main weaknesses:

      There is a disconnect between the screen and the KVS0001 compound, that they describe and test in the second part of the manuscript since KVS0001 is a derivative of the SMG1 inhibitors developed by Gopalsamy et al. in 2012 and not of the lead compound identified in the screen (LY3023414). Because of high toxicity in the mouse xenograft experiments, the authors did not follow up LY3023414 but instead switched to the published SMG1i-11 drug of Gopalsamy and colleagues, a molecule that is widely used among NMD researchers for NMD inhibition in cultured cells. Therefore, in my view, the description of the screen is obsolete, and the paper could just start with the optimization of the pharmacological properties of SMG1i-11 and the characterization of KVS0001. Even though the screen is based on an elegant setup and was executed successfully, it was ultimately a failure as it didn't reveal a useful lead compound that could be further optimized.

      Additional points:

      - Compared to SMG1i-11, KVS0001 seems less potent in inhibiting SMG1 (higher IC50). It would therefore be important to also compare the specificity of both drugs for SMG1 over other kinases at the actually applied concentrations (1 uM for SMG1i-11, 5 uM for KVS0001). The Kinativ Assay (Fig. S13) was performed with 100 nM KVS0001, which is 50-fold less than the concentration used for functional assays and hence not really meaningful. In addition, more information on the pharmacokinetic properties and toxicology of KVS0001 would allow a better judgment of the potential of this molecule as a future therapeutic agent.<br /> - On many figures, the concentrations of the used drugs are missing. Please ensure that for every experiment that includes drugs, the drug concentration is indicated.<br /> - Do the authors have an explanation for why LY3023414 has a much stronger effect on the p53 than on the STAG2 nonsense allele (Fig. 1B, S8), whereas emetine upregulates the STAG2 nonsense alleles more than the p53 nonsense allele (Fig. S5). I find this curious, but the authors do not comment on it.<br /> - While it is a strength of the study that the NMD inhibitors were validated on many different truncation mutations in different cell lines, it would help readers if a table or graphic illustration was included that gives an overview of all mutant alleles tested in this study (which gene, type of mutation, in which cell type). In the current version, this information is scattered throughout the manuscript.<br /> - Lines 194 and 302: That SMG1i-11 was highly insoluble in the hands of the authors is surprising. It is unclear why they used variant 11j, since variant 11e of this inhibitor is widely used among NMD researchers and readily dissolves in DMSO.<br /> - Line 296: The authors claim that they were able to show that LY3023414 inhibited the SMG1 kinase, which is not true. To show this, they would have for example to show that LY3023414 prevents SMG1-mediated UPF1 phosphorylation, as they did for KVS0001 and SMG1i-11 in Fig. 3F. Unless the authors provide this data, the statement should be deleted or modified.

      Comments on the revised version:

      - The authors have satisfactorily addressed all my "Additional points" listed above.

      - With the new publishing model of Life, the authors ultimately decide on whether or not to follow reviewers suggestions, and in this case, the authors decided (against my suggestion) to leave the screening part in the manuscript, although it did not result in a useful lead compound. They argue it helped them define in an unbiased way SMG1 as the ideal target for NMD disruption. I would counterargue that this has been known in the field for quite a while.

      - One last suggestion I have to the authors would be to modify the statement in the abstract "This led to the design of a novel SMG1 inhibitor", because what they call "novel" is, in reality, a chemical improvement of the pharmacological properties of a previously reported SMG1 inhibitor (Gopalsamy et al., 2012).

    1. eLife assessment

      This important study analyzes in an original way how tension pattern dynamics can reveal the contribution of active versus passive intercalation during tissue elongation. The authors develop a compelling, elegant analytical framework (isogonal tension decomposition) to disentangle the passive (adjacent tissues pulling) and active (local tension anisotropy) contributions to intercalation events. This allows the generation of global maps of tissue mechanics that will be extremely helpful in the field of biomechanics.

    2. Reviewer #3 (Public Review):

      In their article "The Geometric Basis of Epithelial Convergent<br /> Extension", Brauns and colleagues present a physical analysis of drosophila axis extension that couples in toto imaging of cell contours (previously published dataset), force inference, and theory. They seek to disentangle the respective contributions of active vs passive T1 transitions in the convergent extension of the lateral ectoderm (or germband) of the fly embryo.

      The revision made by the authors has greatly improved their work, which was already very interesting, in particular the use of force inference throughout intercalation events to identify geometric signatures of active vs passive T1s, and the tension/isogonal decomposition. The new analysis of the Snail mutant adds a lot to the paper and makes their findings on the criteria for T1s very convincing.

      About the tissue scale issues raised during the first round of review. Although I do not find the new arguments fully convincing (see below), the authors did put a lot of effort to discuss the role of the adjacent posterior midgut (PMG) on extension, which is already great. That will certainly provide the interested readers with enough material and references to dive into that question.

      I still have some issues with the authors' interpretation on the role of the PMG, and on what actually drives the extension. Although it is clear that T1 events in the germ band are driven by active local tension anisotropy (which the authors show but was already well-established), it does not show that the tissue extension itself is powered by these active T1s. Their analysis of "fence" movies from Collinet et al 2015 (Tor mutants and Eve RNAi) is not fully convincing. Indeed, as the authors point out themselves, there is no flow in Tor mutant embryos, even though tension anisotropy is preserved. They argue that in Tor embryos the absence of PMG movement leaves no room for the germband to extend properly, thus impeding the flow. That suggests that the PMG acts as a barrier in Tor mutants - What is it attached to, then? The authors also argue that the posterior flow is reduced in "fenced" Eve RNAi embryos (which have less/no tension anisotropy), to justify their claim that it is the anisotropy that drives extension. However, previous data, including some of the authors' (Irvine and Wieschaus, 1994 - Fig 8), show that the first, rapid phase of germband extension is left completely unaffected in Eve mutants (that lack active tension anisotropy). Although intercalation in Eve mutants is not quantified in that reference, this was later done by others, showing that it is strongly reduced. Similarly, the Cyto-D phenotype from Clement et al 2017, in which intercalation is also strongly reduced, also displays normal extension.

    3. Reviewer #2 (Public Review):

      Main comment from 1st review:

      Weaknesses:<br /> The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      Comments on the revised version:

      My main concern was that the author did not use the analysis of mutant contexts such as Snail and Twist to confirm their predictions. They made a series of modifications, clarifying their conclusions. In particular, they now included an analysis of Snail mutant and show that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished, supporting the idea that isogonal strain could be used as an indicator of external forces (Fig7 and S6).

      They further discuss their results in the context of what was published regarding the mutant backgrounds (fog, torso-like, scab, corkscrew, ksr) where midgut invagination is disrupted, and where germ band buckles, and propose that this supports the importance of internal versus external forces driving GBE.<br /> Overall, these modifications, in addition to clarifications in the text, clearly strengthen the manuscript.

    1. eLife assessment

      This useful study describes a single set of label-chase mass spectrometry experiments to confirm the molecular function of YafK as a peptidoglycan hydrolase, and to describe the timing of its attachment to the peptidoglycan. Confirmation of the molecular function of YafK will be helpful in further studies to examine the function and regulation of the outer membrane-peptidoglycan link in bacteria. The evidence supporting the molecular function of YafK and that lpp molecules are shuffled on and off the peptidoglycan is solid, however, data supporting conclusions relating to the locations of lpp-peptidoglycan attachment are incomplete. The work will be of interest to researchers studying lipoproteins in gram negative bacteria.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses: 

      - Only one mutant (YafK) is used to make the conclusion. 

      The aim of the study is to determine the effect of the hydrolysis of the PG→Lpp bond on the dynamics of the tethering of Lpp to PG. Since YafK is the only enzyme catalyzing this reaction, it is appropriate to compare the wild-type strain to an isogenic yafK deletion mutant. Nonetheless, we carefully consider this comment and will investigate the dynamics of the tethering of Lpp to PG in mutants deficient in the production of the L,D-transpeptidases responsible for tethering Lpp to PG.

      Additional kinetic analyses were performed on strains relying on a single L,D-transpeptidase for LPP tethering to PG. Escherichia coli produces three L,D-transpeptidases catalyzing the tethering of LPP to PG (Ybis, YcfS, and ErfK). The corresponding genes were deleted from the chromosome of strain BW25113, thus generating strain BW25113Δ3. Plasmids encoding each one of these three enzymes were independently introduced in BW25113Δ3. Qualitatively, LC-MS analyses revealed similar kinetics for the four Tri-KR isotopologues purified from wild-type strain BW25113 and from the three BW25113Δ3 derivatives producing a single plasmidencoded L,D-transpeptidase (Ybis, YcfS, or ErfK) under the control of a rhamnose inducible promoter (Prha) of plasmid pHV30 (Voedts et al. EMBO J. 2021 40:e108126, doi: 10.15252/embj.2021108126) (see panel A in figure 1 below). Briefly, and as indicated in the first version of the main text, the old→new Tri→KR isotopologue was first synthesized. The new→new isotopologue was not detected 5 min after the medium switch. These results indicate that the newly-synthesized PG disaccharidepeptide subunits and Lpp are independently incorporated into the expanding PG polymer. The proportion of the new→old isotopologue exceeded that of the old→new isotopologue at around 40 min (for the strain producing ErfK) or 20 min (for the strains producing Ybis or YcfS). This is the hallmark of the activity of the YafK hydrolase that liberates existing (old) Lpp that can be tethered to newly synthesized disaccharide-peptide subunit thereby generating the new→old isotopologue. In absence of the YafK hydrolase, the relative proportion of the new→old isotopologue is lower since this isotopologue can only result from the tethering of the preexisting free forms of Lpp to newly synthesized disaccharide-peptide units. The contribution of YafK to variations in the relative abundance of the four isotopologues was also investigated by combining the relative abundance of isotopologues containing either old versus new KR (panel B) or old versus new PG stem peptide (panel C) moieties. As discussed in the first version of the manuscript for strains BW25113 and BW25113ΔyafK, this analysis revealed that the existing (old) disaccharide-tripeptide moieties in the Tri→RK isotopologues disappears more rapidly than the existing (old) KR moieties due to the hydrolysis of the old→old Tri-KR isotopologue by YafK. These results indicate that the mode of tethering of Lpp to PG and the dynamic equilibrium between the PG-tethered and free forms of Lpp are similar for the Ybis, YcfS, and ErfK L,D-transpeptidases. Quantitatively, we also noticed that the overall decrease in the relative abundance of all Tri→KR isotopologues containing existing (old) moieties was slower for the strains producing only ErfK, Ybis, or YcfS than for the wild type and ΔyafK strains.  This could be accounted for by an increase in the generation time of the former group of three strains. This is a limitation of our study because it precludes the comparison of the evolution of a particular isotopologue in several strains, as performed in Fig. 3 for strains BW25113 and BW25113ΔyafK. For this reason, we prefer to present these data in the rebuttal rather than in the manuscript. Indeed, presentation of the data in the main text would require introducing a new mode of presentation of the data (variations in the relative abundance of all four isotopologues in the same strain; see figure below) in addition to variations of the relative abundance of any one of the four isotopologues between strains (Fig. 3). Introduction of this additional mode of presentation of the data would complicate the initial manuscript in an unnecessary manner because the data obtained with mutants producing a single L,D-transpeptidase (ErfK, YbiS, or YcfS) confirmed the data obtained with the wild-type strains producing the three L,D-transpeptidases.

      Author response image 1.

      MS-based kinetic analysis of Lpp tethering to PG.

      -Time points to analyse Tri-KR isotopologues in Wt (0,10,20,40,60 min) and yafK mutant (0,15, 25, 40, 60 min) are not the same. 

      The purpose of the experiments is to compare the kinetics of formation and hydrolysis of the PG→Lpp bond in the WT versus ΔyafK strains. Comparison of the kinetics is therefore possible even though the kinetics are not based on the exact same time points. Nonetheless, we will reproduce the kinetics experiment (see also answers to Reviewer 2) and use the same time points in these additional experiments.

      We have performed additional analyses to provide kinetic data for at least three biological repeats and for the same periods of incubation after the medium switch (0, 10, 20, 40, and 60 min). The full set of data, including means and standard deviations, appear in the additional Table S1. We have also updated Fig. 3 with the means calculated with these additional values. The conclusions of the first version of the manuscript are fully supported by the additional data requested by the reviewer. We have also revised Fig. 4 based on the full set of data appearing in Table S2.

      Reviewer #2 (Public Review): 

      Weaknesses: 

      - However, the authors make a few other conclusions from their data which are harder to understand the logic of, or to feel confident in based on the existing data. They claim that their 5-time point kinetic data indicates that new lpp is not substantially added to lipidII before it is added to the peptidoglycan, and that instead lpp is attached primarily to old peptidoglycan. I believe that this conclusion comes from the comparison of Fig.s 3A and 3C, where it appears that new lpp is added to old peptidoglycan a few minutes before new lpp is added to new peptidoglycan. However, the very small difference in the timing of this result, the minimal number of time points and the complete lack of any presentation of calculated error in any of the data make this conclusion very tenuous. In addition, the authors conclude that lpp is not significantly attached to septal peptidoglycan. The logic behind this conclusion appears to be based on the same data, but the authors do not provide a quantitative model to support this idea.  

      The reviewer is correct in stating that we claim that Lpp is not substantially added to lipid II before incorporation of the disaccharide-pentapeptide subunit into the expanding PG network. This conclusion is based on the paucity of PG-Lpp covalent adducts containing light PG and Lpp moieties at the earliest time points. To substantiate more thoroughly this finding, we will reproduce the kinetic experiments with more early time points. The paucity of the new→new PG-Lpp isotopologues also implies that Lpp might not be extensively tethered to septal peptidoglycan since the latter is assembled from newly synthesized PG (see our previous publication Atze et al. 2021 and references therein). Quantitatively, septal synthesis roughly accounts for one third of the total PG synthesis. It is therefore expected that tethering of Lpp to septal PG would represent one third of the total number of newly synthesized Lpp molecules tethered to PG. We therefore proposed that the paucity of new→new PG- Lpp isotopologues at early time points of the kinetics implies that Lpp is preferentially tethered to the side wall. This is only one of several conclusions that we reach in the present study and we were very careful in the wording of our results. 

      We would first like to stress that our claim that Lpp is primarily attached to old peptidoglycan rather than to lipid II is indeed supported by the results presented in the first version of the manuscript. In fact, the opposite mechanism, i.e. Lpp linking to Lipid II, as established for the linking of proteins to PG by sortases in Gram-positive bacteria, would result in the exclusive tethering of newly synthesized Lpp to newly synthesized PG stems (Fig. 3). This is clearly not the case since the new→new isotopologues are present in small amounts 10 min after the medium switch and are not detectable at 5 min (data appearing in Table S1 and new mass spectra added to Supplementary file 1). Instead, our data indicate that newly synthesized Lpp is tethered to existing PG. Thus, the relevant comparison is not the absolute value of the delay in the appearance of isotopologues in Figs 3A and 3C, as suggested by the reviewer. Rather, the relevant comparison should take into consideration these two following modes of Lpp tethering to PG: (i) tethering Lpp to Lipid II versus (ii) tethering of Lpp to existing PG independently from insertion of new subunits into the expanding PG. The former mode implies the exclusive formation of new→new isotopologues, which were not detected at early time points. The latter mode implies the prevalent formation of old→new isotopologues that were indeed preponderant at early time-points. Thus, our analysis clearly eliminates the first mode of Lpp tethering to PG (tethering of Lpp to Lipid II) and validates the second one (tethering of Lpp to existing PG). As stated in our answers to reviewer 1, we have generated additional repeats and the full set of data, including means and SD values, appears in the additional Supplementary Tables S1 and S2. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -All major reactions catalysed by L,D-transpeptidases must be studied using the labeling-mass spec technique and compared with YafK to strengthen the conclusions. 

      As described above (Figure 1), we explored the dynamics of Lpp tethering in mutants producing a single L,D-transpeptidase.

      -Experiments on the effect of YafK on the bacterial envelope and production of vesicles should be concluded to support the claims. 

      We have analyzed the extent of outer membrane vesicle (OMV) formation both in the wild type strain and in each one of the mutant strains characterized in this study by using a procedure described in detail in one of our previous publications (Hugonneau-Beaufet et al. Microbiol Spectr. 2023 11:e0521722, doi: 10.1128/spectrum.05217-22). Figure 2 below shows that loss of Lpp or of its tethering to PG, following deletion of genes encoding L,D-transpeptidases ErfK, YbiS, and YcfS, results in the formation of OMVs as revealed by the presence of the maltose-binding protein (MBP, 42 kDa) in the corresponding spare culture medium (as detected by immunoblotting). The RNA polymerase subunit RpoA (36 kDa), used as a control, was not detected in these spare culture media, indicating that loss of either Lpp alone or of ErfK, YbiS, and YcfS together was not associated with bacterial lysis. This analysis also showed that production of ErfK, YbiS, or YcfS alone was sufficient to prevent formation of OMVs. Finally, deletion of YafK, as expected, did not lead to OMV formation. These confirmatory results are out of the scope of the manuscript that focuses on the dynamics of Lpp tethering to PG rather than on the role of that tethering in the envelope stability. 

      Author response image 2.

      Figure 2. Immuno-detection of OMV formation.

      Reviewer #2 (Recommendations For The Authors): 

      - Why so much background about previous results in the abstract? Previous results don't seem required for understanding the description of new results here. Maybe put a sentence about importance at the end, instead.

      The background information is important for two reasons. First, because it is important to stress that the method used to determine the structure and dynamics of the isotopologues is novel and has been validated in various ways, including the modeling of isotopic clusters, in a previous study (https://doi.org/10.7554/eLife.72863). Since the current study is an extension of this previous report it is relevant to introduce the type of information that can be obtained by this approach. Second, because it is also important to stress that kinetic analyses have been previously reported for the incorporation              of           disaccharide-peptide      units into        the         expanding           peptidoglycan (https://doi.org/10.7554/eLife.72863). In the current study, we focused on the mode of Lpp-to-PG tethering in the context of PG expansion that thus had to be introduced. 

      - Abstract: tethering of lpp to septal pg is limited by what? Limited to what? Wording not clear.

      The unclear sentence has been rephrased. Revised version “Newly synthesized septum PG appears to contain small amounts of tethered Lpp.”  

      - The figure legend for fig 1b - I only see one red double arrow?

      Black double arrows indicate the position of glycosidic bonds cleaved by the muramidases. Their size was increased so that they appear more distinctly in the image.

      - Fig 3 and Fig 4- these should be shown with error. 

      The full set of data with means and standard deviations appear in Supplementary Tables S1 and S2.

      - This new-> old, old-> new annotation is confusing. Is the PG fragment or the lpp old or new? Are you distinguishing between which part is old and new by the ordering? Or, could either the PG fragment or the lpp be old to be annotated as old-> new? I think you are trying to explain it in the figure 3CD legend, but it could be presented more clearly. When you say respectively, do you mean that old->new means old muropeptide, new lpp? And new-> old means new muropeptide and old lpp? Why not just use the same annotation system you use in fig 2? Or, use subscripts to indicate old and new?. 

      The designation of isotopologues is correct and adequate to designate the products of transpeptidation catalyzed both by PBPs and L,D-transpeptidases. This nomenclature of transpeptidation products has been introduced in the 70s (see Schleifer and Kandler 1972 Bacteriological Reviews 36:407-477).  In this bond designation, the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond. For the Tri→KR isotopologues, the peptide stem acts as the acyl donor whereas Lpp acts as the acyl acceptor. There is therefore no ambiguity in the annotation. This also applies to the old→new-type annotation, old (existing) PG stem linked to new (neosynthesized) Lpp. In the figures, we used a color code to identify old (red) and new (purple) in the Tri→KR moieties. Since a color code cannot be used in the main text, we used the old→new-type of annotation. A sentence has been added at the end of the legend to Fig. 1b to introduce this nomenclature “Please note that we used the standard nomenclature for transpeptidation products in which the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond”.

      - Pg 5 - first paragraph. I'm struggling with the logic of your conclusion that lpp is not attached to lipid II - it seems that this conclusion is based on the timing of the appearance of the hybrid isotopes. You say you would expect the new-new ones to appear quickly, but how quickly would you expect that, and why? You do see new-new ones appearing fairly quicky, in 20 minutes, so I don't understand the logic of why that timing excludes the lipidII modification model. Please elaborate further. 

      See answer above to reviewer 2 and analysis of samples collected shortly after the medium switch (Table S1). See also the revised version of Supplementary file 1 that shows mass spectra for peptidoglycan extracted 5 min after the medium switch.

      - The conclusion about tethering of lpp to septal PG also appears to be somewhat tenuous, which the authors concede when then use the word "might" in the section of the results. However, the language in the abstract is more definitive. Please tone down the language in the abstract, or provide more evidence to support this conclusion. At the least, you could add a little discussion of the numbers. At a given time in mixed culture, how much PG is being constructed at the septum? How does that percentage line up with the rate of PG label loss vs the rate of lpp label loss? 

      -  Pg 5, bottom paragraph. I don't know what you mean by "there was no loss of old->old in the ∆yafK strains, " when you just a sentence above described the decrease. 

      The data of the MS analyses are presented as the relative abundance of isotopologues. If the old→old Tri→KR isotopologue present at the medium shift were not hydrolyzed by YafK, its absolute amount would remain constant over time. However, the relative abundance of the old→old isotopologue decreases by 50% in one generation because the total amount of the Tri→KR muropeptide doubles in one generation (as any of the bacterial constituents). In Fig. 3B, we indeed observed that the relative amount of old→old isotopologue is about 50% after one generation in the ΔyafK mutant indicating the persistence of the isotopologue. In contrast, production of YafK in the strain BW25113 results in lower abundance of this isotopologue (in the order of 90%). 

      To better explicit the concept we expanded the reasoning in the relevant paragraph of the revised version of the manuscript. 

      - Pg 6 - I don't understand how you are drawing a conclusion about the proteolytic degradation of lpp from these data. Please clarify your reasoning.

      In the analysis presented in Fig. 4, we investigated the relative abundance of old and new Lpp based on the relative abundance of old and new KR moieties in all four Tri-KR isotopologues. As stated in the preceding answer, the relative abundance of KR moieties should be 50% after one generation if no degradation of Lpp occurs. This is observed both for BW25113 (Fig. 4A) and for the ΔyafK mutant (Fig. 4B), thus supporting our claim that Lpp is not degraded. In contrast, the relative abundance of the old Tri moiety is lower than 50% for the wild type strain (Fig. 4C) but not for the ΔyafK mutant (Fig. 4D). This reflects the fact that YafK hydrolyzes the PG-Lpp bond and that Lpp released by this reaction can be cross-linked to neo-synthesized PG stems. Please note that, in this reaction, the substrate is a tetrapeptide donor stem (Fig. 1C).

    3. Reviewer #1 (Public Review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      Additionally susceptibility of strains to detergents like SDS can be tested to provide a much needed physisological context to the study.

      In summary, the authors should consider revising the manuscript to improve clarity, substantiate their claims with more detailed evidence, and include additional experimental results that provide necessary physiological context to their study.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The authors of this study have sought to better understand the timing and location of the attachment of the lpp lipoprotein to the peptidoglycan in E. coli, and to determine whether YafK is the hydrolase that cleaves lpp from the peptidoglycan.

      Strengths:<br /> The method is relatively straightforward. The authors are able to draw some clear conclusions from their results, that lpp molecules get cleaved from the peptidoglycan and then re-attached, and that YafK is important for that cleavage.

      Weaknesses:<br /> Figure 3 and 4 - why are the data shown here only two biological replicates, when there are 3-5 replicates shown in table S1 and S2? This makes it seem like you are cherry picking your favorite replicates. Please present the data as the mean of all the replicates performed, with error shown on the graph.

      This work will have a moderate impact on the field of research in which the connections between the OM and peptidoglycan are being studied in E. coli. Since lpp is not widely conserved in gram negatives, the impact across species is not clear. The authors do not discuss the impact of their work in depth.

    1. eLife assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

    2. Reviewer #1 (Public Review):

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future - but most of the "predictions" from the model are actually findings that broadly match earlier experimental results, making them "postdictions". This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

    4. Reviewer #3 (Public Review):

      Summary:

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      Assessment and context:

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

    5. Author response:

      eLife assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We thank the Reviewers and the Reviewing Editor for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. In what follows we summarize our current plan to improve the paper taking up on their suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We will describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how this ratio of neuron numbers depends on the weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig 6E). We will make sure that these results are suitably expanded and better emphasized in revision. We will also include new analysis of dependence of optimal parameters on the relative weighting of encoding error vs metabolic cost in the loss function when studying other parameters (namely: noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity, time constants of single E and I neurons).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity similar to those proposed in the above papers. We apologize if this was not clear enough in the previous version. We will make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because the structure derived in our network is not identical to those studied in the above paper, and because these results give information about how lateral inhibition works in this network. Thus, we will keep presenting it in the revised version, although we will de-emphasize and simplify its presentation to give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We will improve the Limitations paragraph in Discussion, and also anticipate caveats in tandem with results when needed, as suggested.

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future - but most of the "predictions" from the model are actually findings that broadly match earlier experimental results, making them "postdictions".

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We will better distinguish between pre- and post-dictions  in revision.

      Reviewer #2 (Public Review):

      Summary: In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      We are addressing this issue in two ways. First, we will present results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. Namely we plan to vary jointly the noise intensity and the metabolic constant, as well as the ratio of E to I neuron numbers and the ratio of mean I-I to E-I connectivity. Second, we will individuate a reasonable/realistic range of possible variations of each individual parameter and then perform a Monte Carlo search for the optimal point within this range, and compare the so-obtained results with those obtained from the understanding gained from varying one or two parameters at a time.  We will also add the suggested citation to Calaim et al. 2022 in regard to the points discussed above.

      We will improve the comparison between the Excitatory-Inhibitory and the 1-Cell-Type model (see reply to the suggestions of Referee 3 for more details).

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      In the previously submitted manuscript we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We will improve this work by adding the suggested calculations to provide quantitative measures of the dependence of the optimal network parameters and configurations on this relative weighting.

      Reviewer #3 (Public Review):

      Summary: In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because having only connections respecting Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. However, to get better insights into how Dale’s Law constrains or influences the design of efficient networks, we added a comparison of the coding properties of networks that either do or do not satisfy Dale’s law. We apologize if this was not sufficiently clear in the previous version and we will clarify this in revision. 

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We will perform the suggested detailed comparisons between the network loss in the 1CT-model and E-I model and then revise or refine conclusions if and as needed, according to the results we will obtain.

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We will try to make the presentation of the model more accessible to a non-computational audience.

      Assessment and context: Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We will make sure that these points emerge more clearly and in a more accessible way from the revised paper.

    1. Reviewer #3 (Public Review):

      Summary:

      In this work, Simon et al present a new computational tool to assess non-Brownian single-particle dynamics (aTrack). The authors provide a solid groundwork to determine the motion type of single trajectories via an analytical integration of multiple hidden variables, specifically accounting for localization uncertainty, directed/confined motion parameters, and, very novel, allowing for the evolution of the directed/confined motion parameters over time. This last step is, to the best of my knowledge, conceptually new and could prove very useful for the field in the future. The authors then use this groundwork to determine the motion type and its corresponding parameter values via a series of likelihood tests. This accounts for obtaining the motion type which is statistically most likely to be occurring (with Brownian motion as null hypothesis). Throughout the manuscript, aTrack is rigorously tested, and the limits of the methods are fully explored and clearly visualised. The authors conclude with allowing the characterization of multiple states in a single experiment with good accuracy and explore this in various experimental settings. Overall, the method is fundamentally strong, well-characterised, and tested, and will be of general interest to the single-particle-tracking field.

      Strengths:

      (1) The use of likelihood ratios gives a strong statistical relevance to the methodology. There is a sharp decrease in likelihood ratio between e.g. confinement of 0.00 and 0.05 and velocity of 0.0 and 0.002 (figure 2c), which clearly shows the strength of the method - being able to determine 2nm/timepoint directed movement with 20 nm loc. error and 100 nm/timepoint diffusion is very impressive.

      (2) Allowing the hidden variables of confinement and directed motion to change during a trajectory (i.e. the q factor) is very interesting and allows for new interpretations of data. The quantifications of these variables are, to me, surprisingly accurate, but well-determined.

      (3) The software is well-documented, easy to install, and easy to use.

      Weaknesses:

      (1) The aTrack principle is limited to the motions incorporated by the authors, with, as far as I can see, no way to add new analytical non-Brownian motion. For instance, being able to add a dynamical state-switching model (i.e. quick on/off switching between mobile and non-mobile, for instance, repeatable DNA binding of a protein), could be of interest. I don't believe this necessarily has to be incorporated by the authors, but it might be of interest to provide instructions on how to expand aTrack.

      (2) The experimental data does not very convincingly show the usefulness of aTrack. The authors mention that SPBs are directed in mitosis and not in interphase. This can be quantified and studied by microscopy analysis of individual cells and confirming the aTrack direction model based on this, but this is not performed. Similarly, the size of a confinement spot in optical tweezers can be changed by changing the power of the optical tweezer, and this would far more strongly show the quantitative power of aTrack.

      (3) The software has a very strict limit on the number of data points per trajectory, which is a user input. Shorter trajectories are discarded, while longer trajectories are cut off to the set length. It is not explained why this is necessary, and I feel it deletes a lot of useful data without clear benefit (in experimental conditions).

    2. Reviewer #1 (Public Review):

      Summary:

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brwonian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers.

      Strengths:

      This manuscript presents a novel method for trajectory analysis.

      Weaknesses:

      (1) In the results section, is there any reason to choose the specific range of track length for determining the type of motion? The starting value is fine, and would be short enough, but do the authors have anything to report about how much is too long for the model?

      (2) Robustness to model mismatches is a very important section that the authors have uplifted diligently. Understanding where and how the model is limited is important. For example, the authors mentioned the limitation of trajectory length, do the authors have any information on the trajectory length range at which this method works accurately? This would be of interest to readers who would like to apply this method to their own data.

      (3) aTrack extracts certain parameters from the trajectories to determine the motion types. However, it is not very clear how certain parameters are calculated. For example, is the diffusion coefficient D calculated from fitting, and how is the confinement factor defined and estimated, with equations? This information will help the readers to understand the principles of this algorithm.

      (4) The authors mentioned the scenario where a particle may experience several types of motion simultaneously. How do these motions simulated and what do they mean in terms of motion types? Are they mixed motion (a particle switches motion types in the same trajectory) or do they simply present features of several motion types? It is not intuitive to the readers that a particle can be diffusive (Brownian) and direct at the same time.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data.

      Strengths:

      A potential advantage of the presented method is its wide applicability to different motion types.

      Weaknesses:

      (1) There has been a lot of similar work in this field. Even though the authors included many relevant citations in the introduction, it is still not clear what this work uniquely offers. Is it the first time that direct MLE of the time-series data was developed? Suggestions to improve would include (a) better wording in the introduction section, (b) comparing to other popular methods (based on MSD, step-size statistics (Spot-On, eLife 2018;7:e33125), for example) using the simulated dataset generated by the authors, (c) comparing to other methods using data set in challenges/competitions (Nat. Comm (2021) 12:6253).

      (2) The Hypothesis testing method presented here has a number of issues: first, there is no definition of testing statistics. Usually, the testing statistics are defined given a specific (Type I and/or Type II) error rate. There is also no discussion of the specificity and sensitivity of the testing results (i.e. what's the probability of misidentification of a Brownian trajectory as directed? etc). Related, it is not clear what Figure 2e (and other similar plots) means, as the likelihood ratio is small throughout the parameter space. Also, for likelihood ratio tests, the authors need to discuss how model complexity affects the testing outcome (as more complex models tend to be more "likely" for the data) and also how the likelihood function is normalized (normalization is not an issue for MLE but critical for ratio tests).

      (3) Relating to the mathematical foundation (Figure 1b). The measured positions are drawn as direct arrows from the real position states: this infers instantaneous localization. In reality, there is motion blur which introduces a correlation of the measured locations. Motion blur is known to introduce bias in SPT analysis, how does it affect the method here?

      (4) The authors did not go through the interpretation of the figure. This may be a matter of style, but I find the figures ambiguous to interpret at times.

      (5) It is not clear to me how the classification of the 5 motion types was accomplished.

      (6) Figure 3. Caption: what is ((d_{est}-0.1)/0.1)? Also panel labeled as "d" should be "e".

    1. eLife assessment

      Rademacher and colleagues examined the effect of a chemogenetic approach on the integrity of the dopamine system in mice with chronically stimulated dopamine neurons. These findings are important: 1) This approach led to an axon-first degeneration over a time course of 2-4 weeks; 2) The finding that direct excitation of dopaminergic neurons causes differential degeneration sheds light on dopaminergic neuron selective vulnerability mechanisms. Overall, the strength of the evidence is solid, but the behavior experiments that do not include a CNO control provide incomplete support for the findings.

    2. Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

    3. Author response:

      Reviewer #1 (Public Review):

      [...] Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have planned several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We anticipate that these additions will significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      [...] Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these insightful comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      This is an important point. Although we show that CNO does not produce degeneration of DA neuron terminals, we do not exclude a contribution to the behavioral changes. We agree that this behavioral control is necessary, and will address it in revision with a CNO-only running wheel cohort.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and will complete these experiments in revision.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We will explicitly clarify which mice had access to a running wheel in our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps sterically prevented mice from having access to a running wheel in their home cage.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promotes degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not report an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than the VTA, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel.

      In addition, we are not aware of prior studies that have chronically activated DREADDs to produce neurodegeneration. Other studies have shown that acute excitotoxic stressors can produce neuronal degeneration, but the chronic increase in activity is central to our approach.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      As discussed in greater detail in the results section below, our data suggests this may not be a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and will expand on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking and the existing data are difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). Beyond what is already discussed in the manuscript, additional support for increased activity in PD models include:

      - Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      - Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      - Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We will include and further discuss these important examples in our revision.

      Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity. There will be additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models the possibility of chronically increased pacemaking, and interpretation of our results will be informed as we learn more about how the activity of DA neurons changes in humans in PD. We will discuss and elaborate on these important points in the revision.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. We will clarify this point in our revision. In addition, the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al, Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al, Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al, J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al, Annu Rev Pathol 2011, PMID: 21034221).

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020,  PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. While we briefly discuss calcium and neurodegeneration in the discussion, we will expand on this literature in both the introduction and discussion sections. We will carefully review and contextualize our work within existing frameworks of calcium and neurodegeneration (e.g. Surmeier & Schumacker, J Biol Chem 2013, PMID: 23086948; Verma et al., Transl Neurodegener 2022, PMID: 35078537). We believe that the novelty of our study lies in 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Supplemental Figure 1C, which was unchanged in CNO-treated animals compared to controls. We did not report the resting membrane potential because many of the DA neurons were spontaneously firing. However, we will report the initial membrane potential on first breaking into the cell for the whole cell recordings in the revision, which did not vary between groups. This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing of the neuron by the internal solution. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S1C). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (S1E).  This finding is also consistent with increased activity of the DA neurons. We will add discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this insightful comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Nonetheless, we believe that it conveys useful complementary data. As suggested, we will discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. As part of the planned experiments for our revision, we will perform an additional stereologic analysis to further assess the loss of SNc dopamine neurons. We will also review and ensure the images are representative.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We will include a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We will also include frequency and amplitude data for these recordings.   

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      We will review the expression of activity-related genes in our dataset, although we must keep in mind that these genes may behave differently in the context of chronic activation as opposed to acutely increased activity. We will also include experiments assessing striatal dopamine levels by HPLC in the revision.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing to early PD samples when there is more limited SNc DA neuron loss. Please note the numbers of DA neurons within the areas we have selected for sampling (Figure at right). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration and patients where degeneration is ongoing during their disease.

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV).

      Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by increasing intracellular calcium to increase neuronal excitability, and our results show increased Ca2+ by fiber photometry and changes to Ca2+-related genes, strongly suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point, as we acknowledged in the text. Additionally, we have planned revision experiments involving chronic isradipine treatment to further test the role of calcium in the mechanism of degeneration in this model.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we can sample SN DA neurons in early PD (see figure above), and in our view there is great value for such comparisons. We agree that discussion of appropriate caveats is warranted and this will be clearly addressed in the revision.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we will include additional electrophysiology experiments and add discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150) while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020,  PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We will amend our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we will revise the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

    4. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

    5. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons. In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript. The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focussing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis. Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results.

      The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

    1. Reviewer #1 (Public Review):

      Summary:

      Trutti and colleagues used 7T fMRI to identify brain regions involved in subprocesses of updating the content of working memory. Contrary to past theoretical and empirical claims that the striatum serves a gating function when new information is to be entered into working memory, the relevant contrast during a reference-back task did not reveal significant subcortical activation. Instead, the experiment provided support for the role of subcortical (and cortical) regions in other subprocesses.

      Strengths:

      The use of high-field imaging optimized for subcortical regions in conjunction with the theory-driven experimental design mapped well to the focus on a hypothetical striatal gating mechanism.

      Consideration of multiple subprocesses and the transparent way of identifying these, summarized in a table, will make it easy for future studies to replicate and extend the present experiment.

      Weaknesses:

      The reference-back paradigm seems to only require holding a single letter in working memory (X or O; Figure 1). It remains unclear how such low demand on working memory influences associated fMRI updating responses. It is also not clear whether reference-switch trials with 'same' response truly tax working-memory updating (and gate opening), as the working-memory content/representation does not need to be updated in this case. These potential design issues, together with the rather low number of experimental trials, raise concerns about the demonstrated absence of evidence for striatal gate opening.

      The authors provide a motivation for their multi-step approach to fMRI analyses. Still, the three subsections of fMRI results (3.2.1; 3.2.2; 3.3.3) for 4 subprocesses each (gate opening, gate closing, substitution, updating mode) made the Results section complex and it was not always easy to understand why some but not other approaches revealed significant effects (as the midbrain in gate opening).

      The many references to the role of dopamine are interesting, but the discussion of dopaminergic pathways and signals remains speculative and must be confirmed in future studies (e.g., with PET imaging).

    2. Reviewer #2 (Public Review):

      Summary:

      The study reported by Trutti et al. uses high-field fMRI to test the hypothesized involvement of subcortical structure, particularly striatum, in WM updating. Specifically, participants were scanned while performing the Reference Back task (e.g., Rac-Lubashevsky and Kessler, 2016), which tests constructs like working memory gate opening and closing and substitution. While striatal activation was involved in substitution, it was not observed in gate opening. This observation is cited as a challenge to cortico-striatal models of WM gating, like PBWM (Frank and O'Reilly, 2005).

      Strengths:

      While there have been prior fMRI studies of the reference back task (Nir-Cohen et al., 2020), the present study overcomes limitations in prior work, particularly with regard to subcortical structures, by applying high-field imaging with a more precise definition of ROIs. And, the fMRI methods are careful and rigorous, overall. Thus, the empirical observations here are useful and will be of interest to specialists interested in working memory gating or the reference back task specifically.

      Weaknesses:

      I am less persuaded by the more provocative points regarding the challenge it presents to models like PBWM, made in several places by the paper. As detailed below, issues with conceptual clarity of the main constructs and their connection to models, like PBWM, along with some incomplete aspects of the results, make this stronger conclusion less compelling.

      (1) The relationship of the Nir-Cohen et al. (2020) task analysis of the reference back task, with its contrasts like gate opening and closing, and the predictions of PBWM is far from clear to me for several reasons.

      First, contrasts like gate opening and gate closing make strong finite state assumptions. As far as I know, this is not an assumption of PBWM, certainly not for gate opening. At a minimum, PBWM is default closed because of the tonic inhibition of cortico-thalamic dynamics by the globus pallidus. Indeed, this was even noted in the discussion of this paper, which seems to acknowledge this discrepancy, but then goes on to conclude that they have challenged the PBWM model anyway.

      Second, as far as I know, PBWM emphasizes go/no-go processes around constructs of input- and output-gating, rather than state shifts between gate opening and closing. While this relationship is less clear in reference back, substituting task-relevant items into working memory does appear to be an example of input gating, as modeled by PBWM. Thus, it is not clear to me why the substitution contrast would not be more of a test of input gating than the gate opening contrast, which requires assumptions that are not clear are required by the model, as noted above.

      Third, PBWM relies on striatal mechanisms to solve the problem of selective gating, inputting, or outputting items in memory while also holding on to others. Selective gating contrasts with global gating, in which everything in memory is gated or nothing. The reference back task is a test of global gating. It is an important distinction because non-striatal mechanisms that can solve global gating, cannot solve selective gating. Indeed, this limitation of non-striatal mechanisms was the rationale for PBWM adding striatum. The connectivity of the striatum with the cortex permits this selectivity. It is not clear that the reference back task tests these selective demands in the first place. That limitation in this task was the rationale behind the recent Rac-Lubashevsky and Frank (2022) paper using the reference back 2 procedure that modifies the original reference back for selective gating.

      So, if the primary contribution of the paper is to test PBWM, as suggested by the first line of the abstract, then it is not clear that the reference back task in general, or the gate opening contrast in particular, is the best test of these predictions. Other contrasts (substitution), or indeed, tasks (reference back 2) would have been better suited.

      (2) In general, observations of univariate activity in the striatum have been notoriously variable in the context of WM. Indeed, Chatham et al. (2014) who tested working memory output gating - notably in a direct test of the predictions of PBWM - noted this variability. They too did not observe univariate activation in the striatum associated with selective output gating. Rather they found evidence of increased connectivity between the striatum and cortex during selective output gating. They argued that one account of this difference is that striatal gating dynamics emerge from the balance between the firing of both Go and NoGo cell populations that decide whether to gate or not. It is not always clear how this balance should relate to univariate activation in the striatum. Thus, the present study might also test cortico-striatal connectivity, rather than relying exclusively on univariate activation, in their test of striatal involvement in these WM constructs.

      (3) It is concerning that there was no behavioral cost for comparison switch vs. repeat trials. This differs from with prior observations from the reference back (e.g., Nir-Cohen et al., 2020), and in general, is odd given the task switch/cue interpretation component. This failure to observe a basic behavioral effect raises a concern about how participants approached this task and how that might differ from prior reports of the reference back. If they were taking an unusual strategy, it further complicates the interpretation of these results and the implications they hold for theory.

      In summary, the present observations are useful, particularly for those interested in the reference back task. For example, they might call into question verbal theories and task analyses of the reference back task that tie constructs like gate-opening to striatal mechanisms. However, given the ambiguities noted above, the broader implications for models like PBWM, or indeed, other models of working memory gating, are less clear.

    1. eLife assessment

      This important work addresses the relationship between the transdiagnostic compulsivity dimension and confidence as well as confidence-related behaviours like reminder setting. The relationship between confidence and compulsive disorders has recently received a lot of attention and has been considered to be a key cognitive change. The authors paired an elegant experimental design and pre-registration to give convincing evidence of the relationship between compulsivity, reminder setting, and confidence. Future work should clarify the link of their findings with prediction error-related processes to test whether they could be causally related to their results, and further clarify some of the implications for their findings and refine hypotheses about confidence-related cognitive changes with compulsivity and OCD.

    2. Reviewer #1 (Public Review):

      Summary:

      Boldt et al test several possible relationships between trandiagnostically-defined compulsivity and cognitive offloading in a large online sample. To do so, they develop a new and useful cognitive task to jointly estimate biases in confidence and reminder-setting. In doing so, they find that over-confidence is related to less utilization of reminder-setting, which partially mediates the negative relationship between compulsivity and lower reminder-setting. The paper thus establishes that, contrary to the over-use of checking behaviors in patients with OCD, greater levels of transdiagnostically-defined compulsivity predict less deployment of cognitive offloading. The authors offer speculative reasons as to why (perhaps it's perfectionism in less clinically-severe presentations that lowers the cost of expending memory resources), and set an agenda to understand the divergence in cognition between clinical and nonclinical samples. Because only a partial mediation had robust evidence, multiple effects may be at play, whereby compulsivity impacts cognitive offloading via overconfidence and also by other causal pathways.

      Strengths:

      The study develops an easy-to-implement task to jointly measure confidence and replicates several major findings on confidence and cognitive-offloading. The study uses a useful measure of cognitive offloading - the tendency to set reminders to augment accuracy in the presence of experimentally manipulated costs. Moreover, the utilizes multiple measures of presumed biases - overall tendency to set reminders, the empirically estimated indifference point at which people engage reminders, and a bias measure that compares optimal indifference points to engage reminders relative to the empirically-observed indifference points. That the study observes convergenence along all these measures strengthens the inferences made relating compulsivity to the under-use of reminder-setting. Lastly, the study does find evidence for one of several a priori hypotheses and sets a compelling agenda to try to explain why such a finding diverges from an ostensible opposing finding in clinical OCD samples and the over-use of cognitive offloading.

      Weaknesses:

      Although I think this design and study are very helpful for the field, I felt that a feature of the design might reduce the tasks's sensitivity to measuring dispositional tendencies to engage cognitive offloading. In particular, the design introduces prediction errors, that could induce learning and interfere with natural tendencies to deploy reminder-setting behavior. These PEs comprise whether a given selected strategy will be or not be allowed to be engaged. We know individuals with compulsivity can learn even when instructed not to learn (e.g., Sharp, Dolan, and Eldar, 2021, Psychological Medicine), and that more generally, they have trouble with structure knowledge (eg Seow et al; Fradkin et al), and thus might be sensitive to these PEs. Thus, a dispositional tendency to set reminders might be differentially impacted for those with compulsivity after an NPE, where they want to set a reminder, but aren't allowed to. After such an NPE, they may avoid more so the tendency to set reminders. Those with compulsivity likely have superstitious beliefs about how checking behaviors leads to a resolution of catastrophes, which might in part originate from inferring structure in the presence of noise or from purely irrelevant sources of information for a given decision problem.

      It would be good to know if such learning effects exist if they're modulated by PE (you can imagine PEs are higher if you are more incentivized - e.g., 9 points as opposed to only 3 points - to use reminders, and you are told you cannot use them), and if this learning effect confounds the relationship between compulsivity and reminder-setting.

      A more subtle point, I think this study can be more said to be an exploration than a deductive test of a particular model -> hypothesis -> experiment. Typically, when we test a hypothesis, we contrast it with competing models. Here, the tests were two-sided because multiple models, with mutually exclusive predictions (over-use or under-use of reminders) were tested. Moreover, it's unclear exactly how to make sense of what is called the direct mechanism, which is supported by partial (as opposed to complete) mediation.

    3. Reviewer #2 (Public Review):

      Summary:

      Boldt et al. investigated whether previously established relationships between transdiagnostic psychiatric symptom dimensions and confidence distortions would result in downstream influences on the confidence-related behaviour of reminder setting. 600 individuals from the general population completed a battery of psychiatric symptom questionnaires and an online reminder-setting task. In line with previous studies, individuals high in compulsivity (CIT) showed over-confidence in their task performance, whereas individuals high in anxious depression (AD) tended to be under-confident. Crucially, the over-confidence associated with CIT partially mediated a decreased tendency to use external reminders during task performance, whereas the under-confidence associated with AD did not result in any alteration in the external reminder setting. The authors suggest that metacognitive monitoring is impaired in CIT which has a knock-on effect on reminder setting behaviour, but that a direct link also exists between CIT and reduced reminder setting independently of confidence.

      Strengths:

      The study combines the latest advances in transdiagnostic approaches to psychopathology with a cleverly designed external reminder-setting task. The approach allows for investigation of what some of the downstream consequences associated with impaired metacognition in sub-clinical psychopathology may be.

      The experimental design and hypotheses were pre-registered prior to data collection.

      The manuscript is well written and rigorous analysis approaches are used throughout.

      Weaknesses:

      Participants only performed a single task so it remains unclear if the observed effects would generalise to reminder-setting in other cognitive domains.

      The sample consisted of participants recruited from the general population. Future studies should investigate whether the effects observed extend to individuals with the highest levels of symptoms (including clinical samples).

    1. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors designed an EEG experiment to investigate how listeners use temporal structure to optimise sensory detection. Listeners heard 2 seconds of noise and had to detect a faint tone in one of 3 temporal locations (equally spaced in time). In a minority of trials, no tone was presented. Focussing on these 'no tone' trials, the authors show that the EEG 'temporally tracks' the expected tone locations. This temporal tracking behaviour is also shown in a recurrent neural network trained on the same task. The authors interpret these findings as evidence of neural gain control in the service of sequential temporal anticipation.

      Strengths:

      The study uses an elegant experimental design and sophisticated EEG analyses. It is striking how clear the neural signatures are (of sequential expectation in the absence of sensory input). A further strength is the use of neural network modelling to elucidate the possible neural computations.

      Weaknesses:

      My first major comment concerns the theoretical implications of the study. An account based on gain control and temporal anticipation seems highly plausible. But are there other plausible accounts that the current data argue against? Or are there specific versions of gain control / temporal anticipation theories that the data supports and others that the data doesn't support? To develop the manuscript, I think the authors could relate their results in a more specific way to existing accounts, outlining not only what accounts their results favor but also which accounts their data falsify. In doing so I think the study will have a stronger influence on shaping the field.

      My second major comment concerns the consistent lag that is observed between tone location and neural/model responses. This would seem to be inconsistent with an anticipation account, which would instead predict zero or a negative lag. This should be discussed. While I agree the decrease in response magnitude that occurs with tone location is inconsistent with expectation violation, the positive lag that is observed seems more consistent with expectation violation than temporal anticipation/gain control.

      My third major comment is a suggestion to present some further analyses that I think will be informative. First is reporting more extensively the ERP results. This currently appears in one of the panels but there are no statistical tests reported in the main text and only the tone present data is shown. Given that expectation violation has been observed most consistently with ERPs, is there evidence of this in the 'no tone' trials and if so, does it correlate over participants with the power modulation effect or rate of false alarms? Doing this analysis will possibly be informative for assessing the plausibility of different functional accounts of the data e.g. expectation violation/prediction error. My second suggestion is to report the tone present trial data. When the tone is for example presented in the first location, does the response during tone locations 2 and 3 get suppressed? And does the same occur in the neural network model? If so, this would speak to a highly dynamic form of gain control (if the gain control account is correct).

    2. Reviewer #3 (Public Review):

      Summary:

      The study designs an EEG experiment to study how the brain better detects targets by exploiting information about when the target may appear. The study finds that the power fluctuations of alpha and beta oscillations can indicate the time intervals in which the target may appear. Furthermore, a RNN trained on the same task can also exploit such temporal information to better detect targets at the expected time intervals.

      Strengths:

      (1) The design of the experiment is elegant.

      (2) The EEG analysis approach is highly advanced.

      (3) The study combines human EEG experiments and computational modeling to address potential computational neural mechanisms.

      Weaknesses:

      The RNN is used both for modeling, which is commendable, and for simulating new psychophysics experiments, which can be problematic. In other words, it is very dangerous to predict human performance in a novel condition using RNN and assume that prediction is the same as the actual human performance. Comparing the RNN performance in two different noise conditions cannot directly "suggest that the 2 Hz neural modulation observed in Corrected Cluster 234 served to enhance sensory sensitivity to the target tone at the anticipated temporal locations, while selectively suppressing sensory noise during irrelevant noise periods." Here, much stronger evidence is to actually do the behavioral tests in two noise conditions in humans, but even that behavioral experiment cannot directly indicate the function of a neural response. In other words, the conclusion "additional analyses and perturbations on the RNNs indicated that the neural power modulations in the alpha-beta band resulted from selective suppression of irrelevant noise periods and heightened sensitivity to anticipated temporal locations" is not supported. The model does not have alpha or beta oscillations at all, which is OK, but directly concluding the function of alpha/beta oscillations based on the behavior of a model that does not have these oscillations is not appropriate.

      Relatedly, better detection of a target may reflect a change either in sensory processing or in decision-making, while the second possibility seems to be ignored.

      The results section has a lot of discussions, which should be moved to the discussion section.

    3. eLife assessment

      This valuable study provides insights into how the brain learns to better detect a target by predicting when the target may appear. Overall, solid evidence is provided that the power fluctuations of alpha- and beta-band oscillations can reflect the predicted occurrence time of the target, but some conclusions, especially ones related to the neural-network model and temporal gain control account, need further consideration. The study highlights an advanced EEG analysis approach as well as a close combination of human EEG analysis and computational modeling using recurrent neural networks.

    4. Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigated how the brain anticipates sequences of potential sensory events, using temporal predictability to enhance perception. To do so, they combined a tone detection task, electrophysiological recordings, and recurrent neural network models. The stimuli consisted of continuous white noise embedded with either a single tone presented at one of 3 equidistant (500ms) temporal locations, or no tone. The main analyses were carried out on no-tone trials, in which subjects only anticipated future events. First, a modulation power spectrum analysis revealed 4 frequency clusters, and a coupling analysis allowed the authors to group 3 of them together into cluster 234. The time course of the latter aligned with the temporal locations, reaching a local maximum following each of them. The power of cluster 234 during no-tone trials was positively correlated with behavioral performance (d') during tone trials, but not with false alarm rate. Then, the authors trained several continuous-time recurrent neural networks to model the experimental paradigm. After the networks were tuned to reflect the average d' of human subjects, a neural network analogue of EEG was extracted from the activity of neurons. The latter displayed a peak at 2Hz, its time course aligned with the temporal locations, reaching a local maximum both before and after each of them, and its d' score was higher for tones located at one of the temporal locations. A network trained with randomly occurring tones displayed no 2Hz activity and d' independent from tone location. Finally, the authors perturbed the excitatory/inhibitory ratio of neurons within the network, finding that more inhibition resulted in earlier peaks in the neural network activity.

      Strengths:

      (1) The experimental paradigm introduced in this study is original and well-built, allowing for the study of the targeted phenomenon. The fact that relevant neural signals were found despite the absence of sensory cues proves the setup is promising, opening the way for future works, playing with different parameters: number of tones, time between tones, sequence of temporal locations complexity, sequence of events...etc.

      (2) The statistical analysis was exhaustive, the authors consistently introduced controls for different conditions and alternative hypotheses, thoroughly explaining each step of the analysis as well as the choices behind them. The supplementary figures further helped understand the data and answer interrogation one might have. This comprehensive approach was well-appreciated.

      (3) The use of more biologically plausible networks, compared to traditional RNNs, to model the response of subjects is a promising approach, which can give clues as to the mechanism at play, but also make predictions that can then be proven (or disproven) by future experiments.

      The authors provided a work of good technical quality and reported their methods and findings transparently, making for good reproducibility and evaluation.

      Weaknesses:

      (1) The most glaring weakness of the paper lies in its interpretation of the different results. Conclusions are scattered around the paper, mostly unclear, and do not always make much sense with regard to the data. For example, the authors never address the absence of a peak before the first temporal location: why would subjects not "suppress" noise before the first temporal location given its (strong) predictability? Moreover, they immediately assume a functional role for the neural signature they found, as well as a direct link between the mechanisms at play in their RNN and the human brain, thus jumping to hasty and unreliable conclusions. The authors seemed to have a strong bias towards a hypothesis (predictive gain control) and tried to fit their data into it.

      - The authors cited very few relevant papers on related fields, notably on omission, and therefore did not build efficiently on previous works (e.g., Yabe, Raij, Schröger, Bekinschtein, Chait, Auksztulewicz...). Moreover, at several points in the paper, they make choices about their analysis or model without proper justification or cited sources, even when explicitly pointing to the existence of research supporting said choices.

      - Only a single electrode (out of 64) was used (Cz) to carry out every analysis. Without proper justification, this choice could be misinterpreted. Moreover, adopting instead a multivariate approach (incorporating all channels) would give more strength to the paper.

      - Overall, the observed electrophysiological results could be more simply explained by a mechanism akin to a go/no-go (a tone/no-tone) or omission response happening after each temporal location, as subjects have learned when to make that inference. The delay of the response with regards to temporal location would change due to error accumulation in time perception, rather than "the anticipation of the first temporal location facilitating the anticipation of the second", which makes little sense. Moreover, a response in Cz could be expected.

      - As for the results of RNN, not only is the analogy with actual neurophysiological activity limited, both in principle (simple E/I dynamics) and in implementation (inference is only done at the end of each trial), but the authors do not address the activity before the first temporal location, which is a major difference with human data. Their assumption that both RNN and cluster 234 are functionally related to gain control is thus further flawed. Moreover, the analysis of the RNN is lacking, for example, the authors did not compare false positive/negative of different delays, or analyzed Wout.

      - The phrasing and introduction of the paper are misleading, as confusion can arise between predicting a sequence of events (several events in a row) and predicting a single event appearing at different potential locations. It should be clarified that the paper does not address sequences of events at any point.

      It seems the authors already drew their conclusion beforehand and fit the data to match this bias. As such, the interpretation of the data is messy, flawed, and often hasty, drawing erroneous conclusions and parallels.

      Overall, the manuscript is of good technical quality and communicated results very transparently, but the authors seem to have a strong confirmation bias towards temporal anticipation and gain control, thus leading to flawed interpretations.

    1. eLife assessment

      This study presents useful, yet preliminary findings on the transcriptomic changes in cardiac lymphatic cells after myocardial infarction in mice. The conclusions of the authors remain uncertain as sample sizes for lymphatic endothelial cells are very low. The single-cell transcriptomic data were analyzed using solid advanced methodology and may be used as a starting point for future studies of the impact of lymphatic cells on heart disease.

    2. Reviewer #1 (Public Review):

      Summary:

      Assessment of cardiac LEC transcriptomes post-MI may yield new targets to improve lymphatic function. scRNAseq is a valid approach as cardiac LECs are rare compared to blood vessel endothelial cells.

      Strengths:

      Extensive bioinformatics approaches employed by the group.

      Weaknesses:

      Too few cells are included in scRNAseq data set and the spatial transcriptomics data that was exploited has little relevance, or rather specificity, for cardiac lymphatics. This study seems more like a collection of preliminary transcriptomic data than a conclusive scientific report to help advance the field.

    3. Reviewer #2 (Public Review):

      Summary:

      This study integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different time points post-MI. They identified four transcriptionally distinct subtypes of lymphatic endothelial cells and localized them in space. They observed that LECs subgroups are localized in different zones of infarcted heart with functions. Specifically, they demonstrated that LEC ca III may be involved in directly regulating myocardial injuries in the infarcted zone concerning metabolic stress, while LEC ca II may be related to the rapid immune inflammatory responses of the border zone in the early stage of MI. LEC ca I and LEC collection mainly participate in regulating myocardial tissue edema resolution in the middle and late stages post-MI. Finally, cell trajectory and Cell-Chat analyses further identified that LECs may regulate myocardial edema through Aqp1, and likely affect macrophage infiltration through the galectin9-CD44 pathway. The authors concluded that their study revealed the dynamic transcriptional heterogeneity distribution of LECs in different regions of the infarcted heart and that LECs formed different functional subgroups that may exert different bioeffects in myocardial tissue post-MI.

      Strengths:

      The study addresses a significant clinical challenge, and the results are of great translational value. All experiments were carefully performed, and their data support the conclusion.

      Weaknesses:

      (1) Language expression must be improved. Many incomplete sentences exist throughout the manuscript. A few examples: Lines 70-71: In order to further elucidate the effects and regulatory mechanisms of the lymphatic vessels in the repair process of myocardial injury following MI. Lines 71-73: This study, integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different time points after MI from publicly available data (E-MTAB-7895, GSE214611) in the ArrayExpress and gene expression omnibus (GEO) databases. Line 88-89: Since the membrane protein LYVE1 can present lymphatic vessel morphology more clearly than PROX1.

      (2) The type of animal models (i.e., permeant MI or MI plus reperfusion) included in ArrayExpress and gene expression omnibus (GEO) databases must be clearly defined as these two models may have completely different effects on lymphatic vessel development during post-MI remodeling.

      (3) Lines 119-120: Caution must be taken regarding Cav1 as a lymphocyte marker because Cav1 is expressed in all endothelial cells, not limited to LEC.

      (4) Figure 1 legend needs to be improved. RZ, BZ, and IZ need to be labeled in all IF images. Day 0 images suggest that RZ is the tissue section from the right ventricle. Was RZ for all other time points sampled from the right ventricular tissue section?

      (5) The discussion section needs to be improved and better focused on the findings from the current study.

    4. Reviewer #3 (Public Review):

      Summary:

      It has been demonstrated that cardiac lymphatics are essential for cardiac health and function. Moreover, post-myocardial infarction, targeting lymphatics by stimulating lymphangiogenesis has been shown to improve cardiac inflammation, fibrosis, and function. Then, the aim of this study was to evaluate the transcriptomic changes of cardiac lymphatic endothelial cells (LECs) after a myocardial infarction, which could reveal new therapeutic targets targeting lymphatic function. Moreover, investigating the cell-cell communication between lymphatic and immune cells would give critical information for a better understanding of the disease.

      Strengths:

      The use of scRNAseq data to evaluate LECs is an effective strategy considering the small proportion of LECs compared to blood endothelial cells. The extensive bioinformatic analysis used by the authors for three different data sets.

      Weaknesses:

      Among a total of 44,860 cells, only 242 LECs and 5,688 endothelial cells were identified. This small number of LECs is not representative and is insufficient to reliably distinguish four different clusters. The bioinformatic analysis is not supported by significant results in their in vivo and in vitro experiments.

    1. eLife assessment

      This study provides a valuable contribution to the development of small molecules that inhibit the aggregation of tau, a protein involved in several neurodegenerative diseases. The authors present convincing evidence that analogs of the plant alkaloid tryptanthrin can prevent the formation of larger aggregates by targeting the early stages of tau oligomerization. Nevertheless, further studies are needed to elucidate the precise mechanisms of action and to provide a detailed kinetic analysis. This work will be of interest to biochemists and biophysicists focused on designing small molecules to inhibit fibril formation.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper presents a class of small molecule inhibitors of tau aggregation which was discovered through a computational screen. Analogs were generated and tested for their ability to inhibit fibril formation.

      Strengths:

      A few of the analogs were found to have sub-stoichiometric activity. A comparison of unseeded and seeded aggregation kinetics suggests that these compounds preferentially target early-stage aggregation.

      Weaknesses:

      The authors state their interest is in finding compounds that target monomeric states of tau, but their only detection method is late-stage fibril formation. In this respect, they have not really defined a mechanism of action. They state their plan to use hydrogen-exchange mass spectrometry, but there are other techniques, such as single-molecule FRET and measurement of intramolecular reconfiguration. Additionally, there is information that can be gleaned from detailed kinetic modeling of the ThT kinetics to include monomer dynamics, formation of oligomers, and secondary nucleation of fibrils.

    3. Reviewer #2 (Public Review):

      Summary:

      James et al, in this study, build on their previous work investigating tau as a drug target. The authors identify tryptanthrin (TA) and its analogs as powerful inhibitors of tau4RD aggregation, even at low concentrations (nanomolar range). Interestingly, these analogs specifically target the initial stages of aggregation, where tau self-association first begins. This targeted approach effectively explains why such small amounts of tryptanthrin analogs are sufficient for inhibition. The study further shows that slight modifications to the structure of these molecules can significantly impact their effectiveness.

      Strengths:

      The experiments are well-designed and executed. The reviewer, in particular, appreciates the authors for the simple yet intelligent study design to understand the mechanism of aggregation inhibition by TA analogs.

      Weaknesses:

      Certain areas in the manuscript need clarifications, revisions, or additional supporting studies to strengthen the outcomes. For example, the authors mostly apply a single approach to assess tau aggregation or aggregation inhibition. Using additional techniques as suggested below will be helpful.

    1. eLife assessment

      This paper presents a valuable pipeline based on state-of-the-art analytical software that was used to study genetic pleiotropy between neuropsychiatric disorders. The presented evidence supporting the claims is convincing and now includes an appropriate comparison to previously published methods as well as a detailed exploration of the findings. The created pipeline can thus be used by researchers from diverse fields to study different combinations of diseases and traits.

    2. Reviewer #1 (Public Review):

      The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state-of-the-art, local genetic correlation approaches to further refine any relationships.

      Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy-to-follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them.

      The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard.

      There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper.

      Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible?

      These concerns were addressed in the revised version of this manuscript.

    3. Reviewer #2 (Public Review):

      Summary:

      Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out.

      The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci.

      Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits.

      Strengths:<br /> - All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision.<br /> - The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings.

      Weaknesses:<br /> - The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail.<br /> - The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation.

      These concerns were addressed in the revised version of this manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state of the art, local genetic correlation approaches to further refine any relationships. 

      Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy to follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them. 

      The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard. 

      There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper. 

      Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible? 

      We thank reviewer one for their considered review of this manuscript and for highlighting points that would benefit from further exploration. Itemised responses are provided below.

      (1) The reviewer noted that we did not adequately explain our choice of software for global and local genetic correlation analysis, and why we consider the techniques chosen as gold standard. We agree that the paper would benefit from clarification around this aspect of the study.

      Briefly, we firstly selected LAVA for the local genetic correlation analysis because it offers several advantages above competing software and was developed by a reputable team previously known for developing MAGMA, which is well-established in the statistical genetics field. In the manuscript (page 8), we added the following clarification: “LAVA was the most appropriate local genetic correlation approach for this study for several reasons. First, unlike SUPERGNOVA and rho-HESS, LAVA makes specific accommodations for analysis of binary traits. Second, other tools focus on bivariate correlation between traits whilst LAVA offers this alongside multivariate tests such as multiple regression and partial correlation, enabling rigorous testing of pleiotropic effects. Lastly, LAVA is shown to provide results which are less biased than those from other tools.”

      LDSC was selected for the global genetic correlation analysis because the software is well-established and likely the most widely adopted global genetic correlation tool. Reflecting its prevalence, the software is also compatible with LAVA, which adjusts for sample overlap based on the bivariate intercept estimate returned by LDSC. Since global genetic correlations were not the primary focus of this study, having been tested across several previous investigations (see response 2), we did not prioritise comparison of correlation estimates from LDSC against other available software. In the manuscript (pages 7-8) we now include the following statement: “[LDSC] was also applied to derive ‘global’ (i.e., genome-wide) genetic correlation estimates between trait pairs and estimate sample overlap from the bivariate intercept. The latter of these outputs was taken forward as an input for the local genetic correlation analysis using LAVA (see 2.2.2.2). Since global genetic correlation analysis across the traits studied here is not novel and associations reported in past studies are congruent across different tools, the compatibility between LDSC and LAVA motivated our use of LDSC for this analysis”.

      (2) The second comment was that the paper would be strengthened by contextualising our study with detail around what is previously known about associations between the studied traits. Accordingly, we have added clarifying text at the end of the introduction, stating: “although previous studies have performed global genetic correlation analyses between various combinations of these traits {references}, this is the first to compare them at a genome-wide scale using a local genetic correlation approach“. In the discussion, we link back to these studies, stating that “Through genetic correlation analysis, we replicated genome-wide correlations previously described between the studied traits {references}”.

      (3) The reviewer highlighted that the abstract as originally written may mislead or confuse the reader and we agree that clarity could be improved with some restructuring. This has now been revised and should read more logically.

      (4) They also enquired about our reasons for suggesting that the implication of distinct variants for each trait from a colocalisation analysis suggests a distinct causal mechanism. We thank them for this question as it encouraged us to reconsider how best to present the results of this analysis. To answer their question:

      It is certainly true that nearby but distinct variants can confer the same effect. In a scenario where multiple distinct variants result in the same effect and thus increase susceptibility towards two or more related phenotypes, you would expect to find evidence of association to each relevant variant in GWAS across these related traits (even if the magnitude of the associations differ). Where biological mechanisms are shared, post-GWAS finemapping analysis would be expected to yield credible sets overlapping across the traits, and likewise, colocalisation analysis should converge on a set of credible SNPs that are candidates for the shared effect. Where multiple distinct variants confer the same effect, you would expect to see separate fine-mapping credible sets for these distinct variants that colocalise pairwise between the jointly-affected traits. Generally, therefore, evidence supporting the two distinct variants hypothesis would suggest the role of two distinct mechanisms except when certain credible sets identified through fine-mapping converge on a colocalised effect.

      There is a further caveat which we also explored in response to Reviewer two: if a region includes long-spanning LD (and hence a larger number of variants are considered in the analysis), then the colocalisation analysis is more likely to favour the two distinct variants hypothesis since the probability of the variants implicated in both traits being shared decreases. It is likely that support for the two independent variants hypothesis is correct in most of the comparisons from this study that favour this conclusion. This is because, generally, the fine-mapping credible sets do not overlap across trait pairs (Figure S4) and consequently the colocalisation analysis does not find any support for the shared variant hypothesis. An exception is the analysis of PD and schizophrenia at the MAPT locus on chromosome 17. We have accordingly added the following clarification to the (page 18): “However, the colocalisation analysis will increasingly favour the two independent variants hypothesis as the number of analysed variants increases. Hence, the wide-spanning LD of this region may have obstructed identification of variants and mechanisms shared between the traits.”

      Reviewer #2 (Public Review): 

      Summary: 

      Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out. 

      The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci. 

      Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits. 

      Strengths: 

      - All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision. <br /> - The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings. 

      Weaknesses:

      - The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail. 

      - The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation. 

      We thank Reviewer two for their appraisal of this manuscript and kind comments regarding its strengths. We will now aim to address the identified weaknesses.

      (1) The reviewer comments that we did not adequately investigate why loci with causal variants identified in both traits all had positive local genetic correlations. We agree that it would be helpful to better understand the underlying reasons. To address this issue, we have added a new supplementary figure to compare the positive and negative local genetic correlation results (see Figure S2). In the main-text we add the following clarification. ”Although both positive and negative local genetic correlations passed the FDR-adjusted significance threshold, we observed only positive local genetic correlations in loci where fine-mapping credible sets were identified for both traits in the pair. This reflects that the correlation coefficients and variant associations from the analysed GWAS studies were generally stronger in the positively correlated loci (see Figure S2).”

      (2) The reviewer rightly suggests that the manuscript would benefit from an improved explanation of the somewhat inconsistent results for the colocalisation analysis of ALS and AD at the locus around the rs9275477 SNP from this work and a previous study.  We have now further investigated this and believe that the discrepancy results partly from an inherent empirical characteristic of the colocalisation analysis. We have explained this in the manuscript (page 22) as follows: “The previous study analysed a 200Kb window of over 2,000 SNPs around the lead genome-wide significant SNP from the ALS GWAS, rs9275477, and found ~0.50 posterior probability for each of the shared and two independent variant(s) hypotheses. The current analysis used 475 SNPs occurring within a semi-independent LD block of ~50kb in this locus. Since the posterior probability of the two independent variants hypothesis (H3) increases exponentially with the number of variants in the region whilst the shared variant hypothesis (H4) scales linearly, it is expected that our analysis would give stronger support for the latter. Given that the previous study defined regions for analysis based on an arbitrary window of ±100kb around each lead genome-wide significant SNP from the ALS GWAS and we defined each analysis region based on patterns of LD in European ancestry populations, it is reasonable to favour the current finding.”

    1. eLife assessment

      This fundamental work quantifies the stochastic dynamics of neural population activity in the lateral intraparietal area (LIP) of the macaque monkey brain during single perceptual decisions. These single-trial dynamics have been subject to intense debate in neuroscience, and they have significant implications for modelling decision-making in various fields including neuroscience and psychology. Through a combination of state-of-the-art recordings from many LIP neurons and theory-driven data analyses, the authors provide convincing evidence for the notion that single-trial neural population dynamics in LIP encode the decision variable postulated by the drift-diffusion model of decision-making.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, Steinemann et al. characterized the nature of stochastic signals underlying the trial-averaged responses observed in lateral intraparietal cortex (LIP) of non-human primates (NHPs), while these performed the widely used random dot direction discrimination task. Ramp-up dynamics in the trial averaged LIP responses were reported in numerous papers before. But the temporal dynamics of these signals at the single-trial level have been subject to debate. Using large scale neuronal recordings with Neuropixels in NHPs, allows the authors to settle this debate rather compellingly. They show that drift-diffusion like computations account well for the observed dynamics in LIP.

      Strengths:

      This work uses innovative technical approaches (Neuropixel recordings in behaving macaque monkeys). The authors tackle a vexing question that requires measurements of simultaneous neuronal population activity and hence leverage this advanced recording technique in a convincing way.

      They use different population decoding strategies to help interpret the results.

      They also compare how decoders relying on the data-driven approach using dimensionality reduction of the full neural population space compares to decoders relying on more traditional ways to categorize neurons that are based on hypotheses about their function. Intriguingly, although the functionally identified neurons are a modest fraction of the population, decoders that only rely on this fraction achieve comparable decoding performance to those relying on the full population. Moreover, decoding weights for the full population did not allow the authors to reliably identify the functionally identified subpopulation.

      The revision addressed the minor weaknesses to our satisfaction.

    3. Reviewer #2 (Public Review):

      Steinemann, Stine, and their co-authors studied the noisy accumulation of sensory evidence during perceptual decision-making using Neuropixels recordings in awake, behaving monkeys. Previous work has largely focused on describing the neural underpinnings through which sensory evidence accumulates to inform decisions, a process which on average resembles the systematic drift of a scalar decision variable toward an evidence threshold. The additional order of magnitude in recording throughput permitted by the methodology adopted in this work offers two opportunities to extend this understanding. First, larger-scale recordings allow for the study of relationships between the population activity state and behavior without averaging across trials. The authors' observation here of covariation between the trial-to-trial fluctuations of activity and behavior (choice, reaction time) constitutes interesting new evidence for the claim that neural populations in LIP encode the behaviorally-relevant internal decision variable. Second, using Neuropixels allows the authors to sample LIP neurons with more diverse response properties (e.g. spatial RF location, motion direction selectivity), making the important question of how decision-related computations are structured in LIP amenable to study. For these reasons, the dataset collected in this study is unique and potentially quite valuable. This revised manuscript addresses a number of questions regarding analyses which were unclear in the original manuscript, and as a result the study is a strong contribution toward our understanding of neural mechanisms of decision making.

    1. eLife assessment

      This study presents high-quality experiments and data analysis of C. elegans locomotion for spontaneous exploration as well as in the presence of an aversive stimulus. This important work shows that the activation of distinct turn types enhances escape performance as well as exploration. The strength of the evidence is still incomplete, particularly regarding optimal exploration and the identification of the range of the aversive stimulus at the boundary of the arena. The work will be of interest to a broad audience extending from movement ecology, to the biology of Caenorhabditis elegans.

    2. Reviewer #1 (Public Review):

      This is an interesting and thorough paper describing the modes of locomotion of the nematode C. elegans in the context of random exploration or response to an aversive stimulus. The authors collect extensive statistics on various locomotor states and compare findings to a minimal mathematical model inspired by the data. Their data reveal biases in two modes of turning- gradual and sharp- which define the path structure of the animal moving on an agar plate. The authors also find that animals tend to overcome inherent anatomical/physiological biases to locomotion when escaping aversive stimuli.

      Understanding animal navigation is a window for revealing efficient algorithms for exploration of space, and also allows testing of the extent to which we understand how the nervous system produces specific behaviors. This paper adds important analysis towards these goals. I have a couple of comments that may be worth considering:

      (1) The authors place a circular barrier of SDS near the edges of their plates and assume that this aversive stimulus is only sensed when the animal is near the barrier. However, it is possible that the SDS diffuses enough into the interior of the plate to affect the navigation statistics. In this case, the data they have accumulated may in fact be some sort of combination of exploratory locomotion and a general background SDS aversive stimulus. Can the authors control for this? Perhaps test the plates at different distances and times for SDS diffusion? Or replace the barrier with a physical one and not a chemical one?

      (2) The authors do not look at mutants or perturb the physiology in defined ways relevant to the locomotion being studied to test their model. Specifically, it would be of interest to identify neural circuits that govern some of the parameters in the model. Although the authors bring this up in their Discussion section, it seems appropriate for this paper, as it would considerably bolster the impact of the work.

    3. Reviewer #2 (Public Review):

      Summary:

      Turning behavior plays a crucial role in animal exploration and escape responses, regardless of the presence or absence of environmental cues. These turns can be broadly categorized into two categories: strong reorientations, characterized by sudden changes in path directionality, and smooth turns, which involve gradual changes in the direction of motion, leading to sinuosity and looping patterns. One of the key model animals to study these behaviors is the nematode Caenorhabditis elegans, in which the role of strong reorientations has been thoroughly studied. Despite their impact on trajectories, smooth turns have received less attention and remain poorly understood. This study addresses this gap in the literature, by studying the interplay between smooth turns and strong reorientations in nematodes moving in a uniform environment, surrounded by an aversive barrier. The authors use this set-up to study both exploration behavior (when the worm is far from the aversive barrier) and avoidance behavior (when the worm senses the aversive barrier). The main claims of the paper are that (1) during exploratory behavior, the parameters governing strong reorientations are optimized to compensate for the effect of smooth turns, increasing exploration efficiency, and (2) during avoidance, strong reorientations are biased towards the side that maximizes escape success. To support these two claims, the paper presents a detailed quantitative characterization of the statistics of smooth turns and strong reorientations. These results offer insights that may interest a diverse audience, including those in movement ecology, animal search behavior, and the study of Caenorhabditis elegans. In our opinion, the experimental work and data analysis are of the highest quality, resulting in a very clean characterization of C. elegans' turning behavior. However, the experimental design and data analyses presented are not fully aligned with some of the central conclusions drawn, and in particular, we believe that further work is needed to fully support the claim that strong reorientations are optimized to increase exploration efficiency.

      Strengths:

      The authors have addressed important questions in movement ecology through hypothesis-driven experiments. The choice of C. elegans as a model organism to investigate the impact of turning dynamics on escape and exploration is well-justified by its limited repertoire of strong reorientation behaviors and consistent turning bias across strains and individuals. The quality of the experimental data is very high, using state-of-the-art techniques, and a set-up where a robust and reproducible avoidance response can be studied. The data analysis benefits from state-of-the-art techniques and a deep understanding of C. elegans' behavior, resulting in a very clean and very clear set of results. We particularly appreciated the use of a ventral/dorsal reference system (rather than a left/right one), which is more natural and insightful. As a result, the paper presents one of the best characterizations of C. elegans sharp turning behavior published to date. We find that the claim that strong reorientations are chosen in a way that optimizes avoidance behavior is solid and well-supported. The manuscript is well-written and maintains a coherent line of reasoning throughout.

      Weaknesses:

      Our primary concerns revolve around the significance and rigor of the research on exploratory behavior. First, we believe that the experimental arena was too small for accurately observing the unfolding of exploration. The movement of assayed animals was clearly impaired by boundary effects, which obscured key elements of C. elegans exploratory behavior such as the mean square displacement or large-scale trajectory structures emerging from curvature bias. Second, we think that the proof that strong reorientations are optimized to maximize exploration performance is too indirect: it relies on a particular model with some unrealistic assumptions and lacks a quantification of the gains provided by the optimization to the individuals. We believe that a more thorough and direct analysis would be needed to fully support the claim.

    1. eLife assessment

      The work provides valuable genomic resources to address the endocrine control of a life cycle transition in the Malabar grouper fish. The revised manuscript is more solid and the resources and experimental data help to build up a meaningful biological understanding of thyroid signaling in grouper fish.

    2. Reviewer #1 (Public Review):

      Summary and strength:<br /> The authors undertook to assemble and annotate the genome sequence of the Malabar grouper fish, with the aim to provide molecular resources for fundamental and applied research. Even though this is more mainstream, the task is still daunting and labor intensive. Currently, high quality and fully annotated genome sequences are of strategic importance in modern biology. The authors make use of the resource to address the endocrine control of an ecologically and developmentally relevant life cycle transition, metamorphosis. As opposed to amphibian and flat fish where body plan changes, fish metamorphosis is anatomically more subtle and much less known, although it is clear that thyroid hormone (TH) signaling is a key player. The authors thus provide a repertoire of TH-relevant gene expression changes during development and across post-embryonic transitions and correlate developmental stages with changes of gene expression. Overall, this work represents a significant advance in the field.

      Fish 'metamorphosis' is well known because it is not as spectacular as amphibians. This work clearly provides technical and theoretical resources to address in a more systematic manner the molecular changes occurring during development and post-embryonic transitions. Heterochrony is a major source of functional and life cycle diversity in fish, which blurs our anatomy-based understanding of fish biology, and has a direct impact on the protocols and rearing procedures used to produce live stocks. This work illustrates how, by using genomics coupled to simple experimental endocrinology, one directly addresses these challenges.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Responses to recommendations

      Reviewer #1 (Recommendations For The Authors):

      Describe more precisely how gene expression graphs are built (tissues, reads counts). For example, how were read counts normalized? Were they from DESeq2 data, which only works by comparing two samples? If so, all samples should be independently compared to a reference and the normalized expression value of the reference will change from sample to sample... thus introducing a pure technical artifact.

      We have added additional information about the normalisation method to the

      Material and Methods section (Lines 597-598: “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.”) and figure legends

      (lines 247, 286, 372, 404: “Gene expression data was generated from whole fish.

      Expression levels were derived from DESeq2 normalised gene counts.”) to address this recommendation. 

      DESeq2 provides a reference independent normalisation through a median of ratios method (a good explanation can be found here:

      https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.h tml). The normalised expression values are independent of any reference, and therefore will not change from sample and sample as suggested in this comment. In contrast, the pairwise comparisons are done when analysing significantly differentially expressed genes between two treatments using a Wald test, which is done against a reference and generates log2 fold change information and p-values.; however, this is different to the normalisation we described above.

      Provide bioinformatics workflows and, if possible, the set of parameters used, the computing resources, etc. Were some assembly finishing steps carried out (by long-range PCR?) and experimental validations (especially for allelespecific transcripts, by conventional RT-PCR based on diagnostic mutations)?

      We have added additional information on the bioinformatics workflows where required, including parameters used (Lines 530, 536, 549-551, and 574-583.). No finishing steps other than HiC scaffolding were performed. No allele-specific analysis was done as part of this manuscript.

      To further improve transparency, we have also uploaded all the scripts used for this study to https://github.com/R-Huerlimann/Malabar_grouper_genome and the gene models and functional annotation to https://figshare.com/projects/Malabar_grouper_Epinephelus_malabaricus_genome_ annotation/199909. This information has been added to the manuscript in lines 600601 and 609-611.

      Reviewer #3 (Recommendations For The Authors):

      General author response:

      All the recommendations of this reviewer are very relevant and would certainly provide a lot of information, but they are constituting a full project in themselves as they would imply establishing this grouper species as an experimental model in our lab. Currently we only have access to the larval and juvenile stages via a collaboration with the Okinawa Prefectural Sea Farming Center, which is an hour drive from our lab, and is limited to the grouper spawning season. If we want to do all what is suggested, we need to have a regular and easy access to the fishes. This would require establishing this model in our marine station, which is not possible due to space and time issues. These groupers grow to a very large size (1-2 m in length, and up to 150 kg in weight) and only mature into males after > 6 years.

      First and foremost, I would advise the authors to extend their TH and cortisol levels measurements to the entire developmental time considered in their analysis.

      For the reasons stated above we could not perform these experiments. We must emphasize that the data regarding TH are available for a closely related species (e.g., Epinephelus coioides, de Jesus et al. 1998) and there is no reason to think that the situation will be drastically different in E. malabaricus. In addition, given that we have now studied several coral reef fish species in the same context (clownfish, surgeonfish, damselfish, gobies) we observed that the transcriptomic data are more robust, more sensitive, and more precise than hormone measurements. 

      Consider carrying out in situ hybridisation of TSH with putative CRH receptors to determine if thyrotrophin could be competent to respond to HPA axis signals.

      We agree studying the interplay between corticoids and thyroid hormones at the neuroendocrine level would be desirable and we fully agree with the experiment suggested by the reviewer, but this is impossible in our current situation. We are not working with an establish animal model like zebrafish or Xenopus, but with a large, long-lived marine fish that reproduces in spawning aggregations and whose husbandry is notoriously difficult.

      Consider conducting cortisol treatment experiments to functionally determine if indeed cortisol is involved in grouper metamorphosis.

      We tried to do TH and cortisol treatments specifically on the early larval stages corresponding to the early TH peak to see how this would impact the development of the fin spines, but our trials were unsuccessful. The larvae at that stage are extremely fragile and even putting them into small volumes of treatment drugs induced massive mortalities. Again, this would mean establishing this grouper species as a model organism and would require a massive effort to improve larval rearing as discussed above. We feel that our data stands on its own in the meantime and adds valuable information to the existing literature by studying a rarely investigated species.

      Responses to comments

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We made the suggested change of adding “sequence” in lines 32 and 121. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. 

      As for panel E of figure 2, it is not missing. The panel is located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced. The individual tissues derived from juvenile fish were used for the genome annotation only, using ISOseq. The whole larval fish were used for the developmental analysis using RNAseq, as well as the genome annotation. We have added additional information in the figures and text that the results shown are from whole larvae, and added more detail to the material and methods section about which type of sample was analysed in which way.

      Specifically, we have added “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.” to lines 597-598 in the Material and Methods section, “Gene expression data was generated from whole larvae.” to line 191, and “Gene expression data was generated from whole fish. Expression levels were derived from DESeq2 normalised gene counts.” to the figure legends in lines 247, 286, 372, 404). Additionally, we have added clarifications in lines 489, 497, 530, and 536. 

      The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We have added additional information on parameters used in the genome assembly, annotation and transcriptome analysis in lines 549-551, 577, 579, 580, and 582. Additionally, we have uploaded all scripts to github as outlined in the Code and Data Availability section (lines 599-614).

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material.

      Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications.

      This would really help (general) readers.

      The T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormonemediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      The large number of differentially expressed genes is due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60. 

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). However, normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/):

      “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods: an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the existence of an early larval developmental step also involving TH and corticosteroids and 2) the possible interaction of corticoids and TH during metamorphosis. This is a question that is certainly not settled yet in teleost fishes and which is of great interest.

      Especially 1) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of this study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      We acknowledge the descriptive nature of the data and the lack of functional experiments in the Discussion in lines 443 to 445: “This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians, but functional experiments need to be conducted to confirm this hypothesis.” As stated above doing such functional experiment would require establishing the grouper as an experimental model in our husbandry, which currently is not possible due to the large size of the adult fish.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).  

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remains a contentious issue. We have clarified the Discussion on this point (lines 375-376, lines 439-464).

      We wrote that “There is contrasting evidence of communication between these two pathways during teleost fish larval development with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish [26], golden sea bream [64] and silver sea bream [65]. Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder[27]. It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner [66]. On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp decreases cortisol levels[67], whereas cortisol exposure decreases TH levels in European eel [68]. Given this scattered evidence, the existence of a crosstalk active during teleost larval development and metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis is activated during both early development and metamorphosis and that cortisol synthesis is activated during early development. This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians [25], but functional experiments need to be conducted to confirm this hypothesis.” In the revised manuscript, we have also added the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field. 

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiments in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period. 

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. Therefore, we have added the following sentence in lines 212 to 216: “The respective order of appearance of TSH and Tg (TSH at D32, Tg after) is consistent with what we would expect but a bit later than expected given the morphologicl transformation. It would be interesting to revisit this in a future series of experiments, with tighter temporal sampling to study how gene expression and morphological transformation aligned.“.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption (Figure 4 in de Jesus et al, 1998), 2) there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. This clearly shows that our inference is correct. Additionally, we would like to reemphasize that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. In order to clarify our message further, we have changed some of the mentions of

      “metamorphosis” to “larval development” throughout the manuscript and added other improvements to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).  

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here. 

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We have improved the clarity of the manuscript in several places to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared lines 224-225 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript

      that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction. We have added corresponding statements in the abstract (lines 39-43) and discussion (lines 447 to 449).

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We have therefore made changes throughout the Introduction and Discussion to make this clearer.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). The main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we have made changes to the introduction and discussion.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message was unclear. We have addressed these points in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We also made sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. eLife assessment

      This meta-analysis presents valuable findings that reexamine the function of butterfly eyespots in predator avoidance and report for conspicuousness over mimicry. The analysis is robust, but the evidence supporting the importance of conspicuousness is incomplete due to the limitations of the literature, and this debate would benefit from additional experiments that would strengthen these claims. This paper is of interest to evolutionary biologists and ecologists working on the evolution of morphology and predator-prey interactions.

    2. Reviewer #1 (Public Review):

      Summary:

      The question of whether eyespots mimic eyes has certainly been around for a very long time and led to a good deal of debate and contention. This isn't purely an issue of how eyespots work either, but more widely an example of the potential pitfalls of adopting 'just-so-stories' in biology before conducting the appropriate experiments. Recent years have seen a range of studies testing eye mimicry, often purporting to find evidence for or against it, and not always entirely objectively. Thus, the current study is very welcome, rigorously analysing the findings across a suite of papers based on evidence/effect sizes in a meta-analysis.

      Strengths:

      The work is very well conducted, robust, objective, and makes a range of valuable contributions and conclusions, with an extensive use of literature for the research. I have no issues with the analysis undertaken, just some minor comments on the manuscript. The results and conclusions are compelling. It's probably fair to say that the topic needs more experiments to really reach firm conclusions but the authors do a good job of acknowledging this and highlighting where that future work would be best placed.

      Weaknesses:

      There are few weaknesses in this work, just some minor amendments to the text for clarity and information.

    3. Reviewer #2 (Public Review):

      Many prey animals have eyespot-like markings (called eyespots) which have been shown in experiments to hinder predation. However, why eyespots are effective against predation has been debated. The authors attempt to use a meta-analytical approach to address the issue of whether eye-mimicry or conspicuousness makes eyespots effective against predation. They state that their results support the importance of conspicuousness. However, I am not convinced by this.

      There have been many experimental studies that have weighed in on the debate. Experiments have included manipulating target eyespot properties to make them more or less conspicuous, or to make them more or less similar to eyes. Each study has used its own set of protocols. Experiments have been done indoors with a single predator species, and outdoors where, presumably, a large number of predator species predated upon targets. The targets (i.e, prey with eyespot-like markings) have varied from simple triangular paper pieces with circles printed on them to real lepidopteran wings. Some studies have suggested that conspicuousness is important and eye-mimicry is ineffective, while other studies have suggested that more eye-like targets are better protected. Therefore, there is no consensus across experiments on the eye-mimicry versus conspicuousness debate.

      The authors enter the picture with their meta-analysis. The manuscript is well-written and easy to follow. The meta-analysis appears well-carried out, statistically. Their results suggest that conspicuousness is effective, while eye-mimicry is not. I am not convinced that their meta-analysis provides strong enough evidence for this conclusion. The studies that are part of the meta-analysis are varied in terms of protocols, and no single protocol is necessarily better than another. Support for conspicuousness has come primarily from one research group (as acknowledged by the authors), based on a particular set of protocols.

      Furthermore, although conspicuousness is amenable to being quantified, for e.g., using contrast or size of stimuli, assessment of 'similarity to eyes' is inherently subjective. Therefore, manipulation of 'similarity to eyes' in some studies may have been subtle enough that there was no effect.

      There are a few experiments that have indeed supported eye-mimicry. The results from experiments so far suggest that both eye-mimicry and conspicuousness are effective, possibly depending on the predator(s). Importantly, conspicuousness can benefit from eye-mimicry, while eye-mimicry can benefit from conspicuousness.

      Therefore, I argue that generalizing based on a meta-analysis of a small number of studies that conspicuousness is more important than eye-mimicry is not justified. To summarize, I am not convinced that the current study rules out the importance of eye-mimicry in the evolution of eyespots, although I agree with the authors that conspicuousness is important.

    1. eLife assessment

      This important study utilizes humanized mice, in which human immune cells are introduced into immune-deficient mice, to provide solid evidence that two helper CD4 T-cell subsets, T-follicular helper (Tfh) and T-peripheral helper (Tph) cells, are able to drive both autoantibody production and induction of autoimmunity. The work will be of broad interest to medical scientists engaged in deciphering how human immune cells mediate immune responses and contribute to the development of autoimmune diseases.

    2. Reviewer #1 (Public Review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this, Vecchione et al have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres. For Tfh it is unclear how we can interpret their function without the structure where they have the greatest influence. In some cases, the definition of Tph does not seem to differentiate well between Tph and highly activated CD4 T-cells in general.

    3. Reviewer #2 (Public Review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to using humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debatable, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, for example in Figure 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      (2) The definition of "Disease" discussed after Figure 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Hu-derived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Figure 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial.

    1. eLife assessment

      This valuable and well-executed study describes how deletion of the autism spectrum disorder risk gene CNTNAP2 in mice increases dorsolateral striatal projection neuron excitability and promotes repetitive behaviors and cognitive inflexibility. The evidence supporting this claim is solid, although additional experimental evidence would strengthen claims of how corticostriatal activity is altered and linked to behavioral changes. The study provides a potential cellular explanation for the repetitive and inflexible behavior in Cntnap2 knockout mice and CNTNAP2 disorder in humans, which would interest both basic and translational neuroscientists.

    2. Reviewer #1 (Public Review):

      Summary:

      Cording et al. investigated how deletion of CNTNAP2, a gene associated with autism spectrum disorder, alters corticostriatal engagement and behavior. Specifically, the authors present slice electrophysiology data showing that striatal projection neurons (SPNs) are more readily driven to fire action potentials in response to stimulation of corticostriatal afferents, and this is due to increases in SPN intrinsic excitability rather than changes in excitatory or inhibitory synaptic inputs. The authors show that CNTNAP2 mice display repetitive behaviors, enhanced motor learning, and cognitive inflexibility. Overall the authors' conclusions are supported by their data, but a few claims could use some more evidence to be convincing.

      Strengths:

      The use of multiple behavioral techniques, both traditional and cutting-edge machine learning-based analyses, provides a powerful means of assessing repetitive behaviors and behavioral transitions/rigidity. Characterization of both excitatory and inhibitory synaptic responses in slice electrophysiology experiments offers a broad survey of the synaptic alterations that may lead to increased corticostriatal engagement of SPNs.

      Weaknesses:

      (1) The authors conclude that increased cortical engagement of SPNs is due to changes in SPN intrinsic excitability rather than synaptic strength (either excitatory or inhibitory). One weakness is that only AMPA receptor-mediated responses were measured. Though the holding potential used for experiments in Figure 1F-I wasn't clear, recordings were presumably performed at a hyperpolarized potential that limits NMDA receptor-mediated responses. Because the input-output experiments used to conclude that corticostriatal engagement of SPNs is elevated (Figure 1B-E) were conducted in the current clamp, it is possible that enhanced NMDA receptor engagement contributed to increased SPN responses to cortical stimulation. Confirming that NMDA receptor-mediated EPSC components are not altered would strengthen the main conclusion.

      (2) Data clearly show that SPN intrinsic excitability is increased in knockout mice. Given that CNTNAP2 has been linked to potassium channel regulation, it would be helpful to show and quantify additional related electrophysiology data such as negative IV curve responses and action potential hyperpolarization.

      (3) As it stands, the reported changes in dorsolateral striatum SPN excitability are only correlative with reported changes in repetitive behaviors, motor learning, and cognitive flexibility.

    3. Reviewer #2 (Public Review):

      Summary:

      This is an important study characterizing striatal dysfunction and behavioral deficits in Cntnap2-/- mice. There is growing evidence suggesting that striatal dysfunction underlies core symptoms of ASD but the specific cellular and circuit level abnormalities disrupted by different risk genes remain unclear. This study addresses how the deletion of Cntnap2 affects the intrinsic properties and synaptic connectivity of striatal spiny projection neurons (SPN) of the direct (dSPN) and indirect (iSPN) pathways. Using Thy1-ChR2 mice and optogenetics the authors found increased firing of both types of SPNs in response to cortical afferent stimulation. However, there was no significant difference in the amplitude of optically-evoked excitatory postsynaptic currents (EPSCs) or spine density between Cntnap2-/- and WT SPNs, suggesting that the increased corticostriatal coupling might be due to changes in intrinsic excitability. Indeed, the authors found Cntnap2-/- SPNs, particularly dSPNs, exhibited higher intrinsic excitability, reduced rheobase current, and increased membrane resistance compared to WT SPNs. The enhanced spiking probability in Cntnap2-/- SPNs is not due to reduced inhibition. Despite previous reports of decreased parvalbumin-expressing (PV) interneurons in various brain regions of Cntnap2-/- mice, the number and function (IPSC amplitude and intrinsic excitability) of these interneurons in the striatum were comparable to WT controls.

      This study also includes a comprehensive behavioral analysis of striatal-related behaviors. Cntnap2-/- mice demonstrated increased repetitive behaviors (RRBs), including more grooming bouts, increased marble burying, and increased nose poking in the holeboard assay. MoSeq analysis of behavior further showed signs of altered grooming behaviors and sequencing of behavioral syllables. Cntnap2-/- mice also displayed cognitive inflexibility in a four-choice odor-based reversal learning assay. While they performed similarly to WT controls during acquisition and recall phases, they required significantly more trials to learn a new odor-reward association during reversal, consistent with potential deficits in corticostriatal function.

      Strengths:

      This study provides significant contributions to the field. The finding of altered SPN excitability, the detailed characterization of striatal inhibition, and the comprehensive behavioral analysis are novel and valuable to understanding the pathophysiology of Cntnap2-/- mice.

      Weaknesses:

      (1) The approach based on Thy-ChR2 mice has the advantage of overcoming issues caused by injection efficiency and targeting variability. However, the spread of oEPSC amplitudes across mice shown in panels of Figure 1 G/I is very high with almost one order of magnitude difference between some mice. Given this is one of the most important points of the study it will be important to further analyze and discuss what this variability might be due to. Typically, in acute slice recordings, the within-animal variability is larger than the variability across animals. From the sample sizes reported it seems the authors sampled a large number of animals, but with a relatively low number of neurons per animal (per condition). Could this be one of the reasons for this variability?

      (2) This is particularly important because the analysis of corticostriatal evoked APs in panels C and E is performed on pooled data without considering the variability in evoked current amplitudes across animals shown in G and I. Were the neurons in panels C/E recorded from the same mice as shown in G/I? If so, it would be informative to regress AP firing data (say at 20% LED) to the average oEPSC amplitude recorded on those mice at the same light intensity. However, if the low number of neurons recorded per mouse is due to technical limitations, then increasing the sample size of these experiments would strengthen the study.

      (3) On a similar note, there is no discussion of why iSPNs also show increased corticostriatal evoked firing in Figure 1E, despite the difference in intrinsic excitability shown in Figure 3. This suggests other potential mechanisms that might underlie altered corticostriatal responses. Given the role of Caspr2 in clustering K channels in axons, altered presynaptic function or excitability could also contribute to this phenotype, but potential changes in PPR have not been explored in this study.

      (4) Male and female SPNs have different intrinsic properties but the number and/or balance of M/F mice used for each experiment is not reported.

      (5) There is no mention of how membrane resistance was calculated, and no I/V plots are shown.

      (6) It would be interesting to see which behavior transitions most contribute to the decrease in entropy. Are these caused by repeated or perseverative grooming bouts? Or is this inflexibility also observed across other behaviors? The transition map in Figure S5 shows the overall number of syllables and transitions but not their sequence during behavior. Can this be analyzed by calculating the ratio of individual 𝑢𝑖 × 𝑝𝑖,𝑗 × log2 𝑝𝑖,𝑗 factors across genotypes?

    4. Reviewer #3 (Public Review):

      Summary:

      The authors analyzed Cntnap2 KO mice to determine whether loss of the ASD risk gene CNTNAP2 alters the dorsal striatum's function.

      Strengths:

      The results demonstrate that loss of Cntnap2 results in increased excitability of striatal projection neurons (SPNs) and altered striatal-dependent behaviors, such as repetitive, inflexible behaviors. Unlike other brain areas and cell types, synaptic inputs onto SPNs were normal in Cntnap2 KO mice. The experiments are well-designed, and the results support the authors' conclusions.

      Weaknesses:

      The mechanism underlying SPN hyperexcitability was not explored, and it is unclear whether this cellular phenotype alone can account for the behavioral alterations in Cntnap2 KO mice. No clear explanation emerges for the variable phenotype in different brain areas and cell types.

    1. eLife assessment

      This study represents an important contribution to the study of decision-making under risk, bringing an interdisciplinary approach spanning economic theory, behavioral neuroscience, and computational modeling to test how choice preference is influenced by rare and extreme events. The authors present evidence that rats are indeed sensitive to these rare and extreme events despite their infrequent occurrence, driven primarily by an almost complete avoidance of "Black Swans" - rare and extreme losses. The evidence for specific sensitivity to rare and extreme events however remains incomplete, owing in part to the difficulty of isolating the effect of these events beyond that arising from risk preferences more generally in both task design and in the computational modeling of the choice behavior. Given the approach here brings a relatively novel perspective, with a more detailed treatment of these confounds this paper will be of broad interest to those seeking to understand animal behavior through the lens of economic choice.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the impact of rare and extreme events on rodents' decision-making under risk, in gain and loss contexts. They describe the behavior of 20 rats performing a four-armed bandit task, where probabilistic gains (sugar pellets) and losses (time-out punishments) can - in some arms - incorporate extremely large - but rare - outcomes. They report that most rats are sensitive to rare and extreme outcomes despite their infrequent occurrence, and that this sensitivity is primarily driven by extreme loss events which they try to avoid, rather than extreme gains that they seek to obtain.

      They finally propose a modification of standard reinforcement-learning, which features a specific sensitivity to rare and extreme outcomes and can account for the observed behavior.

      Strengths:

      The manuscript really taps into a surprisingly neglected but very relevant aspect of decision-making: the effect of rare and extreme events (REE). The authors have developed an experimental setup that seemingly allows investigation of this aspect, which is not trivial given the idiosyncratic properties of rare and extreme events.

      The parameters of the experimental setup seem also to be well thought off: basically, in the absence of REE, some options are objectively better than others (because, in expectation, they overall deliver more food, or minimize time-out punishments), but this ordering reverses if REE are taken into account. This allows for a clean test of the integration of REE in the rodent's decision-making model.

      The data is presented and analyzed in a very descriptive but exhaustive and transparent way, down to the description of individual rodent's behavior.

      Weaknesses:

      While the description and analyses of the behavioral patterns are rigorously done under the economic lens of risky decision-making, the authors' interpretation heavily relies on the assumption that rodents have built the correct model of the task during the training. Extensive details are provided about the training procedure, and the observed behavior at the end of the training, but it remains virtually impossible to disambiguate choices due to imperfect learning to choices made due to intrinsic preferences for risk or REE.

      By nature, gains (food pellets) and losses (time-out punishments) are somewhat incommensurable so the interpretation of the asymmetry due to outcome valence is also subject to interpretation. There might be some additional subtleties due e.g. satiety that could come from gaining REE (i.e. the delivery of 80 pellets from the Jackpot).

      In its current form, the paper is quite hard to digest. This is naturally the case with interdisciplinary work (here mixing economists and neurobiologists). But I am afraid that with the current frame, the paper is going to miss its target, in terms of audience.

      The proposed model seems somewhat disconnected from the behavioral patterns: while the model suggests an effect of REE at the decision stage (i.e. with specific decision weights for those rare events), this formalism seems at odds with the observation that REE (notably in the loss domain) has an impact of subsequent behavior - (Black Swans tend to reinforce Total Sensitivity to REE) which rather suggests an effect at the learning stage.

      Discussion:

      This study convincingly demonstrates that REEs are processed rather uniquely, which makes sense given their evolutionary relevance. REE has indeed been somewhat neglected in previous research, and this study therefore opens an interesting new front on the fundamental aspects of decision under risk. The authors have devised an original theoretical and empirical framework that will be useful for the community, and the combination of economics analysis and rodent behavior constitutes a thought-provoking ground to think about the nature of risk preferences. The interpretation and mechanistic account of these aspects, as well as their generalizability outside the specific context of this study, remain to be strengthened.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcomes in the same conditions.

      Strengths:

      (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.

      (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.

      (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.

      (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.

      Weaknesses:

      (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance. The design does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect.

      (2) The RL-modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). This parameter only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. In the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one.

      (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That made it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses and gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumptions unnecessarily into the description of the experimental design. In terms of results, "Black Swan" avoidance is more simply known as risk aversion for losses.

      (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible that an animal never experienced the rare, extreme event for a specific option. It's even possible (if they only picked it on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This cannot be known given the information provided. The Supplemental info on p.55 only gives gross overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association?

      (5) The choice data are only presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly, even if it can theoretically be inferred from the sum and the difference. To understand what the rats actually do, we first need to see how often they select each option, without these transformations.

      (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to be pre-registered in any way, leaving many researchers with degrees of freedom. Were any alternative analysis pipelines attempted? Similarly, there were many sub-groupings of the animals, and then comparisons between them - were these post-hoc?

      (7) On p. 17, there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example). What this means is that there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Using a 10-trial window would thus obscure any impact of this rare, extreme event. An analysis that looks at a time course trial-by-trial could reveal any impact.

      (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses.

      (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model? Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.

      (10) The paper does not sufficiently accurately represent the existing literature on human risky decision-making (with and without rare events). Here are a few examples of misrepresented and/or missing literature:<br /> -Most studies on decision-making do not only rely on p > 10% (as per p. 2). Maybe that is true with animals, but not a fair statement generally. Some do, and some don't. There is substantial literature looking at rarer events in both descriptions (most famously with Kahneman & Tversky's work), but also in experience (which is alluded to in reference 19). That reference is not only about the situation when choices are not repeated (e.g. the sampling paradigm), but also partial feedback and full-feedback situations.

      The literature on learning from rewarding experiences in humans is obliquely referenced but not really incorporated. In short, there are two main findings - firstly people underweight rare events in experience; second, people overweight extreme outcomes in experience (both contrary to description). Some related papers are cited, but their content is not used or incorporated into the logic of the manuscript.

      One recent study systematically examined rarity and extremity in human risky decision-making, which seems very relevant here: Mason et al. (2024). Rare and extreme outcomes in risky choice. Psychonomic Bulletin & Review, 31, 1301-1308.

      There is a fair bit of research on the human perception of the risk of rare events (including from experience) and important events like climate. One notable paper is Newell et al (2015) in Nature Climate Change.

    1. eLife assessment

      Lloyd et al. used an evolutionary comparative approach to study DNA damage repair in response to sleep deprivation in Astyanax mexicanus, highlighting how the cavefish population has evolved a reduced DNA damage response compared to the surface-dwelling population. The cavefish have elevated expression of signals commonly associated with aging but do not show evidence of reduced life span nor increased aged-linked pathology, a potentially valuable finding for the field of aging research. A link to alterations in sleep behaviour is outlined, but the evidence for such a link is incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      Lloyd et al employ an evolutionary comparative approach to study how sleep deprivation affects DNA damage repair in Astyanax mexicanus, using the cave vs surface species evolution as a playground. The work shows, convincingly, that the cavefish population has evolved an impaired DNA damage response both following sleep deprivation or a classical paradigm of DNA damage (UV).

      Strengths:

      The study employs a thorough multidisciplinary approach. The experiments are well conducted and generally well presented.

      Weaknesses:

      Having a second experimental mean to induce DNA damage would strengthen and generalise the findings.

      Overall, the study represents a very important addition to the field. The model employed underlines once more the importance of using an evolutionary approach to study sleep and provides context and caveats to statements that perhaps were taken a bit too much for granted before. At the same time, the paper manages to have an extremely constructive approach, presenting the platform as a clear useful tool to explore the molecular aspects behind sleep and cellular damage in general. The discussion is fair, highlighting the strengths and weaknesses of the work and its implications.

    3. Reviewer #2 (Public Review):

      The manuscript investigates the relationship between sleep, DNA damage, and aging in the Mexican cavefish (Astyanax mexicanus), a species that exhibits significant differences in sleep patterns between surface-dwelling and cave-dwelling populations. The authors aim to understand whether these evolved sleep differences influence the DNA damage response (DDR) and oxidative stress levels in the brain and gut of the fish.

      Summary of the Study:

      The primary objective of the study is to determine if the reduced sleep observed in cave-dwelling populations is associated with increased DNA damage and altered DDR. The authors compared levels of DNA damage markers and oxidative stress in the brains and guts of surface and cavefish. They also analyzed the transcriptional response to UV-induced DNA damage and evaluated the DDR in embryonic fibroblast cell lines derived from both populations.

      Strengths of the Study:

      Comparative Approach:<br /> The study leverages the unique evolutionary divergence between surface and cave populations of A. mexicanus to explore fundamental biological questions about sleep and DNA repair.

      Multifaceted Methodology:<br /> The authors employ a variety of methods, including immunohistochemistry, RNA sequencing, and in vitro cell line experiments, providing a comprehensive examination of DDR and oxidative stress.

      Interesting Findings:

      The study presents intriguing results showing elevated DNA damage markers in cavefish brains and increased oxidative stress in cavefish guts, alongside a reduced transcriptional response to UV-induced DNA damage.

      Weaknesses of the Study:

      Link to Sleep Physiology:<br /> The evidence connecting the observed differences in DNA damage and DDR directly to sleep physiology is not convincingly established. While the study shows distinct DDR patterns, it does not robustly demonstrate that these are a direct result of sleep differences.

      Causal Directionality:<br /> The study fails to establish a clear causal relationship between sleep and DNA damage. It is possible that both sleep patterns and DDR responses are downstream effects of a common cause or independent adaptations to the cave environment.

      Environmental Considerations:<br /> The lab conditions may not fully replicate the natural environments of the cavefish, potentially influencing the results. The impact of these conditions on the study's findings needs further consideration.

      Photoreactivity in Albino Fish:<br /> The use of UV-induced DNA damage as a primary stressor may not be entirely appropriate for albino, blind cavefish. Alternative sources of genotoxic stress should be explored to validate the findings.

      Assessment of the Study's Achievements:

      The authors partially achieve their aims by demonstrating differences in DNA damage and DDR between surface and cavefish. However, the results do not conclusively support the claim that these differences are driven by or directly related to the evolved sleep patterns in cavefish. The study's primary claims are only partially supported by the data.

      Impact and Utility:

      The findings contribute valuable insights into the relationship between sleep and DNA repair mechanisms, highlighting potential areas of resilience to DNA damage in cavefish. While the direct link to sleep physiology remains unsubstantiated, the study's data and methods will be useful to researchers investigating evolutionary biology, stress resilience, and the molecular basis of sleep.

    4. Reviewer #3 (Public Review):

      Lloyd, Xia, et al. utilised the existence of surface-dwelling and cave-dwelling morphs of Astyanax mexicanus to explore a proposed link between DNA damage, aging, and the evolution of sleep. Key to this exploration is the behavioural and physiological differences between cavefish and surface fish, with cavefish having been previously shown to have low levels of sleep behaviour, along with metabolic alterations (for example chronically elevated blood glucose levels) in comparison to fish from surface populations. Sleep deprivation, metabolic dysfunction, and DNA damage are thought to be linked and to contribute to aging processes. Given that cavefish seem to show no apparent health consequences of low sleep levels, the authors suggest that they have evolved resilience to sleep loss. Furthermore, as extended wake and loss of sleep are associated with increased rates of damage to DNA (mainly double-strand breaks) and sleep is linked to repair of damaged DNA, the authors propose that changes in DNA damage and repair might underlie the reduced need for sleep in the cavefish morphs relative to their surface-dwelling conspecifics.

      To fulfill their aim of exploring links between DNA damage, aging, and the evolution of sleep, the authors employ methods that are largely appropriate, and comparison of cavefish and surface fish morphs from the same species certainly provides a lens by which cellular, physiological and behavioural adaptations can be interrogated. Fluorescence and immunofluorescence are used to measure gut reactive oxygen species and markers of DNA damage and repair processes in the different fish morphs, and measurements of gene expression and protein levels are appropriately used. However, although the sleep tracking and quantification employed are quite well established, issues with the experimental design relate to attempts to link induced DNA damage to sleep regulation (outlined below). Moreover, although the methods used are appropriate for the study of the questions at hand, there are issues with the interpretation of the data and with these results being over-interpreted as evidence to support the paper's conclusions.

      This study shows that a marker of DNA repair molecular machinery that is recruited to DNA double-strand breaks (γH2AX) is elevated in brain cells of the cavefish relative to the surface fish and that reactive oxygen species are higher in most areas of the digestive tract of the cavefish than in that of the surface fish. As sleep deprivation has been previously linked to increases in both these parameters in other organisms (both vertebrates and invertebrates), their elevation in the cavefish morph is taken to indicate that the cavefish show signs of the physiological effects of chronic sleep deprivation.

      It has been suggested that induction of DNA damage can directly drive sleep behaviour, with a notable study describing both the induction of DNA damage and an increase in sleep/immobility in zebrafish (Danio rerio) larvae by exposure to UV radiation (Zada et al. 2021 doi:10.1016/j.molcel.2021.10.026). In the present study, an increase in sleep/immobility is induced in surface fish larvae by exposure to UV light, but there is no effect on behaviour in cavefish larvae. This finding is interpreted as representing a loss of a sleep-promoting response to DNA damage in the cavefish morph. However, induction of DNA damage is not measured in this experiment, so it is not certain if similar levels of DNA damage are induced in each group of intact larvae, nor how the amount of damage induced compares to the pre-existing levels of DNA damage in the cavefish versus the surface fish larvae. In both this study with A. mexicanus surface morphs and the previous experiments from Zada et al. in zebrafish, observed increases in immobility following UV radiation exposure are interpreted as following from UV-induced DNA damage. However, in interpreting these experiments it is important to note that the cavefish morphs are eyeless and blind. Intense UV radiation is aversive to fish, and it has previously been shown in zebrafish larvae that (at least some) behavioural responses to UV exposure depend on the presence of an intact retina and UV-sensitive cone photoreceptors (Guggiana-Nilo and Engert, 2016, doi:10.3389/fnbeh.2016.00160). It is premature to conclude that the lack of behavioural response to UV exposure in the cavefish is due to a different response to DNA damage, as their lack of eyes will likely inhibit a response to the UV stimulus. Indeed, were the equivalent zebrafish experiment from Zada et al. to be repeated with mutant larvae fish lacking the retinal basis for UV detection it might be found that in this case too, the effects of UV on behaviour are dependent on visual function. Such a finding should prompt a reappraisal of the interpretation that UV exposure's effects on fish sleep/locomotor behaviour are mediated by DNA damage. An additional note, relating to both Lloyd, Xia, et al., and Zada et al., is that though increases in immobility are induced following UV exposure, in neither study have assays of sensory responsiveness been performed during this period. As a decrease in sensory responsiveness is a key behavioural criterion for defining sleep, it is, therefore, unclear that this post-UV behaviour is genuinely increased sleep as opposed to a stress-linked suppression of locomotion due to the intensely aversive UV stimulus.

      The effects of UV exposure, in terms of causing damage to DNA, inducing DNA damage response and repair mechanisms, and in causing broader changes in gene expression are assessed in both surface and cavefish larvae, as well as in cell lines derived from these different morphs. Differences in the suite of DNA damage response mechanisms that are upregulated are shown to exist between surface fish and cavefish larvae, though at least some of this difference is likely to be due to differences in gene expression that may exist even without UV exposure (this is discussed further below).

      UV exposure induced DNA damage (as measured by levels of cyclobutene pyrimidine dimers) to a similar degree in cell lines derived from both surface fish and cave fish. However, γH2AX shows increased expression only in cells from the surface fish, suggesting induction of an increased DNA repair response in these surface morphs, corroborated by their cells' increased ability to repair damaged DNA constructs experimentally introduced to the cells in a subsequent experiment. This "host cell reactivation assay" is a very interesting assay for measuring DNA repair in cell lines, but the power of this approach might be enhanced by introducing these DNA constructs into larval neurons in vivo (perhaps by electroporation) and by tracking DNA repair in living animals. Indeed, in such a preparation, the relationship between DNA repair and sleep/wake state could be assayed.

      Comparing gene expression in tissues from young (here 1 year) and older (here 7-8 years) fish from both cavefish and surface fish morphs, the authors found that there are significant differences in the transcriptional profiles in brain and gut between young and old surface fish, but that for cavefish being 1 year old versus being 7-8 years old did not have a major effect on transcriptional profile. The authors take this as suggesting that there is a reduced transcriptional change occurring during aging and that the transcriptome of the cavefish is resistant to age-linked changes. This seems to be only one of the equally plausible interpretations of the results; it could also be the case that alterations in metabolic cellular and molecular mechanisms, and particularly in responses to DNA damage, in the cavefish mean that these fish adopt their "aged" transcriptome within the first year of life.

      A major weakness of the study in its current form is the absence of sleep deprivation experiments to assay the effects of sleep loss on the cellular and molecular parameters in question. Without such experiments, the supposed link of sleep to the molecular, cellular, and "aging" phenotypes remains tenuous. Although the argument might be made that the cavefish represent a naturally "sleep-deprived" population, the cavefish in this study are not sleep-deprived, rather they are adapted to a condition of reduced sleep relative to fish from surface populations. Comparing the effects of depriving fish from each morph on markers of DNA damage and repair, gut reactive oxygen species, and gene expression will be necessary to solidify any proposed link of these phenotypes to sleep.

      A second important aspect that limits the interpretability and impact of this study is the absence of information about circadian variations in the parameters measured. A relationship between circadian phase, light exposure, and DNA damage/repair mechanisms is known to exist in A. mexicanus and other teleosts, and differences exist between the cave and surface morphs in their phenomena (Beale et al. 2013, doi: 10.1038/ncomms3769). Although the present study mentions that their experiments do not align with these previous findings, they do not perform the appropriate experiments to determine if such a misalignment is genuine. Specifically, Beale et al. 2013 showed that white light exposure drove enhanced expression of DNA repair genes (including cpdp which is prominent in the current study) in both surface fish and cavefish morphs, but that the magnitude of this change was less in the cave fish because they maintained an elevated expression of these genes in the dark, whereas the darkness suppressed the expression of these genes in the surface fish. If such a phenomenon is present in the setting of the current study, this would likely be a significant confound for the UV-induced gene expression experiments in intact larvae, and undermine the interpretation of the results derived from these experiments: as samples are collected 90 minutes after the dark-light transition (ZT 1.5) it would be expected that both cavefish and surface fish larvae should have a clear induction of DNA repair genes (including cpdp) regardless of 90s of UV exposure. The data in Supplementary Figure 3 is not sufficient to discount this potentially serious confound, as for larvae there is only gene expression data for time points from ZT2 to ZT 14, with all of these time points being in the light phase and not capturing any dynamics that would occur at the most important timepoints from ZT0-ZT1.5, in the relevant period after dark-light transition. Indeed, an appropriate control for this experiment would involve frequent sampling at least across 48 hours to assess light-linked and developmentally-related changes in gene expression that would occur in 5-6dpf larvae of each morph independently of the exposure to UV.

      On a broader point, given the effects of both circadian rhythm and lighting conditions that are thought to exist in A. mexicanus (e.g. Beale et al. 2013) experiments involving measurements of DNA damage and repair, gene expression, and reactive oxygen species, etc. at multiple times across >1 24 hour cycle, in both light-dark and constant illumination conditions (e.g. constant dark) would be needed to substantiate the authors' interpretation that their findings indicate consistently altered levels of these parameters in the cavefish relative to the surface fish. Most of the data in this study is taken at only single time points.

      In summary, the authors show that there are differences in gene expression, activity of DNA damage response and repair pathways, response to UV radiation, and gut reactive oxygen species between the Pachón cavefish morph and the surface morph of Astyanax mexicanus. However, the data presented does not make the precise nature of these differences very clear, and the interpretation of the results appears to be overly strong. Furthermore, the evidence of a link between these morph-specific differences and sleep is unconvincing.

    1. eLife assessment

      This is an important characterization of mouse auditory cortex receptive field organization, using two-photon imaging of specific subpopulations. They demonstrate a degradation of tonotopic organization from the input to output neurons. The strength of the evidence is solid, but some controls are needed to further strengthen the conclusion.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Gu et al. employed novel viral strategies, combined with in vivo two-photon imaging, to map the tone response properties of two groups of cortical neurons in A1. The thalamocortical recipient (TR neurons) and the corticothalamic (CT neurons). They observed a clear tonotopic gradient among TR neurons but not in CT neurons. Moreover, CT neurons exhibited high heterogeneity of their frequency tuning and broader bandwidth, suggesting increased synaptic integration in these neurons. By parsing out different projecting-specific neurons within A1, this study provides insight into how neurons with different connectivity can exhibit different frequency response-related topographic organization.

      Strengths:

      This study reveals the importance of studying neurons with projection specificity rather than layer specificity since neurons within the same layer have very diverse molecular, morphological, physiological, and connectional features. By utilizing a newly developed rabies virus CSN-N2c GCaMP-expressing vector, the authors can label and image specifically the neurons (CT neurons) in A1 that project to the MGB. To compare, they used an anterograde trans-synaptic tracing strategy to label and image neurons in A1 that receive input from MGB (TR neurons).

      Weaknesses:

      - Perhaps as cited in the introduction, it is well known that tonotopic gradient is well preserved across all layers within A1, but I feel if the authors want to highlight the specificity of their virus tracing strategy and the populations that they imaged in L2/3 (TR neurons) and L6 (CT neurons), they should perform control groups where they image general excitatory neurons in the two depths and compare to TR and CT neurons, respectively. This will show that it's not their imaging/analysis or behavioral paradigms that are different from other labs.  

      - Figures 1D and G, the y-axis is Distance from pia (%). I'm not exactly sure what this means. How does % translate to real cortical thickness? 

      - For Figure 2G and H, is each circle a neuron or an animal? Why are they staggered on top of each other on the x-axis? If the x-axis is the distance from caudal to rostral, each neuron should have a different distance? Also, it seems like it's because Figure 2H has more circles, which is why it has more variation, thus not significant (for example, at 600 or 900um, 2G seems to have fewer circles than 2H).  

      - Similarly, in Figures 2J and L, why are the circles staggered on the y-axis now? And is each circle now a neuron or a trial? It seems they have many more circles than Figure 2G and 2H. Also, I don't think doing a correlation is the proper stats for this type of plot (this point applies to Figures 3H and 3J).

      - What does the inter-quartile range of BF (IQRBF, in octaves) imply? What's the interpretation of this analysis? I am confused as to why TR neurons show high IQR in HF areas compared to LF areas, which means homogeneity among TR neurons (lines 213 - 216). On the same note, how is this different from the BF variability?  Isn't higher IQR equal to higher variability?

      - Figure 4A-B, there are no clear criteria on how the authors categorize V, I, and O shapes. The descriptions in the Methods (lines 721 - 725) are also very vague.

    3. Reviewer #2 (Public Review):

      Summary:

      Gu and Liang et. al investigated how auditory information is mapped and transformed as it enters and exits an auditory cortex. They use anterograde transsynaptic tracers to label and perform calcium imaging of thalamorecipient neurons in A1 and retrograde tracers to label and perform calcium imaging of corticothalamic output neurons. They demonstrate a degradation of tonotopic organization from the input to output neurons.

      Strengths:

      The experiments appear well executed, well described, and analyzed.

      Weaknesses:

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether or not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording (1) CT neurons in upper layers, (2) TR in deeper layers, (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      (2) What percent of the neurons at the depths are CT neurons? Similar questions for TR neurons?

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      (4) Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, and V1.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high-frequency regions of the primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in the cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, the tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and the best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and reported that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      (3) The authors' definitions of neuronal response type in the methods need more quantitative detail. The authors state: ""Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels.". The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

    1. eLife assessment

      This study resolves a cryo-EM structure of the GPCR, human GPR30, which responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. Understanding the ligand and the mechanism of activation is important to the field of receptor signaling and potentially facilitates drug development targeting this receptor. While the overall structures are solid, the identification of the bicarbonate binding site is only partly supported by the structural data and cell-based functional assays, leaving a major aim of the study incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, which was recently identified as a bicarbonate receptor by the authors' lab. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. However, the main claim of the paper, the identification of the bicarbonate binding site, is only partly supported by the structural and functional data, leaving the study incomplete.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling seem solid. The authors perform fairly extensive unbiased mutagenesis to identify a host of positions that are important to G-protein signaling. To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study a particularly important contribution to the field.

      Weaknesses:

      Without higher resolution structures and/or additional experimental assessment of the binding pocket, the assignment of the bicarbonate remains highly speculative. The local resolution is especially poor in the ECL loop region where the ligand is proposed to bind (4.3 - 4 .8 Å range). Of course, sometimes it is difficult to achieve high structural resolution, but in these cases, the assignment of ligands should be backed up by even more rigorous experimental validation.

      The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. Thus, disruption of bicarbonate signaling by mutagenesis of the putative coordinating residues does not necessarily mean that bicarbonate binding has been disrupted. Moreover, the mutagenesis was apparently done prior to structure determination, meaning that residues proposed to directly surround bicarbonate binding, such as E218, were not experimentally validated. Targeted mutagenesis based on the structure would strengthen the story.

      Moreover, the proposed bicarbonate binding site is surprising in a chemical sense, as it is located within an acidic pocket. The authors cite several other structural studies to support the surprising observation of anionic bicarbonate surrounded by glutamate residues in an acidic pocket (references 31-34). However, it should be noted that in general, these other structures also possess a metal ion (sodium or calcium) and/or a basic sidechain (arginine or lysine) in the coordination sphere, forming a tight ion pair. Thus, the assigned bicarbonate binding site in GPR30 remains an anomaly in terms of the chemical properties of the proposed binding site.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work (PMID: 38413581). In the current body of work, they solved the first cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.21 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 4 extracellular pockets created by extracellular loops (ECLs) (Pockets A-D). Based on the polarity, location, and charge of each pocket, the authors hypothesized that pocket D is a good candidate for the bicarbonate binding site. To verify their structural observation, on top of the 10 mutations they generated in the previous work, the authors introduced another 11 mutations to map out the essential residues for the bicarbonate response on hGPR30. In addition, the human GPR30-G-protein complex model also allowed the authors to untangle the G-protein coupling mechanism of this special class A GPCR that plays an important role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communication publication (PMID: 38413581), this study was carefully designed, and the authors used mutagenesis and functional studies to confirm their structural observations. This work provided high-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 4 extracellular pockets created by ECLs (Pockets A-D). The authors were able to filter out 3 of them and identified that pocket D was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they carefully mapped out nine amino acids that are critical for receptor reactivity.

      Weaknesses:

      It is unclear how novel the aspects presented in the new paper are compared to the most recent Nature Communications publication (PMID: 38413581). Some areas of the manuscript appear to be mixed with the previous publication. The work is still impactful to the field. The new and novel aspects of this manuscript could be better highlighted.

      I also have some concerns about the TGFα shedding assay the authors used to verify their structural observation. I understand that this assay was also used in the authors' previous work published in Nature Communications. However, there are still several things in the current data that raised concerns:

      (1) The authors confirmed the "similar expression levels of HA-tagged hGPR30" mutants by WB in Supplemental Figure 1A and B. However, compared to the hGPR30-HA (~6.5 when normalized to the housekeeping gene, Na-K-ATPase), several mutants of the key amino acids had much lower surface expression: S134A, D210A, C207A had ~50% reduction, D125A had ~30% reduction, and Q215A and P71A had ~20% reduction. This weakens the receptor reactivity measured by the TGFα shedding assay.

      (2) In the previous work, the authors demonstrated that hGPR30 signals through the Gq signaling pathway and can trigger calcium mobilization. Given that calcium mobilization is a more direct measurement for the downstream signaling of hGPR30 than the TGFα shedding assay, pairing the mutagenesis study with the calcium assay will be a better functional validation to confirm the disruption of bicarbonate signaling.

      (3) It was quite confusing for Figure 4B that all statistical analyses were done by comparing to the mock group. It would be clearer to compare the activity of the mutants to the wild-type cell line.

      Additional concerns about the structural data include:

      (1) E218 was in close contact with bicarbonate in Figure 4D. However, there is no functional validation for this observation. Including the mutagenesis study of this site in the cell-based functional assay will strengthen this structural observation.

      (2) For the flow chart of the cryo-EM data processing in Supplemental data 2, the authors started with 10,148,422 particles after template picking, then had 441,348 Particles left after 2D classification/heterogenous refinement, and finally ended with 148,600 particles for the local refinement for the final map. There seems to be a lot of heterogeneity in this purified sample. GPCRs usually have flexible and dynamic loop regions, which explains the poor resolution of the ECLs in this case. Thus, a solid cell-based functional validation is a must to assign the bicarbonate binding pocket to support their hypothesis.

    4. Reviewer #3 (Public Review):

      Summary:

      GPR30 responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. However, it remains unclear how GPR30 recognizes bicarbonate ions. This paper presents the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate. The structure together with functional studies aims to provide mechanistic insights into bicarbonate recognition and G protein coupling.

      Strengths:

      The authors performed comprehensive mutagenesis studies to map the possible binding site of bicarbonate.

      Weaknesses:

      Owing to the poor resolution of the structure, some structural findings may be overclaimed.

      Based on EM maps shown in Figure 1a and Figure Supplement 2, densities for side chains in the receptor particularly in ECLs (around 4 Å) are poorly defined. At this resolution, it is unlikely to observe a disulfide bond (C130ECL1-C207ECl2) and bicarbonate ions. Moreover, the disulfide between ECL1 and ECL2 has not been observed in other GPCRs and the published structure of GPR30 (PMID: 38744981). The density of this disulfide bond could be noise.

      The authors observed a weak density in pocket D, which is accounted for by the bicarbonate ions. This ion is mainly coordinated by Q215 and Q138. However, the Q215A mutation only reduced but not completely abolished bicarbonate response, and the author did not present the data of Q138A mutation. Therefore, Q215 and Q138 could not be bicarbonate binding sites. While H307A completely abolished bicarbonate response, the authors proposed that this residue plays a structural role. Nevertheless, based on the structure, H307 is exposed and may be involved in binding bicarbonate. The assignment of bicarbonate in the structure is not supported by the data.

    1. eLife assessment

      This convincing study advances our understanding of the physiological consequences of the strong overexpression of non-toxic proteins in baker's yeast. The findings suggest that a massive protein burden results in nitrogen starvation and a shift in metabolism likely regulated via the TORC1 pathway, as well as defects in ribosome biogenesis in the nucleolus. The study presents findings and tools that are important for the cell biology and protein homeostasis fields.

    2. Reviewer #1 (Public Review):

      Summary:

      The study "Impact of Maximal Overexpression of a Non-toxic Protein on Yeast Cell Physiology" by Fujita et al. aims to elucidate the physiological impacts of overexpressing non-toxic proteins in yeast cells. By identifying model proteins with minimal cytotoxicity, the authors claim to provide insights into cellular stress responses and metabolic shifts induced by protein overexpression.

      Strengths:

      The study introduces a neutrality index to quantify cytotoxicity and investigates the effects of protein burden on yeast cell physiology. The study identifies mox-YG (a non-fluorescent fluorescent protein) and Gpm1-CCmut (an inactive glycolytic enzyme) as proteins with the lowest cytotoxicity, capable of being overexpressed to more than 40% of total cellular protein while maintaining yeast growth. Overexpression of mox-YG leads to a state resembling nitrogen starvation probably due to TORC1 inactivation, increased mitochondrial function, and decreased ribosomal abundance, indicating a metabolic shift towards more energy-efficient respiration and defects in nucleolar formation.

      Weaknesses:

      While the introduction of the neutrality index seems useful to differentiate between cytotoxicity and protein burden, the biological relevance of the effects of overexpression of the model proteins is unclear.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Fujita et al. characterized the neutrality indexes of several protein mutants in S. cerevisiae and uncovered that mox-YG and Gpm1-CCmut can be expressed as abundant as 40% of total proteins without causing severe growth defects. The authors then looked at the transcriptome and proteome of cells expressing excess mox-YG to investigate how protein burden affects yeast cells. Based on RNA-seq and mass-spectrometry results, the authors uncover that cells with excess mox-YG exhibit nitrogen starvation, respiration increase, inactivated TORC1 response, and decreased ribosomal abundance. The authors further showed that the decreased ribosomal amount is likely due to nucleoli defects, which can be partially rescued by nuclear exosome mutations.

      Strengths:

      Overall, this is a well-written manuscript that provides many valuable resources for the field, including the neutrality analysis on various fluorescent proteins and glycolytic enzymes, as well as the RNA-seq and proteomics results of cells overexpressing mox-YG. Their model on how mox-YG overexpression impairs the nucleolus and thus leads to ribosomal abundance decline will also raise many interesting questions for the field.

      Weaknesses:

      The authors concluded from their RNA-seq and proteomics results that cells with excess mox-YG expression showed increased respiration and TORC1 inactivation. I think it will be more convincing if the authors can show some characterization of mitochondrial respiration/membrane potential and the TOR responses to further verify their -omic results.

      In addition, the authors only investigated how overexpression of mox-YG affects cells. It would be interesting to see whether overexpressing other non-toxic proteins causes similar effects, or if there are protein-specific effects. It would be good if the authors could at least discuss this point considering the workload of doing another RNA-seq or mass-spectrum analysis might be too heavy.

    4. Reviewer #3 (Public Review):

      Summary:

      Protein overexpression is widely used in experimental systems to study the function of the protein, assess its (beneficial or detrimental) effects in disease models, or challenge cellular systems involved in synthesis, folding, transport, or degradation of proteins in general. Especially at very high expression levels, protein-specific effects and general effects of a high protein load can be hard to distinguish. To overcome this issue, Fujita et al. use the previously established genetic tug-of-war system to identify proteins that can be expressed at extremely high levels in yeast cells with minimal protein-specific cytotoxicity (high 'neutrality'). They focus on two versions of the protein mox-GFP, the fluorescent version and a point mutation that is non-fluorescent (mox-YG) and is the most 'neutral' protein on their screen. They find that massive protein expression (up to 40% of the total proteome) results in a nitrogen starvation phenotype, likely inactivation of the TORC1 pathway, and defects in ribosome biogenesis in the nucleolus.

      Strengths:

      This work uses an elegant approach and succeeds in identifying proteins that can be expressed at surprisingly high levels with little cytotoxicity. Many of the changes they see have been observed before under protein burden conditions, but some are new and interesting. This work solidifies previous hypotheses about the general effects of protein overexpression and provides a set of interesting observations about the toxicity of fluorescent proteins (that is alleviated by mutations that render them non-fluorescent) and metabolic enzymes (that are less toxic when mutated into inactive versions).

      Weaknesses:

      The data are generally convincing, however in order to back up the major claim of this work - that the observed changes are due to general protein burden and not to the specific protein or condition - a broader analysis of different conditions would be highly beneficial.

      Major points:

      (1) The authors identify several proteins with high neutrality scores but only analyze the effects of mox/mox-YG overexpression in depth. Hence, it remains unclear which molecular phenotypes they observe are general effects of protein burden or more specific effects of these specific proteins. To address this point, a proteome (and/or transcriptome) of at least a Gpm1-CCmut expressing strain should be obtained and compared to the mox-YG proteome. Ideally, this analysis should be done simultaneously on all strains to achieve a good comparability of samples, e.g. using TMT multiplexing (for a proteome) or multiplexed sequencing (for a transcriptome). If feasible, the more strains that can be included in this comparison, the more powerful this analysis will be and can be prioritized over depth of sequencing/proteome coverage.

      (2) The genetic tug-of-war system is elegant but comes at the cost of requiring specific media conditions (synthetic minimal media lacking uracil and leucine), which could be a potential confound, given that metabolic rewiring, and especially nitrogen starvation are among the observed phenotypes. I wonder if some of the changes might be specific to these conditions. The authors should corroborate their findings under different conditions. Ideally, this would be done using an orthogonal expression system that does not rely on auxotrophy (e.g. using antibiotic resistance instead) and can be used in rich, complex mediums like YPD. Minimally, using different conditions (media with excess or more limited nitrogen source, amino acids, different carbon source, etc.) would be useful to test the robustness of the findings towards changes in media composition.

      (3) The authors suggest that the TORC1 pathway is involved in regulating some of the changes they observed. This is likely true, but it would be great if the hypothesis could be directly tested using an established TORC1 assay.

      (4) The finding that the nucleolus appears to be virtually missing in mox-YG-expressing cells (Figure 6B) is surprising and interesting. The authors suggest possible mechanisms to explain this and partially rescue the phenotype by a reduction-of-function mutation in an exosome subunit. I wonder if this is specific to the mox-YG protein or a general protein burden effect, which the experiments suggested in point 1 should address. Additionally, could a mox-YG variant with a nuclear export signal be expressed that stays exclusively in the cytosol to rule out that mox-YG itself interferes with phase separation in the nucleus?

      Minor points:

      (5) It would be great if the authors could directly compare the changes they observed at the transcriptome and proteome levels. This can help distinguish between changes that are transcriptionally regulated versus more downstream processes (like protein degradation, as proposed for ribosome components).

    1. eLife assessment

      This paper reports important findings on giant organelle complexes containing endosomes and lysosomes (termed endosomal-lysosomal organelles form assembly structures [ELYSAs]) present in mouse oocytes and 1- to 2-cell embryos. The data showing the localization and dynamics of ELYSAs during oocyte/embryo maturation are convincing. This work will be of interest to general cell biologists and developmental biologists.

    2. Reviewer #1 (Public Review):

      In this manuscript, Satouh et al. report giant organelle complexes in oocytes and early embryos. Although these structures have often been observed in oocytes and early embryos, their exact nature has not been characterized. The authors named these structures "endosomal-lysosomal organelles form assembly structures (ELYSAs)". ELYSAs contain organelles such as endosomes, lysosomes, and probably autophagic structures. ELYSAs are initially formed in the perinuclear region and then migrate to the periphery in an actin-dependent manner. When ELYSAs are disassembled after the 2-cell stage, the V-ATPase V1 subunit is recruited to make lysosomes more acidic and active. The ELYSAs are most likely the same as the "endolysosomal vesicular assemblies (ELVAs)", reported by Elvan Böke's group earlier this year (Zaffagnini et al. doi.org/10.1016/j.cell.2024.01.031). However, it is clear that Satouh et al. identified and characterized these structures independently. These two studies could be complementary. Although the nature of the present study is generally descriptive, this paper provides valuable information about these giant structures. The data are mostly convincing, and only some minor modifications are needed for clarification and further explanation to fully understand the results.

    3. Reviewer #2 (Public Review):

      Satouh et al report the presence of spherical structures composed of endosomes, lysosomes, and autophagosomes within immature mouse oocytes. These endolysosomal compartments have been named as Endosomal-LYSosomal organellar Assembly (ELYSA). ELYSAs increase in size as the oocytes undergo maturation. ELYSAs are distributed throughout the oocyte cytoplasm of GV stage immature oocytes but these structures become mostly cortical in the mature oocytes. Interestingly, they tend to avoid the region which contains metaphase II spindle and chromosomes. They show that the endolysosomal compartments in oocytes are less acidic and therefore non-degradative but their pH decreases and becomes degradative as the ELYSAs begin to disassemble in the embryos post-fertilization. This manuscript shows that lysosomal switching does not happen during oocyte development, and the formation of ELYSAs prevents lysosomes from being activated. Structures similar to these ELYSAs have been previously described in mouse oocytes (Zaffagnini et al, 2024) and these vesicular assemblies are important for sequestering protein aggregates in the oocytes but facilitate proteolysis after fertilization. The current manuscript, however, provides further details of endolysosomal disassembly post-fertilization. Specifically, the V1-subunit of V-ATPase targeting the ELYSAs increases the acidity of lysosomal compartments in the embryos. This is a well-conducted study and their model is supported by experimental evidence and data analyses.

    4. Reviewer #3 (Public Review):

      Fertilization converts a cell defined as an egg to a cell defined as an embryo. An essential component of this switch in cell fate is the degradation (autophagy) of cellular elements that serve a function in the development of the egg but could impede the development of the embryo. Here, the authors have focused on the behavior during the egg-to-embryo transition of endosomes and lysosomes, which are cytoplasmic structures that mediate autophagy. By carefully mapping and tracking the intracellular location of well-established marker proteins, the authors show that in oocytes endosomes and lysosomes aggregate into giant structures that they term Endosomal LYSosomal organellar Assembl[ies] (ELYSA). Both the size distribution of the ELYSAs and their position within the cell change during oocyte meiotic maturation and after fertilization. Notably, during maturation, there is a net actin-dependent movement towards the periphery of the oocyte. By the late 2-cell stage, the ELYSAs are beginning to disintegrate. At this stage, the endo-lysosomes become acidified, likely reflecting the activation of their function to degrade cellular components.

      This is a carefully performed and quantified study. The fluorescent images obtained using well-known markers, using both antibodies and tagged proteins, support the interpretations, and the quantification method is sophisticated and clearly explained. Notably, this type of quantification of confocal z-stack images is rarely performed and so represents a real strength of the study. It provides sound support for the conclusions regarding changes in the size and position of the ELYSAs. Another strength is the use of multiple markers, including those that indicate the activity state of the endo-lysosomes. Altogether, the manuscript provides convincing evidence for the existence of ELYSAs and also for regulated changes in their location and properties during oocyte maturation and the first few embryonic cell cycles following fertilization.

      At present, precisely how the changes in the location and properties of the ELYSAs affect the function of the endo-lysosomal system is not known. While the authors' proposal that they are stored in an inactive state is plausible, it remains speculative. Nonetheless, this study lays the foundation for future work to address this question.

      Minor point: l. 299. If I am not mistaken, there is a typo. It should read that the inhibitors of actin polymerization prevent redistribution from the cytoplasm to the cortex during maturation.<br /> Minor point: A few statements in the Introduction would benefit from clarification. These are noted in the comments to the authors.

    1. eLife assessment

      The study describes a valuable new technology in the field of targeted protein degradation that allows identification of E3-ubiquitin ligases that target a protein of interest. The presented data are convincing, however, it is unclear whether the proposed system can be successfully used in high throughput applications. This technology will serve the community in the initial stages of developing targeted protein degraders.

    2. Reviewer #1 (Public Review):

      Summary:

      PROTACs are heterobifunctional molecules that utilize the Ubiquitin Proteasome System to selectively degrade target proteins within cells. Upon introduction to the cells, PROTACs capture the activity of the E3 ubiquitin ligases for ubiquitination of the targeted protein, leading to its subsequent degradation by the proteasome. The main benefit of PROTAC technology is that it expands the "druggable proteome" and provides numerous possibilities for therapeutic use. However, there are also some difficulties, including the one addressed in this manuscript: identifying suitable target-E3 ligase pairs for successful degradation. Currently, only a few out of about 600 E3 ligases are used to develop PROTAC compounds, which creates the need to identify other E3 ligases that could be used in PROTAC synthesis. Testing the efficacy of PROTAC compounds has been limited to empirical tests, leading to lengthy and often failure-prone processes. This manuscript addressed the need for faster and more reliable assays to identify the compatible pairs of E3 ligases-target proteins. The authors propose using the RiPA assay, which depends on rapamycin-induced dimerization of FKBP12 protein with FRB domain. The PROTAC technology is advancing rapidly, making this manuscript both timely and essential. The RiPA assay might be useful in identifying novel E3 ligases that could be utilized in PROTAC technology. Additionally, it could be used at the initial stages of PROTAC development, looking for the best E3 ligase for the specific target.

      The authors described an elegant assay that is scalable, easy-to-use, and applicable to a wide range of cellular models. This method allows for the quantitative validation of the degradation efficacy of a given pair of E3 ligase-target proteins, using luciferase activity as a measure. Importantly, the assay also enables the measurement of kinetics in living cells, enhancing its practicality.

      Strengths:

      (1) The authors have addressed the crucial needs that arise during PROTAC development. In the introduction, they nicely describe the advantages and disadvantages of the PROTAC technology and explain why such an assay is needed.

      (2) The study includes essential controls in experiments (important for generating new assay), such as using the FRB vector without E3 ligase as a negative control, testing different linkers (which may influence the efficacy of the degradation), and creating and testing K-less vectors to exclude the possibility of luciferase or FKBP12 ubiquitination instead of WDR5 (the target protein). Additionally, the position of the luc in the FKBP12 vector and the position of VHL in the FRB vector are tested. Different E3 ligases are tested using previously identified target proteins, confirming the assay's utility and accuracy.

      (3) The study identified a "new" E3 ligase that is suitable for PROTAC technology (FBXL).

      Weaknesses:

      It is not clear how feasible it would be to adapt the assay for high-throughput screens. In some experiments, the efficacy of WDR5 degradation tested by immunoblotting appears to be lower than luciferase activity (e.g., Figure 2G and H).

    3. Reviewer #2 (Public Review):

      Summary:

      Adhikari and colleagues developed a new technique, rapamycin-induced proximity assay (RiPA), to identify E3-ubiquitin (ub) ligases of a protein target, aiming at identifying additional E3 ligases that could be targeted for PROTAC generation or ligases that may degrade a protein target. The study is timely, as expanding the landscape of E3-ub ligases for developing targeted degraders is a primary direction in the field.

      Strengths:

      The study's strength lies in its practical application of the FRB:FKBP12 system. This system is used to identify E3-ub ligases that would degrade a target of interest, as evidenced by the reduction in luminescence upon the addition of rapamycin. This approach effectively mimics the potential action of a PROTAC.

      Weaknesses:

      (1) While the technique shows promise, its application in a discovery setting, particularly for high-throughput or unbiased E3-ub ligase identification, may pose challenges. The authors should provide more detailed insights into these potential difficulties to foster a more comprehensive understanding of RiPA's limitations.

      (2) While RiPA will help identify E3 ligases, PROTAC design would still be empirical. The authors should discuss this limitation. Could the technology be applied to molecular glue generation?

      (3) Controls to verify the intended mechanism of action are missing, such as using a proteasome inhibitor or VHL inhibitors/siRNA to verify on-target effects. Verification of the target E3 ligase complex after rapamycin addition via orthogonal approaches, such as IP, should be considered.

      Minor concern:

      The graphs in Figure 1E are missing.

    1. Reviewer #1 (Public Review):

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through β-arrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning.

      The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are.

      For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.

      It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022).

      What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared.

      I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class.

    2. Reviewer #2 (Public Review):

      Summary:

      The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics.

      Strengths:

      The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out.

      Weaknesses:

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case.

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues.

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures.

      (5) The last part of using a machine learning-based approach to analyse allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job.

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clairty what the distinctive features of two ligand binding mechanisms are.

    1. eLife assessment

      This study combines extensive published and new datasets to provide a useful single-cell multi-omics analysis of early cardiac lineage segregation, highlighting the mutual regulation of key regulators for cardiac specification. While the data presentation is robust, the computational methods for delineating cardiac lineage trajectories and the functional analyses are incomplete and require further clarification and additional experiments. If validated, these findings will be of significant interest to researchers in the fields of cardiac development and congenital heart disease.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors identified and described the transcriptional trajectories leading to CMs during early mouse development, and characterized the epigenetic landscapes that underlie early mesodermal lineage specification.

      The authors identified two transcriptomic trajectories from a mesodermal population to cardiomyocytes, the MJH and PSH trajectories. These trajectories are relevant to the current model for the First Heart Field (FHF) and the Second Heart Field (SHF) differentiation. Then, the authors characterized both gene expression and enhancer activity of the MJH and PSH trajectories, using a multiomics analysis. They highlighted the role of Gata4, Hand1, Foxf1, and Tead4 in the specification of the MJH trajectory. Finally, they performed a focused analysis of the role of Hand1 and Foxf1 in the MJH trajectory, showing their mutual regulation and their requirement for cardiac lineage specification.

      Strengths:

      The authors performed an extensive transcriptional and epigenetic analysis of early cardiac lineage specification and differentiation which will be of interest to investigators in the field of cardiac development and congenital heart disease. The authors considered the impact of the loss of Hand1 and Foxf1 in-vitro and Hand1 in-vivo.

      Weaknesses:

      The authors used previously published scRNA-seq data to generate two described transcriptomic trajectories.

      (1) Details of the re-analysis step should be added, including a careful characterization of the different clusters and maker genes, more details on the WOT analysis, and details on the time stamp distribution along the different pseudotimes. These details would be important to allow readers to gain confidence that the two major trajectories identified are realistic interpretations of the input data.

      The authors have also renamed the cardiac trajectories/lineages, departing from the convention applied in hundreds of papers, making the interpretation of their results challenging.

      (2) The concept of "reverse reasoning" applied to the Waddington-OT package for directional mass transfer is not adequately explained. While the authors correctly acknowledged Waddington-OT's ability to model cell transitions from ancestors to descendants (using optimal transport theory), the justification for using a "reverse reasoning" approach is missing. Clarifying the rationale behind this strategy would be beneficial.

      (3) As the authors used the EEM cell cluster as a starting point to build the MJH trajectory, it's unclear whether this trajectory truly represents the cardiac differentiation trajectory of the FHF progenitors:<br /> - This strategy infers that the FHF progenitors are mixed in the same cluster as the extra-embryonic mesoderm, but no specific characterization of potential different cell populations included in this cluster was performed to confirm this.

      - The authors identified the EEM cluster as a Juxta-cardiac field, without showing the expression of the principal marker Mab21l2 per cluster and/or on UMAPs.

      - As the FHF progenitors arise earlier than the Juxta-cardiac field cells, it must be possible to identify an early FHF progenitor population (Nkx2-5+; Mab21l2-) using the time stamp. It would be more accurate to use this FHF cluster as a starting point than the EEM cluster to infer the FHF cardiac differentiation trajectory.

      These concerns call into question the overall veracity of the trajectory analysis, and in fact, the discrepancies with prior published heart field trajectories are noted but the authors fail to validate their new interpretation. Because their trajectories are followed for the remainder of the paper, many of the interpretations and claims in the paper may be misleading. For example, these trajectories are used subsequently for annotation of the multiomic data, but any errors in the initial trajectories could result in errors in multiomic annotation, etc, etc.

      (4) As mentioned in the discussion, the authors identified the MJH and PSH trajectories as non-overlapping. But, the authors did not discuss major previously published data showing that both FHF and SHF arise from a common transcriptomic progenitor state in the primitive streak (DOI: 10.1126/science.aao4174; DOI: 10.1007/s11886-022-01681-w). The authors should consider and discuss the specifics of why they obtained two completely separate trajectories from the beginning, how these observations conflict with prior published work, and what efforts they have made at validation.

      (5) Figures 1D and E are confusing, as it's unclear why the authors selected only cells at E7.0. Also, panels 1D 'Trajectory' and 'Pseudotime' suggest that the CM trajectory moves from the PSH cells to the MJH. This result is confusing, and the authors should explain this observation.

      (6) Regarding the PSH trajectory, it's unclear how the authors can obtain a full cardiac differentiation trajectory from the SHF progenitors as the SHF-derived cardiomyocytes are just starting to invade the heart tube at E8.5 (DOI: 10.7554/eLife.30668).

      The above notes some of the discrepancies between the author's trajectory analysis and the historical cardiac development literature. Overall, the discrepancies between the author's trajectory analysis and the historical cardiac development literature are glossed over and not adequately validated.

      (7) The authors mention analyzing "activated/inhibited genes" from Peng et al. 2019 but didn't specify when Peng's data was collected. Is it temporally relevant to the current study? How can "later stage" pathway enrichment be interpreted in the context of early-stage gene expression?

      (8) Motif enrichment: cluster-specific DAEs were analyzed for motifs, but the authors list specific TFs rather than TF families, which is all that motif enrichment can provide. The authors should either list TF families or state clearly that the specific TFs they list were not validated beyond motifs.

      (9) The core regulatory network is purely predictive. The authors again should refrain from language implying that the TFs in the CRN have any validated role.

      Regarding the in vivo analysis of Hand1 CKO embryos, Figures 6 and 7:

      (10) How can the authors explain the presence of a heart tube in the E9.5 Hand1 CKO embryos (Figure 6B) if, following the authors' model, the FHF/Juxta-cardiac field trajectory is disrupted by Hand1 CKO? A more detailed analysis of the cardiac phenotype of Hand1 CKO embryos would help to assess this question.

      (11) The cell proportion differences observed between Ctrl and Hand1 CKO in Figure 6D need to be replicated and an appropriate statistical analysis must be performed to definitely conclude the impact of Hand1 CKO on cell proportions.

      (12) The in-vitro cell differentiations are unlikely to recapitulate the complexity of the heart fields in-vivo, but they are analyzed and interpreted as if they do.

      (13) The schematic summary of Figure 7F is confusing and should be adjusted based on the following considerations:<br /> (a) the 'Wild-type' side presents 3 main trajectories (SHF, Early HT and JCF), but uses a 2-color code and the authors described only two trajectories everywhere else in the article (aka MJH and PSH). It's unclear how the SHF trajectory (blue line) can contribute to the Early HT, when the Early HT is supposed to be FHF-associated only (DOI: 10.7554/eLife.30668). As mentioned previously in Major comment 3., this model suggests a distinction between FHF and JCF trajectories, which is not investigated in the article.<br /> (b) the color code suggests that the MJH (FHF-related) trajectory will give rise to the right ventricle and outflow tract (green line), which is contrary to current knowledge.

      Minor comments:

      (1) How genes were selected to generate Figure 1F? Is this a list of top differentially expressed genes over each pseudotime and/or between pseudotimes?

      (2) Regarding Figure 1G, it's unclear how inhibited signaling can have an increased expression of underlying genes over pseudotimes. Can the authors give more details about this analysis and results?

      (3) How do the authors explain the visible Hand1 expression in Hand1 CKO in Figure S7C 'EEM markers'? Is this an expected expression in terms of RNA which is not converted into proteins?

      (4) The authors do not address the potential presence of doublets (merged cells) within their newly generated dataset. While they mention using "SCTransform" for normalization and artifact removal, it's unclear if doublet removal was explicitly performed.

    3. Reviewer #2 (Public Review):

      Summary of goals:

      The aims of the study were to identify new lineage trajectories for the cardiac lineages of the heart, and to use computational and cell and animal studies to identify and validate new gene regulatory mechanisms involved in these trajectories.

      Strengths:

      The study addresses the long-standing yet still not fully answered questions of what drives the earliest specification mechanisms of the heart lineages. The introduction demonstrates a good understanding of the relevant lineage trajectories that have been previously established, and the significance of the work is well described. The study takes advantage of several recently published data sets and attempts t use these in combination to uncover any new mechanisms underlying early mesoderm/cardiac specification mechanisms. A strength of the study is the use of an in vitro model system (mESCs) to assess the functional relevance of the key players identified in the computational analysis, including innovative technology such as CRISPR-guided enhancer modulations. Lastly, the study generates mesoderm-specific Hand1 LOF embryos and assesses the differentiation trajectories in these animals, which represents a strong complementary approach to the in vitro and computational analysis earlier in the paper. The manuscript is clearly written and the methods section is detailed and comprehensive.

      Comments and Weaknesses:

      Overall: The computational analysis presented here integrates a large number of published data sets with one new data point (E7.0 single cell ATAC and RNA sequencing). This represents an elegant approach to identifying new information using available data. However, the data presentation at times becomes rather confusing, and relatively strong statements and conclusions are made based on trajectory analysis or other inferred mechanisms while jumping from one data set to another. The cell and in vivo work on Hand1 and Foxf1 is an important part of the study. Some additional experiments in both of these model systems could strongly support the novel aspects that were identified by the computational studies leading into the work.

      (1) Definition of MJH and PSH trajectory:<br /> The study uses previously published data sets to identify two main new differentiation trajectories: the MJH and the PSH trajectory (Figure 1). A large majority of subsequent conclusions are based on in-depth analysis of these two trajectories. For this reason, the method used to identify these trajectories (WTO, which seems a highly biased analysis with many manually chosen set points) should be supported by other commonly used methods such as for example RNA velocity analysis. This would inspire some additional confidence that the MJH and PSH trajectories were chosen as unbiased and rigorous as possible and that any follow-up analysis is biologically relevant.

      (2) Identification of MJH and PSH trajectory progenitors:<br /> The study defines various mesoderm populations from the published data set (Figure 1A-E), including nascent mesoderm, mixed mesoderm, and extraembryonic mesoderm. It further assigns these mesoderm populations to the newly identified MJH/PSH trajectories. Based on the trajectory definition in Figure 1A it appears that both trajectories include all 3 mesoderm populations, albeit at different proportions and it seems thus challenging to assign these as unique progenitor populations for a distinct trajectory, as is done in the epigenetic study by comparing clusters 8 (MJH) and s (PSH)(Figure 2). Along similar lines, the epigenetic analysis of clusters 2 and 8 did not reveal any distinct differences in H3K4m1, H3K27ac, or H3K4me3 at any of the time points analyzed (Figure 2F). While conceptually very interesting, the data presented do not seem to identify any distinct temporal patterns or differences in clones 2 and 8 (Figure 2H), and thus don't support the conclusion as stated: "the combined transcriptome and chromatin accessibility analysis further supported the early lineage segregation of MJH and the epigenetic priming at gastrulation stage for early cardiac genes".

      (3) Function of Hand1 and Foxf1 during early cardiac differentiation:<br /> The study incorporated some functional studies by generating Hand1 and Foxf1 KO mESCs and differentiated them into mesoderm cells for RNA sequencing. These lines would present relevant tools to assess the role of Hand1 and Foxf1 in mesoderm formation, and a number of experiments would further support the conclusions, which are made for the most part on transcriptional analysis. For example, the study would benefit from quantification of mesoderm cells and subsequent cardiomyocytes during differentiation (via IF, or more quantitatively, via flow cytometry analysis). These data would help interpret any of the findings in the bulk RNAseq data, and help to assess the function of Hand1 and Foxf1 in generating the cardiac lineages. Conclusions such as "the analysis indicated that HAND1 and FOXF1 could dually regulate MJH specification through directly activating the MJH specific genes and inhibiting PSH specific genes" seem rather strong given the data currently provided.

      (4) Analysis of Hand1 cKO embryos:<br /> Adding a mouse model to support the computational analysis is a strong way to conclude the study. Given the availability of these early embryos, some of the findings could be strengthened by performing a similar analysis to Figure 7B&C and by including some of the specific EEM markers found to be differentially regulated to complement the structural analysis of the embryos.

      (5) Current findings in the context of previous findings:<br /> The introduction carefully introduces the concept of lineage specification and different progenitor pools. Given the enormous amount of knowledge already available on Hand1 and Foxf1, and their role in specific lineages of the early heart, some of this information should be added, ideally to the discussion where it can be put into context of what the present findings add to the existing understanding of these transcription factors and their role in early cardiac specification.

    4. Reviewer #3 (Public Review):

      (1) In Figure 1A, could the authors justify using E8.5 CMs as the endpoint for the second lineage and better clarify the chamber identities of the E8.5 CMs analysed? Why are the atrial genes in Figure 1C of the PSH trajectory not present in Table S1.1, which lists pseudotime-dependent genes for the MJH/PSH trajectories from Figure 1F?

      (2) Could the authors increase the resolution of their trajectory and genomic analyses to distinguish between the FHF (Tbx5+ HCN4+) and the JCF (Mab21l2+/ Hand1+) within the MJH lineage? Also, clarify if the early extraembryonic mesoderm contributes to the FHF.

      (3) The authors strongly assume that the juxta-cardiac field (JCF), defined by Mab21l2 expression at E7.5 in the extraembryonic mesoderm, contributes to CMs. Could the authors explain the evidence for this? Could the authors identify Mab21l2 expression in the left ventricle (LV) myocardium and septum transversum at E8.5 (see Saito et al., 2013, Biol Open, 2(8): 779-788)? If such a JCF contribution to CMs exists, the extent to which it influences heart development should be clarified or discussed.

      (4) Could the authors distinguish the Hand1+ pericardium from JCF progenitors in their single-cell data and explain why they excluded other cell types, such as the endocardium/endothelium and pericardium, or even the endoderm, as endpoints of their trajectory analysis? At the NM and MM mesoderm stages, how did the authors distinguish the earliest cardiac cells from the surrounding developing mesoderm?

      (5) Could the authors contrast their trajectory analysis with those of Lescroart et al. (2018), Zhang et al., Tyser et al., and Krup et al.?

      (6) Previous studies suggest that Mesp2 expression starts at E8 in the presomitic mesoderm (Saga et al., 1997). Could the authors provide in situ hybridization or HCR staining to confirm the early E7 Mesp2 expression suggested by the pseudo-time analysis of the second lineage.

      (7) Could the authors also confirm the complementary Hand1 and Lefty2 expression patterns at E7 using HCR or in situ hybridization? Hand1 expression in the first lineage is plausible, considering lineage tracing results from Zhang et al.

      (8) Could the authors explain why Hand1 and Lefty2+ cells are more likely to be multipotent progenitors, as mentioned in the text?

      (9) Could the authors comment on the low Mesp1 expression in the mesodermal cells (MM) of the MJH trajectory at E7 (Figure 1D)? Is Mesp1 transiently expressed early in MJH progenitors and then turned off by E7? Have all FHF/JCF/SHF cells expressed Mesp1?

      (10) Could the authors clarify if their analysis at E7 comprises a mixture of embryonic stages or a precisely defined embryonic stage for both the trajectory and epigenetic analyses? How do the authors know that cells of the second lineage are readily present in the E7 mesoderm they analysed (clusters 0, 1, and 2 for the multiomic analysis)?

      (11) Could the authors further comment on the active Notch signaling observed in the first and second lineages, considering that Notch's role in the early steps of endocardial lineage commitment, but not of CMs, during gastrulation has been previously described by Lescroart et al. (2018)?

      (12) In cluster 8, Figure 2D, it seems that levels of accessibility in cluster 8 are relatively high for genes associated with endothelium/endocardium development in addition to MJH genes. Could the authors comment and/or provide further analysis?

      (13) Can the authors clarify why they state that cluster 8 DAEs are primed before the full activation of their target genes, considering that Bmp4 and Hand1 peak activities seem to coincide with their gene expression in Figure 2G?

      (14) Did the authors extend the multiomic analysis to Nanog+ epiblast cells at E7 and investigate if cardiac/mesodermal priming exists before mesodermal induction (defined by T/Mesp1 onset of expression)?

      (15) In the absence of duplicates, it is impossible to statistically compare the proportions of mesodermal cell populations in Hand1 wild-type and knockout (KO) embryos or to assess for abnormal accumulation of PS, NM, and MM cells. Could the authors analyse the proportions of cells by careful imaging of Hand1 wild-type and KO embryos instead?

      (16) Could the authors provide high-resolution images for Figure 7 B-C-D as they are currently hard to interpret?

    1. eLife assessment

      This study represents a valuable addition to the catalog of mitochondrial proteins. With the use of methodology based on the bi-genomic split-GFP technology, the authors generate convincing data, including dually localized proteins and topological information, under various growth conditions in yeast. The study represents a starting point for further functional and/or mechanistic studies on mitochondrial protein biogenesis.

    2. Reviewer #1 (Public Review):

      Summary:

      The study conducted by the Shouldiner's group advances the understanding of mitochondrial biology through the utilization of their bi-genomic (BiG) split-GFP assay, which they had previously developed and reported. This research endeavors to consolidate the catalog of matrix and inner membrane mitochondrial proteins. In their approach, a genetic framework was employed wherein a GFP fragment (GFP1-10) is encoded within the mitochondrial genome. Subsequently, a collection of strains was created, with each strain expressing a distinct protein tagged with the GFP11 fragment. The reconstitution of GFP fluorescence occurs upon the import of the protein under examination into the mitochondria.

      Strengths:

      Notably, this assay was executed under six distinct conditions, facilitating the visualization of approximately 400 mitochondrial proteins. Remarkably, 50 proteins were conclusively assigned to mitochondria for the first time through this methodology. The strains developed and the extensive dataset generated in this study serve as a valuable resource for the comprehensive study of mitochondrial biology. Specifically, it provides a list of 50 "eclipsed" proteins whose role in mitochondria remains to be characterized.

      Weaknesses:

      The work could include some functional studies of at least one of the newly identified 50 proteins.

    3. Reviewer #2 (Public Review):

      The authors addressed the question of how mitochondrial proteins that are dually localized or only to a minor fraction localized to mitochondria can be visualized on the whole genome scale. For this, they used an established and previously published method called BiG split-GFP, in which GFP strands 1-10 are encoded in the mitochondrial DNA and fused the GFP11 strand C-terminally to the yeast ORFs using the C-SWAT library. The generated library was imaged under different growth and stress conditions and yielded positive mitochondrial localization for approximately 400 proteins. The strength of this method is the detection of proteins that are dually localized with only a minor fraction within mitochondria, which so far has hampered their visualization due to strong fluorescent signals from other cellular localizations. The weakness of this method is that due to the localization of the GFP1-10 in the mitochondrial matrix, only matrix proteins and IM proteins with their C-termini facing the matrix can be detected. Also, proteins that are assembled into multimeric complexes (which will be the case for probably a high number of matrix and inner membrane-localized proteins) resulting in the C-terminal GFP11 being buried are likely not detected as positive hits in this approach. Taking these limitations into consideration, the authors provide a new library that can help in the identification of eclipsed protein distribution within mitochondria, thus further increasing our knowledge of the complete mitochondrial proteome. The approach of global tagging of the yeast genome is the logical consequence after the successful establishment of the BiG split-GFP for mitochondria. The authors also propose that their approach can be applied to investigate the topology of inner membrane proteins, however, for this, the inherent issue remains that it cannot be excluded that even the small GFP11 tag can impact on protein biogenesis and topology. Thus, the approach will not overcome the need to assess protein topology analysis via biochemical approaches on endogenous untagged proteins.

    4. Reviewer #3 (Public Review):

      Summary:

      Here, Bykov et al move the bi-genomic split-GFP system they previously established to the genome-wide level in order to obtain a more comprehensive list of mitochondrial matrix and inner membrane proteins. In this very elegant split-GFP system, the longer GFP fragment, GFP1-10, is encoded in the mitochondrial genome and the shorter one, GFP11, is C-terminally attached to every protein encoded in the genome of yeast Saccharomyces cerevisiae. GFP fluorescence can therefore only be reconstituted if the C-terminus of the protein is present in the mitochondrial matrix, either as part of a soluble protein, a peripheral membrane protein, or an integral inner membrane protein. The system, combined with high-throughput fluorescence microscopy of yeast cells grown under six different conditions, enabled the authors to visualize ca. 400 mitochondrial proteins, 50 of which were not visualised before and 8 of which were not shown to be mitochondrial before. The system appears to be particularly well suited for analysis of dually localized proteins and could potentially be used to study sorting pathways of mitochondrial inner membrane proteins.

      Strengths:

      Many fluorescence-based genome-wide screens were previously performed in yeast and were central to revealing the subcellular location of a large fraction of yeast proteome. Nonetheless, these screens also showed that tagging with full-length fluorescent proteins (FP) can affect both the function and targeting of proteins. The strength of the system used in the current manuscript is that the shorter tag is beneficial for the detection of a number of proteins whose targeting and/or function is affected by tagging with full-length FPs.

      Furthermore, the system used here can nicely detect mitochondrial pools of dually localized proteins. It is especially useful when these pools are minor and their signals are therefore easily masked by the strong signals coming from the major, nonmitochondrial pools of the proteins.

      Weaknesses:

      My only concern is that the biological significance of the screen performed appears limited. The dataset obtained is largely in agreement with several previous proteomic screens but it is, unfortunately, not more comprehensive than them, rather the opposite. For proteins that were identified inside mitochondria for the first time here or were identified in an unexpected location within the organelle, it remains unclear whether these localizations represent some minor, missorted pools of proteins or are indeed functionally important fractions and/or productive translocation intermediates. The authors also allude to several potential applications of the system but do little to explore any of these directions.

    1. eLife assessment

      This important study examines the extent to which distinct developmental pathways that result in alternative morphs correlate with transcriptome differences in a marine annelid, Streblospio benedicti. The strengths of the study include the experimental design and dense temporal sampling, which together provide convincing evidence that the two morphs can be clearly distinguished at the transcriptome level, despite relatively modest overall differences. The work will be of particular interest to students of the evolution of development.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Overall, this study provides a meticulous comparison of developmental transcriptomes between two sub-species of the annelid Streblospio benedicti. Different lineages of S. benedicti maintain one of two genetically programmed alternative life histories, the ancestral planktotrophic or derived lecithotrophic forms of development. This contrast is also seen at the inter-species level in many marine invertebrate taxa, such as echinoderms and molluscs. The authors report relatively (surprisingly?) modest differences in transcriptomes overall, but also find some genes whose expression is essentially morph-specific (which they term "exclusive").

      Strengths:<br /> The study is based on dense and appropriately replicated sampling of early development. The tight clustering of each stage/morph combination in PCA space suggests the specimens were accurately categorized. The similar overall trajectories of the two morphs was surprising to me for two stage: 1) the earliest stage (16-cell), at which we might expect maternal differences due to the several-fold difference in zygote size, and 2) the latest stage (1-week), where there appears to be the most obvious morphological difference. This is why we need to do experiments!

      The examination of F1 hybrids was another major strength of the study. It also produced one of the most surprising results: though intermediate in phenotype, F1 embryos have the most distinct transcriptomes, and reveal a range of fixed, compensatory differences in the parental lines. Further, the F1 lack expression of nearly all transcripts identified as morph-specific in the pure parental lines. Since the F1 larvae present intermediate traits combining the features of both morphs, this implies that morph-specific transcripts are not actually necessary for morph-specific traits. This is interesting and somewhat counter to what one might naively expect.

      Weaknesses:<br /> Overall I really enjoyed this paper, and in its revised form it addresses some concerns I had in the first version. I still see a few places where it can be tightened and made more insightful.

    3. Reviewer #2 (Public Review):

      The manuscript by Harry and Zakas determined the extent to which gene expression differences contribute to developmental divergence by using a model that has two distinct developmental morphs within a single species. Although the authors did collect a valuable dataset and trends in differential expression between the two morphs of S. benedicti were presented, we found limitations about the methods, system, and resources that the authors should address.

      We have two major points:

      (1) Background information about the biological system needs to be clarified in the introduction of this manuscript. The authors stated that F1 offspring can have intermediate larval traits compared to the parents (Line 81). However, the authors collected F1 offspring at the same time as the mother in the cross. If offspring have intermediate larval traits, their developmental timeline might be different than both parents and necessitate the collection of offspring at different times to obtain the same stages as the parents. Could the authors (1) explain why they collected offspring at the same time as parents given that other literature and Line 81 state these F1 offspring develop at intermediate rates, and (2) add the F1 offspring to Figure 1 to show morphological and timeline differences in development?

      Additionally, the authors state (Lines 83-85) that they detail the full-time course of embryogenesis for both the parents and the F1 crosses. However, we do not see where the authors have reported the full-time course for embryogenesis of the F1 offspring. Providing this information would shape the remaining results of the manuscript.

      (2) We have several concerns about the S. benedicti genome and steps regarding the read mapping for RNA-seq:

      The S. benedicti genome used (Zakas et al. 2022) was generated using the PP morph. The largest scaffolds of this assembly correspond to linkage groups, showing the quality of this genome. The authors should point out in the Methods and/or Results sections that the quality of this genome means that PP-specific gene expression can be quantified well. However, the challenges and limitations of mapping LL-specific expression data to the PP genome should be discussed.

      It is possible that the authors did not find exclusive gene expression in the LL morph because they require at least one gene to be turned on in one morph as part of the data-cleaning criteria. Because the authors are comparing all genes to the PP morph, they could be missing true exclusive genes responsible for the biological differences between the two morphs. Did they make the decision to only count genes expressed in one stage of the other morph because the gene models and mapping quality led to too much noise?

      The authors state that the mapping rates between the two morphs are comparable (Supplementary Figure 1). However, there is a lot of variation in mapping the LL individuals (~20% to 43%) compared to the PP individuals. What is the level of differentiation within the two morphs in the species (pi and theta)? The statistical tests for this comparison should be added and the associated p-value should be reported. The statistical test used to compare mapping rates between the two morphs may be inappropriate. The authors used Salmon for their RNA alignment and differential expression analysis, but it is possible that a different method would be more appropriate. For example, Salmon has some limitations as compared to Kallisto as others have noted. The chosen statistical test should be explained, as well as how RNA-seq data are processed and interpreted.

      What about the read mapping rate and details for the F1 LP and PL individuals? How did the offspring map to the P genome? These details should be included in Supplementary Figure 1. Could the authors also provide information about the number of genes expressed at each stage in the F1 LP and PL samples in S Figure 2? How many genes went into the PCA? Many of these details are necessary to evaluate the F1 RNA-seq analyses.

      Generally, the authors need to report the statistics used in data processing more thoroughly. The authors need to report the statistics used to (1) process and evaluate the RNA-seq data and (2) determine the significance between the two morphs (Supplementary Figures 1 and 2).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Overall, this study provides a meticulous comparison of developmental transcriptomes between two sub-species of the annelid Streblospio benedicti. Different lineages of S. benedicti maintain one of two genetically programmed alternative life histories, the ancestral planktotrophic or derived lecithotrophic forms of development. This contrast is also seen at the inter-species level in many marine invertebrate taxa, such as echinoderms and molluscs. The authors report relatively (surprisingly?) modest differences in transcriptomes overall but also find some genes whose expression is essentially morph-specific (which they term "exclusive").

      Strengths:

      The study is based on a dense and appropriately replicated sampling of early development. The tight clustering of each stage/morph combination in PCA space suggests the specimens were accurately categorized. The similar overall trajectories of the two morphs were surprising to me for two stages: 1) the earliest stage (16-cell), at which we might expect maternal differences due to the several-fold difference in zygote size, and 2) the latest stage (1-week), where there appears to be the most obvious morphological difference. This is why we need to do experiments!

      The examination of F1 hybrids was another major strength of the study. It also produced one of the most surprising results: though intermediate in phenotype, F1 embryos have the most distinct transcriptomes, and reveal a range of fixed, compensatory differences in the parental lines.

      Weaknesses:

      Overall I really enjoyed this paper, but I see a few places where it can be tightened and made more insightful. These relate to better defining the basis for "exclusive" expression (regulation or gene presence/absence?), providing more examples of how specific genes related to trophic mode behave, and placing the study in the context of similar work in other phyla.

      As suggested, we changed the term “exclusive expression” to “morph-specific” expression throughout the paper to clarify which genes are only expressed in one morph. We also added references to similar work in other phyla such as recent work on lecithotrophic and planktotrophic development in species of Heliocidaris sea urchins in the 4th paragraph of the discussion. We added additional data about the F1 hybrids in “Gene expression of Genetic Crosses” section and the new Figure 8B. We find that gene expression in F1 offspring is divided between matching the maternal and paternal gene expression patterns, with slightly more genes matching paternal expression.

      Reviewer #2 (Public Review):

      The manuscript by Harry and Zakas determined the extent to which gene expression differences contribute to developmental divergence by using a model that has two distinct developmental morphs within a single species. Although the authors did collect a valuable dataset and trends in differential expression between the two morphs of S. benedicti were presented, we found limitations about the methods, system, and resources that the authors should address.

      We have two major points:

      (1) Background information about the biological system needs to be clarified in the introduction of this manuscript. The authors stated that F1 offspring can have intermediate larval traits compared to the parents (Line 81). However, the authors collected F1 offspring at the same time as the mother in the cross. If offspring have intermediate larval traits, their developmental timeline might be different than both parents and necessitate the collection of offspring at different times to obtain the same stages as the parents. Could the authors (1) explain why they collected offspring at the same time as parents given that other literature and Line 81 state these F1 offspring develop at intermediate rates, and (2) add the F1 offspring to Figure 1 to show morphological and timeline differences in development?

      Additionally, the authors state (Lines 83-85) that they detail the full-time course of embryogenesis for both the parents and the F1 crosses. However, we do not see where the authors have reported the full-time course for embryogenesis of the F1 offspring. Providing this information would shape the remaining results of the manuscript.

      (2) We have several concerns about the S. benedicti genome and steps regarding the read mapping for RNA-seq:

      The S. benedicti genome used (Zakas et al. 2022) was generated using the PP morph. The largest scaffolds of this assembly correspond to linkage groups, showing the quality of this genome. The authors should point out in the Methods and/or Results sections that the quality of this genome means that PP-specific gene expression can be quantified well. However, the challenges and limitations of mapping LL-specific expression data to the PP genome should be discussed.

      It is possible that the authors did not find exclusive gene expression in the LL morph because they require at least one gene to be turned on in one morph as part of the data-cleaning criteria. Because the authors are comparing all genes to the PP morph, they could be missing true exclusive genes responsible for the biological differences between the two morphs. Did they make the decision to only count genes expressed in one stage of the other morph because the gene models and mapping quality led to too much noise?

      The authors state that the mapping rates between the two morphs are comparable (Supplementary Figure 1). However, there is a lot of variation in mapping the LL individuals (~20% to 43%) compared to the PP individuals. What is the level of differentiation within the two morphs in the species (pi and theta)? The statistical tests for this comparison should be added and the associated p-value should be reported. The statistical test used to compare mapping rates between the two morphs may be inappropriate. The authors used Salmon for their RNA alignment and differential expression analysis, but it is possible that a different method would be more appropriate. For example, Salmon has some limitations as compared to Kallisto as others have noted. The chosen statistical test should be explained, as well as how RNA-seq data are processed and interpreted.

      What about the read mapping rate and details for the F1 LP and PL individuals? How did the offspring map to the P genome? These details should be included in Supplementary Figure 1. Could the authors also provide information about the number of genes expressed at each stage in the F1 LP and PL samples in S Figure 2? How many genes went into the PCA? Many of these details are necessary to evaluate the F1 RNA-seq analyses.

      Generally, the authors need to report the statistics used in data processing more thoroughly. The authors need to report the statistics used to (1) process and evaluate the RNA-seq data and (2) determine the significance between the two morphs (Supplementary Figures 1 and 2).

      (1) We clarified in the methods that F1 embryos are collected at the same stage (not absolute time) as the parental types. So the “16-cell” stage is comparable across planktotrophic, lecithotrophic and F1 offspring regardless of absolute time taken to reach that stage (which differs by ~3 hours- Figure 1).

      Figure 2A details every time point collected for all crosses. As mentioned in the methods, we were unable to collect two timepoints for one set of crosses (LP) due to limited tissue. However, we still cover the full development time from “16 cell” through “swimming larvae” stages, which is the full larval development time.

      (2) We appreciate the reviewer's concerns regarding the mapping to the reference genome. The S. benedicti genome is a largely complete and contiguous chromosome-length genome which we have now highlighted in the manuscript. However, the reference is only for the planktotrophic morph. So it is certainly possible that there could be mapping bias for lecithotrophic reads or F1 reads, as we point out in the discussion. While some bias is certainly possible, it is unlikely to be driving major differences in the results. We performed several tests to demonstrate this:

      (1) We conducted two-sided T-tests of the mapping rates between all sample groups in our dataset (PP, LL, PL, LP)  to determine if there were significant differences in mapping rates among the populations. No significant differences were found. The specific results of these statistical tests are included in the updated manuscript in supplementary figure 1 and are as follows:

      Author response table 1.

      (2) In response to the comment about sequence level divergence affecting mapping rate, we estimated pi (nucleotide diversity within a population) and dxy (genomic divergence between two populations) based on the sampled transcriptomic data of our Planktotrophic and Lecithotrophic populations. We used PIXY (Korunes, K.L. and Samuk, K., 2021) with its standard settings to estimate these values, with variant call files in bcf format produced with bcftools - one for all planktotrophic samples and one for all lecithotrophic samples in our dataset. We found that across regions of the transcriptome, the difference in pi between Planktotrophs and Lecithotrophs was between 0.11% and 4.2%. Genomic divergence across the transcriptome is also relatively minor: estimates of dxy ranged from 0.0049 to 0.0076. Given that these estimates show relatively modest differences in nucleotide diversity and overall sequence divergence, we maintain that it is unlikely that they significantly impact the results described in this study. From what we have seen in the literature, these values are not outside of other population studies that are mapping to a species reference derived from one population.

      We added the mapping rates of all samples in the Supplement (SFig. 1) as requested. We added the number of genes expressed at each stage in the Supplement (SFig. 2) as requested. We have also provided further details and figures (Fig 8B) on read mapping rates and statistics used in data processing, including those for F1 RNA-seq data.

    1. eLife assessment

      This study presents valuable findings on how the endocannabinoid system is involved in endometriosis progression using CNR1 and CNR2 knockout (KO) mouse models. The evidence supporting the authors' claims is incomplete; including bulk RNA-seq, flow cytometry, and imaging mass cytometry would have strengthened the study. This work might be of interest to medical scientists working on endometriosis.

    2. Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates both bulk and single-cell RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involves a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

    3. Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interstingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      L386: Role of CNR2 in T cells: Finding nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      Interpretation of the results is well-described in discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees L324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. Please add the results of the statistical analyses and analyzed sample numbers. L444-450 cannot be reviewed without them.

      This reviewer agrees L498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      The authors addressed all my concerns. I do not have any comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates both bulk and single-cell RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involve a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

      We appreciate the reviewer's thoughtful and constructive feedback. We agree that the additional measurements of lesion size/burden and histopathology would provide valuable insights into the specific contributions of CNR1 and CNR2 to endometriosis progression. However, the focus of this study was on assessing the alterations in complex immune microenvironment due to the absence of CNR1 and CNR2, given their close relation in regulating immune cell populations. We will plan to incorporate these measurements in future studies to further strengthen the understanding of the disease pathogenesis. Regarding the potential effects of global knockout, the reviewer raises a valid concern. To address this, we will explore cell and/or tissue-specific knockout models in future experiments to better isolate the direct effects of CNR1 and CNR2 on the disease process, while minimizing potential confounding factors from systemic alterations.

      Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and the microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interestingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions, but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      In line 386: Role of CNR2 in T cells. The finding that nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      The interpretation of the results is well-described in the Discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees with lines 324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. The results of the statistical analyses and analyzed sample numbers should be added. Lines 444-450 cannot be reviewed without them.

      This reviewer agrees with lines 498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      We would like to thank the reviewer for insightful comments, suggestions and acknowledging the importance of the work presented in this manuscript.

      Regarding 7-day time point, we have provided rationale in lines 479-481, but agree that it isn’t sufficient and hence we have provided additional details on the selection of the 7-day time point for the experiments in methods section (Mouse model of EM). We have also noted the suggestion on providing comparison of differentially expressed genes in the eutopic endometrium vs ectopic lesions. Since there are publications comparing the eutopic vs ectopic gene expression patterns (PMIDs: 33868805 and 18818281), including a study exploring the ECS genes in the endometrium throughout different menstrual cycles (PMID: 35672435), we believe additional analysis using the same dataset may not yield new information. However, we see the value in reviewer’s comment, and we will look at the gene expression patterns in the uterine vs endometriosis like lesions in our future studies with tissue or cell specific CNR1 and CNR2 knockout models to understand functional relevance of ECS in endometriosis initiation.

      Since the IMC study was exploratory for proof of concept, we did not have enough biological replicates for meaningful statistical validation (n = 2-3). We have clarified this information in the methods, results, and figure legends for appropriately representing the limitations of the current setup.

      Finally, we appreciate the feedback on the section discussing retrograded menstrual debris. Even though the menstrual debris may not be decidualized, some endometriotic lesions have the ability to decidualize based on their response to estrogen and progesterone in a cycling manner (PMID: 26450609), similar to the endometrium in the uterine cavity. We have clarified this in the revised MS.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      The mechanism of how alterations in ECS contribute to the observed cellular and molecular changes is unclear. Connecting CNR1 or CNR2 function to a specific cell type or cellular process would provide a more detailed understanding of how dysregulated ECS contributes to endometriosis pathogenesis.

      We agree that integrating the functions of CNR1 or CNR2 to specific cell types or cellular processes would strengthen the mechanistic insights presented in our study. This would help elucidate specific pathways by which dysregulated ECS leads to the alterations in immune cell populations, gene expression profiles, and other key aspects of endometriosis development and progression. This is a rapidly evolving field and at this stage, we do not have published information to reflect on this aspect in the revised manuscript.

      (1) As mentioned in the text, the ECS components being studied are widely expressed and may affect multiple aspects of endometriosis pathogenesis and symptomatology. However, the cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although these limitations are mentioned in the discussion, it is important to know if global CNR1 and CNR2 KO affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or if preexisting alterations in host or donor tissues affect lesion development in the surgically induced, syngeneic mouse model of endometriosis. This would also be the case in studies on immune system dysfunction or lesion microenvironment, as it is possible preexisting immune system dysfunction following CNR1 or CNR2 loss could alter the disease trajectory and lead to a misinterpretation of the findings. Some of these potential confounders could be addressed using crossover approaches in Figure 1A experimental design, but the donor tissues are reported to be matched to the recipients based on genotype.

      The reviewer raised an excellent point that the widespread expression of the ECS components studied in our manuscript may affect multiple aspects of endometriosis pathogenesis and symptomatology. Indeed, the cell or tissue-specific effects of CNR1 and CNR2 knockout are not fully incorporated into our experimental design, which could lead to potential confounding factors that may affect the interpretation of some of our findings. However, as outlined in our previous comments, we will incorporate the tissue/cell specific knockout, as well the crossover approaches to elucidate if the loss of CNR1 and CNR2 function is lesion driven in future studies. We agree that it is important to understand the impact of global CNR1 and CNR2 knockout on normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, and other potential preexisting alterations in the host or donor tissues that could influence lesion development in the syngeneic mouse model of endometriosis. As outlined in the MS (lines 59-62), there are studies highlighting pregnancy specific impact including implantation and impaired primary decidual zone formation. We did not find any baseline alterations in the systemic immune profiles between the CNR1 and CNR2 knockout mice and the WT mice without EM induction. However, the uterine environment has not been assessed to understand the baseline immune profile between the knockout mice and WT mice. We agree with the reviewer that, the possibility of preexisting immune system dysfunction following CNR1 or CNR2 loss could alter the disease trajectory related to immune system dysfunction or lesion microenvironment. We have highlighted this in the limitations section.

      (2) The phenotypic characterization of the endometriosis mouse model with or without CNR1 or CNR2 KO is very limited. To better understand how the observed cellular and molecular alterations correlate with endometriosis pathogenesis and severity CNR1 and CNR2 K/O mice, a detailed characterization of lesion size differences and histopathology should be made. Importantly, the histopathological characterization of the lesions would complement the imaging mass spectrometry findings.

      We agree that more detailed characterization of the endometriosis lesions in our CNR1 and CNR2 knockout mouse models are required. As evident for our several previous publications, we have focused on detailed histopathological characterization of endometriotic lesions in our syngeneic mouse model of endometriosis including a multiple time course study (Symons et al, 2020, FASEB). In the present investigation, we focused on cataloging spatial and transcriptomic changes as we do not currently have any information on the global influence of CNR1 and CNR2 knockout on endometriosis lesion microenvironment, since we prioritized this aspect, we were not able to provide detailed histological assessment of lesions. However, the IMC analysis provides a detailed, spatially resolved profile of the cellular composition and interactions within the endometriotic lesions, which we believe offers valuable insights into the mechanisms by which the dysregulated ECS may contribute to endometriosis pathogenesis. This quantitative, high-dimensional approach complements the transcriptional profiling and other analyses we have performed.

      (3) Given the effect sizes and variance observed with the ECS ligand measurements, an N = 4-5 biological samples for mouse phenotypic studies seems too low.

      The reviewer raises a valid point about low sample size. As elaborated earlier, this was a proof of principle study to capture biologically significant alterations within lesion and surrounding peritoneal microenvironment in the absence of CNR1, CNR2 receptors. This information is crucial for establishing the potential mechanisms by which the dysregulated ECS may contribute to the pathogenesis of endometriosis. Now that we have established the framework and baseline understanding of immune-inflammatory alterations, we will refine our future experimental approaches and include more samples if becomes necessary.

      Reviewer #2 (Recommendations For The Authors):

      It is hard to read the labeling of figures. Please increase the font size of each figure.

      We have increased the font size of the labels where necessary to improve the readability.

      Supplementary Data 1, Table 1 seems like Supplementary Table 1. Please use the same labeling of the Supplementary tables and figures to avoid confusion.

      We have updated the labeling accordingly and ensured that all supplementary tables and figures are consistently labeled.

      This reviewer suggests depositing RNA-seq and IMC data to NCBI etc. and listing the accession number in the MS.

      Thank you for your recommendation to deposit the RNA-seq and imaging mass cytometry (IMC) data from our study in public repositories such as NCBI. We appreciate your suggestion, as data sharing is an important aspect of scientific transparency and reproducibility. Bulk mRNA sequencing data has been attached as a supplementary file and IMC data has been deposited on Mendeley Data (DOI: 10.17632/2ptns5yhzh.1).

      Please clarify L363.

      We have clarified this in the revised MS. The revised text now reads: “However, we did not find the same differences (T cell-related genes) in the UnD lesions of CNR2 k/o mice. Moreover, UnD lesions of CNR2 k/o mice showed significantly low number of DEGs (11 compared to 65 in the DD lesions from CNR2 k/o mice) suggesting a decidualization dependent response (Supplementary Data 3).”

      Figure 7B: It is hard to see/understand the results in L438-440. It might be helpful if % is added to the figure.

      We have added more tick marks to the y-axis of Figure 7B to make it easier for the reader to interpret the percentages of the different cell types.

      Figure 7 legend: 2nd D should be G.

      We have revised the legend accordingly.

      Supplementary Figure 6: It seems immune cells are clustered in CN1, which is different from Figure 7. To easily understand Suppl Fig 6AB, please add some details in the legend.

      We have revised the legend as suggested.

      The revised legend now reads: “A, B Representative image of 8 distinct cell types from CN analysis of DD and UnD lesions from WT, CNR1 k/o, and CNR2 k/o mice, respectively. C Heatmap representation of CN analysis shows distinct clustering patterns observed in the UnD lesions among the different genotypes. The clustering reveals distinct spatial patterns of immune cell populations within the UnD lesions, which appear to differ from the observations in Figure 7G. This suggests potential spatial heterogeneity in the immune landscape of EM like lesions under conditions of decidualization.”

    1. eLife assessment

      This valuable study details an aspect of plant immunity where ATG6 was not previously known to have a role. The results suggest a direct relationship between ATG6 and NPR1, a well-studied salicylic acid receptor protein, which could be of interest to researchers studying the regulation of plant immunity. While the data presented are compelling, there are concerns about the interpretation of results, particularly regarding discrepancies in fluorescence and protein blot data. Addressing these issues would improve the overall impact of the work and consistency with prior studies.

    2. Reviewer #1 (Public Review):

      The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.

      The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      The authors have addressed most of my previous concerns.

    3. Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on revised version:

      The authors demonstrate the correlation between overexertion of atg6 and higher stability and activity of npr1. They claim a novel activity of atg6 in the nucleus.<br /> Overall, the experimental scope of the study is solid, however, the over-interpretation of the results substantially reduces the significance and value of this study for the target plant immunity readership.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study reports on a previously unrecognized function of ATG6 in plant immunity. The work is valuable because it proposes a direct interaction between ATG6 and a well-studied salicylic acid receptor protein, NPR1, which may interest researchers investigating plant immunity regulation. While the data presented are compelling, more information regarding the specificity of ATG6's role would improve the overall impact of the study, especially with an eye towards consistency with prior work.

      We also genuinely thank the editor and reviewers for the constructive and helpful suggestions and comments. These comments have greatly improved the quality and thoroughness of our manuscript. We have carefully studied these comments and have made the appropriate changes as far as possible. Additionally, some minor errors were also corrected during the revision process. New text is shown in blue in the revised manuscript. Our responses to the reviewer's comments are provided below each respective comment.

      Public Reviews:

      Reviewer #1 (Public Review):<br /> Summary:<br /> The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.<br /> Strengths:<br /> The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      Thank you very much for recognizing our research.

      Weaknesses:<br /> - The authors can do a few additional experiments to test the role of ATG6 in plant immunity.<br /> I recommend the authors to test the interaction between ATGs and other NPR1 homologs (such as NPR2).

      Thanks to your valuable feedback, it was discovered that the Arabidopsis NPRs family comprises six members: NPR1, NPR2, NPR3, NPR4, NPR5/PETIOLE 1 (BOP1), and NPR6/BOP2. NPR3/4 function in tandem as negative regulators to modulate SA signaling and plant immune responses (Ding et al., 2018). Similar to NPR1, NPR2 acts as a positive regulator of SA signaling (Castello et al., 2018). NPR5/BOP1 and NPR6/BOP2 primarily participate in the regulation of plant growth and development (McKim et al., 2008). This study specifically investigates the correlation between ATG6 and NPRs in plant resistance to pathogenic bacteria. Consequently, we experimentally confirmed the interaction between ATG6 and NPR1, NPR3, and NPR4 (Fig. 1 and Fig. S1 in the revised manuscript). It would be intriguing to further explore the interactions between ATG6 and other NPRs in the context of regulating plant growth and development in future research endeavors.

      -The concentration of SA used in the experiment (0.5-1 mM) seems pretty high. Does a lower concentration of SA induce ATG6 accumulation in the nucleus?

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). Consequently, to investigate the function of NPR1, many scientists and research groups typically employ higher concentrations of SA (e.g., 0.5 mM, 1 mM, or even 5 mM) to elucidate its role (Spoel et al., 2009; Fu et al., 2012; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a). In our study, we observed an interaction between ATG6 and NPR1. To enhance the detection of the NPR1 protein, we standardized the SA concentration (Arabidopsis was treated with 0.5 mM SA; Tobacco was treated with 1 mM SA) used in our experiments. Subsequently, we analyzed the nuclear accumulation ATG6 or NPR1 using a relatively high SA concentration (Arabidopsis was treated with 0.5 mM SA; Tobacco was treated with 1 mM SA), consistent with concentrations used in previous studies (Spoel et al., 2009; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a).

      -Does the silencing of ATG6 affect the cell death (or HR) triggered by AvrRPS4?

      Thank you for pointing this out. In this study, we examined changes in Pst DC3000/avrRps4-induced cell death in Col, amiRNAATG6 # 1, amiRNAATG6 # 2, npr1, NPR1-GFP, ATG6-mCherry and ATG6-mCherry × NPR1-GFP plants. The results of Taipan blue staining showed that Pst DC3000/avrRps4-induced cell death in npr1, amiRNAATG6 # 1 and amiRNAATG6 # 2 was significantly higher compared to Col (Fig. S15 in the revised manuscript). Conversely, Pst DC3000/avrRps4-induced cell death in ATG6-mCherry, NPR1-GFP and ATG6-mCherry × NPR1-GFP was significantly lower compared to Col. Notably, Pst DC3000/avrRps4-induced cell death in ATG6-mCherry × NPR1-GFP was significantly lower compared ATG6-mCherry and NPR1-GFP (Fig. S15 in the revised manuscript). These results suggest that ATG6 and NPR1 cooperatively inhibit Pst DC3000/avrRps4-induced cell dead. The relevant description can be found in lines 394-404 of the revised manuscript.

      -SA and NPR1 are also required for immunity and are activated by other NLRs (such as RPS2 and RPM1). Is ATG6 also involved in immunity activated by these NLRs?

      Thank you for your valuable comments. The most notable event in the NLR-mediated ETI immune response is the induction of hypersensitive response-programmed cell death (HR-PCD) (Jones and Dangl, 2006; Yuan et al., 2021). SA plays a dual role in the ETI response. On one hand, the accumulation of SA during the R gene-mediated ETI defense response is directly linked to the onset of HR-PCD (Nawrath and Metraux, 1999). SA and NPR1 can enhance the ETI response by regulating the expression of downstream target genes (Falk et al., 1999; Feys et al., 2001; Ding et al., 2018; Liu et al., 2020). On the other hand, the activation of SA signaling can have a negative regulatory effect on HR-PCD during the ETI response. High levels of SA have been shown to significantly inhibit HR-PCD triggered by the avrRpt2 effector (Rate and Greenberg, 2001; Devadas and Raina, 2002; Jurkowski et al., 2004). Rate et al. discovered that the inhibition of HR-PCD by SA relies on NPR1 (Rate and Greenberg, 2001).

      Arabidopsis AtATG6 or its homologs in other species (such as NbBECLIN1, TaATG6s, etc.) have been identified as positive regulators in plant immunity, playing a crucial role in inhibiting cell death and preventing invasion by pathogenic microorganisms (Liu et al., 2005; Patel and Dinesh-Kumar, 2008; Yue et al., 2015). Patel et al. demonstrated that, akin to autophagy-deficient mutants previously documented, AtATG6 antisense (AtATG6-AS) plants treated with Pst DC3000/avrRpm1 exhibited diffuse cell death, indicating the necessity of ATG6 in restricting cell death (Patel and Dinesh-Kumar, 2008). In tobacco, deficiencies in BECLIN 1 result in the onset of diffuse HR-PCD, underscoring the essential role of BECLIN 1 in limiting HR-PCD (Liu et al., 2005). Despite the genetic evidence supporting the critical function of ATG6 in plant immunity, the precise molecular mechanisms through which ATG6 impedes the invasion of pathogenic microorganisms remain elusive.

      In our study, we uncovered that ATG6 interacts with NPR1 to hinder pathogen invasion and inhibit the initiation of cell death. In animals, members of the NLR family have been observed to interact with the autophagy-related protein LC3 to inhibit the survival of pathogen (Zhang et al., 2019). Similar mechanisms may exist in plants. However, it remains to be explored whether NLR directly induces the activation of ATG6 through interaction or the relationship between NPR1-ATG6 interactions and NLR-mediated plant immunity, necessitating further investigation.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      However, the overall conclusions of the study are not well supported experimentally. The significance of the findings is low because of their mostly correlational nature, and lack of consistency with earlier reports on the same protein.

      Thank you for your valuable and constructive suggestions. In this article, we unveil a novel relationship in which ATG6 positively regulates NPR1 in plant immunity (Fig. 8 in the revised manuscript). ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and formation of SINCs-like condensates. This may be of interest to researchers studying the regulation of plant immunity. While there may be minor flaws in our current study, the significance of these findings cannot be overstated, as they have the potential to redirect scientific attention towards uncovering novel functions for autophagy genes.

      Based on the integrity and quality of the data as well as the depth of analysis, it is not yet clear if ATG6 is a specific regulator of NPR1 or if it is affecting NPR1's stability indirectly, through inducing an elevation of SA levels in plants. As such, the current study demonstrates a correlation between overexpression of ATG6, SA accumulation, and NPR1 stability, however, whether and how these components work together is not yet demonstrated.

      Thanks to your valuable feedback. Although as the reviewer said there may be some flaws in our data from the current results, scientific research is an ongoing process and I am confident that future studies will be even better. From the results given to us at the moment at least this study reports a previously undiscovered function of ATG6 in plant immunity. We propose a direct interaction between ATG6 and a well-studied salicylic acid receptor protein, NPR1. We unveil a novel relationship in which ATG6 positively regulates NPR1 in plant immunity (Fig. 8 in the revised manuscript). ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and formation of SINCs-like condensates. This may be of interest to researchers studying the regulation of plant immunity.

      Based on the provided biochemical data, it is not yet clear if the ATG6 functions specifically through NPR1 or through its paralogs NPR3 and NPR4, which are negative regulators of immunity. It is quite possible that interaction with NPR1 (or any NPR) is not the major regulatory step in the activity of ATG6 in plant immunity. The effect of ATG6 on NPR1 could well be indirect, through a change in the SA level and redox environment of the cell during the immune response. Both SA level and redox state of the cell were reported to induce accumulation of NPR1 in the nucleus and increase in stability.

      Thanks to your valuable feedback. In this study, we validated the interaction between ATG6 and NPR1 through various approaches and identified the key regions mediating their interaction. Our findings indicate that ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and the formation of SINC-like condensates. These results clearly demonstrate the involvement of ATG6 in the regulation of NPR1.Furthermore, we also found that ATG6 interacts with NPR3/4 (Fig. S1 in the revised manuscript). This is particularly relevant given that NPR3 and NPR4 have been shown to act as adaptors for the ubiquitin E3 ligase Cullin 3 (CUL3) to regulate the degradation of NPR1. Therefore, whether ATG6 regulates NPR1 through its interactions with NPR3/4 is an intriguing question worth exploring in future studies. We appreciate the reviewer's concerns and are committed to addressing them in our future research to further elucidate the complex regulatory mechanisms involving ATG6, NPR1, and other key players in plant immunity.

      Another major issue is the poor quality of the subcellular analyses. In contradiction to previous studies, ATG6 in this study is not localized to autophagosome puncta, which suggests that the soluble localization pattern presented here does not reflect the true localization of ATG6. Even if the authors propose a novel, non-canonical nuclear localization for ATG6, they still should have detected the canonical autophagy-like localization of this protein.

      Thanks to your valuable feedback. We conducted predictions at NLS Mapper (https://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi) and identified two bipartite NLSs in ATG6, with the sequences "MRKEEIPDKSRTIPIDPNLPKWVCQNCHHS" and "DPNLPKWVCQNCHHS LTIVGVDSYAGKFFNDP". To further elucidate the nuclear localization of ATG6, we introduced Agrobacterium tumefaciens carrying ATG6-GFP into nls-mCherry tobacco leaves through transient transformation. Subsequently, we observed the localization of ATG6-GFP, along with the canonical autophagy-like patterns. Our findings revealed fluorescence signals of ATG6-GFP in both the cytoplasm and nuclei (Figure 2b). The nuclear-localized ATG6-GFP overlapping with the nuclear-localized marker, nls-mCherry (indicated by white arrows). Additionally, we observed punctate patterns indicative of canonical autophagy-like localization of ATG6-GFP fluorescence signals (indicated by red circles). Based on these results, we are more confident about the authenticity of ATG6's nuclear localization. The revised manuscript includes clearer images to support our observations.

      Recommendations for the Authors:

      Reviewer #2 (Recommendations For The Authors):

      The duration and concentration of SA treatments are quite variable between experiments which makes comparisons difficult.

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). Consequently, to investigate the function of NPR1, many scientists and research groups typically employ higher concentrations of SA (e.g., 0.5 mM, 1 mM, or even 5 mM) to elucidate its role (Spoel et al., 2009; Fu et al., 2012; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a). In our study, we observed an interaction between ATG6 and NPR1. To enhance the detection of the NPR1 protein, we standardized the SA concentration used in our experiments. In this study, for the treatment of Arabidopsis, we followed the protocols outlined in Saleh et al. and Spoel et al., utilizing 0.5 mM SA (Spoel et al., 2009; Saleh et al., 2015). For tobacco treatment, we adopted the methodology described in the study by Zavaliev et al., administering 1 mM SA (Zavaliev et al., 2020).

      The methods section does not explain some of the essential experimental conditions and reagents used in the study.

      Thank you for pointing this out. Due to word limitations we have placed the detailed experimental methods and reagents in Supplemental Data 1. In Supplemental Data 1, we provide a comprehensive overview of the experimental flow and conditions employed in our study.

      Lines 62-63: the C-terminal domain of all NPRs has a name (already defined as SA-binding domain (SBD)). Also, it would be worth referring to the structure of NPR1 (Kumar et al 2022, Nat) as the source of information about its domains.

      Thank you for pointing this out, we have changed this description in the revised manuscript (lines 62-63).

      Lines 66-69: NPR1 doesn't form monomers. A recent study showed that the basic functional unit of NPR1 is a dimer (Kumar et al 2022, Nat).

      Thank you for pointing this out. In the revised manuscript (line 67) " monomers " has been changed to “dimer”.

      Lines 89-95 and elsewhere: the term "invasion" has a very specific meaning and it doesn't necessarily refer to disease. A pathogen can invade the plant but cause no disease (e.g. ETI). Most plant genetic immune mechanisms act after pathogen invasion, not before it. Those cited works reported the disease resistance, not the invasion resistance.

      Thank you for pointing this out. We've changed the incorrect description in the revised manuscript (line 91).

      Lines 113-119: the truncation at the aa328 includes half of the ANK domain (repeats 1 and 2), not just BTB. The C-terminal truncation variant contains the other half (repeats 3 and 4) of the ANK domain, not the entire ANK domain. It also contains the SBD, not just the NLS. So, this kind of analysis cannot determine the role of ANK domain in the interaction, nor it can conclusively determine if the interaction is through SBD. The interaction should be tested with the SBD domain only in order to make this conclusion.

      Thank you for pointing this out, we have removed the inappropriate description and made the appropriate changes in the revised manuscript (lines 114 and 115).

      In Figure S1, the equally strong interaction of atg6 is found for NPR3/NPR4. Does that mean that atg6 functions also through these other NPRs? What's the significance of these data compared to NPR1-ATG6 interaction? This is especially important, because both NPR3 and NPR4 are predominantly nuclear proteins, and they are unlikely to significantly overlap with autophagy components in the cytoplasm.

      NPR1 and its paralogues NPR3/NPR4, which frequently interact with other proteins to regulate plant immune responses (Backer et al., 2019; Chen et al., 2019). To identify ATGs that interact with NPRs, we performed yeast two-hybrid (Y2H) screens using NPRs as bait. Interestingly, ATG6 interacted with NPR1, NPR3 and NPR4, respectively, and different concentrations of SA treatment did not significantly affect their interaction (Fig. S1a). NPR1 is an important positive regulator of the plant immune response (Chen et al., 2021b). In Arabidopsis and N. benthamian, ATG6 or its homologues was reported to act as a positive regulator to enhance plant disease resistance to P. syringae pv. tomato (Pst) DC3000 and Pst DC3000/avrRpm1 bacteria (Patel and Dinesh-Kumar, 2008), N. benthamiana mosaic virus (TMV) (Liu et al., 2005). Therefore, in this study we focused on investigating the biological significance of the interaction between ATG6 and NPR1. Whether the interaction between ATG6 and NPR3/4 also has an effect on plant immunity is a question that remains to be explored in future studies.

      In Figure 1c and elsewhere: why not use the anti-mCherry antibody to detect atg6-mcherry? Are we seeing the correct protein band of atg6-mcherry? Also, it is not clear what antibodies they used throughout the study: the sources and specificities of antibodies are not provided.

      Thank you for pointing this out. We initially synthesized the ATG6 antibody (anti-ATG6, 1:200, peptide, C-KEKKKIEEEERK, Abmart) in order to detect the endogenous ATG6 protein, and we also tested the specificity and potency of the ATG6 antibody (results are shown in Fig. S17). Additionally, in order to determine the location of the ATG6-mCherry bands, we also detected ATG6-mCherry in ATG6-mCherry Arabidopsis using the ATG6 antibody, and we also used Col as a control (results are shown in Fig. S4). These results show that our synthesized ATG6 antibody can effectively and clearly immunize to both ATG6 and ATG6-mCherry. Therefore, in this study, we used the ATG6 antibody to analyze both ATG6-mCherry and endogenous ATG6. Detailed antibody information is presented in Supplementary Data 1, table S4

      In Figures 1d, 2a, and 2b, the subcellular localization pattern of atg6 contradicts what was published before (Fujiki et al 2007, Plant Phys; Liu et al 2018, FPlS; Xu et al 2017, Autophagy; Li et al 2018, Nat. Comm.). As an autophagy protein, atg6 was shown to localize to cytoplasmic puncta (autophagosomes), like atg8. No nuclear localization was found in those studies. The lack of puncta and the strong nuclear accumulation are signs that the localization of atg6 reported here has to be interpreted with caution. With the data provided, I am not convinced yet that we are looking at the correct ATG6 subcellular localization. Even if the authors propose a novel, non-canonical localization for atg6, they still should have detected the canonical autophagy-like localization of this protein.

      Thanks to your valuable feedback. To further elucidate the nuclear localization of ATG6, we introduced Agrobacterium tumefaciens carrying ATG6-GFP into nls-mCherry tobacco leaves through transient transformation. Subsequently, we observed the localization of ATG6-GFP, along with the canonical autophagy-like patterns. Our findings revealed fluorescence signals of ATG6-GFP in both the cytoplasm and nuclei (Figure 2b). The nuclear-localized ATG6-GFP overlapping with the nuclear-localized marker, nls-mCherry (indicated by white arrows). Additionally, we observed punctate patterns indicative of canonical autophagy-like localization of ATG6-GFP fluorescence signals (indicated by red circles). Based on these results, we are more confident about the authenticity of ATG6's nuclear localization. The revised manuscript includes clearer images to support our observations.

      It would make more sense to include the BiFC data (fig. S2) in the main figure, instead of the co-localization (fig. 1d) which cannot serve as evidence for interaction.

      Thank you for the feedback. We accept your suggestion. In Fig.1, we have replaced the co-localization image with a BiFC (Bimolecular Fluorescence Complementation) image to better illustrate the interaction.

      In Figure S2, the bifc signals have to be quantified to qualify as evidence for interaction. also, a subcellular marker has to be used (e.g. nuclear mcherry). From the current poor-quality images, one cannot determine where in the cell the presumed interaction takes place, nucleus or cytoplasm, or both. Also, no puncta are seen in these images.

      Thank you for pointing this out. Despite the lack of clarity in the images we provided, our BiFC results unequivocally demonstrate the interaction between ATG6 and NPR1 in both the cytoplasm and nucleus. Notably, as the reviewer pointed out, punctate signals were not observed in our images. This lack of punctate signals is consistent with previous studies (Figure 2) that have also shown BiFC results between autophagy-associated proteins ATG8s and their interacting partners. For instance, Fig 1G (Marshall et al. 2019, Cell), Fig 2F (Marshall et al. 2019, Cell), Fig 4B (Macharia et al. 2019, BMC Plant Biology), and Fig 3 (Zhou et al. 2018, Autophagy) all did not exhibit punctate signals, aligning closely with our findings.

      In Figure S3a, the nuclear localization is shown for stomata. It is known that stomata are especially strong expressors of the transgenes, and localization there could be an artefact of overaccumulation of the fusion protein. Also, why do they present the localization of atg6-gfp, if the analysis and the cross were made with atg6-mcherry?

      Thank you for pointing this out. In our previous experiments, we observed the localization of ATG6 in the nucleus of Arabidopsis thaliana plants overexpressing ATG6-GFP (Fig. S3a). To clearly visualize the location of the nucleus, we used the cytosolic DAPI dye, which readily stained the nuclei of the stomatal guard cells. This allowed us to easily identify the nuclear regions for our observations. Additionally, in Fig. 2a and Fig.S3b, we detected the fluorescence signal of ATG6-mCherry within the nucleus, further confirming the nuclear localization of ATG6. Moreover, the nuclear and cytoplasmic fractions were separated. Under SA treatment, ATG6-mCherry and ATG6-GFP were detected in the cytoplasmic and nuclear fractions in N. benthamiana (Fig. 2c and d). Similarly, ATG6 was also detected in the nuclear fraction of UBQ10::ATG6-GFP and UBQ10::ATG6-mCherry overexpressing plants (Fig. 2e and f).

      In Figure S3b, the images are low resolution and of poor quality. Why atg6-mcherry is expressed in a single cell if these are transgenic plants? The nuclear co-localization with npr1-gfp has to be shown more clearly with high res. images and also be quantified, because the expression of atg6-mcherry is not as uniform as npr1-gfp.

      Thank you for pointing this out. Contrary to the reviewer's assertion, the ATG6-mCherry fluorescence signal depicted in Figure S3b was not exclusive to a single cell. In fact, this fluorescence was also evident in other cells, albeit with relatively weaker intensity. This disparity in fluorescence intensity may be attributed to the irregularities in leaf structure at the time of image capture using the microscope. To bolster our conclusion, we further examined the fluorescence signals in the cells of the root elongation zone in ATG6-mCherry x NPR1-GFP, as depicted in the figure below. Our observations revealed that the fluorescence signals of ATG6-mCherry exhibited uniform distribution, with detection in both the cytoplasm and nucleus. We have replaced the original unclear image with a high-quality image.

      Lines 138-143: In fig. S3d, it would make more sense to show the WB on the hybrid npr1-gfp/atg6-mcherry plants with both anti-gfp and anti-mcherry antibodies to detect the free mcherry/gfp. Since the analysis of the level of free FP is done, then why didn't they test the free mcherry levels in Figure S4a? This would be more important than testing the free GFP in ATG6-GFP plants, because the imaging of atg6-mcherry was done in the hybrid plants (fig. S3b).

      Thank you for pointing this out. We initially synthesized the ATG6 antibody (anti-ATG6, 1:200, peptide, C-KEKKKIEEEERK, Abmart) in order to detect the endogenous ATG6 protein, and we also tested the specificity and potency of the ATG6 antibody (results are shown in Fig. S17). Additionally, in order to determine the location of the ATG6-mCherry bands, we also detected ATG6-mCherry in ATG6-mCherry Arabidopsis using the ATG6 antibody, and we also used Col as a control (results are shown in Fig. S4). These results show that our synthesized ATG6 antibody can effectively and clearly immunize to both ATG6 and ATG6-mCherry. Therefore, in this study, we used the ATG6 antibody to analyze both ATG6-mCherry and endogenous ATG6. Detailed antibody information is presented in Supplementary Data 1, table S4. In the previous experiments, we procured the mCherry antibody (mCherry-Tag Monoclonal Antibody(6B3), BD-PM2113, China) to immunolabel ATG6-mCherry. However, we encountered challenges with the potency of this mCherry antibody, and considering our budget constraints, as well as the availability of our self-synthesized ATG6 antibody, we chose not to pursue the purchase of another antibody from a different company for the continuation of the Western Blot experiment.

      In Figure 2c, there's no atg6-mcherry detected at time 0, in either cytoplasm or nucleus, yet the microscope images in panel a show strong accumulation in both compartments.

      Thank you for pointing this out. Previous studies ATG6 can also be degraded via the 26s proteasome pathway (Qi et al., 2017). We speculate that this phenomenon might be attributed to the rapid turnover of ATG6 at time 0.

      Lines 156-160: this statement is unsupported by the data. In fig. S5, the bands for native atg6 in the nuclear fraction are extremely weak, and they do not show the reverse pattern of change along the time points compared to the cytoplasmic fraction, which would indicate that the nuclear fraction is complementary to the cytoplasmic pool of the protein. The result more likely suggests that the majority of the ATG6 is in the cytoplasm, and that the weak bands detected in the nucleus are either background signal, or a contamination from the cytoplasmic pool. At this low protein level or poor immuno-detection the background signal is inevitable due to overexposure. Even though the actin marker is not detected in the nuclear fraction, it doesn't necessarily mean that there's no contamination from the cytoplasm in the nuclear fraction. The actin is just too abundant and can be detected at lower exposure.

      Thank you for pointing this out. In Fig. S5, we detected the subcellular localization of endogenous ATG6, although the image quality was somewhat low. Nevertheless, the cytosolic and nuclear localization of ATG6 could be clearly observed. In addition to this, we also verified the cytosolic and nuclear localization of ATG6 in Arabidopsis using confocal fluorescence microscopy and nucleoplasmic separation experiments. Actin and H3 were used as cytoplasmic and nucleus internal reference, respectively. (Fig. 2e and f). Furthermore, we observed the cytosolic and nuclear localization of ATG6 when we expressed ATG6-GFP or ATG6-mCherry in tobacco leaves through cis-transfection experiments (Fig. 2a-d). These results are consistent with the prediction of the subcellular location of ATG6 in the Arabidopsis subcellular database (https://suba.live/) (Fig. S3c). The reviewer's feedback has been valuable in helping us present these findings more clearly. We acknowledge the limitations in the image quality for the endogenous ATG6 localization, but we believe the combination of multiple experimental approaches, including the use of fluorescent protein fusions, provides robust evidence for the cytosolic localization of ATG6 in plant cells. Moving forward, we will continue to investigate the significance of ATG6's subcellular distribution and its potential dual roles in both the nucleus and the cytosol, particularly in the context of its interaction with the key immune regulator NPR1. We appreciate the reviewer's constructive comments, as they will help us strengthen the presentation and interpretation of our findings.

      In Figure 3a the images are of too low resolution to see the co-localization. The focal planes of the top and bottom panels are quite different: the top is focused on stomata, the bottom - on pavement cells. So, the number of the NPR1-GFP nuclei between these two focal planes is dramatically different. Also, it looks like the atg6-mcherry in these plants are predominantly in the cytoplasm, not the nucleus as the authors claim. A higher resolution and higher quality of images are required to determine this.

      Thank you for pointing this out. To ensure the clarity and accuracy of our confocal images, we have supplied a clearer image as supplementary evidence. The Bright images distinctly show that both sets of images are in the same plane of focus. Furthermore, in the figure (third one in the fourth column), the nucleus localization of ATG6-mCherry is clearly visible, and that ATG6-mCherry is co-localized with NPR1-GFP in the nucleus, as indicated by the white arrow.

      In Figure 3b, it is not indicated what exactly was measured and in what condition, mock or SA. If these are numbers of nuclei, then it should be indicated what size of the area was sampled, not just "section", and both mock and SA should be included in the measurements. Also, how many independent images have been sampled? what does the error bar represent? What does "normal" mean? Shouldn't this be a mock treatment?

      Thank you for pointing out this. The term "Normal" in this context refers to mock treatment, and we have revised the description for clarity. In Figure 3b, the graph illustrates the count of nuclear localizations of NPR1-GFP in ATG6-mCherry × NPR1-GFP and NPR1-GFP Arabidopsis plants following SA treatment. Statistical data were obtained from three independent experiments, each comprising five individual images, resulting in a total of 15 images analyzed for this comparison. Detailed descriptions were also added to the revised manuscript (Lines 568-570, 800-804).

      Lines 167-168: the proposed increase of NPR1-GFP in the nucleus could be simply due to a higher accumulation of SA in the hybrid plants, not because of the direct interaction of atg6.

      Thank you for pointing out this. Our results confirmed that ATG6 overexpression significantly increased nuclear accumulation of NPR1 (Fig. 3). Notably, the ratio (nucleus NPR1/total NPR1) in ATG6-mCherry × NPR1-GFP was not significantly different from that in NPR1-GFP, and there is a similar phenomenon in N. benthamiana (Fig. 3c-f). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 might result from higher levels and more stable NPR1, rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. Furthermore, we found that under SA treatment, the protein levels of NPR1 were significantly higher in the ATG6-mCherry × NPR1-GFP line compared to the NPR1-GFP line (Fig. 5a). Notably, even in the absence of differences in SA levels between the two lines, we observed that ATG6 could delay the degradation of NPR1 under normal conditions (Fig. 6). These findings suggest that ATG6 employs both SA-dependent and SA-independent mechanisms to maintain the stability of the key immune regulator NPR1. In summary, we therefore suggest that the increased nuclear accumulation in NPR1 cells is a dual effect of SA and ATG6.

      Lines 202-204: "Increased nuclear accumulation" implies increased translocation. However, they found that the ratio of NPR1-GFP does not change (Figure 3), so the reason for higher nuclear accumulation is not translocation, but abundance.

      Thank you for pointing out this. Our results confirmed that ATG6 overexpression significantly increased nuclear accumulation of NPR1 (Fig. 3). ATG6 also increases NPR1 protein levels and improves NPR1 stability (Fig. 5 and 6). Therefore, we consider that the increased nuclear accumulation of NPR1 in ATG6-mCherry x NPR1-GFP plants might result from higher levels and more stable NPR1 rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. To verify this possibility, we determined the ratio of NPR1-GFP in the nuclear localization versus total NPR1-GFP. Notably, the ratio (nucleus NPR1/total NPR1) in ATG6-mCherry × NPR1-GFP was not significantly different from that in NPR1-GFP, and there is a similar phenomenon in N. benthamiana (Fig. 3c-f). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 might result from higher levels and more stable NPR1, rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. Further we analyzed whether ATG6 affects NPR1 protein levels and protein stability. Our results show that ATG6 increases NPR1 protein levels under SA treatment and ATG6 maintains the protein stability of NPR1 (Fig. 5 and 6). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 result from higher levels and more stable NPR1. The corresponding description is shown in revised manuscript (lines 338~352).

      Lines 204-205: the co-localization in Figure 1d cannot be interpreted as interaction.

      Thank you for the feedback. We have replaced the co-localization image with a BiFC (Bimolecular Fluorescence Complementation) image to better illustrate the interaction in Fig 1d.

      What age of plants were used for the analysis in Figures 4 and S7? The age of the plant might significantly affect the free SA levels under control conditions.

      Thank you for the feedback. In Figures 4 and S7, 3-week-old plants were used to determine salicylic acid (SA) levels and the expression of target genes. Figures 4 and S7 figure notes provide detailed descriptions (lines 818-819).

      In Figure 5a they treat with SA, but the analysis in Figure S10 is done with the pathogen, so how can these data be correlated?

      Thank you for pointing out this. Previous studies have demonstrated that pathogen infestation rapidly increases the salicylic acid (SA) content in plants, and the elevated SA then activates plant immune responses. Therefore, both pathogen treatment and direct SA treatment can activate SA-dependent plant immune responses. The NPR1 protein is known for its instability. In Figure 5a, we utilized a 0.5 mM SA treatment to assess the changes in NPR1 protein levels, as the impact of SA treatment is more immediate and pronounced.

      Lines 241-242: In Figure 5b, it is not clear why there's no detection of NPR1-GFP and atg6-mcherry at time 0?? The levels of proteins in the transient assay are sufficiently high for detection by WB.

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). In addition, previous studies ATG6 can also be degraded via the 26s proteasome pathway (Qi et al., 2017). We speculate that this phenomenon might be attributed to the rapid turnover of NPR1 and ATG6 at time 0.

      In Figures 5c-d, the quality of these images is very poor, and they do not clearly show the signs. What structure was exactly measured in these images? There are so many fluorescent bodies there, that it is not clear what are we looking at. Also, it is not clear why they did not show the mcherry channel? It would be important to see if the bodies in SA-treated plants show co-localization with atg6-mcherry autophagosomes (if these exist at all).

      Thank you for pointing this out. Interestingly, similar to previous reports (Zavaliev et al., 2020), SA promoted the translocation of NPR1 into the nucleus, but still a significant amount of NPR1 was present in the cytoplasm (Fig. 3c and e). Previous studies have shown that SA increased NPR1 protein levels and facilitated the formation of SINCs in the cytoplasm, which are known to promote cell survival (Zavaliev et al., 2020). We therefore observed the fluorescence signal of SINCs-like condensates in the cytoplasm of tobacco leaves. After 1mM SA treatment, more SINCs-like condensates fluorescence were observed in N. benthamiana co-transformed with ATG6-mCherry + NPR1-GFP compared to mCherry + NPR1-GFP (Fig. 5c-d and Supplemental movie 1-2). We have a clearer demonstration in the supplemental video movie 1-2. Additionally, we observed that SINCs-like condensates signaling partial co-localized with certain ATG6-mCherry autophagosomes fluorescence signals.

      Lines 245-247: so, is it atg6 or SA that increases the NPR1 levels? If this is due to SA, then the whole study doesn't have novelty, because we already know from previous works that SA increases the stability of npr1.

      Thank you for pointing this out. Indeed, previous studies have shown that salicylic acid (SA) increases NPR1 levels and protein stability (Spoel et al., 2009; Saleh et al., 2015). In our experiments, we found that under SA treatment, the protein levels of NPR1 were significantly higher in the ATG6-mCherry × NPR1-GFP line compared to the NPR1-GFP line (Fig. 5a). Additionally, free SA levels were also significantly elevated in the ATG6-mCherry × NPR1-GFP line under pathogen challenge (Pst DC3000/avrRps4), but not under normal conditions (Fig. 4a). Furthermore, even in the absence of differences in SA levels between the two lines, we observed that ATG6 could delay the degradation of NPR1 under normal conditions (Fig. 6). These findings represent one of our new discoveries. These findings suggest that ATG6 employs both SA-dependent and SA-independent mechanisms to maintain the stability of the key immune regulator NPR1.

      Lines 313-316: npr1 and atg6 can function independently from each other, so the term "jointly" is misleading. Based on the overall data provided in this manuscript it cannot be concluded that the two proteins work in one complex to control plant immunity.

      Thank you for pointing this out. In the revised manuscript "jointly" has been changed to “cooperatively”.

      Lines 369-374: this speculation is beyond the main hypothesis claiming that atg6 functions through npr1. If atg6 can activate the transcription alone, then what is the significance of its activation of npr1? How can one distinguish between the two?

      Thank you for pointing this out. Transcription activation by transcription factors typically requires at least two conserved structural domains: a transcription activation domain and a DNA-binding domain. However, ATG6 does not possess these two typical conserved structural domains found in canonical transcription factors. Given this structural context, it is unlikely that ATG6 would be able to directly activate transcription on its own. The lack of the canonical transcription factor domains in ATG6 suggests that it may not be able to function as a direct transcriptional activator. Previous studies have shown that acidic activation domains (AADs) in transcriptional activators (such as Gal4, Gcn4 and VP16) play important roles in activating downstream target genes. Acidic amino acids and hydrophobic residues are the key structural elements of AAD (Pennica et al., 1984; Cress and Triezenberg, 1991; Van Hoy et al., 1993). Chen et al. found that EDS1 contains two ADD domains and confirmed that EDS1 is a transcriptional activator with AAD (Chen et al., 2021a). Here, we also have similar results that ATG6 overexpression significantly enhanced the expression of PR1 and PR5 (Fig. 4b-c and S9), and that the ADD domain containing acidic and hydrophobic amino acids is also found in ATG6 (148-295 AA) (Fig. S14). We speculate that ATG6 might act as a transcriptional coactivator to activate PRs expression synergistically with NPR1.

      Lines 389-400: the cell death due to AvrRPS4 in Col-0 ecotype is extremely weak as there's no complete receptor complex for this effector. So, one has to use a very high dose to induce cell death in Col-0, certainly higher than the one used for bacterial growth. The authors used the same dose in both assays, so it is likely that what we see as "cell death" is not an effector-triggered response, but rather symptom-associated for the virulent pathogen.

      Thank you for pointing this out. Indeed, as the reviewer pointed out, most cell death assays use higher concentrations of Pst DC3000/avrRps4 or Pst DC3000/avrRpt2, but they typically treat Arabidopsis for a relatively short period, usually less than 1 day(Hofius et al., 2009; Zavaliev et al., 2020). In this study, although we used relatively low Pst DC3000/avrRps4 (0.001) injections, we detected cell death under a relatively long period of Pst DC3000/avrRps4 infestation (3 days). Pst DC3000/avrRps4-infested plants multiply significantly in host cells, and therefore we assumed that the propagated pathogens after 3 days of incubation would be sufficient to induce intense cell death. Consequently, we chose this concentration of Pst DC3000/avrRps4 for the experiment.

      Lines 407-416: why do you expect "delay of degradation" with autophagy inhibitor? Shouldn't it be the opposite? In Figure S14, if we compare the bands between 120min and 120min+ConA+WM, the effect of autophagy inhibitors is actually quite strong (0.47 vs 0.22), with about 50% more degradation of NPR1 in their presence. So, the conclusion that the degradation of NPR1 is autophagy-independent is wrong according to this result.

      Thank you for pointing this out. We have revised the inaccurate description, as outlined in the revised manuscript (lines 413-425).

      References

      Backer R, Naidoo S, van den Berg N. 2019. The NONEXPRESSOR OF PATHOGENESIS-RELATED GENES 1 (NPR1) and Related Family: Mechanistic Insights in Plant Disease Resistance. Front Plant Sci 10, 102.

      Castello MJ, Medina-Puche L, Lamilla J, et al. 2018. NPR1 paralogs of Arabidopsis and their role in salicylic acid perception. PLoS One 13, e0209835.

      Chen H, Li M, Qi G, et al. 2021a. Two interacting transcriptional coactivators cooperatively control plant immune responses. Sci Adv 7, eabl7173.

      Chen J, Mohan R, Zhang Y, et al. 2019. NPR1 Promotes Its Own and Target Gene Expression in Plant Defense by Recruiting CDK8. Plant Physiol 181, 289-304.

      Chen J, Zhang J, Kong M, et al. 2021b. More stories to tell: NONEXPRESSOR OF PATHOGENESIS-RELATED GENES1, a salicylic acid receptor. Plant Cell Environ.

      Cress WD, Triezenberg SJ. 1991. Critical structural elements of the VP16 transcriptional activation domain. Science 251, 87-90.

      Devadas SK, Raina R. 2002. Preexisting systemic acquired resistance suppresses hypersensitive response-associated cell death in Arabidopsis hrl1 mutant. Plant Physiol 128, 1234-1244.

      Ding Y, Sun T, Ao K, et al. 2018. Opposite Roles of Salicylic Acid Receptors NPR1 and NPR3/NPR4 in Transcriptional Regulation of Plant Immunity. Cell 173, 1454-1467 e1415.

      Falk A, Feys BJ, Frost LN, et al. 1999. EDS1, an essential component of R gene-mediated disease resistance in Arabidopsis has homology to eukaryotic lipases. Proc Natl Acad Sci U S A 96, 3292-3297.

      Feys BJ, Moisan LJ, Newman MA, et al. 2001. Direct interaction between the Arabidopsis disease resistance signaling proteins, EDS1 and PAD4. EMBO J 20, 5400-5411.

      Fu ZQ, Yan S, Saleh A, et al. 2012. NPR3 and NPR4 are receptors for the immune signal salicylic acid in plants. Nature 486, 228-232.

      Hofius D, Schultz-Larsen T, Joensen J, et al. 2009. Autophagic components contribute to hypersensitive cell death in Arabidopsis. Cell 137, 773-783.

      Jones JD, Dangl JL. 2006. The plant immune system. Nature 444, 323-329.

      Jurkowski GI, Smith RK, Jr., Yu IC, et al. 2004. Arabidopsis DND2, a second cyclic nucleotide-gated ion channel gene for which mutation causes the "defense, no death" phenotype. Mol Plant Microbe Interact 17, 511-520.

      Lee HJ, Park YJ, Seo PJ, et al. 2015. Systemic Immunity Requires SnRK2.8-Mediated Nuclear Import of NPR1 in Arabidopsis. Plant Cell 27, 3425-3438.

      Liu Y, Schiff M, Czymmek K, et al. 2005. Autophagy regulates programmed cell death during the plant innate immune response. Cell 121, 567-577.

      Liu Y, Sun T, Sun Y, et al. 2020. Diverse Roles of the Salicylic Acid Receptors NPR1 and NPR3/NPR4 in Plant Immunity. Plant Cell 32, 4002-4016.

      McKim SM, Stenvik GE, Butenko MA, et al. 2008. The BLADE-ON-PETIOLE genes are essential for abscission zone formation in Arabidopsis. Development 135, 1537-1546.

      Nawrath C, Metraux JP. 1999. Salicylic acid induction-deficient mutants of Arabidopsis express PR-2 and PR-5 and accumulate high levels of camalexin after pathogen inoculation. Plant Cell 11, 1393-1404.

      Patel S, Dinesh-Kumar SP. 2008. Arabidopsis ATG6 is required to limit the pathogen-associated cell death response. Autophagy 4, 20-27.

      Pennica D, Goeddel DV, Hayflick JS, et al. 1984. The amino acid sequence of murine p53 determined from a c-DNA clone. Virology 134, 477-482.

      Qi H, Xia FN, Xie LJ, et al. 2017. TRAF Family Proteins Regulate Autophagy Dynamics by Modulating AUTOPHAGY PROTEIN6 Stability in Arabidopsis. Plant Cell 29, 890-911.

      Rate DN, Greenberg JT. 2001. The Arabidopsis aberrant growth and death2 mutant shows resistance to Pseudomonas syringae and reveals a role for NPR1 in suppressing hypersensitive cell death. Plant J 27, 203-211.

      Saleh A, Withers J, Mohan R, et al. 2015. Posttranslational Modifications of the Master Transcriptional Regulator NPR1 Enable Dynamic but Tight Control of Plant Immune Responses. Cell Host Microbe 18, 169-182.

      Skelly MJ, Furniss JJ, Grey H, et al. 2019. Dynamic ubiquitination determines transcriptional activity of the plant immune coactivator NPR1. Elife 8.

      Spoel SH, Mou Z, Tada Y, et al. 2009. Proteasome-mediated turnover of the transcription coactivator NPR1 plays dual roles in regulating plant immunity. Cell 137, 860-872.

      Van Hoy M, Leuther KK, Kodadek T, et al. 1993. The acidic activation domains of the GCN4 and GAL4 proteins are not alpha helical but form beta sheets. Cell 72, 587-594.

      Yuan M, Ngou BPM, Ding P, et al. 2021. PTI-ETI crosstalk: an integrative view of plant immunity. Curr Opin Plant Biol 62, 102030.

      Yue J, Sun H, Zhang W, et al. 2015. Wheat homologs of yeast ATG6 function in autophagy and are implicated in powdery mildew immunity. BMC Plant Biol 15, 95.

      Zavaliev R, Mohan R, Chen T, et al. 2020. Formation of NPR1 Condensates Promotes Cell Survival during the Plant Immune Response. Cell 182, 1093-1108 e1018.

    1. eLife assessment

      This valuable manuscript systematically addresses the role of intracellular lipid transfer proteins on cellular lipid levels. It provides convincing evidence on the role of ORP9 and ORP11 in sphingolipid metabolism at the Golgi complex. This article will be of broad interest to cell biologists interested in lipid metabolism and membrane biology.

    2. Reviewer #1 (Public Review):

      Summary:

      In this well-designed study, the authors of the manuscript have analyzed the impact of individually silencing 90 lipid transfer proteins on the overall lipid composition of a specific cell type. They confirmed some of the evidence obtained by their own and other research groups in the past, and additionally, they identified an unreported role for ORP9-ORP11 in sphingomyelin production at the trans-Golgi. As they delved into the nature of this effect, the authors discovered that ORP9 and ORP11 form a dimer through a helical region positioned between their PH and ORD domains.

      Strengths:

      This well-designed study presents compelling new evidence regarding the role of lipid transfer proteins in controlling lipid metabolism. The discovery of ORP9 and ORP11's involvement in sphingolipid metabolism invites further investigation into the impact of the membrane environment on sphingomyelin synthase activity.

      Weaknesses:

      There are a couple of weaknesses evident in this manuscript. Firstly, there's a lack of mechanistic understanding regarding the regulatory role of ORP9-11 in sphingomyelin synthase activity. Secondly, the broader role of hetero-dimerization of LTPs at ER-Golgi membrane contact sites is not thoroughly addressed. The emerging theme of LTP dimerization through coiled domains has been reported for proteins such as CERT, OSBP, ORP9, and ORP10. However, the specific ways in which these LTPs hetero and/or homo-dimerize and how this impacts lipid fluxes at ER-Golgi membrane contact sites remain to be fully understood.

      Regardless of the unresolved points mentioned above, this manuscript presents a valuable conceptual advancement in the study of the impact of lipid transfer on overall lipid metabolism. Moreover, it encourages further exploration of the interplay among LTP actions across various cellular organelles.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine which lipid transfer proteins impact the lipids of Golgi apparatus, and they identified a reasonable number of "hits" where the lack of one lipid transfer protein affected a particular Golgi lipid or class of lipids. They then carried out something close to a "proof of concept" for one lipid (sphingomyelin) and two closely related lipid transfer proteins (ORP9/ORP11). They looked into that example in great detail and found a previous unknown relationship between the level of phosphatidylserine in the Golgi (presumably trans-Golgi, trans-Golgi Network) and function of the sphingomyelin synthase enzyme. This was all convincingly done - results support their conclusions - showing that the authors achieved their aims.

      Impact:

      There are likely to be 2 types of impact:

      (I) cell biology: sphoingomyelin synthase, ORP9/11 will be studied in future in more informed ways to understand (a) the role of different Golgi lipids - this work opens that out and produces a to more questions than answers (b) the role of different ORPs: what distinguishes ORP11 from its paralogy ORP10?

      (ii) molecular biochemistry: combining knockdown miniscreen with organelle lipidomics must be time-consuming, but here it is shown to be quite a powerful way to discover new aspects of lipid-based regulation of protein function. This will be useful to others as an example, and if this kind of workflow could be automated, then the possible power of the method could be widely applied.

      Strengths:

      Nicely controlled data;

      Wide-ranging lipidomics dataset with repeats and SDs - all data easily viewed.

      Simple take home message that PS traffic to the TGN by ORP9/11 is required for some aspect of SMS1 function.

      Weaknesses:

      Model and Discussion:

      Despite the authors saying that this has been addressed in their rebuttal, I still struggle to find any ideas about the aspect of SMS1 function that is being affected.

      As I mentioned before, even if no further experiments were carried out the authors could discuss possibilities. one might speculate what the PS is being used for. For example, is it a co-factor for integral membrane proteins, such as flippases? Is it a co-factor for peripheral membrane proteins, such as yet more LTPs? The model could include the work of Peretti et al (2008), which linked Nir2 activity exchanging PI:PA (Yadav et al, 2015) to the eventual function of CERT. Could the PS have a role in removing/reducing DAG produced by CERT?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The authors should possibly discuss more the other cases when LTPs of the same type of ORP9 and ORP10 have been found to dimerise. They should definitely cite and discuss the evidence reported in February this year in CMLS (see https://link.springer.com/article/10.1007/s00018-023-04728-5). In this paper, authors reported very similar findings as those the authors have in Figures 3, 4, S6, S7, and S8. Specifically, in this CMLS paper the authors find that ORP9 and ORP10 (not ORP11) interact through a central helical region and that ORP9 localises ORP10 to the ER-Golgi MCSs by providing ORP10 with a binding site for VAPs, where the heterodimer mediates the exchange of PtdIns(4)P for PtdSer. 

      We thank the reviewer for their recommendations. The mentioned paper has simply gone unnoticed by us and is now referred in the revised manuscript. Various other papers reporting on LTP dimerizations are already cited in our manuscript: ORP9-ORP10 dimerization (Kawasaki et al. 2022), ORP9-ORP11 dimerization (Zhou et al. 2010), and ORP9-ORP10/11 dimerization (Tan and Finkel 2022). Revised manuscript now discusses the dimerization of CERT and OSBP while citing Gehin et al. 2023, Ridgway et al. 1992 and de la Mora et al. 2021.

      Reviewer #2 (Recommendations For The Authors): 

      Model and Discussion: 

      Give an idea about the aspect of SMS1 function that is being affected. Even if no further experiments were carried out, the authors could discuss possibilities. One might speculate what the PS is being used for. For example, is it a co-factor for integral membrane proteins, such as flippases? Is it a co-factor for peripheral membrane proteins, such as yet more LTPs? The model could include the work of Peretti et al (2008), which linked Nir2 activity exchanging PI:PA (Yadav et al, 2015) to the eventual function of CERT. Could the PS have a role in removing/reducing DAG produced by CERT? 

      We thank the reviewer for their recommendations. The same recommendations were also scripted in the public review, which we believe we answered sufficiently. 

      Other, Minor: 

      Make clear that there is no sterol readout (Fig 1C) 

      We would like to point out that Figure 1C has a sterol readout as CE refers to cholesterol esters.

      PH domains of ORP9 and ORP11 localized only partially to the Golgi, unlike the PH domains of OSBP and CERT" (line 154). Say here where the non-Golgi ORP9 and ORP11 PH domain pool is - presumably in the cytoplasm.  

      We thank the reviewer for their suggestion and rephrase the sentence accordingly. 

      Fig 7H-J: histograms not lines as these are separate unlinked categories

      We thank the reviewer for their suggestion. However, we think the original figure represent our findings in the best possible way. Our analysis regarding individual lipid species is also included in Supplementary figure 10.

      Reviewer #3 (Recommendations For The Authors): 

      (1) At the end of the intro, in summarizing their findings, the authors state (p3. lines 48-49) "These findings highlight how phospholipid and sphingolipid gradients along the secretory pathway are linked at ER-Golgi membrane contact sites." This should instead read "These findings highlight THAT phospholipid and sphingolipid gradients along the secretory pathway are linked at ER-Golgi membrane contact sites." 

      We thank the reviewer for their suggestion and change the sentence accordingly.

      (2) As noted in the public section, to show that ORP9/11 do indeed exchange lipids, an in vitro experiment demonstrating that ORP11 can transfer PI4P is essential. Ideally, it would be best to examine PS AND PI4P transfer by ORP9 AND 11 separately AND then by the ORP9/11 heterodimer. This could lend insights as to the function of the heterodimer. The He et al et Yu paper should provide guidelines for this. Why have the heterodimers? 

      We believe we addressed this point by showing the lipid transfer ability of the ORP9-ORP11 dimer. These findings are now part of the revised manuscript.

      (3) It would be interesting to discuss the roles of ORP9/ORP11 versus ORP9/ORP10... they seem so analogous, although this is at the discretion of the authors. 

      We thank the reviewer for their suggestion. Since the difference between ORP9-ORP10 and ORP9-ORP11 dimers was also raised by other reviewers, we decided to include this discussion in the manuscript. A section based on our answer to Reviewer #2 in Public Review is now part of the Discussions.

      (4) The authors used a melanoma cell line in their screens (p3, line 59). Could they explain why they used this cell line versus others? 

      We chose MelJuSo cell for various reasons. Mainly, MelJuSo are diploid, which eases generating knockouts in a screening setup compared to other polyploid cancer cell lines (e.g. HeLa). Furthermore, our CRISPR/Cas9 screening protocols are optimized for these cell lines.

    1. eLife assessment

      This work presents fundamental new insights into the conductivity of freshwater cable bacteria. The evidence supporting the conclusions, which was collected using appropriate techniques, is compelling. The work will be of interest to environmental microbiologists and the microbial electrochemistry community.

    2. Reviewer #2 (Public Review):

      Summary:

      In this work, Mohamed Y. El-Naggar and co-workers present a detailed electronic characterization of cable bacteria from Southern California freshwater sediments. The cable bacteria could be reliably enriched in laboratory incubations, and subsequent TEM characterization and 16S rRNA gene phylogeny demonstrated their belonging to the genus Candidatus Electronema. Atomic force microscopy and two-point probe resistance measurements were then used to map out the characteristics of the conductive nature, followed by microelectrode four-probe measurements to quantify the conductivity.

      Interestingly, the authors observe that some freshwater cable bacteria filaments displayed a higher degree of robustness upon oxygen exposure than what was previously reported for marine cable bacteria. Finally, a single nanofiber conductivity on the order of 0.1 S/cm is calculated, which matches the expected electron current densities linking electrogenic sulphur oxidation to oxygen reduction in sediment and is consistent with hopping transport.

      Strengths and weaknesses:

      A comprehensive study is applied to characterise the conductive properties of the sampled freshwater cable bacteria. Electrostatic force microscopy and conductive atomic force microscopy provide direct evidence of the location of conductive structures. Four-probe microelectrode devices are used to quantify the filament resistance, which presents a significant advantage over commonly used two-probe measurements that include contributions from contact resistances. While the methodology is convincing, I find that some of the conclusions seem to be drawn on very limited sample sizes, which display widely different behaviour. In particular:

      The authors observe that the conductivity of freshwater filaments may be less sensitive to oxygen exposure than previously observed for marine filaments. This is indeed the case for an interdigitated array microelectrode experiment (presented in Figure 5) and for a conductive atomic force microscopy experiment (described in line 391), but the opposite is observed in another experiment (Figure S1). It is therefore difficult to assess the validity of the conclusion until sufficient experimental replications are presented.

      The calculation of a single nanofiber conductivity is based on experiment and calculation with significant uncertainty. E.g. for the number of nanofibres in a single filament that varies depending on the filament size (Frontiers in microbiology, 2018, 9: 3044.), and the measured CB resistance, which does not scale well with inner probe separation (Figure 5). A more rigorous consideration of these uncertainties is required.

      Comments on revised version:

      The authors address all of the comments carefully.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work provides significant insight into freshwater cable bacteria (CB) and is an important contribution to the emerging CB literature. In this manuscript, Yang et al. describe currentvoltage measurements on CB collected from two freshwater sources in Southern California. The studies use electrostatic and conductive atomic force microscopies, as well as four-probe measurements. These measurements are consistent with back-of-the-envelope calculations on conductivities needed to sustain CB function. The data shows that freshwater CB have a similar structure and function to the more studied marine cable bacteria.

      Strengths:

      Excellent measurements on a new class of cable bacteria.

      Weaknesses:

      The paper would benefit from additional analysis of the data.

      Reviewer #1 (Recommendations for The Authors):

      This work provides significant insight into freshwater cable bacteria (CB) and is an important contribution to the emerging CB literature. In this manuscript, Yang et al. describe current-voltage measurements on CB collected from two freshwater sources in Southern California. The studies use electrostatic and conductive atomic force microscopies, as well as four-probe measurements. These measurements are consistent with back-of-the-envelope calculations on conductivities needed to sustain CB function. The data shows that freshwater CB have a similar structure and function to the more studied marine cable bacteria. Minor comments follow.

      We are grateful to the reviewer for the encouraging feedback and for appreciating the central message of the preprint. Below we address the reviewer’s constructive comments.

      Additional information could be provided regarding the degraded cells where an 'empty cage' remains, as well as the polyphosphate granules, which were previously observed in marine CB (refs. 11 and 18). 

      We have edited the manuscript to note that the appearance of empty cages and the polyphosphate granules in freshwater cable bacteria is indeed consistent with these features as previously reported in marine CB. The size of polyphosphate granules in freshwater CB are comparable or slightly smaller than in marine CB (Sulu-Gambari et al., 2015). In the case of empty cages, these cells were previously described as ‘ghost filaments’ which had lost all cell membrane and cytoplasmic material (Cornelissen et al., 2018). 

      Manuscript edits: a sentence regarding polyphosphate granules has been added into the manuscript from lines 307 - 308. “The size of polyphosphate granules in freshwater CB (70 nm – 400 nm) is comparable or slightly smaller than in marine CB (35)”.

      A sentence regarding the empty cages has been added into the manuscript (lines 303-305). “These empty cages were previously described as ‘ghost filaments’ which had lost all cell membrane and cytoplasm material (20).”

      The authors also state that the 'phase difference between the elevated ridges and interridge regions is proportional to the tip voltage squared,' and refer to Fig. 4D. This figure has only three data points with large error bars. The authors may wish to explain this finding and justify their analysis in greater detail.

      We thank the reviewer for pointing out that we presented this result but did not adequately describe its origin or significance. In general, the probe phase response of electrostatic force microscopy (EFM) can originate not only from the electrostatic interaction with the sample (i.e. the electrical properties of interest) but also from shorter range van der Waals forces (which are more reflective of probe-sample distance i.e. topography). To ensure that EFM is reporting electrical interactions, we performed these measurements using a two-pass technique, with the second pass retracing the topography measured during the first pass, but at a fixed height above the surface where the interactions are long range (electrostatic) rather than short range (vdW) or resulting from topography cross-talk. The purpose of the voltage change measurement (Fig. 4D) is to simply assess whether this procedure is successful, since electrostatic forces are proportional to the square of the voltage at a fixed height (F = ½ . ∂C⁄∂z .V2). While the error bar of that measurement is high, due to the intrinsic noise in the dynamic (high frequency) EFM phase response measurement, we note that the purpose of this measurement is simply to assess that the interaction is due to the electrical interaction with the sample, before proceeding to actual conductance measurements (Figs. 5-8).

      Manuscript edits: we previously simply cited a reference where the reader can delve deeper into the origin of the square voltage signal. To put this into better context, we now include an additional information (lines 461 - 475), noting the origin and purpose of the result as described above.  

      It is interesting that the freshwater CB appear to be more resilient to air compared to marine CB (or at least some freshwater filaments, as the authors note that the level of resilience is filament-dependent). The authors indicate that salt affects oxygen solubility and there is a larger oxygen content in freshwater. Do the authors have thoughts on whether or not the differences between marine and freshwater CB could fit, or not fit, with the hypothesis that conductivity in air is lowered due to oxidation of the Ni/S species (ref. 25 in manuscript)? Could the freshwater CB have greater protection against oxidation?

      We thank the reviewer for highlighting this point. Indeed, our manuscript mentions the current hypothesis that conductivity of cable bacteria may be diminished upon oxidation of the Ni/S groups (lines 101 - 105 and 498 - 504). It remains unclear how this idea may lead to variability between marine and freshwater cables. Interestingly, however, a recent comparative bioRxiv preprint (Digel et. al. 2023) noted significant differences in the morphology, number, and crosssectional area of nanofibers between a freshwater and marine CB strain. These differences may lead to a different resiliency against oxidative degradation upon exposure air. Specifically, even though the marine CB strain was characterized by a larger cross-section area per nanofiber, it had significantly fewer nanofibers, leading to 40% smaller total area than its freshwater counterpart. We have edited the manuscript to highlight these possible differences (at least in size) between freshwater and marine cables.

      Manuscript edits (lines 506 – 514) “For example, a recent comparative study (21) hints at significant differences in the morphology, number, and size of nanofibers when comparing a marine CB strain to a freshwater CB strain. Specifically, while the marine CB was characterized by a 50% larger cross-sectional area per nanofiber, the total nanofibers’ area was 40% smaller than the freshwater strain due to a smaller number of nanofibers per CB filament. Given the proposed central role of nanofibers in mediating electron transport along CB, it is possible that such differences may also lead to different degrees of tolerance against oxidative degradation upon exposure to air.”

      Figure 6D shows current-voltage measurements from three representative cables; there is a large variation, most notably between Cable 1 and Cables 2 and 3. Is this variation typical for different cables? Can the authors comment on the range of values observed and how many cables fit into different ranges? Any thoughts on the reasons behind the range?

      Figure 6 B and C (red and blue) are representative of most of the cable conductance measured using the point IV CAFM technique, with the Figure 6 A (green) IV curve being an example of the upper limit, which was less frequently observed. In total we measured ten cables using the point IV CAFM technique. These variations may stem from actual differences in the conductivity of separate CB filaments, the environment of the measurement, or limitations in the conductive AFM measurement techniques. These limitations include a large contact resistance due to the interaction of the small probe with the sample, which may lead to large variability depending on the contact point.  For this reason, we rely on 4-probe measurements (Fig. 8) for quantitative conductive analyses, rather than conductive AFM. It is important to note, however, that the conductive AFM measurements (Fig. 6 and Fig. 7) provide other complementary information including the demonstration of both transverse and longitudinal transport (lines 389-393) in Fig. 6 and the visualizing of the current carrying nanofibers in Fig. 7. 

      Manuscript edits: we have edited the manuscript (lines 413 - 418) to make it clear that the quantitative estimate of conductivity was made only using 4 probe measurements due to the limitations of CAFM or two-probe techniques.

      Can the authors comment on how the number of fibers per CB in their samples compares with the number of fibers in marine CB? Marine CB are known to have pinwheel junctions where the fibers come together before branching out again. This pinwheel design could play a role in the function of the CB or in its survival (see Adv. Biosys. 2020, 4, 2000006). Were pinwheel structures observed in freshwater CB? If so, how do they compare?

      From the previous studies, estimates of the number of fibers in marine CB appeared to vary significantly from 15 or 17 (Pfeffer et. al., 2012) to 58 – 61 (Cornelissen et. al., 2018). In our freshwater CB, we estimated the number of fibers at ~35 per CB (line 423), which is comparable to the count of 34 per freshwater CB recently reported by Digel et al., bioRxiv 2023. We cannot specifically comment on the pinwheel structure as we did not perform the transverse thin section TEM imaging necessary to observe the cell-cell junctions in this particular study.

      On lines 95-96, the authors discuss the fact that marine cable bacteria have a wide variance in their measured conductivities. While one may ask if the larger marine conductivities (near 80 S/cm) are representative, a conductivity of 0.1 S/cm is 2 orders of magnitude lower than this value, which the field generally refers to as a high conductivity. The authors should mention whether or not any of their specimens display the high conductivities seen in select marine cable bacteria specimens.

      It is indeed important to note that the ~80 S/cm figure refers to an upper end previously observed (ref. 22) for marine CB conductivity. In our manuscript (lines 525 - 526), we highlight that the previously observed range (including in that same study) is 10−2-101 S/cm and we were careful to qualify the previously reported upper end with ‘reaching as high as’ (line 97). Note that this places our measurement of 0.1 S/cm within the previously reported range. We have not observed freshwater CB conductivity near the upper end of the previously reported range, and generally propose that these types of measurements are better analyzed in the context of the biological function rather than ‘high vs. low’. Towards that end, the manuscript (lines 527-537) makes the argument that the 10-1 S/cm figure may be sufficient to support the electrical currents mediated by CB in sediments. We have edited the manuscript to highlight that we did not observe single CB nanofiber conductivity near the upper limit previously observed in marine CB (lines 522 525). 

      Reviewer #2 (Public Review):

      Summary:

      In this work, Mohamed Y. El-Naggar and co-workers present a detailed electronic characterization of cable bacteria from Southern California freshwater sediments. The cable bacteria could be reliably enriched in laboratory incubations, and subsequent TEM characterization and 16S rRNA gene phylogeny demonstrated their belonging to the genus Candidatus Electronema. Atomic force microscopy and two-point probe resistance measurements were then used to map out the characteristics of the conductive nature, followed by microelectrode four-probe measurements to quantify the conductivity.

      Interestingly, the authors observe that some freshwater cable bacteria filaments displayed a higher degree of robustness upon oxygen exposure than what was previously reported for marine cable bacteria. Finally, a single nanofiber conductivity on the order of 0.1 S/cm is calculated, which matches the expected electron current densities linking electrogenic sulphur oxidation to oxygen reduction in sediment. This is consistent with hopping transport.

      Strengths and weaknesses:

      A comprehensive study is applied to characterize the conductive properties of the sampled freshwater cable bacteria. Electrostatic force microscopy and conductive atomic force microscopy provide direct evidence of the location of conductive structures. Four-probe microelectrode devices are used to quantify the filament resistance, which presents a significant advantage over commonly used two-probe measurements that include contributions from contact resistances. While the methodology is convincing, I find that some of the conclusions seem to be drawn on very limited sample sizes, which display widely different behavior. In particular:

      The authors observe that the conductivity of freshwater filaments may be less sensitive to oxygen exposure than previously observed for marine filaments. This is indeed the case for an interdigitated array microelectrode experiment (presented in Figure 5) and for a conductive atomic force microscopy experiment (described in line 391), but the opposite is observed in another experiment (Figure S1). It is therefore difficult to assess the validity of the conclusion until sufficient experimental replications are presented.

      We indeed acknowledge both in the abstract (line 23-26) and section 2.2 (lines 374-377) the variable nature of the sensitivity and filament-dependent response to air exposure. Our discussion (lines 498-506) considers the possible reasons for this variability:

      ‘While these observations showed a high degree of variability and therefore require a more detailed investigation, it is interesting to consider the possibility that the oxidative decline (or other damaging processes), thought to be a consequence of oxidation of Ni cofactors involved in electron transport (25), may not affect all sections of the cm long CB filaments simultaneously; under these conditions, IDA measurements, which probe multiple micrometer-scale electrode-crossing CB regions (e.g. 372 crossings in Figure 5 inset) may offer an advantage over techniques addressing entire CBs or specific CB regions. It is also interesting to consider an alternative possibility that the conductive properties of freshwater CB maybe intrinsically more oxygen-resistant than marine CB’.

      To summarize , the manuscript points to the likelihood that the IDA technique used here may offer an advantage for detecting currents under damaging conditions since it interrogates multiple sections simultaneously. Furthermore, in a recent preprint from Digel et al., (2023), the conductivity of the only freshwater strain investigated in that study was among the highest compared to other marine CB strains. Therefore, the freshwater CB being more resistant is one possibility to be investigated based on these observations and results. We therefore present the latter as a possibility in the discussion.

      The calculation of a single nanofiber conductivity is based on experiment and calculation with significant uncertainty. E.g. for the number of nanofibers in a single filament that varies depending on the filament size (Frontiers in microbiology, 2018, 9: 3044.), and the measured CB resistance, which does not scale well with inner probe separation (Figure 5). A more rigorous consideration of these uncertainties is required.

      The reviewer raises an important point. For these calculations, we made sure to determine the representative number of fibers per cable and thickness of the nanofibers (~50 nm) from our own samples. We indeed assessed the possible variability across our different cable filaments and found the fiber numbers varied from 30 – 44 (with 35 used as a representative figure in the paper). For the scaling of resistance with inner probe separation, our 4P results estimated that the CB resistances are 47 MΩ  and 240 MΩ for the 20 µm and 200 µm lengths, respectively, rather than an expected tenfold difference if the cable has a uniform conductivity along the entire filaments. This result suggests nonuniform conductivity in different sections of the CB filament. Since accounting for non-uniform conduction (and variability in fiber morphology/density) is clearly difficult, we were careful to limit our conclusion to an order of magnitude estimate (e.g. lines 522-525). Given the previously reported range of cable bacteria conductivity (10−2101 S/cm), this places our estimate within this range. We have further edited the manuscript to note that our reported single nanofiber conductivity cannot be constrained further than the order of 0.1 S/cm due to our estimates in nanofiber diameter and per cable amount as well as the possibility of nonuniform conductivity along the CB length (lines 522-525).

      Reviewer #2 (Recommendations for The Authors):

      Figure 4A: Please add scale- and color bar.

      Done - new Fig. 4 included with colors bars for topography and phase. The inset of Fig. 4A denotes a 200 nm scale bar (and that scale is now mentioned in the figure caption)

      Figure 5: A time series graph might be more instructive.

      Done - we indeed appreciate this suggestion and find that it improved the clarity of Figure 5. An inset has been included in Figure 5 plotting the resistance R change over time under different conditions. This inset demonstrates that the resistance of the cable on the IDA was slowly decreasing in the N2/H2 anaerobic chamber, only to start increasing upon exposure to ambient air.

      After putting the cable back into the chamber, the resistance again decreased over time.

    1. Author Response:

      We thank the reviewers for their insightful feedback. In our revised version of the manuscript, we will address all points raised.

      Regarding the preprocessing (Reviewer 1), we agree that the StandardRat pipeline is optimal for newly acquired datasets. However, since this study involves reanalyzing an already published dataset (Ionescu et al., JNM, 2023), which was preprocessed, analyzed, and published before the StandardRat paper, we aimed to maintain the same preprocessing. This approach allows for consistent interpretation of the readout regarding functional and molecular connectivity in the context of our previously published findings. Nonetheless, we agree that providing full access to the data will enable other researchers to reproduce our results using the StandardRat preprocessing pipeline and perform additional analyses on this rich dataset. Therefore, we will provide full access to the data via an open repository, as the reviewer suggested.

      Regarding anesthesia, we acknowledge that this is a limitation of our study, as more recent studies have indicated superior protocols. However, we and others have shown that, while not ideal, isoflurane at the used dose maintains stable physiology and does not cause burst suppression in rats. We will amend our discussion to reflect these points.

      Regarding the other points, we will amend the manuscript to provide more detail on the experimental design, including the tracer application as suggested by Reviewer 2, and clarify parts of the analysis that are unclear in the current version. Additionally, we agree with Reviewer 2 that our current terminology may cause confusion, and we will amend it accordingly. We will also discuss the other points raised by the reviewers, such as the reduced sample size for the pharmacological cohort as limitations in our discussion.

      Thank you for your understanding and the opportunity to improve our manuscript.

    2. eLife assessment

      This important paper on measuring molecular connectivity using combined serotonin PET and resting-state fMRI provides both novel methods for studying the brain as well as insights into the effects of ecstasy administration. The methods are solid, with a few doubts that need to be dispelled surrounding the high anaesthetic dose used.

    3. Reviewer #1 (Public Review):

      This paper by Ionescu et al. applies novel brain connectivity measures based on fMRI and serotonin PET both at baseline and following ecstasy use in rats. There are multiple strengths to this manuscript. First, the use of connectivity measures using temporal correlations of 11C-DASB PET, especially when combined with resting state fMRI, is highly novel and powerful. The effects of ecstasy on molecular connectivity of the serotonin network and salience network are also quite intriguing.

      I would like the authors to discuss and justify their use of high-dose (1.3%) isolfurane. A recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. To overcome any doubts about the effects of the high-dose anaesthetic I'd encourage the authors to show the results of their functional connectivity specificity using the same or similar image processing protocol as described in that consensus paper. This is especially true since the fMRI ICs in Figure 2A appear fairly restricted.

      I'd also be interested to read more about why the cerebellum was chosen as a reference region, given that serotonin is highly expressed in the cerebellum, and what effects the choice of reference region has on their quantification.

      The PET ICs appear less bilateral than the fMRI ICs. Is that simply a thresholding artefact or is it a real signal?

      "The data will be made available upon reasonable request" is not sufficient - please deposit the data in an open repository and link to its location.

    4. Reviewer #2 (Public Review):

      Summary:

      The article aims to describe a novel methodology for the study of brain organization, in comparison to fMRI functional connectivity, under rest vs. controlled pharmacological stimulation.

      Strengths:

      Solid study design with pharmacological stimulation applied to assess the biological significance of functional and (novel) molecular connectivity estimates.

      Provides relevant information on the multivariate organization of serotoninergic system in the brain.

      Provides relevant information on the sensitivity of traditional (univariate PET analysis, fMRI functional connectivity) and novel (molecular connectivity) methods in measuring pharmacological effects on brain function.

      Weaknesses:

      While the study protocol is referenced in the paper, it would be useful to at least report whether the study uses bolus, constant infusion, or a combination of the two and the duration of the frames chosen for reconstruction. Minimal details on anesthesia should also be reported, clarifying whether an interaction between the pharmacological agent for anesthesia and MDMA can be expected (whole-brain or in specific regions).

      Some terminology is used in a bit unclear way. E.g. "seed-based" usually refers to seed-to-voxel and not ROI-to-ROI analysis, or e.g. it is a bit confusing to have IC1 called SERT network when in fact all ICs derived from DASB data are SERT networks. Perhaps a different wording could be used (IC1 = SERT xxxxx network; IC2= SERT salience network) .

      The limited sample size for the rats undergoing pharmacological stimulation which might make the study (potentially) not particularly powerful. This could not be a problem if the MDMA effect observed is particularly consistent across rats. Information on inter-individual variability of FC, MC, and BPND could be provided in this regard.

    1. Reviewer #1 (Public Review):

      Little is known about the local circuit mechanisms in the preoptic area (POA) that regulate body temperature. This carefully executed study investigates the role of GABAergic interneurons in the POA that express neurotensin (NTS). The principal finding is that GABA-release from these cells inhibits neighboring neurons, including warm-activated PACAP neurons, thereby promoting hyperthermia, whereas NTS released from these cells has the opposite effect, causing a delayed activation and hypothermia. This is shown through an elegant series of experiments that include slice recordings alongside matched in vivo functional manipulations. The roles of the two neurotransmitters are distinguished using a cell-type-specific knockout of Vgat as well as pharmacology to block GABA and NTS receptors. Overall, this is an excellent study that is noteworthy for revealing local circuit mechanisms in the POA that control body temperature and also for highlighting how amino acid neurotransmitters and neuropeptides released from the same cell can have opposing physiologic effects.

    1. eLife assessment

      Through cellular, developmental, and physiological analysis, this valuable study identifies a gene that functions to regulate the relative growth of roots and shoots under salt stress. The holistic approach taken provides solid evidence that this gene, a member of a larger tandemly duplicated gene family initially highlighted by association mapping, as well as an upstream regulator contribute to salt tolerance. More robust statistical or biological support for some conclusions could further strengthen this manuscript. The manuscript will be of interest to plant biologists studying mechanisms of abiotic stress tolerance and gene family evolution.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Strengths:

      The holistic approach and integrative methodologies presented in the manuscript are essential for gaining a mechanistic understanding of a complex trait such as salt tolerance. The authors focused on At3g50160 but included in their analyses additional DUF247 paralogs, which further contributes to the strength of their approach. In addition, the authors considered the developmental stage (young seedlings, early or late vegetative stages) and growth conditions of the plants (agar plates or soil) when investigating the role of SR3G in salt tolerance and root or shoot development.

      Weaknesses:

      The authors' claims and interpretation of the results are not fully supported by the data and analyses. In several cases, the authors report differences that are not statistically significant (e.g., Figures 4A, 7C, 8B, S14, S16B, S17C), use inappropriate statistical tests (e.g., t-test instead of Dunnett Test/ANOVA as in Figures 10B-C, S19-23), present standard errors that do not seem to be consistent with the post-hoc Tukey HSD Test (e.g., Figures 4, 9B-C, S16B), or lack controls (e.g., Figure 5C-E, staining of the truncated versions with FM4-64 is missing).

      In other cases, traits of root system architecture and expression patterns are inconsistent between different assays despite similar growth conditions (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B), or T-DNA insertion alleles of WRKY75 that are claimed to be loss-of-function show comparable expression of WRKY75 as WT plants. Additionally, several supplemental figures are mislabeled (Figures S6-9), and some figure panels are missing (e.g., Figures S16C and S17E).

      Consequently, the authors' decisions regarding subsequent functional assays, as well as major conclusions about gene function, including SR3G function in root system architecture, involvement in root suberization, and regulation of cellular damage are incomplete.

    3. Reviewer #2 (Public Review):

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity, and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study that demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      The abstract and beginning of the Discussion section highlight the "new tool" developed here for measuring biomass accumulation. I feel that this distracts from the central aims of the study, which is really about the role of a specific gene in root development under salt stress. I would suggest moving the tool description to less prominent parts of the manuscript.

    1. eLife assessment

      This useful study presents a real-time transcriptomics analysis, with the aim of providing rapid access to sequenced data to reduce the costs associated with Oxford Nanopore long-read technology. Although the authors illustrate the compelling utility of this approach with three diverse experimental setups, issues with study design and analysis result in incomplete supporting evidence.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed three case studies: (1) transcriptome profiling of two human cell cultures (HEK293 and HeLa), (2) identification of experimentally enriched transcripts in cell culture (RiboMinus and RiboPlus treatments), and (3) identification of experimentally manipulated genes in yeast strains (gene knockouts or strains transformed with plasmids containing the deleted gene for overexpression). Sequencing was performed using the Oxford Nanopore Technologies (ONT), the only technology that allows for real-time analysis. The real-time transcriptomic analysis was performed using NanopoReaTA, a recent toolbox for comparative transcriptional analyses of Nanopore-seq data, developed by the group (Wierczeiko and Pastore et al. 2023). The authors aimed to show the use of the tool developed by them in data generated by ONT, evidencing the versatility of the tool and the possibility of cost reduction since the sequencing by ONT can be stopped at any time since enough data were collected.

      Strengths:

      Given that Oxford Nanopore Technologies offers real-time sequencing, it is extremely useful to develop tools that allow real-time data analysis in parallel with data generation. The authors demonstrated that this strategy is possible for both human cell lines and yeasts in the case studies presented. It is a useful strategy for the scientific community and it has the potential to be integrated into clinical applications for rapid and cost-effective quality checks in specific experiments such as overexpression of genes.

      Weaknesses:

      In relation to the RNA-Seq analyses, for a proper statistical analysis, a greater number of replicates should have been performed. The experiments were conducted with a minimal number of replicates (2 replicates for case study 1 and 2 and 3 replicates for case study 3).

      Regarding the experimental part, some problems were observed in the conversion to double-stranded and loading for Nanopore-Seq, which were detailed in Supplementary Material 2. This fact is probably reflected in the results where a reduction in the overall sequencing throughput and detected gene number for HEK293 compared to HeLa were observed (data presented in Supplementary Figure 2). It is necessary to use similar quantities of RNA/cDNA since the sequencing occurs in real-time. The authors should have standardized the experimental conditions to proceed with the sequencing and perform the analyses.

    3. Reviewer #2 (Public Review):

      Summary:

      Transcriptomics technologies play important roles in biological studies. Technologies based on second-generation sequencing, such as mRNA-seq, face some serious obstacles, including isoform analysis, due to short read length. Third-generation sequencing technologies perfectly solve these problems by having long reads, but they are much more expensive. The authors presented a useful real-time strategy to minimize the cost of sequencing with Oxford Nanopore Technologies (ONT). The authors performed three sets of experiments to illustrate the utility of the real-time strategy. However, due to the problems in experimental design and analysis, their aims are not completely achieved. If the authors can significantly improve the experiments and analysis, the strategy they proposed will guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way and help studies in both basic research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequenced with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that the sequencing can be stopped once enough data is collected to save cost. Here, they described three sets of experiments: a comparison between two human cell lines, a comparison among RNA preparation procedures, and a comparison between genetically modified yeasts. Their results show that the real-time strategy works for different species and different RNA preparation methods.

      Weaknesses:

      However, especially considering that the computational tool NanopoReaTA is their previous work, the authors should present more helpful guidelines to perform real-time ONT analysis and more advanced analysis methods. There are four major weaknesses:

      (1) For all three sets of experiments, the authors focused on sample clustering and gene-level differential expression analysis (DEA), and only did little analysis on isoform level and even nothing in any figures in the main text. Sample clustering and gene-level DEA can be easily and well done using mRNA-seq at a much cheaper cost. Even for initial data quality checking, mRNA-seq can be first done in Illumina MiSeq/NextSeq which is quick, before deep sequencing in HiSeq/NovaSeq. The real power of third-generation RNA sequencing is the isoform analysis due to the long read length. At least for now, PacBio Iso-seq is very expensive and one cannot analyze the data in real-time. Thus, the authors should focus on the real-time isoform analysis of ONT to show the advantages.

      (2) The sample sizes are too small in all three sets of experiments: only two for sets 1 and 2, and three for set 3. For DEA, three is the minimal number for proper statistics. But a sample size of three always leads to very poor power. Nowadays, a proper transcriptomics study usually has a larger sample size. Besides the power issue, biological samples always contain many outliers due to many reasons. It is crucial to show whether the real-time analysis also works for larger sample sizes, such as 10, i.e., 20 samples in total. Will the performance still hold when the sample number is increasing? What is the maximum sample number for an ONT run? If the samples need to be split into multiple runs, how the real-time analysis will be adjusted? These questions are quite useful for researchers who plan to use ONT.

      (3) According to the manuscript, real-time analysis checks the sequencing data in a few time points, this is usually called sequential analysis or interim analysis in statistics which is usually performed in clinical trials to save cost. Care must be taken while performing these analyses, as repeated checks on the data can inflate the type I error rate. Thus, the authors should develop a sequential analysis procedure for real-time RNA sequencing.

      (4) The experimental set 1 (comparison between two completely different human cell lines) and experimental set 2 (comparison among RNA preparation procedures) are not quite biologically meaningful. If it is possible, it is better for the authors to perform an experiment more similar to a real situation for biological discovery. Then the manuscript can attract more researchers to follow its guidelines.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. eLife assessment

      This study develops a useful metric for quantifying codon usage adaptation - the Codon Adaptation Index of Species (CAIS). This metric permits direct comparisons of the strength of selection at the molecular level across species. The study is based on solid evidence, and the authors identify relationships between CAIS and the presence of disordered protein domains. Other correlations, such as the one between CAIS and body size, are weak and non-significant. In summary, the study introduces an interesting new approach to quantifying codon usage across species, which may be helpful in attempts to measure selection at the molecular level.

    2. Reviewer #2 (Public Review):

      Assessment

      This study develops a potentially useful metric for quantifying codon usage adaptation – the Codon Adaptation Index of Species (CAIS) – that is intended to allow for more direct comparisons of the strength of selection at the molecular level across species by controlling for interspecies variation in amino acid usage and GC content. As evidence to support there claim CAIS better controls for GC content and amino acid usage across species, they note that CAIS has only a weak positive correlation with GC% (that does not stand up to multiple hypothesis testing correction) while CAI has a clear negative correlation with GC%. Using CAIS, they find better adapted species have more disordered protein domains; however, excitement about these findings is dampened due to (1) this result is also observed using the effective number of codons (ENC) and

      (2) concerns over the interpretation of CAIS as a proxy for the effectiveness of selection.

      Public Review

      Summary

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that attempts to control for differences in amino acid usage and GC% across species. Using their new metric, the authors observe a positive relationship between CAIS and the overall “disorderedness” of a species protein domains. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection sNe when mutation bias changes across species.

      Strengths

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance.

      (2) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. A significant improvement over the previous version is the implementation of software tool for applying this method.

      (3) The authors do a better job of putting their results in the context of the underlying theory of CAIS compared to the previous version.

      (4) The paper is generally well-written.

      Weaknesses

      (1) The previously observed correlation between CAIS and body size was due to a bug when calculating phylogenetic independent contrasts. I commend the authors for acknowledging this mistake and updating the manuscript accordingly. I feel that the unobserved correlation between CAIS and body size should remain in the final version of the manuscript. Although it is disappointing that it is not statistically significant, the corrected results are consistent with previous findings (Kessler and Dean 2014).

      (2) I appreciate the authors for providing a more detailed explanation of the theoretical basis model. However, I remain skeptical that shifts in CAIS across species indicates shifts in the strength of selection. I am leaving the math from my previous review here for completeness.

      As in my previous review, let’s take a closer look at the ratio of observed codon frequencies vs. expected codon frequencies under mutation alone, which was previously notated as RSCUS in the original formulation. In this review, I will keep using the RSCUS notation, even though it has been dropped from the updated version. The key point is this is the ratio of observed and expected codon frequencies. If this ratio is 1 for all codons, then CAIS would be 0 based on equation 7 in the manuscript – consistent with the complete absence of selection on codon usage. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.

      I think what the authors are attempting to do is “divide out” the effects of mutation bias (as given by Ei), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represents adaptive codon usage. Consider Gilchrist et al. GBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is

      where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g

      E[Oi,g].

      Let’s re-write the  in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as

      where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias . This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process.

      If we do this, then

      Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=

      (1) Thus, we have recovered the Gilchrist et al. model from the formulation of Ei under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).

      We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).

      This shows that the expected value of RSCUS for a two codon amino acid is expected to increase as the strength of selection ∆η increases, which is desired. Note that ∆η in Gilchrist et al. is formulated in terms of selection against a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If ∆η = 0 (i.e. selection does not favor either codon), then E[RSCUS] = 1. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if sNe (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances.

      Consider our 2-codon amino acid scenario. You can see how changing GC content without changing selection can alter the CAIS values calculated from these two codons. Particularly problematic appears to be cases of extreme mutation biases, where CAIS tends toward 0 even for higher absolute values of the selection parameter. Codon usage for the majority of the genome will be primarily determined by mutation biases,

      with selection being generally strongest in a relatively few highly-expressed genes. Strong enough mutation biases ultimately can overwhelm selection, even in highly-expressed genes, reducing the fraction of sites subject to codon adaptation.

      Peer review image 1.

      Peer review image 2.

      CAIS (Low Expression)

      Peer review image 3.

      CAIS (Average Expression)

      Peer review image 4.

      CAIS (High Expression)

      If we treat the expected codon frequencies as genome-wide frequencies, then we are basically assuming this genome made up entirely of a single 2-codon amino acid with selection on codon usage being uniform across all genes. This is obviously not true, but I think it shows some of the potential limitations of the CAIS approach. Based on these simulations, CAIS seems best employed under specific scenarios. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content around 0.41, so I suspect their results are okay (assuming things like GC-biased gene conversion are not an issue). Outliers in GC content probably are best excluded from the analysis.

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. One potential challenge to CAIS is the non-monotonic changes in codon frequencies observed in some species (again, see Shah and Gilchrist 2011 and Gilchrist et al. 2015).

    3. Author response:

      The following is the authors’ response to the original reviews.

      In addition to our responses to reviewer suggestions below, a minor bug in the calculation of CAIS was brought to our attention by a reader of our preprint. We have corrected this bug and rerun analyses, whose results became slightly stronger as noise was removed. While we were doing that, someone pointed out to us that our equations were almost the same as Kullback-Leibler divergence, which explains why our metric performed so well. We have made the numerically trivial (see before vs. after figure below) mathematical change to use Kullback-Leibler divergence instead, and now have a better story, with a solid basis in information theory, as to why CAIS works.

      Author response image 1.

      Unfortunately, we discovered a second bug that caused our PIC correction code to fail to perform the needed correction for phylogenetic confounding. The previously reported correlation between CAIS (or ENC) with body mass no longer survives PIC-correction. We have therefore removed this analysis from the manuscript. Our story now stands more on the theoretical basis of CAIS and ENC than on the post facto validation than it previously did. We now also present CAIS and ENC on a more equal footing. ENC results are slightly stronger, while CAIS has the complementary advantage of correcting for amino acid frequencies.

      The work involved in these changes, as well as some of the responses to reviews below, justifies changing the second author into a co-first author, and adding an additional coauthor (Hanon McShea) who discovered the second bug.

      Reviewer #1 (Public Review): 

      In this manuscript, the authors propose a new codon adaptation metric, Codon Adaptation Index of Species (CAIS), which they present as an easily obtainable proxy for effective population size. To permit between-species comparisons, they control for both amino acid frequencies and genomic GC content, which distinguishes their approach from existing ones. Having confirmed that CAIS negatively correlates with vertebrate body mass, as would be expected if small-bodied species with larger effective populations experience more efficient selection on codon usage, they then examine the relationship between CAIS and intrinsic structural disorder in proteins. 

      The idea of a robust species-level measure of codon adaptation is interesting. If CAIS is indeed a reliable proxy for the effectiveness of selection, it could be useful to analyze species without reliable life history- or mutation rate data (which will apply to many of the genomes becoming available in the near future). 

      A key question is whether CAIS, in fact, measures adaptation at the codon level. Unfortunately, CAIS is only validated indirectly by confirming a negative correlation with body mass. As a result, the observations about structural disorder are difficult to evaluate. 

      As discussed in the preamble above, we have replaced the body mass validation with a stronger theoretical basis in information theory.

      A potential problem is that differences in GC between species are not independent of life history. Effective population size can drive compositional differences due to the effects of GC-biased gene conversion (gBGC). As noted by Galtier et al. (2018), genomic GC correlates negatively with body mass in mammals and birds. It would therefore be important to examine how gBGC might affect CAIS, and to what extent it could explain the relationship between CAIS and body mass. 

      Suppose that gBGC drives an increase in GC that is most pronounced at 3rd codon positions in highrecombination regions in small-bodied species. In this case, could observed codon usage depart more strongly from expectations calculated from overall genomic GC in small vertebrates compared to large ones? The authors also report that correcting for local intergenic GC was unsuccessful, based on the lack of a significant negative relationship with body mass (Figure 3D). In principle, this could also be consistent with local GC providing a relatively more appropriate baseline in regions with high recombination rates. Considering these scenarios would clarify what exactly CAIS is capturing. 

      Figure 3 (previously Supplementary Figures S5A and S5B) shows that CAIS is negligibly correlated with %GC (not robust to multiple comparisons correction), and ENC not at all. We believe this is evidence against the possibility brought up by the reviewer, i.e. that Ne might affect gBGC (and hence global %GC). This relationship, if present, could act as a confounding effect, but it is not present within our species dataset. 

      Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that non-selective forces, include gBGC as well as conventional mutation biases, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, CAIS and ENC correct for both mutation bias and gBGC, in order to isolate the effects of selection.

      This argument, based on an average genomic region, is vulnerable to gene-rich genomic regions having differentially higher recombination rates and hence GC-biased gene conversion. However, we do not see the expected positive correlation between |𝐥𝐨𝐜𝐚𝐥 𝐆𝐂 - global GC| and CAIS (see new Figure 5), again suggesting that gene conversion strength is not a confounding factor acting on CAIS.

      Given claims about "exquisitely adapted species", the case for using CAIS as a measure of codon adaptation would also be stronger if a relationship with gene expression could be demonstrated. RSCU is expected to be higher in highly expressed genes. Is there any evidence that the equivalent GCcontrolled measure behaves similarly? 

      Correlations with gene expression are outside the scope of the current work, which is focused on producing and exploiting a single value of codon adaptation per species. It is indeed possible that our general approach of using Kullback-Leibler divergence to correct for genomic %GC could be useful in future work investigating differences among genes.  

      The manuscript is overall easy to follow, though some additional context may be helpful for the general reader. A more detailed discussion of how this work compares to the approach taken by Galtier et al. (2018), which accounted for GC content and gBGC when examining codon preferences, would be appropriate, for example. In addition, it would have been useful to mention past work that has attempted to explicitly quantify selection on codon usage. 

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences as a function of species. Our approach might therefore be robust to scenarios where different genes have different codon preferences (see Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne.

      Reviewer #2 (Public Review): 

      ## Summary 

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection $sN_e$ when the mutation bias changes across species.  

      ## Strengths 

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this). 

      We now cite Cope et al. as an example of how amino acid composition can act as a confounding factor.

      (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected. 

      Unfortunately, our previous PIC correction code was buggy, and in fact the relationship with body size does not survive PIC correction (although it is strong prior to PIC correction). We have therefore removed it from the paper. However, the more novel result on protein disorder remains strong.

      (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. 

      ## Weaknesses 

      (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s $S$ statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to $S$, CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences. 

      The main limitation of dos Reis’s test in our view is that, like the better versions of CAI, it requires comparable orthologs across species. See also the discussion below re the benefits of proteome-wide approach. We now also note the advantage of not needing tRNA gene copy numbers and abundances. 

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. the complications of Gingold et al. 2014 cited above are pertinent, but incorporating them would make simulations quite involved. Instead, we now have a stronger theoretical justification for CAIS grounded in information theory. We have significantly expanded discussion of Figure 2 to give a clearer idea of the conceptual underpinnings of CAIS and ENC.

      The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher $N_e$ results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?" 

      Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.

      I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by $E_i$), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is

      where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g

      E[Oi,g].

      Let’s re-write the  in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as

      where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias .This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process. 

      If we do this, then 

      Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=1. Thus, we have recovered the Gilchrist et al. model from the formulation of $E_i$ under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).. 

      We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).

      This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection $\Delta\eta$ increases, which is desired. Note that $\Delta\eta$ in Gilchrist et al. is formulated in terms of selection *against* a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If $\Delta\eta = 0$ (i.e. selection does not favor either codon), then $E[RSCUS] = 1$. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if $sN_e$ (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay. 

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. 

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. While we keep our more heuristic presentation, our revised manuscript now more clearly acknowledges that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. The reason that we believe our approach worked despite this, is that we think the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene. We have made multiple changes to the texts to make this point clearer.

      Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method. 

      Genome-wide %GC values are hard-coded because they were taken from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The more complicated code used to calculate the intergenic %GC, and the code used to calculate amino acid frequencies is located at https://github.com/MaselLab/CodonAdaptation-Index-of-Species. Luckily, someone else just wrote a simpler end to end pipeline for us, on the basis of our preprint. We now note this in the Acknowledgements, and link to it: https://github.com/gavinmdouglas/handy_pop_gen/blob/main/CAIS.py.

    1. eLife assessment

      This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.

    2. Reviewer #1 (Public Review):

      In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system.

      Comments on revised version:

      I have read through the responses and the revised manuscript. I think together this results in an improved version.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.

      Strengths:

      Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.

      Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.

      Comments on revised version:

      The manuscript has been improved after revisions. The methods, data and analyses broadly support the claims with only minor weaknesses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.

      We thank the reviewers and the editor for their effort and expertise in reviewing our manuscript. We have made changes based on the reviews and believe this has greatly strengthened our manuscript. We appreciate their insightful comments and suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system, however falls short in some technical aspects of the bioinformatic analyses of the generated sequence data.

      (1) Four sequencing libraries were generated and then merged for analysis, however, the authors fail to document any parameters that would indicate that the clustering does not suffer from any batch effects.

      We thank the reviewer for this comment which has given us the opportunity to elaborate on this interesting point. Consequently, we have added evidence to show that the data do not suffer from batch effects between samples (e.g. between sorted samples 1 and 4, and unsorted samples 2 and 3). We now show that there are contributions to all clusters from sorted and unsorted samples and highlight the benefits to using both conditions in a cell atlas with unknown cell types.

      Accordingly, we have now added the following paragraph to line 153:

      There were contributions from sorted and unsorted samples in almost all clusters (except ciliary plates). We found that some cell/tissue types had similar recovery from both methods (e.g. Stem A, Muscle 2, and Tegument), others were preferentially recovered by sorting (e.g Neuron 1, Neuron 4, and Stem E), and some were depleted by sorting (e.g. Parenchyma 1, Protonephridia, and Ciliary plates) (Supplementary Figure 1) , Supplementary Table 4). This variation in recovery, therefore, enabled us to maximise the discovery and inclusion of different cell types in the atlas.

      We have now added a Supplementary Figure 1 showing the contribution of sorted and unsorted cells to the Seurat clusters. We have also included a Supplementary Table 4 detailing the cell number contribution for both conditions and the percentages in order to easily compare differential recovery between cell types.

      These are added to the manuscript.

      (2) Additionally, the authors switch between analysis platforms without a clear motivation or explanation of what the fundamental differences between these platforms are. While in theory, any biologically robust observation should be recoverable from any permutation of analysis parameters, it has been recently documented that the two popular analysis platforms (Seurat - R and scanPy python) indeed do things slightly differently and can give different results (https://www.biorxiv.org/content/10.1101/2024.04.04.588111v1). For this reason, I don't think that one can claim that Seurat fails to find clusters resolved by SAM without running a similar pipeline on the cluster alone as was done with SAM/scanPy here. The manuscript itself needs to be checked carefully for misleading statements in this regard.

      We thank the reviewer for this comment and agree that it’s important to increase the clarity on this matter. We have added additional detail to explain that results of subclustering Neuron 1 using Seurat and SAM/ScanPy were broadly similar, but that we presented the results from the SAM/ScanPy analysis due to the strengths of SAM in detecting small differences in gene expression (Tarashanky et al., 2019 PMID: 31524596). We have included here the UMAP showing subclustering of Neuron 1 in Seurat for comparison.

      Author response image 1.

      UMAP showing subclustering of Neuron 1 cluster in Seurat (SCT normalisation, PC = 19, resolution = 0.3).

      We’ve added this additional text to the ‘Neuron abundance and diversity’ section on line 220:

      We explored whether Neuron 1 could be further subdivided into transcriptionally distinct cells by subclustering (Supplementary Figure 2; Supplementary Table 6) using the self-assembling manifold (SAM) algorithm (Tarashansky et al., 2019) with ScanPy (Wolf et al., 2018), given its reported strength in discerning subtle variation in gene expression (Tarashansky et al., 2019), although a similar topology was subsequently found using Seurat.

      (3) Similarly, the manuscript contains many statements regarding clusters being 'connected to', or forming a 'bridge' on the UMAP projection. One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (see Chari and Pachter 2023). To support these types of interpretations, the authors should provide evidence of gene expression transitions that support connectivity as well as stability estimates of such connections under different parameter conditions. Otherwise, these descriptors hold little value and should be dropped and the transcriptomic states simply defined as clusters with no reference to their positions on the UMAP.

      We thank the reviewer for this thoughtful comment. We agree and have rephrased those statements accordingly e.g. line numbers 218, 439, 543, and 557.

      (4) The underlying support for the clusters as transcriptomically unique identities is not well supported by the dot plots provided. The authors used very permissive parameters to generate marker lists, which hampers the identification of highly specific marker genes. This permissive approach can allow for extensive lists of upregulated genes for input into STRING/GO analyses, this is less useful for evaluating the robustness of the cluster states. Running the Seurat::FindAllMarkers with more stringent parameters would give a more selective set of genes to display and thereby increase the confidence in the reader as to the validity of profiles selected as being transcriptomically unique.

      The Reviewer is correct in noting that we used a permissive approach to enable a better understanding of the biology of each cluster, based on analysing enriched functions. However, we disagree about the suitability of the approach for finding markers. First, the permissive approach produced longer candidate lists, but those with the best AUC scores for each cluster are at the top of the list for each cluster. Second, some of the markers with lower expression also revealed interesting biology (e.g. Notum in the muscles). Furthermore, we used filtering on the marker genes lists to increase the minimum marker gene scores for analyses such as the GO analyses (details in the GO section of the methods). It’s important to stress that our approach also utilised validation by FISH for top marker genes, as well as biologically informative genes that were lower down the marker gene list.

      (5) Figure 5B shows a UMAP representation of cell positions with a statement that the clustering disappears. As a visual representation of this phenomenon, the UMAP is a very good tool, however, to make this statement you need to re-cluster your data after the removal of this gene set and demonstrate that the data no longer clusters into A/B and C/D.

      We’ve added Supplementary Figure 13 to show that after removing WSR and ZSR genes and reclustering, the data no longer clusters in A/B and C/D, even at a higher resolution where clusters appear oversplit.

      Also, as a reader, these data beg the question: which genes are removed here? Is there an over-representation of any specific 'types' of genes that could lead to any hypotheses of the function? Perhaps the STRING/GO analyses of this gene set could be informative.

      We have performed GO-enrichment analyses on W-specific genes, Z-specific genes and both together compared to the rest of the genome, but we did not find very informative results (see Supplementary Table 13 that we have now added, line 464). This may be due to the large difference in size. There are approx 900 Z-specific genes (males two copy, females one copy), while approx 30 W-specific genes many of which have homologs in the Z-specific region of the genome. Instead we suggest that tissue-specific regulation of gene dosage compensation is the more likely explanation as reported for other species (Valsecchi et al. 2018).

      (6) How do the proportions of cell types characterized via in situ here compare to the relative proportions of clusters obtained? It does not correspond to the percentages of the clusters captured (although this should be quantified in a similar manner in order to make this comparison direct: 10,686/20,478 = ~50% vs. 7%), how do you interpret this discrepancy? While this is mentioned in the discussion, there is no sufficient postulation as to why you have an overabundance of the stem cells compared to their presence in the tissue. While it is true that you could have a negative selection of some cell types, for example as stated the size of the penetration glands exceeds both that of the 10x capabilities (40uM), and the 30uM filters used in the protocol, this does not really address why over half of the captured cells represent 'stem cells'. A more realistic interpretation would be biological rather than merely technical. For example, while the composition of the muscle cells and the number of muscle transcriptomes captured are quite congruent at ~20%, the organism is composed of more than 50% of neurons, but only 15% of the transcriptomic states are assigned to neuronal. Could it be that a large fraction of the stem cells are actually neural progenitors? Are there other large inconsistencies between the cluster sizes and the fraction of expected cells? Could you look specifically at early transcription factors that are found in the neurons (or other cell types) within the various stem cell populations to help further refine the precursor/cell type relationships?

      Yes, it is really interesting that more than 50% of cells in the animal are neurons whereas more than 50% of cells in scRNAseq data are stem cells. This dataset provides a unique opportunity to compare tissue composition in the whole animal to the corresponding single cell RNAseq dataset.

      The table (in Supplementary Table 17) shows the percentage of cells from each tissue type in the miracidium (identified via in situ hybridisation of tissue-type marker genes) and in the scRNAseq to understand this phenomenon.

      This table shows that the single cell protocol used in this study negatively selected for nerves and tegument, and positively selected for stem and parenchyma. The composition of the muscle and protonephridia cells and the number of muscle and protonephridia transcriptomes captured are quite congruent.

      This technical finding is also biologically consistent. For instance, the tegument cells span the body wall muscles, with the cell bodies below and a syncytial layer above. It is not known how the tegument fragments during the dissociation process, and which parts of the cells get packaged by the 10X GEMs. Because of tegumental structure, the cells are likely prone to damage, and therefore we speculate that is why the tegument cells are under-represented in our 10X data. Unusually shaped fragments may not have been captured in 10X GEMs and of those that were, damaged or distressed tegument cells/fragments may have been excluded post-sequencing, by QC filters including cell calling, mitochondrial percentage and low transcript count (e.g. if there there was a tegumental fragment with 100 transcripts it would have not passed QC). Stem cells are spherical with a large nucleus:cytoplasm ratio, likely making them more robust during dissociation and more likely to be captured in 10X GEMs.

      We don’t think that a large fraction of the stem cells are actually neural progenitors because:

      (1) we used previously reported marker genes of different tissue types to identify the single cell RNAseq clusters, e.g. Ago2-1 for stem cells, which has been used in multiple life stages.

      (2) The stem cell transcriptomes express many previously reported stem cell marker genes.

      (3) We found that the stem cells from the single cell data generally had higher numbers of transcripts than the other cell types which is consistent with the Wang et al. 2013 observation that RNA marker POPO-1 could distinguish germinal (stem) cells from other cell types as they are RNA rich.

      (4) We also found higher numbers of ribosomal related transcripts in our stem cell transcriptomes, which is consistent with Pan’s observation that part of the distinct morphology of stem cells is densely packed ribosomes in the cytoplasm.

      In order to elaborate on this discussion we have generated new visualisations:

      (1) A UMAP of the stem cell marker ago2-1 (Supplementary figure 10), to further illustrate our evidence in classifying the stem cell clusters

      (2) A co-expression plot of the stem cell marker ago2-1 with neural marker complexin to confirm that there is little coexpression (the most coexpression being in Neuron 1 and Stem F). We identified that 15.56% of cells in the Stem F cluster show some expression of complexin (neural marker), suggesting that a small fraction of Stem F may be early/precursor neurons, but the gene expression indicates that the majority of cells in Stem F are more likely to be stem cells than any other tissue type. There is little to no complexin expression in the other stem clusters.

      (3) Expression plots of the 5 neurogenins (TFs involved in neuronal differentiation) we could identify using WormBase ParaSite in these data. Four of the five showed very little expression, and not in specific clusters. The fifth (Smp_072470) showed slightly more expression, though still sparse, mostly across the stem and neural clusters not enough to indicate that any of the stem clusters are neural progenitors.

      Author response image 2.

      Coexpression UMAP showing the expression of stem cell marker Ago2-1 and neural marker complexin.

      Author response image 3.

      UMAPs showing the expression five putative neurogenins of S.mansoni.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.

      Strengths:

      Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.

      Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.

      Weaknesses:

      The sample sizes upon which the in situ hybridization results and cell counts are based are either not stated (in most cases) or are very small (n=3). This lack of clarity about biological replicates and sample sizes makes it difficult for the reader to assess the robustness of the results and the extremely small sample sizes (when provided) are a missed opportunity to explore the variability of the system, or lack thereof.

      We have now added more details about the methods we used for validating cell type marker genes by in situ hybridisation. We have added to the methods that ‘We carried out at least three in situ hybridisation experiments for each marker gene we validated (each experiment was a biological replicate). From each experiment we imaged (by confocal microscopy) at least 10 miracidia (technical replicates) per marker gene experiment.’ on line 1036.

      In the figure legends we have added the number of miracidia that were screened, and documented the percentage of the screened larvae that showed the in situ gene expression pattern that is seen in the images in the figures, and that we described in the text.

      We manually segmented the nuclei of pan tissue marker genes, and we did this for one miracidium in the case of all tissues, except stem cells where we segmented stem cells in five larvae. Manual segmentation of gene expression in a confocal z-stack is very time consuming. We consider that the variability of different cell and tissue types (stereotypy) between miracidia is beyond the scope of this paper and can be investigated in future work.

      Although assigning transcripts to a given cell type is usually straightforward via in situ experiments, the authors fail to consider the potential difficulty of assigning the appropriate nuclei to cells with long cytoplasmic extensions, like neurons. In the absence of multiple markers and a better understanding of the nervous system, it seems likely that the authors have overestimated the number of neurons and misassigned other cell types based on their proximity to neural projections.

      This is a valid point, and we acknowledge the difficulties of assigning a nucleus to a cell using mRNA expression only and in the absence of a cell membrane marker. We tried to address this issue by labelling the cell membranes using an antibody against beta catenin after the HCR in situ protocol. This method has been used successfully on sections on slides (Schulte et al., 2024), but we failed to get usable results in our miracidia whole-mounts. The beta catenin localisation marked the membranes of the gland cells but didn’t do the same for the neurons or other cell types (see image below).

      Author response image 4.

      Image showing a maximum intensity projection of a subvolume of a confocal z-stack of a miracidia wholemount in situ hybridisation (by HCR) for paramyosin counterstained with a beta catenin antibody (1:600 concentration of Sigma C2206). The cell membrane of a lateral gland is clearly labelled, but those of the neurons of the brain and the paramyosin+ muscle cells are not.

      Our observation that 57% of the cells in a miracidium are nerves is high compared to the C.elegans hermaphrodite adult in which 302 out of 959 cells are neurons (Hobert et al., 2016), few studies have equivalent data with which to make comparisons. Despite this, and the limitation described above, we believe that we have not overestimated the number of neural cells. During the process of validating the marker genes and closely examining gene expression in hundreds of miracidia, we noted that the nuclei of different tissue types are distinct and recognisable (see figure below). The nuclei of stem, tegument and parenchymal cells are comparatively large and spherical with obvious nucleoli (i). The four nuclei of the apical gland cell are angular, pentagonal in shape and sit adjoining each other (inside red dashed circle, i-iii), those of the two lateral glands are bilaterally symmetrical and surrounded by flask shaped cytoplasm (arrows, iv). The nuclei of the body wall muscle cells are peripheral and flattened on the outer edge (iii). The notum+ muscle cell nuclei are anterior of the apical gland (manuscript Figure 2E). The only other two tissue types are the nerves and protonephridia, and their nuclei are smaller and more compact/condensed. In situ expression of the protonephridia marker suggests that 6 cells make up the protonephridial system (manuscript Figure 4 B&E). Therefore, by process of elimination, the remaining nuclei should belong to neurons. The complexin expression pattern supports this and we counted 209 nuclei that were surrounded by cpx transcript expression. To help the reader interpret this for themselves we have added confocal z-stacks of miracidia where tissue level markers have been multiplexed (supplementary videos 18-20). We counted all tissue type cells individually and the tissue type cell numbers added up to the overall cell count.

      Author response image 5.

      Image showing the diversity of nucleus morphology between tissue types in the miracidium.

      Biologically, it is not surprising that this larva is dominated by neural cells. It must navigate a complex aquatic environment and identify a suitable mollusc host in less than 12 hours. It is a non-feeding vehicle that must deliver the stem cells to a suitable environment where they can develop into the subsequent life cycle stage. Accordingly, the cell type composition reflects this challenge.

      The conclusion that germline genes are expressed in the miracidia stem cells seems greatly overstated in the absence of any follow-up validation. The expression scales for genes like eled and boule are more than 3 orders of magnitude smaller than those used for any of the robustly expressed genes presented throughout the paper. These scales are undefined, so it isn't entirely clear what they represent, but neither of these genes is detected at levels remotely high (or statistically significant) enough to survive filters for cluster-defining genes.

      Given that germ cells often develop early in embryogenesis and arrest the cell cycle until later in development, and that these transcripts reveal no unspliced forms, it seems plausible that the authors are detecting some maternally supplied transcripts that have yet to be completely degraded.

      We agree that the expression of genes such as eled and boule are low. We made this clear in the figure legends and text, and have now added scale information to the figure legends. We did not explore these genes as cluster-defining genes, partly due to their comparatively low levels of expression, but as genes already reported to be important in germ line specification. We found the expression of these genes to be consistent with our hypothesis that the Kappa stem cells may include germ line segregated cells, but our hypothesis does not rest on these lower-expressed genes.

      It is certainly possible that we have detected some maternally supplied transcripts in the miracidia stem cells. However experiments to distinguish between zygotic and maternal transcripts using metabolic labelling of zygotic transcripts (e.g. Fishman et al. 2023) would be hard in this species due to the hard egg capsule and its ectolethical embryogenesis. Therefore this is out of scope for this work, but this would be a very interesting topic to follow up on and develop tools for.

      We have added these sentences to the Discussion ln 746 ‘Intriguingly, the presence of spliced-only copies of the germline defining genes eled and boule could suggest that they are maternal transcripts that have been restricted to the primordial germ cells during embryogenesis, as is the case in Zebrafish embryos (Fishman et al., 2023). An alternative explanation is that unspliced transcripts exist for these lowly expressed genes but their abundance was below our threshold for detection.’

      Reviewer #1 (Recommendations For The Authors):

      Ln 138: specify the version of Seurat used, and reference the primary papers for this software. Also, from the dot plot shown here, these do not all appear to be supported by unique gene sets. How was the final clustering determined? This information is in the methods section, but a summary here could make it more robust for the readership.

      In addition to the details in the methods section, we have added the version and referenced the version-specific primary paper for Seurat when it is first mentioned. We have also summarised the methods used to select the final clustering when we first present the results to aid in clarity.

      We added to line 140 ‘Using Seurat (version 4.3.0) (Hao et al., 2021), 19 distinct clusters of cells were identified, along with putative marker genes best able to discriminate between the populations (Figure 1C & D and Supplementary Table 2 and 3). We used Seurat’s JackStraw and ElbowPlot, along with molecular cross-validation to select the number of principal components, and Seurat’s clustree to select a resolution where clusters were stable (Hao et al., 2021).’

      Ln 147: isn't seven stem cell clusters a lot? See comment in public review.

      We did not have preconceived expectations of the number of stem cell clusters, and were guided by the data and gene expression. In doing so we also discovered that four of those clusters were likely only two ‘biologically or functionally distinct’ clusters, but these split into four clusters based on the expression of genes on the sex-specific regions of the chromosomes, which was both unexpected and interesting.

      Figure 1D: gene model names are un-informative for the general reader. Can you provide any putative gene identities here to render this plot interpretable? For example in the main text you state that Smp-085540 is paramyosin; please use this annotation in all your visual material (as is used in Figure 2A).

      We have added gene names to the dotplots in all figures with the locus identifier (minus the ‘Smp’ prefix) in brackets after the gene name.

      Ln 191:196 Identification of the two muscle clusters as circular and longitudinal muscles is very well supported. However, it would be interesting to look specifically at the genes that are different here. Did the authors attempt to specifically pull out genes differentially expressed between these two groups, or only examine the output of FindAllMarkers at this point?

      We did indeed look specifically for genes differentially expressed between the muscle clusters, the results of which can be found in Supplementary Table 5 (Line 206). This analysis revealed “Wnt-11-1 (circular) and MyoD (longitudinal) were among the most differentially expressed genes”, which were important findings in our understanding of the muscle cells in the miracidium.

      Ln 207: "connected to stem F" - does this refer specifically to their relative positions on the UMAP in Figure 1C? One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (public review).

      We agree, and have rephrased accordingly.

      Ln 209:211: Here the authors switch from Seurat (R) as an analysis package, to SAM (python) for subset analysis of one large neural cluster. The results indicate that there may be small populations of transcriptomically distinct neural subtypes also within the neural1 cluster, but that the vast majority of these cells do not express unique transcriptomic profiles. Also in the supplementary material for this (SF1) there is a question of whether or not there is any clustering according to batch effects.

      In general, I find the neuronal section a little difficult to follow and it is unclear how many unique profiles are present and which are documented with in situ. I would recommend re-running the analysis on the entire neural subset (n1:5: complexin positive) and generating an inventory of putatively unique neural states with the associated in situ validation altogether in a main figure.

      In response to comments above we have both clarified our reasoning for using SAM analysis, and presented more details on possible batch effects. We have gone through the neural system results in order to make it clearer for the reader to follow.

      Ln 236: here the authors introduce a STRING analysis for the first time. Also, this method requires some introduction for the general audience in terms of its goals and general functionality and output.

      We used STRING analysis on some well defined clusters to provide additional clues about function. At the first mention of STRING (neuron 3 results) we have added the following statement to give more introduction to the reader: “STRING analysis of the top 100 markers of Neuron 3 predicted two protein interaction networks with functional enrichment: ….”

      Ln. 280:281. It is unclear why Steger et al is referenced here. In what way does a description of neural and glandular cell transcriptomic similarity in a Cnidarian inform your data on a member of the playhelmenthes? (which should also be referenced in the introduction: to which phylogenetic lineage does Schistosoma belong).

      We have now added that the Schistosoma belong to the Platyhelminths on the first line of the introduction.

      Ln 295 we have added ‘We expected to find a discrete cluster(s) for the penetration glands, and that it would show similarities to the neural clusters (as glandular cells arise from neuroglandular precursor cells in other animals, such as the sea anemone, Nematostella vectensis, Steger et al., 2022).’

      Ln 339: explain the motivation for generating a further plate-based scRNA of the ciliary plates.

      We wished to include the ciliary plates alongside the gland cells for plate based RNAseq as they are unique to the miracidium stage and wanted to make sure we had captured them in this study.

      Ln 345: Define the tegumental cells for the general reader.

      We have added further description on tegument cells in the introduction and tegument results section, e.g. on line 61, 366).

      Ln 365: "this cluster" is imprecise. Which cluster are we looking at here?' Also: were flame cells already described morphologically at this stage, or is this the first description of the protonephridial system for this stage of the life cycle?

      We have now clarified which cluster we are talking about in the text. The flame cells have been described using TEM before (Pan, 1980).

      Stem Cells: also here you refer to cells as 'bridge' which refers to the configuration of the UMAP. While this is likely a biological representation of a different differentiation state, the nomination of this based solely on the UMAP representation should be avoided.

      We have rephrased this.

      Figure 5B: What is neuron 6? This was Neuron 3 in Figure 1.

      Thank you for spotting these mistakes in the labelling, we have corrected them now.

      Ln 421:438 - Here you represent a UMAP representation of the cell positions, but state that the clustering disappears. See comment in Public Review.

      Modified accordingly, see response in public review.

      Ln 472 "Cells in stem E, F, and G in silico clusters might be stressed/damaged/dying cells or cells in transcriptionally transitional states." Is there any evidence supporting either of these conclusions?

      We found that 15.56% of the cells in Stem F expressed the neural marker complexin, leading us to consider the possibility that a fraction of these cells may be neural precursors. Stem F also had some cells with a mitochondrial % near the maximum threshold we set, suggesting they could be experiencing some stress. Since we could not identify clear markers for these clusters, their function and a more specific identity, beyond ‘stem’, is not yet known.

      That the two stem cell populations contribute to different parts of the next life cycle stage is interesting. The combined analysis suffers from the same issues as the previous analysis in terms of sample distribution; are the 'grey' sporocyst cells also contributing to the stem A/B (kappa) C/D (delta/phi) clusters? This is not possible to tell from the plot as the miracidia may simply be plotted on the top. A different representation of sample contribution to clusters is warranted.

      We have made an alternative visualisation here to demonstrate that the miracidia cells are not plotted on top of the sporocyst stem cells. Unfortunately this visual is hampered as there is not a straightforward way to split the panels. In the figure below, the left pane shows the miracidia cells, and the right pane shows the sporocyst cells. Below that, we have included the original figure for comparison. It can be clearly seen that there are three miracidia tegument cells in the sporocyst tegument cluster, and one sporocyst cell in the miracidia stem cells (Stem E), but the miracidia A/B and C/D stem cells are not plotted on top of any sporocyst cells.

      Author response image 6.

      Methods: Why is the multiplet rate estimate at >50% for the unsorted sample?

      We have added more detail on this: “The estimated doublet rate was calculated based on 10X loading guidelines and adjusted for our sample concentrations”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more careful consideration of what was already known based on previous literature, which would help the authors to better put their results in context. For example, previous work suggested that one of the sporocyst stem cell populations (phi) gives rise to tegument and other temporary larval structures; this appears not to be mentioned here. The model in Figure 7 suggests that two of the stem cell populations are gone at day 15 post-infection; the literature shows that those cells can still be detected at this stage (there are just far fewer of them).

      We have added the definition of Kappa, Delta and Phi as per Wang et al (2018) in the stem cell results p13 ln 428.

      We have amended Figure 7 to include further elements from the Wang et al (2018) paper that show that mother sporocyst stem cells classified as delta and phi are still detectable on day 15 post-infection in mother sporocysts.

      We intentionally didn’t put too much emphasis on fitting our data to the model of Wang et al (2018), because a) it’s a different life cycle stage and b) the single cell data the model was based on was from 35 stem cells and gathered using a different method, c) more recent data (Diaz, Attenborough et al. 2024) with 119 stem cells from sporocysts did not recover the same populations of stem cells. We therefore linked our data to previous literature where it was relevant but focused on being led by the data we gathered (>10,000 stem cells).

      (2) To add some detail to the public comment about the lack of clarity about sample sizes and biological replicates, and how this leads to questions about the robustness of the results, Figures 4 B and F show the expression pattern for the same parenchyma marker (Smp_318890) in two different samples. The patterns appear quite distinctive. In B, the cell bodies are so clearly labeled that the signal appears oversaturated. In F the cell bodies are barely apparent. Based on the single-cell clustering, it should be possible to distinguish between Parenchyma clusters 1 and 2 based on the levels of this transcript. Careful quantification of signal intensity from multiple samples across multiple experiments might enable the authors to detect such differences.

      The reason the expression patterns look different between panels 4Bii and 4F is that in 4Bii we have manually segmented the nuclei of the parenchymal cells in order to count them, whereas in the images in 4F there is no segmentation. We have made this more clear in this legend now, and also in the legends of Figures 2,3, and 5. If there was any signal intensity difference between parenchyma 1 and 2 cells based on expression of the marker gene, Smp_318890, it was not obvious. We carried out 6 experiments for parenchyma markers, multiplexing the pan-parenchyma marker, Smp_318890, with markers for parenchyma 2 but we were unable to distinguish between the two populations.

      (3) The authors find that the "somatic" stem cells in miracidia seem to combine attributes of the previously defined delta and phi stem cells from sporocysts. Because the 3 classes of sporocyst stem cells were defined by expression of nanos-2 and fgfrA, using those probes in in-situ experiments could have helped them resolve whether or not the miracidial cells represent precursors that can adopt either fate or if the heterogeneity is already present in miracidia.

      In silico expression of the marker genes for the 3 classes of sporocyst stem cells didn’t support those three classes in the miracidia stem cells (See supplementary table 10). We further subclustered the delta/phi cells to see if we could recover separate delta and phi populations but we were unable to do so. We therefore did not pursue in situ experiments of these genes. We instead prioritised cluster-defining genes in the miracidia stem cell populations rather than cluster defining genes in the sporocyst (defined by Wang et al., 2018), but we still explored these in silico. For example, instead of using klf to define Kappa (Wang et al 2018), we used UPPA to validate the Kappa population as it showed similar expression to klf but higher expression levels and was specific to that population. However, like Wang et al 2018, we did use p53, which is a cluster marker of delta and phi in sporocysts, as it showed clear and high expression in our miracidia delta/phi population. We were guided by our data and our knowledge of the literature. More in depth single cell RNAseq is needed from the mother and daughter sporocyst stages to understand the heterogeneity and fates of these stem populations.

      (4) Scale bars should be included throughout the figures and the scale should be defined either on the figure or in the legend. Similarly, all the scales used for velocity and expression analysis should be defined.

      We have added scale bars to all figures and legends.

      The statements “Gene expression has been log-normalised and scaled using Seurat(v. 4.3.0)”, “Gene expression has been normalised (CPM) and log-transformed using scvelo(v. 0.2.4)”, or “Library size was normalised and gene expression values were log-normalised using SAM (v1.0.1) and Scanpy (v1.8.2)” has been added to all figures as appropriate.

      (5) The table entitled In situ hybridization probes (Supplementary Table 15) contains no probe sequences, so any interested reader wishing to use these probes would have to design their own. To ensure the reproducibility of the results presented here, the authors should provide the probe sequences they used.

      In Supplementary Table 15 we have added the Molecular Instruments Lot number of all the probes used. Anyone wanting to repeat the experiment can order the same probes from the company.

      (6) It is unclear how useful the supplemental figures showing the STRING enrichment analyses will be for readers. Unannotated Smp gene identifiers provide no way to help readers digest the information in these hairballs. It would probably be best to replace the Smp names with useful annotations based on their orthologs; if not, these figures could probably be dropped entirely. (Also, the bottom panel of Supplementary Figure 7 has the word "Lorem" embedded on one of the connecting nodes.)

      “Lorem” has been removed.

      Many of the genes in these analyses do not have short descriptions, therefore we have used Smp gene identifiers in the STRING analysis supplementary figures. These ‘Smp_’ numbers can be used to search WormBase Parasite, where a description can be found and the history of the gene ID traced. This latter function facilitates searching for these genes in the literature and consistency between versions as gene models are updated.

      Minor edits

      (1) Figures 4A-D aren't cited in the text until after 4E-F are. It seems like moving the section on protonephridial cells (line 364) before the section on tegumental cells (line 345) better reflects the order of the figures.

      Thank you for flagging this, we have updated the in-text citations of Figure 4.

      (2) In-text references to Sarfati et al, 2021 should be to Nanes Sarfati, as listed in the references. Poteaux et al 2023 is cited in the text, but not in the reference list.

      Both of these have been fixed.

    1. eLife assessment

      This important work provides evidence that glutamate and GABA are released from different synaptic vesicles at supramammillary axon terminals onto granule cells of the dentate gyrus. The study uses complementary electrophysiological and anatomical experimental approaches. Together, these provide solid evidence that the co-release of glutamate and GABA from different vesicles within the same terminal could modulate granule cell firing in a frequency-dependent manner, although thorough elimination of alternative mechanisms would have strengthened the study. The work will be of interest to neuroscientists investigating co-release of neurotransmitters in various synapses in the brain and those interested in subcortical control of hippocampal function.

    2. Reviewer #1 (Public Review):

      This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA-filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum.

      Major concerns:

      (1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a one-transmitter identity and had distinct physiological properties, could that account for some of the physiological findings?

      (2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation.

      (3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper.

      (4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents.

      (5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG.

      (6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why?

      (7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM?

      (8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequency-dependent depression and GABA showing frequency-independent stable depression.

      Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs.

      Strengths:

      The conclusions of this paper are mostly well supported by the data.

      Weaknesses:

      Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Hirai et al investigated the release properties of glutamate/GABA co-transmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs.

      Strengths:

      Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits.

      Weaknesses:

      No major issues are noted. Some minor issues related to data presentation and experimental details are listed below.

    1. eLife assessment

      This study provides a novel and valuable alternative explanation for volatility-induced changes in choice behavior, commonly attributed to learning-rate adaptations. Through rigorous and comprehensive computational modeling of previously published data, the authors provide convincing support for the claim that apparent learning-rate adaptations may instead reflect a mixture of decision strategies. Furthermore, they demonstrate that differential weighting of the optimal decision strategy is predicted by psychopathology common to depression and anxiety. This work should be of interest to a wide range of scientists, including psychologists, neuroscientists, computer scientists, and clinicians.

    2. Reviewer #2 (Public Review):

      Summary:

      Previous research shows that humans tend to adjust learning in environments where stimulus-outcome contingencies become more volatile. This learning rate adaptation is impaired in some psychiatric disorders, such as depression and anxiety. In this tudy the authors reanalyze previously published data on a reversal learning task with two volatility levels. Through a new model they provide some evidence for an alternative explanation whereby the learning rate adaptation is driven by different decision-making strategies and not learning deficits. In particular, they propose that adjusting of learning can be explained by deviations from the optimal decision-making strategy (based on maximizing expected utility) due to response stickiness or focus on reward magnitude. Furthermore, a factor related to general psychopathology of individuals with anxiety and depression negatively correlated with the weight on the optimal strategy and response stickiness, while it correlated positively with the magnitude strategy (a strategy that ignores the probability of outcome).

      The main strength of the study is a novel and interesting explanation of an otherwise well-established finding in human reinforcement learning. This proposal is supported by rigorously conducted parameter retrieval and the comparison of the novel model to a wide range of previously published models. The authors explore from many angles, if and why the predictions from the new proposed model are superior to previously applied models.

      My previous concerns were addressed in the revised version of the manuscript. I believe that the article now provides a new perspective on a well-established learning effect and offer a novel set of interesting response models that can be applied to a wide array of decision-making problems.

      I see two limitations of the study not mentioned in the discussion of the manuscript. First, the task features binary inputs and responses, therefore unexpected uncertainty (volatility) is impossible to differentiate from the uncertainty about outcomes, and exploration is inseparable from random choices. Future work could validate these findings in task designs that allow to distinguish these processes. Second, clinical results are based on a small sample of patients and should be interpreted with this in mind.

    3. Reviewer #3 (Public Review):

      Summary:

      This paper presents a new formulation of a computational model of adaptive learning amid environmental volatility. Using a behavioral paradigm and data set made available by the authors of an earlier publication (Gagne et al., 2020), the new model is found to fit the data well. The model's structure consists of three weighted controllers that influence decisions on the basis of (1) expected utility, (2) potential outcome magnitude, and (3) habit. The model offers an interpretation of psychopathology-related individual differences in decision-making behavior in terms of differences in the relative weighting of the three controllers.

      Strengths:

      The newly proposed "mixture of strategies" (MOS) model is evaluated relative to the model presented in the original paper by Gagne et al., 2020 (here called the "flexible learning rate" or FLR model) and two other models. Appropriate and sophisticated methods are used for developing, parameterizing, fitting, and assessing the MOS model, and the MOS model performs well on multiple goodness-of-fit indices. Parameters of the model show decent recoverability and offer a novel interpretation for psychopathology-related individual differences. Most remarkably, the model seems to be able to account for apparent differences in behavioral learning rates between high-volatility and low-volatility conditions even with no true condition-dependent change in the parameters of its learning/decision processes. This finding calls into question a class of existing models that attribute behavioral adaptation to adaptive learning rates.

      Weaknesses:

      The authors have responded to the weaknesses noted previously.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Point 1.1

      Summary: This paper describes a reanalysis of data collected by Gagne et al. (2020), who investigated how human choice behaviour differs in response to changes in environmental volatility. Several studies to date have demonstrated that individuals appear to increase their learning rate in response to greater volatility and that this adjustment is reduced amongst individuals with anxiety and depression. The present authors challenge this view and instead describe a novel Mixture of Strategies (MOS) model, that attributes individual differences in choice behaviour to different weightings of three distinct decision-making strategies. They demonstrate that the MOS model provides a superior fit to the data and that the previously observed differences between patients and healthy controls may be explained by patients opting for a less cognitively demanding, but suboptimal, strategy. 

      Strengths: 

      The authors compare several models (including the original winning model in Gagne et al., 2020) that could feasibly fit the data. These are clearly described and are evaluated using a range of model diagnostics. The proposed MOS model appears to provide a superior fit across several tests. 

      The MOS model output is easy to interpret and has good face validity. This allows for the generation of clear, testable, hypotheses, and the authors have suggested several lines of potential research based on this. 

      We appreciate the efforts in understanding our manuscript. This is a good summary.

      Point 1.2

      The authors justify this reanalysis by arguing that learning rate adjustment (which has previously been used to explain choice behaviour on volatility tasks) is likely to be too computationally expensive and therefore unfeasible. It is unclear how to determine how "expensive" learning rate adjustment is, and how this compares to the proposed MOS model (which also includes learning rate parameters), which combines estimates across three distinct decision-making strategies. 

      We are sorry for this confusion. Actually, our motivation is that previous models only consider the possibility of learning rate adaptation to different levels of environmental volatility. The drawback of previous computational modeling is that they require a large number of parameters in multi-context experiments. We feel that learning rate adaptation may not be the only mechanisms or at least there may exist alternative explanations. Understanding the true mechanisms is particularly important for rehabilitation purposes especially in our case of anxiety and depression. To clarify, we have removed all claims about the learning rate adaptation is “too complex to understand”.

      Point 1.3

      As highlighted by the authors, the model is limited in its explanation of previously observed learning differences based on outcome value. It's currently unclear why there would be a change in learning across positive/negative outcome contexts, based on strategy choice alone. 

      Thanks for mentioning this limitation. We want to highlight two aspect of work.

      First, we developed the MOS6 model primarily to account for the learning rate differences between stable and volatile contexts, and between healthy controls and patients, not for between positive and negative outcomes. In the other words, our model does not eliminate the possibility of different learning rate in positive and negative outcomes.

      Second, Figure 3A shows that FLR (containing different learning parameters for positive/negative outcomes) even performed worse than MOS6 (setting identical learning rate for positive/negative outcomes). This result question whether learning rate differences between positive/negative outcomes exist in our dataset.

      Action: We now include this limitation in lines 784-793 in discussion:

      “The MOS model is developed to offer context-free interpretations for the learning rate differences observed both between stable and volatile contexts and between healthy individuals and patients. However, we also recognize that the MOS account may not justify other learning rate effects based solely on strategy preferences. One such example is the valence-specific learning rate differences, where learning rates for better-than-expected outcomes are higher than those for worse-than-expected outcomes (Gagne et al., 2020). When fitted to the behavioral data, the context-dependent MOS22 model does not reveal valence-specific learning rates (Supplemental Note 4). Moreover, the valence-specific effect was not replicated in the FLR22 model when fitted to the synthesized data of MOS6.”

      Point 1.4

      Overall the methods are clearly presented and easy to follow, but lack clarity regarding some key features of the reversal learning task.

      Throughout the method the stimuli are referred to as "right" and "left". It's not uncommon in reversal learning tasks for the stimuli to change sides on a trial-by-trial basis or counterbalanced across stable/volatile blocks and participants. It is not stated in the methods whether the shapes were indeed kept on the same side throughout. If this is the case, please state it. If it was not (and the shapes did change sides throughout the task) this may have important implications for the interpretation of the results. In particular, the weighting of the habitual strategy (within the Mixture of Strategies model) could be very noisy, as participants could potentially have been habitual in choosing the same side (i.e., performing the same motor movement), or in choosing the same shape. Does the MOS model account for this? 

      We are sorry for the confusion. Yes, two shapes indeed changed sides throughout the task. We replaced the “left” and “right” with “stimulus 1” and “stimulus 2”. We also acknowledge the possibility that participants may develop a habitual preference for a particular side, rather than a shape. Due to the counterbalance design, habitual on side will introduce a random selection noise in choices, which should be captured by the MOS model through the inverse temperature parameter.  

      Point 1.5

      Line 164: "Participants received points or money in the reward condition and an electric shock in the punishment condition." What determined whether participants received points or money, and did this differ across participants? 

      Thanks! We have the design clarified in lines 187-188:

      “Each participant was instructed to complete two blocks of the volatile reversal learning task, one in the reward context and the other in the aversive context”,

      and in lines:

      “A total of 79 participants completed tasks in both feedback contexts. Four participants only completed the task in the reward context, while three participants only completed the aversive task.”

      Point 1.6

      Line 167: "The participant received feedback only after choosing the correct stimulus and received nothing else" Is this correct? In Figure 1a it appears the participant receives feedback irrespective of the stimulus they chose, by either being shown the amount 1-99 they are being rewarded/shocked, or 0. Additionally, what does the "correct stimulus" refer to across the two feedback conditions? It seems intuitive that in the reward version, the correct answer would be the rewarding stimulus - in the loss version is the "correct" answer the one where they are not receiving a shock? 

      Thanks for raising this issue. We removed the term “correct stimulus” and revised the lines 162-166 accordingly:

      “Only one of the two stimuli was associated with actual feedback (0 for the other one). The feedback magnitude, ranged between 1-99, is sampled uniformly and independently for each shape from trial to trial. Actual feedback was delivered only if the stimulus associated with feedback was chosen; otherwise, a number “0” was displayed on the screen, signifying that the chosen stimulus returns nothing.”

      Point 1.7

      Line 176: "The whole experiment included two runs each for the two feedback conditions." Does this mean participants completed the stable and volatile blocks twice, for each feedback condition? (i.e., 8 blocks total, 4 per feedback condition). 

      Thanks! We have removed the term “block”, and now we refer to it as “context”. In particular, we removed phrases like “stable block” and “volatile block” and used “context” instead.

      Action: See lines 187-189 for the revised version.

      “Each participant was instructed to complete two runs of the volatile reversal learning task, one in the reward context and the other in the aversive context. Each run consisted of 180 trials, with 90 trials in the stable context and 90 in the volatile context (Fig. 1B).”

      Point 1.8

      In the expected utility (EU) strategy of the Mixture or Strategies model, the expected value of the stimulus on each trial is produced by multiplying the magnitude and probability of reward/shock. In Gagne et al.'s original paper, they found that an additive mixture of these components better-captured participant choice behaviour - why did the authors not opt for the same strategy here? 

      Thanks for asking this. Their strategy basic means the mixture of PF+MO+HA, where PF stands for the feedback probability (e.g., 0.3 or 0.7) without multiplying feedback magnitude. However, ours are EU+MO+HA, where EU stands for feedback probability x feedback magnitude. We did compare these two strategies and the model using their strategy performed much worse than ours (see the red box below).

      Author response image 1.

      Thorough model comparison.

      Point 1.9

      How did the authors account for individuals with poor/inattentive responding, my concern is that the habitual strategy may be capturing participants who did not adhere to the task (or is this impossible to differentiate?). 

      The current MOS6 model distinguishes between the HA strategy and the inattentive response. Due to the counterbalance design, the HA strategy requires participants to actively track the stimuli on the screen. In contrast, the inattentive responding, like the same motor movement mentioned in Point 1.4, should exhibit random selection in their behavioral data, which should be account by the inverse temperature parameter.

      Point 1.10

      The authors provide a clear rationale for, and description of, each of the computational models used to capture participant choice behaviour. 

      • Did the authors compare different combinations of strategies within the MOS model (e.g., only including one or two strategies at a time, and comparing fit?) I think more explanation is needed as to why the authors opted for those three specific strategies. 

      We appreciate this great advice. Following your advice, we conducted a thorough model comparisons. Please refer to Figure R1 above. The detailed text descriptions of all the models in Figure R1 are included in Supplemental Note 1.

      Point 1.11

      Please report the mean and variability of each of the strategy weights, per group. 

      Thanks. We updated the mean of variability of the strategies in lines 490-503:

      “We first focused on the fitted parameters of the MOS6 model. We compared the weight parameters (, , ) across groups and conducted statistical tests on their logits (, , ). The patient group showed a ~37% preference towards the EU strategy, which is significantly weaker than the ~50% preference in healthy controls (healthy controls’ : M = 0.991, SD = 1.416; patients’ : M = 0.196, SD = 1.736; t(54.948) = 2.162, p = 0.035, Cohen’s d = 0.509; Fig. 4A). Meanwhile, the patients exhibited a weaker preference (~27%) for the HA strategy compared to healthy controls (~36%) (healthy controls’ : M = 0.657,  SD = 1.313; patients’ : M = -0.162, SD = 1.561; t(56.311) = 2.455, p = 0.017, Cohen’s d = 0.574), but a stronger preference for the MO strategy (36% vs. 14%; healthy controls’ : M = -1.647,  SD = 1.930; patients’ : M = -0.034, SD = 2.091; t(63.746) = -3.510, p = 0.001, Cohen’s d = 0.801). Most importantly, we also examined the learning rate parameter in the MOS6 but found no group differences (t(68.692) = 0.690, p = 0.493, Cohen’s d = 0.151). These results strongly suggest that the differences in decision strategy preferences can account for the learning behaviors in the two groups without necessitating any differences in learning rate per se.”

      Point 1.12

      The authors compare the strategy weights of patients and controls and conclude that patients favour more simpler strategies (see Line 417), based on the fact that they had higher weights for the MO, and lower on the EU.

      (1) However, the finding that control participants were more likely to use the habitual strategy was largely ignored. Within the control group, were the participants significantly more likely to opt for the EU strategy, over the HA? 2) Further, on line 467 the authors state "Additionally, there was a significant correlation between symptom severity and the preference for the HA strategy (Pearson's r = -0.285, p = 0.007)." Apologies if I'm mistaken, but does this negative correlation not mean that the greater the symptoms, the less likely they were to use the habitual strategy?

      I think more nuance is needed in the interpretation of these results, particularly in the discussion. 

      Thanks. The healthy participants seemed more likely to opt for the EU strategy, although this difference did not reach significance (paired-t(53) = 1.258, p = 0.214, Cohen’s d = 0.242). We systematically explore the role of HA. Compared to the MO, the HA saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent: when healthier subjects run out of cognitive resources for the EU strategy, they will cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the negative symptom-HA correlation. As clever as the HA strategy is, it is not surprising that the health control participants opt more for the HA during decision-making.

      However, we are cautious to draw strong conclusion on (1) non-significant difference between EU and HA within health controls and (2) the negative symptom-HA correlation. The reason is that the MOS22, the context-dependent variant, 1) exhibited a significant higher preference for EU over HA (paired-t(53) = 4.070, p < 0.001, Cohen’s d = 0.825) and 2) did not replicate this negative correlation (Supplemental Information Figure S3).

      Action: Simulation analysis on the effects of HA was introduced in lines 556-595 and Figure 4. We discussed the effects of HA in lines 721-733:

      “Although many observed behavioral differences can be explained by a shift in preference from the EU to the MO strategy among patients, we also explore the potential effects of the HA strategy. Compared to the MO, the HA strategy also saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent (Gershman, 2020): when healthier participants exhaust their cognitive resources for the EU strategy, they may cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the stronger preference for the HA strategy in the HC group (Fig. 3A) and the negative correlation between HA preferences and symptom severity  (Fig. 5). Apart from shedding light on the cognitive impairments of patients, the inclusion of the HA strategy significantly enhances the model’s fit to human behavior (see examples in Daw et al. (2011); Gershman (2020); and also Supplemental Note 1 and Supplemental Figure S3).”

      Point 1.13

      Line 513: "their preference for the slowest decision strategy" - why is the MO considered the slowest strategy? Is it not the least cognitively demanding, and therefore, the quickest? 

      Sorry for the confusion. In Fig. 5C, we conducted simulations to estimate the learning speed for each strategy. As shown below, the MO strategy exhibits a flat learning curve. Our claim on the learning speed was based solely on simulation outcomes without referring to cognitive demands. Note that our analysis did not aim to compare the cognitive demands of the MO and HA strategies directly.

      Action: We explain the learning speed of the three strategies in lines 571-581.

      Point 1.14

      The authors argue that participants chose suboptimal strategies, but do not actually report task performance. How does strategy choice relate to the performance on the task (in terms of number of rewards/shocks)? Did healthy controls actually perform any better than the patient group? 

      Thanks for the suggestion. The answers are: 1) EU is the most rewarding > the HA > the MO (Fig. 5A), and 2) yes healthy controls did actually perform better than patients in terms of hit rate (Fig. 2).

      Action: We included additional sections on above analyses in lines 561-570 and lines 397-401.

      Point 1.15

      The authors speculate that Gagne et al. (2020) did not study the relationship between the decision process and anxiety and depression, because it was too complex to analyse. It's unclear why the FLR model would be too complex to analyse. My understanding is that the focus of Gagne's paper was on learning rate (rather than noise or risk preference) due to this being the main previous finding. 

      Thanks! Yes, our previous arguments are vague and confusing. We have removed all this kind of arguments.

      Point 1.16

      Minor Comments: 

      • Line 392: Modeling fitting > Model fitting 

      • Line 580 reads "The MO and HA are simpler heuristic strategies that are cognitively demanding."

      - should this read as less cognitively demanding? 

      • Line 517: health > healthy 

      • Line 816: Desnity > density 

      Sorry for the typo! They have all been fixed.

      Reviewer #2:

      Point 2.1

      Summary: Previous research shows that humans tend to adjust learning in environments where stimulus-outcome contingencies become more volatile. This learning rate adaptation is impaired in some psychiatric disorders, such as depression and anxiety. In this study, the authors reanalyze previously published data on a reversal-learning task with two volatility levels. Through a new model, they provide some evidence for an alternative explanation whereby the learning rate adaptation is driven by different decision-making strategies and not learning deficits. In particular, they propose that adjusting learning can be explained by deviations from the optimal decision-making strategy (based on maximizing expected utility) due to response stickiness or focus on reward magnitude. Furthermore, a factor related to the general psychopathology of individuals with anxiety and depression negatively correlated with the weight on the optimal strategy and response stickiness, while it correlated positively with the magnitude strategy (a strategy that ignores the probability of outcome). 

      Thanks for evaluating our paper. This is a good summary.

      Point 2.2

      My main concern is that the winning model (MOS6) does not have an error term (inverse temperature parameter beta is fixed to 8.804). 

      (1) It is not clear why the beta is not estimated and how were the values presented here chosen. It is reported as being an average value but it is not clear from which parameter estimation. Furthermore, with an average value for participants that would have lower values of inverse temperature (more stochastic behaviour) the model is likely overfitting.

      (2) In the absence of a noise parameter, the model will have to classify behaviour that is not explained by the optimal strategy (where participants simply did not pay attention or were not motivated) as being due to one of the other two strategies.

      We apologize for any confusion caused by our writing. We did set the inverse temperature as a free parameter and quantitatively estimate it during the model fitting and comparison. We also created a table to show the free parameters for each models. In the previous manuscript, we did mention “temperature parameter beta is fixed to 8.804”, but only for the model simulation part, which is conducted to interpret some model behaviors.

      We agree with the concern that using the averaged value over the inverse temperature could lead to overfitting to more stochastic behaviors. To mitigate this issue, we now used the median as a more representative value for the population during simulation. Nonetheless, this change does not affect our conclusion (see simulation results in Figures 4&6).

      Action: We now use the term “free parameter” to emphasize that the inverse temperature was fitted rather than fixed. We also create a new table “Table 1”  in line 458 to show all the free parameters within a model. We also update the simulation details in lines 363-391 for more clarifications.

      Point 2.3

      (3) A model comparison among models with inverse temperature and variable subsets of the three strategies (EU + MO, EU + HA) would be interesting to see. Similarly, comparison of the MOS6 model to other models where the inverse temperature parameter is fixed to 8.804).

      This is an important limitation because the same simulation as with the MOS model in Figure 3b can be achieved by a more parsimonious (but less interesting) manipulation of the inverse temperature parameter.

      Thanks, we added a comparison between the MOS6 and the two lesion models (EU + MO, EU + HA). Please refer to the figure below and Point 1.8.

      We also realize that the MO strategy could exhibit averaged learning curves similar to random selection. To confirm that patients' slower learning rates are due to a preference for the MO strategy, we compared the MOS6 model with a variant (see the red box below) in which the MO strategy is replaced by Random (RD) selection that assigns a 0.5 probability to both choices. This comparison showed that the original MOS6 model with the MO strategy better fits human data.

      Author response image 2.

      Point 2.4

      Furthermore, the claim that the EU represents an optimal strategy is a bit overstated. The EU strategy is the only one of the three that assumes participants learn about the stimulus-outcomes contingencies. Higher EU strategy utilisation will include participants that are more optimal (in maximum utility maximisation terms), but also those that just learned better and completely ignored the reward magnitude.

      Thank you for your feedback. We have now revised the paper to remove all statement about “EU strategy is the optimal” and replaced by “EU strategy is rewarding but complex”. We agree that both the EU strategy and the strategy only focusing on feedback probability (i.e., ignoring the reward magnitude, refer to as the PF strategy) are rewarding but complex beyond two simple heuristics. We also included the later strategy in our model comparisons (see the next section Point 2.5).

      Point 2.5

      The mixture strategies model is an interesting proposal, but seems to be a very convoluted way to ask: to what degree are decisions of subjects affected by reward, what they've learned, and response stickiness? It seems to me that the same set of questions could be addressed with a simpler model that would define choice decisions through a softmax with a linear combination of the difference in rewards, the difference in probabilities, and a stickiness parameter. 

      Thanks for suggesting this model. We did include the proposed linear combination models (see “linear comb.” in the red box below) and found that it performed significantly worse than the MOS6.

      Action: We justified our model selection criterion in the Supplemental Note 1.

      Author response image 3.

      Point 2.6

      Learning rate adaptation was also shown with tasks where decision-making strategies play a less important role, such as the Predictive Inference task (see for instance Nassar et al, 2010). When discussing the merit of the findings of this study on learning rate adaptation across volatility blocks, this work would be essential to mention. 

      Thanks for mentioning this great experimental paradigm, which provides an ideal solution for disassociating the probability learning and decision process. We have discussed about this paradigm as well as the associated papers in discussion lines 749-751, 763-765, and 796-801.

      Point 2.7

      Minor mistakes that I've noticed:

      Equation 6: The learning rate for response stickiness is sometimes defined as alpha_AH or alpha_pi.

      Supplementary material (SM) Contents are lacking in Note1. SM talks about model MOS18, but it is not defined in the text (I am assuming it is MOS22 that should be talked about here).

      Thanks! Fixed.

      Reviewer #3:

      Point 3.1

      Summary: This paper presents a new formulation of a computational model of adaptive learning amid environmental volatility. Using a behavioral paradigm and data set made available by the authors of an earlier publication (Gagne et al., 2020), the new model is found to fit the data well. The model's structure consists of three weighted controllers that influence decisions on the basis of (1) expected utility, (2) potential outcome magnitude, and (3) habit. The model offers an interpretation of psychopathology-related individual differences in decision-making behavior in terms of differences in the relative weighting of the three controllers.

      Strengths: The newly proposed "mixture of strategies" (MOS) model is evaluated relative to the model presented in the original paper by Gagne et al., 2020 (here called the "flexible learning rate" or FLR model) and two other models. Appropriate and sophisticated methods are used for developing, parameterizing, fitting, and assessing the MOS model, and the MOS model performs well on multiple goodness-of-fit indices. The parameters of the model show decent recoverability and offer a novel interpretation for psychopathology-related individual differences. Most remarkably, the model seems to be able to account for apparent differences in behavioral learning rates between high-volatility and low-volatility conditions even with no true condition-dependent change in the parameters of its learning/decision processes. This finding calls into question a class of existing models that attribute behavioral adaptation to adaptive learning rates. 

      Thanks for evaluating our paper. This is a good summary.

      Point 3.2<br /> (1) Some aspects of the paper, especially in the methods section, lacked clarity or seemed to assume context that had not been presented. I found it necessary to set the paper down and read Gagne et al., 2020 in order to understand it properly.

      (3) Clarification-related suggestions for the methods section: <br /> - Explain earlier that there are 4 contexts (reward/shock crossed with high/low volatility). Lines 252-307 contain a number of references to parameters being fit separately per context, but "context" was previously used only to refer to the two volatility levels. 

      Action: We have placed the explanation as well as the table about the 4 contexts (stable-reward/stable-aversive/volatile-reward/volatile-aversive) earlier in the section that introduces the experiment paradigm (lines 177-186):

      “Participants was supposed to complete this learning and decision-making task in four experimental contexts (Fig. 1A), two feedback contexts (reward or aversive)  two volatility contexts (stable or volatile). Participants received points in the reward context and an electric shock in the aversive context. The reward points in the reward context were converted into a monetary bonus by the end of the task, ranging from £0 to £10. In the stable context, the dominant stimulus (i.e., a certain stimulus induces the feedback with a higher probability) provided a feedback with a fixed probability of 0.75, while the other one yielded a feedback with a probability of 0.25. In the volatile context, the dominant stimulus’s feedback probability was 0.8, but the dominant stimulus switched between the two every 20 trials. Hence, this design required participants to actively learn and infer the changing stimulus-feedback contingency in the volatile context.”

      - It would be helpful to provide an initial outline of the four models that will be described since the FLR, RS, and PH models were not foreshadowed in the introduction. For the FLR model in particular, it would be helpful to give a narrative overview of the components of the model before presenting the notation. 

      Action: We now include an overview paragraph in the section of computation model to outline the four models as well as the hypotheses constituted in the model (lines 202-220).  

      - The subsection on line 343, describing the simulations, lacks context. There are references to three effects being simulated (and to "the remaining two effects") but these are unclear because there's no statement in this section of what the three effects are.

      - Lines 352-353 give group-specific weighting parameters used for the stimulations of the HC and PAT groups in Figure 4B. A third, non-group-specific set of weighting parameters is given above on lines 348-349. What were those used for?

      - Line 352 seems to say Figure 4A is plotting a simulation, but the figure caption seems to say it is plotting empirical data. 

      These paragraphs has been rewritten and the abovementioned issues have been clarified. See lines 363-392.

      Point 3.2

      (2) There is little examination of why the MOS model does so well in terms of model fit indices. What features of the data is it doing a better job of capturing? One thing that makes this puzzling is that the MOS and FLR models seem to have most of the same qualitative components: the FLR model has parameters for additive weighting of magnitude relative to probability (akin to the MOS model's magnitude-only strategy weight) and for an autocorrelative choice kernel (akin to the MOS model's habit strategy weight). So it's not self-evident where the MOS model's advantage is coming from.

      An intuitive understanding of the FLR model is that it estimates the stimuli value through a linear combination of probability feedback (PF, )and (non-linear) magnitude .See equation:

      Also, the FLR model include the mechanisms of HA as:

      In other words, FLR model considers the mechanisms about the probability of feedback (PF)+MO+HA (see Eq. XX in the original study), but our MOS considers the mechanisms of EU+MO+HA. The key qualitative difference lies between FLR and MOS is the usage of the expected utility formula (EU) instead the probability of feedback (PF). The advantage of our MOS model has been fully evidenced by our model comparisons, indicating that human participants multiply probability and magnitude rather than only considering probability. The EU strategy has also been suggested by a large pile of literature (Gershman et al., 2015; Von Neumann & Morgenstern, 1947).

      Making decisions based on the multiplication of feedback probability and magnitude can often yield very different results compared to decisions based on a linear combination of the two, especially when the two magnitudes have a small absolute difference but a large ratio. Let’s consider two cases:

      (1) Stimulus 1: vs. Stimulus 2:

      (2) Stimulus 1: vs. Stimulus 2:

      The EU strategy may opt for stimulus 2 in both cases, since stimulus 2 always has a larger expected value. However, it is very likely for the PF+MO to choose stimulus 1 in the first case. For example, when .  If we want the PF+MO to also choose stimulus to align with the EU strategy, we need to increase the weight on magnitude . Note that in this example we divided the magnitude value by 100 to ensure that probability and magnitude are on the same scale to help illustration.

      In the dataset reported by Gagne, 2020, the described scenario seems to occur more often in the aversive context than in the reward context. To accurately capture human behaviors, FLR22 model requires a significantly larger weight for magnitude in the aversive context than in the reward context . Interestingly, when the weights for magnitude in different contexts are forced to be equal, the model (FLR6) fails, exhibiting an almost chance-level performance throughout learning (Fig. 3E, G). In contrast, the MOS6 model, and even the RS3 model, exhibit good performance using one identical set of parameters across contexts. Both MOS6 and RS3 include the EU strategy during decision-making. These findings suggest humans make decisions using the EU strategy rather than PF+MO.

      The focus of our paper is to present that a good-enough model can interpret the same dataset in a completely different perspective, not necessarily to explore improvements for the FLR model.

      Point 3.3

      One of the paper's potentially most noteworthy findings (Figure 5) is that when the FLR model is fit to synthetic data generated by the expected utility (EU) controller with a fixed learning rate, it recovers a spurious difference in learning rate between the volatile and stable environments. Although this is potentially a significant finding, its interpretation seems uncertain for several reasons: 

      - According to the relevant methods text, the result is based on a simulation of only 5 task blocks for each strategy. It would be better to repeat the simulation and recovery multiple times so that a confidence interval or error bar can be estimated and added to the figure. 

      - It makes sense that learning rates recovered for the magnitude-oriented (MO) strategy are near zero, since behavior simulated by that strategy would have no reason to show any evidence of learning. But this makes it perplexing why the MO learning rate in the volatile condition is slightly positive and slightly greater than in the stable condition. 

      - The pure-EU and pure-MO strategies are interpreted as being analogous to the healthy control group and the patient group, respectively. However, the actual difference in estimated EU/MO weighting between the two participant groups was much more moderate. It's unclear whether the same result would be obtained for a more empirically plausible difference in EU/MO weighting. 

      - The fits of the FLR model to the simulated data "controlled all parameters except for the learning rate parameters across the two strategies" (line 522). If this means that no parameters except learning rate were allowed to differ between the fits to the pure-EU and pure-MO synthetic data sets, the models would have been prevented from fitting the difference in terms of the relative weighting of probability and magnitude, which better corresponds to the true difference between the two strategies. This could have interfered with the estimation of other parameters, such as learning rate. 

      - If, after addressing all of the above, the FLR model really does recover a spurious difference in learning rate between stable and volatile blocks, it would be worth more examination of why this is happening. For example, is it because there are more opportunities to observe learning in those blocks?

      I would recommend performing a version of the Figure 5 simulations using two sets of MOS-model parameters that are identical except that they use healthy-control-like and patient-like values of the EU and MO weights (similar to the parameters described on lines 346-353, though perhaps with the habit controller weight equated). Then fit the simulated data with the FLR model, with learning rate and other parameters free to differ between groups. The result would be informative as to (1) whether the FLR model still misidentifies between-group strategy differences as learning rate differences, and (2) whether the FLR model still identifies spurious learning rate differences between stable and volatile conditions in the control-like group, which become attenuated in the patient-like group. 

      Many thanks for this great advice. Following your suggestions, we now conduct simulations using the median of the fitted parameters. The representations for healthy controls and patients have identical parameters, except for the three preference parameters; moreover, the habit weights are not controlled to be equal. 20 simulations for each representative, each comprising 4 task sequences sampled from the behavioral data. In this case, we could create error bars and perform statistical tests. We found that the differences in learning rates between stable and volatile conditions, as well as the learning rate adaptation differences between healthy controls and patients, still persisted.

      Combined with the discussion in Point 3.2, we justify why a mixture-of-strategy can account for learning rate adaptation as follow. Due to (unknown) differences in task sequences, the MOS6 model exhibits more MO-like behaviors due to the usage of the EU strategy. To capture this behavior pattern, the FLR22 model has to increase its weighting parameter 1-λ for magnitude, which could ultimately drive the FLR22 to adjust the fitted learning rate parameters, exhibiting a learning rate adaptation effect. Our simulations suggest that estimating learning rate just by model fitting may not be the only way to interpret the data.

      Action: We included the simulation details in the method section (lines 381-lines 391)

      “In one simulated experiment, we sampled the four task sequences from the real data. We simulated 20 experiments with the parameters of to mimic the behavior of the healthy control participants. The first three are the median of the fitted parameters across all participants; the latter three were chosen to approximate the strategy preferences of real health control participants (Figure 4A). Similarly, we also simulated 20 experiments for the patient group with the identical values of , and , but different strategy preferences   . In other words, the only difference in the parameters of the two groups is the switched and . We then fitted the FLR22 to the behavioral data generated by the MOS6 and examined the learning rate differences across groups and volatile contexts (Fig. 6). ”

      Point 3.4

      Figure 4C shows that the habit-only strategy is able to learn and adapt to changing contingencies, and some of the interpretive discussion emphasizes this. (For instance, line 651 says the habit strategy brings more rewards than the MO strategy.) However, the habit strategy doesn't seem to have any mechanism for learning from outcome feedback. It seems unlikely it would perform better than chance if it were the sole driver of behavior. Is it succeeding in this example because it is learning from previous decisions made by the EU strategy, or perhaps from decisions in the empirical data?

      Yes, the intuition is that the HA strategy seems to show no learning mechanism. But in reality, it yields a higher hit rate than MO by simply learning from previous decisions made by the EU strategy. We run simulations to confirm this (Figure 4B).

      Point 3.5

      For the model recovery analysis (line 567), the stated purpose is to rule out the possibility that the MOS model always wins (line 552), but the only result presented is one in which the MOS model wins. To assess whether the MOS and FLR models can be differentiated, it seems necessary also to show model recovery results for synthetic data generated by the FLR model. 

      Sure, we conducted a model recovery analysis that include all models, and it demonstrates that MOS and FLR can be fully differentiated. The results of the new model recovery analysis were shown in Fig. 7.

      Point 3.6

      To the best of my understanding, the MOS model seems to implement valence-specific learning rates in a qualitatively different way from how they were implemented in Gagne et al., 2020, and other previous literature. Line 246 says there were separate learning rates for upward and downward updates to the outcome probability. That's different from using two learning rates for "better"- and "worse"-than-expected outcomes, which will depend on both the direction of the update and the valence of the outcome (reward or shock). Might this relate to why no evidence for valence-specific learning rates was found even though the original authors found such evidence in the same data set? 

      Thanks. Following the suggestion, we have corrected our implementation of valence-specific learning rate in all models (see lines 261-268).

      “To keep consistent with Gagne et al., (2020), we also explored the valence-specific learning rate,

      is the learning rate for better-than-expected outcome, and for worse-than-expected outcome. It is important to note that Eq. 6 was only applied to the reward context, and the definitions of “better-than-expected” and “worse-than-expected” should change accordingly in the aversive context, where we defined for and for .

      No main effect of valence on learning rate was found (see Supplemental Information Note 3)

      Point 3.7

      The discussion (line 649) foregrounds the finding of greater "magnitude-only" weights with greater "general factor" psychopathology scores, concluding it reflects a shift toward simplifying heuristics. However, the picture might not be so straightforward because "habit" weights, which also reflect a simplifying heuristic, correlated negatively with the psychopathology scores. 

      Thanks. In contrast the detrimental effects of “MO”, “habit” is actually beneficial for the task. Please refer to Point 1.12.

      Point 3.8

      The discussion section contains some pejorative-sounding comments about Gagne et al. 2020 that lack clear justification. Line 611 says that the study "did not attempt to connect the decision process to anxiety and depression traits." Given that linking model-derived learning rate estimates to psychopathology scores was a major topic of the study, this broad statement seems incorrect. If the intent is to describe a more specific step that was not undertaken in that paper, please clarify. Likewise, I don't understand the justification for the statement on line 615 that the model from that paper "is not understandable" - please use more precise and neutral language to describe the model's perceived shortcomings. 

      Sorry for the confusion. We have removed all abovementioned pejorative-sounding comments.

      Point 3.9

      4. Minor suggestions: 

      - Line 114 says people with psychiatric illness "are known to have shrunk cognitive resources" - this phrasing comes across as somewhat loaded. 

      Thanks. We have removed this argument.

      - Line 225, I don't think the reference to "hot hand bias" is correct. I understand hot hand bias to mean overestimating the probability of success after past successes. That's not the same thing as habitual repetition of previous responses, which is what's being discussed here. 

      Response: Thanks for mentioning this. We have removed all discussions about “hot hand bias”.

      - There may be some notational inconsistency if alpha_pi on line 248 and alpha_HA on line 253 are referring to the same thing. 

      Thanks! Fixed!

      - Check the notation on line 285 - there may be some interchanging of decimals and commas.

      Thanks! Fixed!

      Also, would the interpretation in terms of risk seeking and risk aversion be different for rewarding versus aversive outcomes? 

      Thanks for asking. If we understand it correctly, risk seeking and risk aversion mechanisms are only present in the RS models, which show clearly worse fitting performance. We thus decide not to overly interpret the fitted parameters in the RS models.

      - Line 501, "HA and PAT groups" looks like a typo. 

      - In Figure 5, better graphical labeling of the panels and axes would be helpful. 

      Response: Thanks! Fixed!

      REFERENCES

      Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans' choices and striatal prediction errors. Neuron, 69(6), 1204-1215.

      Gagne, C., Zika, O., Dayan, P., & Bishop, S. J. (2020). Impaired adaptation of learning to contingency volatility in internalizing psychopathology. Elife, 9.

      Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394.

      Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273-278.

      Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior, 2nd rev.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      We now further discuss these issues and the perspectives offered by future genomic work on lines 462-485.  

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through hostswitching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version does not necessarily fail on our dataset due to its size: ALE runs, but it provides unrealistic parameter estimates and is not able to output possible reconciliations, as mentioned in our Material and Methods section. We think this issue is mostly due to the fact that there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent switches is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. We now ran the dated version of ALE independently on the smaller alpha and betacoronaviruses datasets. It still fails on the betacoronaviruses dataset.  On the alphacoronaviruses dataset, it does output significant reconciliations, however these reconciliations have a majority of events of transfers and losses, confirming that codiversification is unlikely in this clade.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that crossspecies transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host switches involve unsampled intermediate hosts. To address the reviewer's comment, we now better underline the importance of sampling biases in our main text (see Discussion, lines 487-494) with supporting references (note that we did not find the Cohen et al. Nature Comm reference). We also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text. 

      We agree that distinguishing between alpha and beta coronaviruses provides useful additional insights. We have run separate cophylogenetic analyses for these two sub-clades and now report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that is now cited. 

      Reviewer #1 (Recommendations For The Authors):

      (1) Overall I found this paper to be quite difficult to follow. The text needs clearer structure, which can be helped by writing in shorter paragraphs and adding section headings. For example, there are some very long paragraphs starting on L83, L176, L215, L511, and L598.

      We have now added section headings and divided these paragraphs into smaller ones.

      (2) It would be helpful to define some of the key terminology relating to the evolutionary interactions between the viruses and their hosts. Some of the terms that are typically used in the context include "coevolution", "cospeciation", "codivergence", and "codiversification". These have different meanings and need to be used carefully. The paper mostly deals with "codivergence" between coronaviruses and their host species.

      We now provide a list of definitions in Box S1. These definitions are as in our recent article clarifying the differences between these patterns/processes (Perez-Lamarque & Morlon 2024).

      Specific comments

      L83-L105: This paragraph can be written more concisely.

      We prefer to keep this paragraph like this as it contains key explanations that are necessary for understanding our approach and results.  

      Figure 1: The timescales of the trees are rather confusing. The different scales are indicated by the gray shading but this is easy to overlook. Maybe stretching or compressing the trees horizontally would help to emphasise the different timescales.

      Done.

      Figure 2: Note that the maximum clade credibility tree is a specific tree sampled from the posterior distribution - it is not a consensus tree. In the figure caption, the meaning of "location" is unclear.

      We have removed the word “consensus”, thank you for noting this. We have replaced “location” by “branching order”. 

      L461: How was the model chosen, and why were different models used in the BEAST and PhyloBayes analyses?

      We did our PhyloBayes analyses first and used the LG model following methodology outlined in previous studies using ALE (e.g. Groussin et al. 2017; Dorrell et al. 2021). Unfortunately, the LG model is not available in the default version of BEAST2 so we had to use a different model (the WAG model). We have now run BEAST2 with the LG model (thanks to the BEAST_CLASSIC package) and we obtained very similar results (see Figure below showing the BEAST consensus trees obtained with the WAG or LG models – they only slightly differ by the branching of the u7351 OTU). We have now added this information in the Methods section. 

      Author response image 1.

      L477: It is not clear to me how the PhyloBayes and BEAST analyses differ. Please expand the explanation of why PhyloBayes was used here.

      We have now clarified this (lines 594-597). 

      L568: Why not test explicitly for recombination?

      We did test for the occurrence of recombination using several approaches, including

      OpenRDP (https://github.com/PoonLab/OpenRDP), our own custom code, and Gubbins (Croucher et al. 2015). These tests were however inconclusive, indicating either the absence or presence of recombination, thus suggesting that the palmprint region is too short to infer anything about recombination. We thus do not exclude the possibility that recombination occurred, and test the robustness of our results to recombination by running our analyses on different sub-parts of the palmprint region. We have clarified this in our Material & Methods.

      L618: "DNA sequences" -> "RNA sequences"

      Done.

      The paper contains numerous minor grammatical errors and would benefit from careful proofreading and editing. Please check the use of plurals and apostrophes. Some of the errors are listed below:

      L49: "As several" -> "As with several"

      Done.

      L178: "reconciliates" -> "reconciles"?

      Done.

      L199: "extent" -> "extant"

      Done.

      L289: This sentence needs rephrasing to avoid a triple negative ("cannot ... reject ... not present")

      Done.

      L469: "temporary" -> "temporal"

      Done.

      L470: "neglectable" -> "negligible"

      Done.

      L577: "not only relying" -> "not relying only"

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The study is generally well-constructed and its results are convincing. However, considering the availability of a dated host tree, conducting a dated reconciliation analysis could be beneficial. Creating a smaller sub-dataset and performing a dated reconciliation analysis would likely be a valuable addition to the research.

      We have now run the dated version of ALE on both the alpha and betacoronaviruses subclades. ALE dated still does not output reconciliations on the betacoronaviruses dataset, but it does on the smaller alphacoronaviruses dataset. We found significant reconciliations, indicating that mammal-alphacoronavirus associations are not random with respect to phylogeny, but the reconciliations involved more host switch and loss events (38 switches + 29 losses) than cospeciation events (65), indicating cophylogenetic signal in the absence of phylogenetic congruence (Perez-Lamarque & Morlon 2024). We now present the results on lines 264-282.  

      Reviewer #3 (Recommendations For The Authors):

      I think the results are written in a very speculative way, with many sentence fragments that should really be part of the discussion.

      We have carefully checked our Results section and rephrased or removed formulation that may have been perceived as speculative.  

      There are a lot of considerations in this manuscript about spread and future pandemics, but I think this is very far from the topic of this paper. When we quantified the coevolutionary risk of bats-betacovs in a recent paper (Forero et al. 2024, Virus Evol.), we only briefly touched upon this discussion because we compared our outputs with a measure of human population density. I don't think the manuscript needs to talk about epidemiology at all, and it would probably be more useful as a purely evo-bio piece.

      We think that it is useful to discuss the potential implications of our results for future pandemics, even though we agree that this discussion is rather speculative. We have removed the mention of predictions in the Abstract and have softened our wording in the Discussion.  

      References:

      Croucher, N.J., Page, A.J., Connor, T.R., Delaney, A.J., Keane, J.A., Bentley, S.D., et al. (2015). Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res., 43, e15.

      Dorrell, R.G., Villain, A., Perez-Lamarque, B., Audren de Kerdrel, G., McCallum, G., Watson, A.K., et al. (2021). Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes. Proc. Natl. Acad. Sci., 118, e2009974118.

      Edgar, R.C. et al. (2022). Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147.

      Groussin, M., Mazel, F., Sanders, J.G., Smillie, C.S., Lavergne, S., Thuiller, W., et al. (2017).

      Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nat. Commun., 8, 14319.

      Perez-Lamarque, B. & Morlon, H. (2024). Distinguishing cophylogenetic signal from phylogenetic congruence clarifies the interplay between evolutionary history and species interactions. Syst. Biol.

    2. eLife assessment

      Maestri et al report the absence of phylogenetic evidence supporting codiversification of mammalian coronaviruses and their hosts, leading to the important conclusion that the evolutionary history of the virus and its hosts are decoupled through frequent host switches. The evidence for frequent host switching, derived from state-of-the-art probabilistic modeling of co-evolution, is convincing. The study adds a new perspective to the ongoing debate over the timescale of coronavirus evolution.

    3. Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

    4. Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.<br /> The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.<br /> In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      The authors could only use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset. The authors did attempt to address this issue in the revision, albeit with limited success.

    1. eLife assessment

      In this useful study, the authors tested the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes using modeling and behavioral experiments, claiming that bumblebees rely most on ground-views for homing. However, due to a lack of analysis of the bees' behavior during training and a lack of information as to how the homing behavior of bees develops over time, the evidence supporting their claims is currently incomplete. Moreover, there was concern that the experimental environment was not representative of natural scenes, thus limiting the findings of the study.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented. 

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

    3. Reviewer #2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.<br /> The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:<br /> line 51: "Snapshot models perform best with bird's eye views";<br /> line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it.";<br /> line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:<br /> Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      Behavioural analysis:<br /> The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).<br /> In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

    4. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. Reviewer #1 (Public Review):

      Summary:

      Das and Menon describe an analysis of a large open-source iEEG dataset (UPENN-RAM). From encoding and recall phases of memory tasks, they analyzed power and phase-transfer entropy as a measure of directed information flow in regions across a hypothesized tripartite network system. The anterior insula (AI) was found to have heightened high gamma power during encoding and retrieval, which corresponded to suppression of high gamma power in the posterior cingulate cortex (PCC) during encoding but not recall. In contrast, directed information flow from (but not to) AI to mPFC/PCC and dorsal posterior parietal/middle frontal cortex is high during both time periods when PTE is analyzed with broadband but not narrowband activity. They claim that these findings significantly advance an understanding of how network communication facilitates cognitive operations such as control over memory and that the AI of the salience network (SN) is responsible for governing the switch between the frontoparietal network (FPN) and default-mode network (DMN) when shifting between externally- and internally-driven processing.

      I find this question interesting and important and agree with the authors that iEEG presents a unique opportunity to investigate the temporal dynamics within network nodes. However, I am not convinced that their claims are supported by the results currently presented. In particular, the fact that network-level communication is not modulated significantly compared to rest and does not relate to behavior suggests that PTE analyses may not be tapping into task-relevant communication. Moreover, dissociation of network effects - present during both encoding and recall - from local power suppression effects - present only during encoding - suggests that these sets of results may index separate and not unitary task processes.

      Strengths:

      - The authors present results from an impressively sized iEEG sample. For reader context, this type of invasive human data is difficult and time-consuming to collect and many similar studies in high-level journals include 5-20 participants, typically not all of whom have electrodes in all regions of interest. It is excellent that they have been able to leverage open-source data in this way.

      - Preprocessing of iEEG data also seems sensible and appropriate based on field standards.

      - The authors tackle the replication issues inherent in much of the literature by replicating findings across task contexts, demonstrating that the principles of network communication evidenced by their results generalize in multiple task memory contexts. Again, the number of iEEG patients who have multiple tasks' worth of data is impressive.

      Weaknesses:

      • The motivation for investigating the tripartite network during memory tasks is not currently well-elaborated. Though the authors mention, for example, that "the formation of episodic memories relies on the intricate interplay between large-scale brain networks (p. 4)", there are no citations provided for this statement, and the reader is unable to evaluate whether the nodes and networks evidenced to support these processes are the same as networks measured here.

      • In addition, though the tripartite network has been proposed to support cognitive control processes, and the neural basis of cognitive control is the framed focus of this work, the authors do not demonstrate that they have measured cognitive control in addition to simple memory encoding and retrieval processes. Tasks that have investigated cognitive control over memory (such as those cited on p. 13 - Badre et al., 2005; Badre & Wagner, 2007; Wagner et al., 2001; Wagner et al., 2005) generally do not simply include encoding, delay, and recall (as the tasks used here), but tend to include some manipulation that requires participants to engage control processes over memory retrieval, such as task rules governing what choice should be made at recall (e.g., from Badre et al., 2005 Fig. 1: congruency of match, associative strength, number of choices, semantic similarity). Moreover, though there are task-responsive signatures in the nodes of the tripartite networks, concluding that cognitive control is present because cognitive control networks are active would be a reverse inference.

      • It is currently unclear if the directed information flow from AI to DMN and FPN nodes truly arises from task-related processes such as cognitive control or if it is a function of static brain network characteristics constrained by anatomy (such as white matter connection patterns, etc.). This is a concern because the authors did not find that influences of AI on DMN or FPN are increased relative to a resting baseline (collected during the task) or that directed information flow differs in successful compared to unsuccessful retrieval. I doubt that this AI influence is 1) supporting a switch between the DMN and FPN via the SN or 2) relevant for behavior if it doesn't differ from baseline-active task or across accuracy conditions. An additional comparison that may help investigate whether this is reflective of static connectivity characteristics would be a baseline comparison during non-task rest or sleep periods.

      • Related to the above concern, it is also questionable how directed information flow from AI facilitates switching between FPN and DMN during both encoding and recall if high gamma activity does not significantly differ in AI versus PCC or mPFC during recall as it does during encoding. It seems erroneous to conclude that the network-level communication is happening or happening with the same effect during both task time points when these effects are decoupled in such a way from the power findings.

      • Missing information about the methods used for time-frequency conversion for power calculation and the power normalization/baseline-correction procedure bars a thorough evaluation of power calculation methods and results.

      If revisions to the manuscript can address concerns about directed information flow possibly being due to anatomical constraints - such as by indicating that directed information flow is not present during non-task rest or sleep - this work may convey important information about the structure and order of communication between these networks during attention to tasks in general. However, the ability of the findings to address cognitive control-specific communication and the nature of neurophysiological mechanisms of this communication - as opposed to the temporal order and structure of recruited networks - may be limited.

      Because phase-transfer entropy is presented as a "causal" analysis in this investigation (PTE), I also believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (admittedly, myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and the imprecise use of "causation" may be confusing. The authors could consider amending language regarding this analysis if they are concerned about bridging these definitions of causality across a wide audience.

    2. eLife assessment

      In this manuscript, the authors present valuable findings on the apparent role of a salience-network anterior insula node in directing fronto-parietal and default-mode network activity within a tripartite network during control of memory, drawn from an impressive invasive human neurophysiological dataset. While we commend the use of a large intracranial EEG dataset to approach this question, the study at present is incomplete in its methodologies, analysis, and interpretation to support the authors' central claims. The manuscript could be improved by addressing the concerns described.

    3. Reviewer #2 (Public Review):

      In this study, the authors leverage a large public dataset of intracranial EEG (the University of Pennsylvania RAM repository) to examine electrophysiologic network dynamics involving the participation of salience, frontoparietal, and default mode networks in the completion of several episodic memory tasks. They do this through a focus on the anterior insula (AI; salience network), which they hypothesize may help switch engagement between the DMN and FPN in concert with task demands. By analyzing high-gamma spectral power and phase transfer entropy (PTE; a putative measure of information "flow"), they show that the AI shows higher directed PTE towards nodes of both the DMN and FPN, during encoding and recall, across multiple tasks. They further demonstrate that high-gamma power in the PCC/precuneus is decreased relative to the AI during memory encoding. They interpret these results as evidence of "triple-network" control processes in memory tasks, governed by a key role of the AI.

      I commend the authors on leveraging this large public dataset to help contextualize network models of brain function with electrophysiological mechanisms - a key problem in much of the fMRI literature. I also appreciate that the authors emphasized replicability across multiple memory tasks, in an effort to demonstrate conserved or fundamental mechanisms that support a diversity of cognitive processes. However, I believe that their strong claims regarding causal influences within circumscribed brain networks cannot be supported by the evidence as presented. In my efforts to clearly communicate these inadequacies, I will suggest several potential analyses for the authors to consider that might better link the data to their central hypotheses.

      (1) As a general principle, the effects that the authors show - both in regards to their high-gamma power analysis and PTE analysis - do not offer sufficient specificity for a reader to understand whether these are general effects that may be repeated throughout the brain, or whether they reflect unique activity to the networks/regions that are laid out in the Introduction's hypothesis. This lack of specificity manifests in several ways, and is best communicated through examples of control analyses.

      First, the PTE analysis is focused solely on the AI's interactions with nodes of the DMN and FPN; while it makes sense to focus on this putative "switch" region, the fact that the authors report significant PTE from the AI to nodes of both networks, in encoding and retrieval, across all tasks and (crucially) also at baseline, raises questions about the meaningfulness of this statistic. One way to address this concern would be to select a control region that would be expected to have little/no directed causal influence on these networks and repeat the analysis. Alternatively (or additionally), the authors could examine the time course of PTE as it evolves throughout an encoding/retrieval interval, and relate that to the timing of behavioral events or changes in high-gamma power. This would directly address an important idea raised in their own Discussion, "the AI is well-positioned to dynamically engage and disengage with other brain areas."

      Second, the authors state that high-gamma suppression in the PCC/precuneus relative to the AI is an anatomically specific signature that is not present in the FPN. This claim does not seem to be supported by their own evidence as presented in the Supplemental Data (Figures S2 and S3), which to my eye show clear evidence of relative suppression in the MFG and dPPC (e.g. S2a and S3a, most notably) which are notated as "significant" with green bars. I appreciate that the magnitude of this effect may be greater in the PCC/precuneus, but if this is the claim it should be supported by appropriate statistics and interpretation.

      (2) I commend the authors on emphasizing replicability, but I found their Bayes Factor (BF) analysis to be difficult to interpret and qualitatively inconsistent with the results that they show. For example, the authors state that BF analysis demonstrates "high replicability" of the gamma suppression effect in Figure 3a with that of 3c and 3d. While it does appear that significant effects exist across all three tasks, the temporal structure of high gamma signals appears markedly different between the two in ways that may be biologically meaningful. Moreover, it appears that the BF analysis did not support replicability between VFR and CATVFR, which is very surprising; these are essentially the same tasks (merely differing in the presence of word categories) and would be expected to have the highest degree of concordance, not the lowest. I would suggest the authors try to analytically or conceptually reconcile this surprising finding.

      To aid in interpretability, it would be extremely helpful for the authors to assess across-task similarity in high-gamma power on a within-subject basis, which they are well-powered to do. For example, could they report the correlation coefficient between HGP timecourses in paired-associates versus free-recall tasks, to better establish whether these effects are consistent on a within-subject basis? This idea could similarly be extended to the PTE analysis. Across-subject correlations would also be a welcome analysis that may provide readers with better-contextualized effect sizes than the output of a Bayes Factor analysis.

    1. Reviewer #1 (Public Review):

      Summary:

      The investigation delves into allosteric modulation within the glycosylated SARS-CoV-2 spike protein, focusing on the fatty acid binding site. This study uncovers intricate networks connecting the fatty acid site to crucial functional regions, potentially paving the way for developing innovative therapeutic strategies.

      Strengths:

      This article's key strength lies in its rigorous use of dynamic nonequilibrium molecular dynamics (D-NEMD) simulations. This approach provides a dynamic perspective on how the fatty acid binding site influences various functional regions of the spike. A comprehensive understanding of these interactions is crucial in deciphering the virus's behavior and identifying potential targets for therapeutic intervention.

      Weaknesses:

      The presented evidence is compelling but could be better if this study is supported with sequence analysis to facilitate a complete view of the allosteric networks. The thorough analysis of the simulation results is partially aligned with the discussion because observed in the replicates and the monomers an asymmetry in the perturbations generated by D-NEMD, even when we're using 210 nonequilibrium MD of 10 ns. While the authors claim that the strategy used in this article has been previously validated, the complexity of the spike and the interactions analyzed have required a robust statistical analysis, which is not shown quantitatively. The investigation examines the allosteric modulation within the glycosylated SARS-CoV-2 spike protein, emphasizing the significance of the fatty acid binding site in influencing the structural dynamics and communication pathways essential for viral function, potentially facilitating the development of novel therapeutic strategies. The presented evidence is compelling but needs to be supported by sequence analysis, which will facilitate understanding of the scientific community.

      Minor considerations:

      Figure S3 shows a discrepancy in the presentation of residue values S325 in the plots of Chains A, B, and C. While chain A shows a value near 0.1, chains B and C plots do not have any value.

      Please explain why the plots of figures S6, S7, and S8 show significant changes in several regions, such as RBM and Furin Site. Can these changes be explained?

      The flow of the allosteric interaction is complex to visualize just by looking at structures. Could you please include a diagram showing the flow of allosteric interactions (in a sequence diagram or using the structure of the protein)? Or could you include a vector showing how the perturbation done in the FA Active site takes contact with other relevant regions of the Spike protein?

    2. Reviewer #3 (Public Review):

      Summary:

      In a previous study, the authors analyzed the dynamics of the SARS-CoV2 spike protein through lengthy MD simulations and an out-of-equilibrium sampling scheme. They identified an allosteric interaction network linking a lipid-binding site to other structurally important regions of the spike. However, this study was conducted without considering the impact of glycans. It is now known that glycans play a crucial role in modulating spike dynamics. This new manuscript investigates how the presence of glycans affects the allosteric network connecting the lipid binding site to the rest of the spike. The authors conducted atomistic equilibrium and out-of-equilibrium MD simulations and found that while the presence of glycans influences the structural responses, it does not fundamentally alter the connectivity between the fatty acid site and the rest of the spike.

      Strengths:

      The manuscript's findings are based on an impressive amount of sampling. The methods and results are clearly outlined, and the analysis is conducted meticulously.

      Weaknesses:

      The study does not clearly show any new findings. The authors themselves acknowledge that the manuscript mainly presents negative results-indicating that glycans do not significantly impact the allosteric network previously reported in other publications. All the results in the paper are based on a single methodology, and additional independent approaches would be needed to confirm the robustness of these findings. Allosteric networks arise from subtle correlations in protein structural dynamics, and it's uncertain whether the results discussed in this manuscript stem exclusively from the chosen force field and other modeling and analysis decisions, or if they indeed reflect something real.

    3. eLife assessment

      This manuscript focuses on understanding if and how the glycosylation of SARS-CoV2 spike protein affects a putative allosteric network of interactions controlled by the binding of a fatty acid. The main conclusion is that glycans do not significantly affect the network of allosteric interactions. This useful information - albeit mainly consisting of negative results - is based on solid evidence. It will be of interest to scientists focusing on SARS CoV2 protein structure and dynamics.

    4. Reviewer #2 (Public Review):

      This is a nice paper illustrating the use of equilibrium/non-equilibrium MD simulations to explore allosteric communication in the Spike protein. The results are described in detail and suggest a complex network of signal transmission patterns. The topic is not completely novel as it has been studied before by the same authors and the impact of glycosylation is moderated and localized at the furin site, so not many new conclusions emerge here. It is suggested that mutations are commonly found in the communication pathway which is interesting, but the authors fail to provide evidence that this is related to a positive selection and not simply to a random effect related to mutations at points that are not crucial for stability or function. One interesting point is the connection of the FA site with an additional site binding heme group. It will be interesting to see reversibility, i.e. removal of the ligand at this site is producing perturbation at the FA site?, does it produce other effects suggesting a cascade of allosteric effects? Finally, the paper lacks details to help reproducibility, in particular, I do not see details on D-NEMD calculations. One interesting point is the connection of the FA site with an additional site binding heme group.

    1. eLife assessment

      This is an important study, as PIM1/2 control of protein synthesis in differentiated cells has implications beyond T cells. The evidence is convincing in that it makes extensive use of the mouse knockout model and validation in mouse T cells with inhibitors. A rescue experiment in mouse KO T cells would be even stronger than the inhibitor studies to validate the KO phenotype and the evidence would be truly impressive if the results from the rescue experiment support the working model. Extending the observations to human T cells would also be a step towards translation and would further increase the potential impact of the work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editors’ recommendations for the authors

      The reviewers recommend the following: 

      (a) Digging deeper into the discussion of the density-dependent dispersal. 

      (b) Clarifying the microfluidic setup.  

      (c) Clarifying the description and interpretation of the transcriptomic evidence. 

      (d) Toning down carbon cycle connections (some reviewers felt the evidence did not fully support the claims). 

      We would like to thank the editors for their thoughtful evaluation of our manuscript and their clear suggestions. We have revised the manuscript in the light of these comments, as we outline below and address in detail in the point-by-point response to the reviewers’ comments that follows. 

      (a) We have expanded the discussion of density-dependent dispersal and revised Figure 2C to improve clarity. 

      (b) We have also added further information concerning the microfluidic setup in the results section and provide an illustration of the setup in a new figure panel, Figure 1A.

      (c) Addressing the reviewers’ comments on the transcriptomic analysis, we have added more information in the description and interpretation of the results. 

      (d) We have rephrased the text describing the role of degradation-dispersal cycles for carbon cycling to highlight it as the motivation of this study and emphasize the link to literature on foraging, without creating expectations of direct measurements of global carbon cycling.

      Public Reviews:

      Reviewer #1 (Public Review):

      [...]

      Weaknesses: 

      Much of the genetic analysis, as it stands, is quite speculative and descriptive. I found myself confused about many of the genes (e.g., quorum sensing) that pop up enriched during dispersal quite in contrast to my expectations. While the authors do mention some of this in the text as worth following up on, I think the analysis as it stands adds little insight into the behaviors studied. However, I acknowledge that it might have the potential to generate hypotheses and thus aid future studies. Further, I found the connections to the carbon cycle and marine environments in the abstract weak --- the microfluidics setup by the authors is nice, but it provides limited insight into naturalistic environments where the spatial distribution and dimensionality of resources are expected to be qualitatively different. 

      We thank the reviewer for their suggestions to improve our manuscript. We agree that the original manuscript would have benefitted from more detailed interpretation of the observed changes in gene expression. We have revised the manuscript to elaborate on the interpretation of the changes in expression of quorum sensing genes (see response to reviewer 1, comment 3), motility genes (see response to reviewer 1, comment 6), alginate lyase genes (see response to reviewer 1, comment 7 and reviewer 2, comment 2), and ribosomal and transporter genes (see response to reviewer 2, comment 2).

      In general, we think that the gene expression study not only supports the phenotypic observations that we made in the microfluidic device, such as the increased swimming motility when exposed to digested alginate medium, but  also adds further insights. Our reasoning for studying the transcriptomes in well mixed-batch cultures was the inability to study gene expression dynamics to support the phenotypic observations about differential motility and chemotaxis in our microfluidics setup. The transcriptomic data clearly show that even in well-mixed environments, growth on digested alginate instead of alginate is sufficient to increase the expression of motility and chemotaxis genes. In addition, the finding that expression of alginate lyases and metabolic genes is increased during growth on digested alginate was revealed through the analysis of transcriptomes, something which would not have been possible in the microfluidic setup. We agree with the reviewer that our analyses implicate further, perhaps unexpected, mechanisms like quorum sensing in the cellular response to breakdown products, and that this represents an interesting avenue for further studies.

      Finally, we  also agree with the reviewer that it would be good to be more explicit in the text that our microfluidic system cannot fully capture the complex dynamics of natural environments. Our approach does, however, allow the characterization of cellular behaviors at spatial and temporal scales that are relevant to the interactions of bacteria, and thus provides a better understanding of colonization and dispersal of marine bacteria in a manner that is not possible through in situ experiments. We have edited our manuscript to highlight this and modified our statements regarding carbon cycling towards emphasizing the role degradation-dispersal cycles in remineralization of polysaccharides (see response to reviewer 1, comment 2).  

      Reviewer #2 (Public Review):

      [...]

      Weaknesses: 

      The explanation of the microfluidics measurements is somewhat confusing but I think this could be easily remedied. The quantitative interpretation of the dispersal data could also be improved and I'm not clear if the data support the claim made. 

      We thank the reviewer for their comments and helpful suggestions. We have revised the manuscript with these suggestions in mind and believe that the manuscript is improved by a more detailed explanation of the microfluidic setup. We have added more information in the text (detailed in response to reviewer 2, comments 1 and 2) and have added a depiction of the microfluidic setup (Fig. 1A). We have also modified the presentation and discussion of the dispersal data (Fig. 2C), as described in detail below in response to reviewer 2, comment 4, and argue that they clearly show density-dependent dispersal. We believe that this modification of how the results are presented provides a more convincing case for our main conclusion, namely that the presence of degradation products controls bacterial dispersal in a density-dependent manner.  

      Reviewer #3 (Public Review):

      [...]

      Weaknesses: 

      I find this paper very descriptive and speculative. The results of the genetic analyses are quite counterintuitive; therefore, I understand the difficulty of connecting them to the observations coming from experiments in the microfluidic device. However, they could be better placed in the literature of foraging - dispersal cycles, beyond bacteria. In addition, the interpretation of the results is sometimes confusing. 

      We thank the reviewer for their suggestions to improve the manuscript. We have edited the manuscript to interpret the results of this study more clearly, in particular with regard to the fact that breakdown products of alginate cause cell dispersal (see response to reviewer 2, comment 1), gene expression changes of ribosomal proteins and transporters (see response to reviewer 2, comment 2), as well as genes relating to alginate catabolism (see response to reviewer 2, comment 3).

      To provide more context for the interpretation of our results we now also embed our findings in more detail in the previous work on foraging strategies and dispersal tradeoffs.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should clarify in more detail what they mean by density dependence in Figure 2. Usually density dependence refers to a per capita dependence, but here it seems that the per capita rate of dispersal might be roughly independent of density (Figure 2c; if you double the number of cells it doubles the number of cells leaving). Rather it seems the dispersal is such that the density of remaining cells falls below a threshold (~300 cells). 

      We thank the reviewer for raising this important point. To analyze the data more explicitly in terms of per capita dependence and so make the density dependence in the dispersal from the microfluidic chambers more clear, we have modified Figure 2C and edited the text. 

      In the modified Figure 2C, we computed the fraction of dispersed cells for each chamber (i.e the change in cell number divided by the cell number at the time of the nutrient switch). This quantity directly reveals the per-capita dependence, as mentioned by reviewer 1, and is now represented on the y-axis of Figure 2C instead of the absolute change in cell number. 

      These data demonstrate that the fraction of dispersed cells increases with increasing numbers of cells present in the chamber at the time of switching, with more highly populated chambers showing a higher fraction of dispersed cells. These findings indicate that there is a strong density dependence in the dispersal process.

      As pointed out by reviewer 1, another interesting aspect of the data is the transition at low cell number. The fraction of dispersed cells is negative in the case of the chamber with approximately 70 cells, consistent with no dispersal at this low density, and a moderate density increase as a function of continued growth.  

      In addition to the new analysis presented in Figure 2C, we have modified the paragraph that discusses this result as follows (line 208):

      “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      (2) The authors should tone down their claims about the carbon cycle in the abstract. I do not believe the results as they stand could be used to understand degradation-dispersal cycles in marine environments relevant to the carbon cycle, since these behaviors have been studied in microfluidic environments which in my understanding are quite different. As such, statements such as "degradation-dispersal cycles are an integral part in the global carbon cycle, we know little about how cells alternate between degradation and motility" and "Overall, our findings reveal the cellular mechanisms underlying bacterial degradation-dispersal cycles that drive remineralization in natural environments" are overstated in the abstract. 

      We appreciate the reviewer’s comments regarding the connections of our work with the carbon cycle. We have now rephrased these statements in our manuscript to describe a potential connection between our work and the marine carbon cycle. The colonization of polysaccharides particles by bacteria and subsequent degradation has been widely acknowledged to play a significant role in controlling the carbon flow in marine ecosystems. (Fenchel, 2002; Preheim et al., 2011; Yawata et al., 2014, 2020). We still refer to carbon flow in the revised manuscript, though cautiously, as microbial remineralization of biomass, which is recognized as an important factor in the marine biological carbon pump (e.g., (Chisholm, 2000; Jiao et al., 2024). As stated in the previous version of the manuscript, the main motivation of our work was to study the growth behaviors of marine heterotrophic bacteria during polysaccharide degradation, especially to understand when bacteria depart already colonized and degraded particles and find novel patches to grow and degrade, a process that is poorly understood. Therefore, it is conceivable that degradation-dispersal cycles do play a role in the flow of carbon in marine ecosystems. However, we acknowledge that the carbon cycle is influenced by a multitude of biological and chemical processes, and the bacterial degradation-dispersal cycle might not be the sole mechanism at play. 

      We also appreciate the reviewer’s comments highlighting that the complexity of natural environments is not fully captured in our microfluidics system. However, our microfluidics setup does allow us to quantify responses and behaviors of microbial groups at high spatial and temporal resolution, especially in the context of environmental fluctuations. Microbes in nature interact at small spatial scales and have to respond to changes in the environment, and the microfluidics setup enables the quantification of these responses. Moreover, dispersal of the bacterium V. cyclitrophicus that we use in our study, has been previously observed even during growth on particulate alginate (Alcolombri et al., 2021), but the cues and regulation controlling dispersal behaviors have been unclear.  Microfluidic experiments have now allowed us to study this process in a highly quantitative manner, and align well with observations from experiments from more nature-like settings. These quantitative experiments on bacterial strains isolated from marine particles are expected to constrain quantitative models of carbon degradation in the ocean (Nguyen et al., 2022).

      We have now adjusted our statements throughout our manuscript to reflect the knowledge gaps in understanding the triggers of degradation-dispersal cycles and their links with carbon flow in marine ecosystems. The revised manuscript, especially, contains the following statements (line 47 and line 60):

      “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      (3) The authors should clarify why they think quorum-sensing genes are increased in expression on digested alginate. The authors currently mention that QS could be used to trigger dispersal, but given the timescales of dispersal in Figure 2 (~half an hour), I find it hard to believe that these genes are expressed and have the suggested effect on those timescales. As such I would have expected the other way round - for QS genes to be expressed highly during alginate growth, so that density could be sensed and responded to. Please clarify. 

      We have now clarified this point in the revised manuscript. While the triggering of dispersal by quorum-sensing genes may indeed appear counterintuitive, and the response is rapid (we see dispersal of cells within 30-40 minutes), both observations are in line with previous studies in another model organism Vibrio cholerae. The dispersal time is similar to the dispersal time of V. cholerae cells from biofilms, as described by Singh and colleagues, (Figure 1E of Ref. Singh et al., 2017). In that case, induction of the quorum sensing dispersal regulator HapR was observed during biofilm dispersal within one hour after switch of condition (Fig. 2, middle panel of Ref. Singh et al., 2017). Even though the specific quorum sensing signaling molecules are probably different in our strain (there is no annotated homolog of the hapR gene in V. cyclitrophicus), we observed that the full set of quorum sensing genes was enriched in cells growing on digested alginate (as reported in line 314 and Fig. 4A).

      We have added this information in the manuscript (line 317): 

      “The set of quorum sensing genes was also positively enriched in cells growing on digested alginate (Fig. 4A and S4F, Table S13). This role in dispersal is in agreement with a previous study that showed induction of the quorum sensing master regulator in V. cholerae cells during dispersal from biofilms on a similar time scale as here (less than an hour) [28].”

      Reviewer #2 (Recommendations For The Authors):

      (1) Around line 144 - I don't really understand how you flow alginate through the microfluidic platform. It seems if the particles are transiently going through the microfluidic chamber then the flow rate and hence residence time of the alginate particles will matter a lot by controlling the time the cells have to colonize and excrete enzymes for alginate breakdown. Or perhaps the alginate is not particulate but is instead a large but soluble polymer? I think maybe a schematic of the microfluidic device would help -- there is an implicit assumption that we are familiar with the Dal Co et al device, but I don't recall its details and maybe a graphic added to Figure 1 would help. 

      a. In reviewing the Dal Co paper I see that cells are trapped and the medium flows through channels and the plane where the cells are held. I am still a little confused about the size of the polymeric alginate -- large scale (>1um) particles or very small polymers? 

      We have now provided a detailed description of our microfluidic experimental system. At the start of the experiments, cells are in fact not trapped within the microfluidic device, but grow and can move freely within a chamber designed with dimensions (sub-micron heights) so that growth occurs only as a monolayer. Cells were exposed to nutrients, either alginate or alginate digestion products, both in soluble form (not particles). These compounds were flowed into the device through a main channel, but entered the flowfree growth chambers by diffusion. To make these aspects of our experiments clearer, we have added further information on this in the Materials & Methods section (line 556), added this information in the abstract (line 51), and in the results (line123).

      To make our microfluidic setup clearer, we have followed this advice and added a schematic as Figure 1A and have added more information on the setup to the main text (line 153):

      “In brief, the microfluidic chips are made of an inert polymer (polydimethylsiloxane) bound to a glass coverslip. The PDMS layer contains flow channels through which the culture medium is pumped continuously. Each channel is connected to several growth chambers that are laterally positioned. The dimensions of these growth chambers (height: 0.85 µm, length: 60 µm, width: 90-120 µm) allow cells to freely move and grow as monolayers. The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4. This setup combined with time-lapse microscopy allowed us to follow the development of cell communities over time.”

      (2) What makes this confusing is the difference between Figure 1C and Figure S2A -- the authors state that the difference in Figure 1C is due to dispersal, but is there flow through the microfluidic device? So what role does that flow through the device have in dispersal? Is the adhesion of the cell groups driven at all by a physical interaction with high molecular weight polymers in the microfluidic devices or is this purely a biological effect? Could this also be explained by different real concentrations of nutrients in the two cases? 

      We realize from this comment that the role of flow of the medium in the microfluidic setup was not clearly addressed in our manuscript. In fact, cells were not exposed to flow, and nutrients were provided to the growth chambers by diffusion. We have added a clearer explanation of this point on line 158:

      “The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4.“

      One purely physical effect that we anticipate is that a high viscosity of the medium could immobilize cells. To address this point, we measured the viscosity of both alginate and digested alginate and conclude that the increase in viscosity is not strong enough to immobilize cells. We added a statement in the text (line 170)

      “To test the role of increased viscosity of polymeric alginate in causing the increased aggregation of cells, we measured the viscosity of 0.1% (w/v) alginate or digested alginate dissolved in TR media. For alginate, the viscosity was 1.03±0.01 mPa·s (mean and standard deviation of three technical replicates) whereas the viscosity of digested alginate in TR media was found to be 0.74±0.01 mPa·s. Both these values are relatively close to the viscosity of water at this temperature (0.89 mPa·s18) and, while they may affect swimming behavior [19], they are insufficient to physically restrain cell movement [20].”

      as well as a section in the Materials and Methods (line 594):

      “Viscosity of the alginate and digested alginate solution

      We measured the viscosity of alginate solutions using shear rheology measurements. We use a 40 mm cone-plate geometry (4° cone) in a Netzsch Kinexus Pro+ rheometer. 1200 uL of sample was placed on the bottom plate, the gap was set at 150 um and the sample trimmed. We used a solvent trap to avoid sample evaporation during measurement. The temperature was set to 25°C using a Peltier element. We measure the dynamic viscosity over a range of shear rates  = 0.1 – 100 s-1. We report the viscosity of each solution as the average viscosity measured over the shear rates 10 – 100 s-1, where the shear-dependence of the viscosity was low.

      We measured the viscosity of 0.1% (w/V) alginate dissolved in TR media, which was 1.03 +/- 0.01 mPa·s (reporting the mean and standard deviation of three technical replicates.). The viscosity of 0.1% digested alginate in TR media was found to be 0.74+/-0.01 mPa·s. This means that the viscosity of alginate in our microfluidic experiments is 36% higher than of digested alginate, but the viscosities are close to those expected of water (0.89 mPa·s at 25 degree Celsius according to Berstad and colleagues [18]).”

      While our microfluidic setup allows us to track the position and movement of cells in a spatially structured setting, these observations do not allow us to distinguish directly whether the differences in dispersal are a result of purely physical effects of polymers on cells or are a result of them triggering a biological response in cells that causes them to become sessile. It is known that bacterial appendages like pili interact with polysaccharide residues (Li et al., 2003). Therefore, it is quite plausible that cross-linking by polysaccharides can contribute growth behaviors on alginate. However, our analysis of gene expression demonstrates that flagellum-driven motility is decreased in the presence of alginate compared to digested alginate, alongside other major changes in gene expression. In addition, our measures of dispersal show that dispersal of cells when exposed to digested alginate is density dependent. Both observations suggest that the patterns in dispersal are governed by decision-making processes by cells resulting in changes in cell motility, rather than being a product of purely physical interactions with the polymer. 

      The finding that viscosities of both alginate and digested alginate are similar to that of water, suggests that diffusion of nutrients in the growth chambers should be similar. Therefore, we think that the differences in real concentrations of nutrients is likely not contributing to the observed differences in behavior. 

      (3) Why is Figure S1 arbitrary units? Does this have to do with the calibration of LC-MS? It would be better, it seems, to know the concentrations in real units of the monomer at least. 

      We agree with the reviewer that it would have been better to have absolute concentrations for these compounds. However, to calibrate the mass spectrometer signals (ion counts) to absolute concentrations for the different alginate compounds, we would need an analytical standard of known concentration. We are not aware of such a standard and thus report only relative concentrations. We agree that the y-axis label of Figure S1 should not contain ‘arbitrary’ units, as it shows a ratio (of measurements in the same arbitrary units). We have edited the labels of Figure S1 accordingly and the figure legend in line 26 of the Supplemental Material (“Relative concentrations…”).

      (4) Line 188 - density-dependent dispersal. The claim here is that "cells in chambers with many cells were more likely to disperse than cells in chambers with less cells." (my emphasis). Looking at the data in Figure 2C it appears that about 40% of the cells disperse irrespective of the density, before the switch to digested alginate. So it would seem that there is not a higher likelihood of dispersal at higher cell densities. For the very highest cell density, it does appear that this fraction is larger, but I'd be concerned about making this claim from what I understand to be a single experiment. To support the claim made should the authors plot Change in Cell number/Starting Cell number on the y-axis of Fig. 2C to show that the fraction is increasing? It would seem some additional data at higher starting cell densities would help support this claim more strongly. 

      We thank the reviewer for this comment, which is in line with a remark made by reviewer 1 in their comment 1. In response to these two comments (and as described above), we have edited Figure 2C and now have plotted the change in cell number relative to starting cell number at the y axis to directly show the density dependence. We observe a positive (approximately linear) relationship between the fraction of dispersed cells with the number of cells present in the chamber at the time of switching. This indicates that there is a density dependence in the dispersal process, with highly populated chambers showing a higher fraction of dispersed cells. 

      In addition to the change in Figure 2C, we have modified the paragraph around line 208: “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      The highest cell number at the start of the switch that we include is about 800 cells. The maximum number of cells that can fit into a chamber are ca. 1000 cells. Thus, 800 resident cells are close to the maximal density.

      (5) A comment -- I find the result of significant chemotaxis towards alginate but not the monomers of alginate to be quite surprising. The ecological relevance of this (line 219) seems like an important result that is worth expanding on a bit at least in the discussion. For now, my question is whether the authors know of any mechanism by which chemotaxis receptors could respond to alginate but not the monomer. How can a receptor distinguish between the two? 

      We agree that this result is surprising, given that oligomers can be more easily transported into the periplasm where sensing takes place, and they also provide an easier accessible nutrient source. Indeed, in case of the insoluble polymer chitin it has been shown that chemotaxis towards chitin is mediated by chitin oligomers (Bassler et al., 1991), which was suggested as a general motif to locate polysaccharide nutrient sources (Keegstra et al., 2022). However, a recent study has changed this perspective by showing widespread chemotaxis of marine bacteria towards the glucose-based marine polysaccharide laminarin, but not towards laminarin oligomers or glucose (Clerc et al., 2023). Together with our results on chemotaxis towards alginate (but not significantly toward alginate oligomers) this suggests that chemotaxis towards soluble polysaccharides can be mediated by direct sensing of the polysaccharide molecules.

      As recommended, we expanded the discussion of the ecological relevance and also added more information on possible mechanisms of selective sensing of alginate and its breakdown products (around line 479).:

      “Direct chemotaxis towards polysaccharides may facilitate the search for new polysaccharide sources after dispersal. We found that the presence of degradation products not only induces cell dispersal but also increases the expression of chemotaxis genes. Interestingly, we found that V. cyclitrophicus ZF270 cells show chemotaxis towards polymeric alginate but not digested alginate. This contrasts with previous findings for bacterial strains degrading the insoluble marine polysaccharide chitin, where chemotaxis was strongest towards chitin oligomers53, suggesting that oligomers may act as an environmental cue for polysaccharide nutrient sources55. However, recent work has shown that certain marine bacteria are attracted to the marine polysaccharide laminarin, and not laminarin oligomers56. Together with our results, this indicates that chemotaxis towards soluble polysaccharides may be mediated by the polysaccharide molecules themselves. The mechanism of this behavior is yet to be identified, but could be mediated by polysaccharide-binding proteins as have been found in Sphingomonas sp. A1 facilitating chemotaxis towards pectin57. Direct polysaccharide sensing adds complexity to chemosensing as polysaccharides cannot freely diffuse into the periplasm, which can lead to a trade-off between chemosensing and uptake58. Furthermore, most polysaccharides are not immediately metabolically accessible as they require degradation. But direct polysaccharide sensing can also provide certain benefits compared to using oligomers as sensory cues. First, it could enable bacterial strains to preferably navigate to polysaccharide nutrients sources that are relatively uncolonized and hence show little degradation activity. Second, strong chemotaxis towards degradation products could hinder a timely dispersal process as the dispersal then requires cells to travel against a strong attractant gradient formed by the degradation products. Overall, this strategy allows cells to alternate between degradation and dispersal to acquire carbon and energy in a heterogeneous world with nutrient hotspots [44,59–61].”

      (6) Comment on lines 287-8 -- that the "positive enrichment of the gene set containing bacterial motility proteins matched the increase in motile cells that we observe in Fig 3E." I'm confused about what is meant by the word "matched" here. Is the implication that there is some quantitative correspondence between increased motility in Figure 3 and the change in expression in Figure 4? Or is the statement a qualitative one -- that motility genes are upregulated in the presence of digested alginate? Table S12 didn't help me answer this question. 

      We thank the reviewer for their helpful comment. Our original statement was a qualitative one - observing that gene expression enrichment in genes associated with bacterial motility aligned with our expectations based on the previous observation of an increase in motile cells. We have now changed the wording to highlight the qualitative nature of this statement (line 315):

      “The positive enrichment of the gene set containing bacterial motility proteins aligned with our expectations based on the increase in motile cells that we observed in Figure 3E (Fig. 4A, Table S12).”

      (7) Line 326 - what is the explanation for the production of public enzymes in the presence of digest? How does this square with the previous narrative about cells growing on alginate digest expressing motility genes and chemotaxing towards alginate? It seems like the story is a bit tenuous here in the sense that digested alginates stimulate both motility - which is hypothesized to drive the discovery of new alginate particles - and lyase enzymes which are used to degrade alginate. So do the high motility cells that are chemotaxing towards alginate also express lyases en route? I'm of the opinion that constructing narratives like these in the absence of a more quantitative understanding of the colonization and degradation dynamics of alginate particles presents a major challenge and may be asking more of the data than the data can provide. 

      a. I noted later that this is addressed later around lines 393 in the Discussion section.

      Indeed, the notion that the presence of breakdown products triggers motility and also increases the expression of alginate lyases and other metabolic genes for alginate catabolism seems counterintuitive. We have now expanded our discussion of these results to contextualize these findings (around line 443):

      "One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell50. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients."

      (8) I like Figure 6, and I think this hypothesis is a good result from this paper, but I think it would be important to emphasize this as a proposal that needs further quantitative analysis to be supported. 

      We have now edited the manuscript to make this point more clear. While both degradation and dispersal are well-appreciated parts of microbial ecology, the transitions and underlying mechanisms are unclear. We have edited the discussion to improve the clarity (line 419): 

      “This cycle of biomass degradation and dispersal has long been discussed in the context of foraging e.g., [44,45,13,46,47], but the cellular mechanisms that drive the cell dispersal remain unclear.”

      Also, we have updated Figure 6 to indicate more clearly which new findings this work proposes (now bold font) and which previous findings that were made in different bacterial taxa and carbon sources that aligns with our  work (now light font). We edited the figure legend accordingly (line 503):

      "By integrating our results with previous studies on cooperative growth on the same system, as well as results on dispersal cycles in other systems, we highlight where the specific results of this work add to this framework (bold font)."

      Minor comments 

      (1) Is there any growth on the enzyme used for alginate digestion? E.g. is the enzyme used to digest the alginate at sufficiently high concentrations that cells could utilize it for a carbon/nitrogen source? 

      We thank the reviewer for raising this point. We added the following paragraph as Supplemental Text to address it (line 179):

      “Protein amount of the alginate lyases added to create digested alginate

      Based on the following calculation, we conclude that the amount of protein added to the growth medium by the addition of alginate lyases is so small that we consider it negligible. In our experiment we used 1 unit/ml of alginate lyases in a 4.5 ml solution to digest the alginate. As the commercially purchased alginate lyases are 10,000 units/g, our 4.5 ml solution contains 0.45 mg of alginate lyase protein. The digested alginate solution diluted 45x when added to culture medium. This means that we added 0.18 µg alginate lyase protein to 1 ml of culture medium.

      As a comparison, for 1ml of alginate medium, 1000µg of alginate is added or for 1 ml of Lysogeny broth (LB) culture medium, 3,500 µg of LB are added.  Thus, the amount of alginate lyase protein that we added is ca. 5000 - 20,000 times smaller than the amount of alginate or LB that one would add to support cell growth. Therefore, we expect the growth that the digestion of the added alginate lyases would allow to be negligible.”

      (2) The lines in Figure 2B are very hard to see. 

      We have addressed this comment by using thicker lines in Figure 2B.

      (3) The black background and images in Figure 3A and B are hard to see as well. 

      We have now replaced Figure 3A and B, now using a white background.

      (4) Typo at the beginning of line 251? 

      Unfortunately we failed to find the typo referred to. We are happy to address it if it still exists in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) I think there is not enough experimental evidence to conclude that the underlying cause of increased motility is the accumulation of digested alginate products. To conclusively show that this is the cause and not just some signal linked to cell density, perhaps the experiment should be repeated with a different carbon source. 

      We thank the reviewer for their comment, which made us realize that we did not make the nature of the dispersal cue clear. The gene expression data was obtained from batch cultures and measured at the same approximate bacterial densities in batch, which indeed shows that the digested alginate is a sufficient signal for an increase in motility gene expression. This agrees very well with our observation that cells growing on digested alginate in microfluidic chambers have an increased fraction of motile cells in comparison with cells exposed to alginate (Fig 3E). However, we did not mean to suggest that the observed dispersal by bacterial motility is not influenced by cell density, in fact, we see that dispersal (and hence the increase in cell motility) in microfluidic chambers that are switched from polymeric to digested alginate depends on the bacterial density in the chamber, with higher bacterial densities showing increased dispersal. This shows that the presence of alginate oligomers does trigger dispersal through motility, but this signal affects bacterial groups in a cell density dependent manner.

      Similar observations have been made in Caulobacter crescentus, which was found to form cell groups on the polymer xylan while cells disperse when the corresponding monomer xylose becomes available (D’Souza et al., 2021). We reference the additional work in lines 179 and 230. Taken together, these observations indicate a more general phenomenon in dispersal from polysaccharide substrates.

      (2) About the expression data: 

      • Ribosomal proteins and ABC transporters are enriched in cells grown on digested alginate and the authors discuss that this explains the difference in max growth rate between alginate and digested alginate. However, in Figure S2E the authors report no statistical difference between growth rates. 

      We have now edited the manuscript to clarify this point. We found that cells grown on degradation products reached their maximal growth rate around 7.5 hours earlier (Fig. S2D) and showed increased expression of ribosomal biosynthesis and ABC transporters in late-exponential phase (Fig. 4A). We consider this shorter lag time as a sign of a different growth state and therefore a possible reason for the difference in ribosomal protein expression.

      As the reviewer correctly points out, the maximum growth rates that were computed from the two growth curves were not significantly different (Fig. S2E). However, for our gene expression analysis, we harvested the transcriptome of cells that reached OD 0.39-0.41 (mid- to late-exponential phase). At this time point, the cell cultures may have differed in their momentary growth rate.

      We edited the manuscript to make this clearer (line 287):

      “Both observations likely relate to the different growth dynamics of V. cyclitrophicus ZF270 on digested alginate compared to alginate (Fig. S2A), where cells in digested alginate medium reached their maximal growth rate 7.5 hours earlier and thus showed a shorter lag time (Fig. S2D). As a consequence, the growth rate at the time of RNA extraction (mid-to-late exponential phase) may have differed, even though the maximum growth rate of cells grown in alginate medium and digested alginate medium were not found to be significantly different (Fig. S2E).”

      • The increased expression of transporters for lyases in cells grown on digested alginate (lines 273-274 and 325-328) is very confusing and the explanation provided in lines 412-420 is not very convincing. My two cents on this: Expression of more enzymes and induction of motility might be a strategy to be prepared for more likely future environments (after dispersal, alginate is the most likely carbon source they will find). This would be in line with observed increased chemotaxis towards the polymer rather than the monomer (Similar to C. elegans). 

      This comment is in line with reviewer 2, comment 7. In response to these two comments (and as described above), we expanded our discussion of these results to contextualize these findings (around line 443):

      “One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell [50]. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.”

      Additionally, we agree with the intriguing comment that continued expression of alginate lyases may also prepare cells for likely future environments. Further studies that aim to answer whether marine bacteria are primed by their growth on one carbon source towards faster re-initiation of degradation on a new particle will be an interesting research question. We now address this point in our manuscript (line 458):

      “However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.“

      (3) The yield reached by Vibrio on alginate is significantly higher than the yield in digested alginate, not similar, as stated in lines 133-134. Only cell counts are similar. Perhaps the author can correct this statement and speculate on the reason leading to this discrepancy: perhaps cells tend to aggregate in alginate despite the fact that these are well-mixed cultures. 

      We have edited the description of the OD measurements accordingly and agree with the reviewer that aggregation is indeed a possible reason for the discrepancy (line 141):

      “We also observed that the optical density at stationary phase was higher when cells were grown on alginate (Fig. S2B and C). However, colony counts did not show a significant difference in cell numbers (Fig. S3), suggesting that the increased optical density may stem from aggregation of cells in the alginate medium, as observed for other Vibrio species [7].”

      (4) I suggest toning down the importance of the results presented in this study for understanding global carbon cycling. There is a link but at present it is too much emphasized. 

      We have edited our statements regarding the carbon cycle. In the revised manuscript we stress the lack of direct quantifications of carbon cycling. . We still refer to carbon flow in the revised manuscript, as we would argue that microbial remineralization of biomass is recognized as an important factor in the marine biological carbon pump (e.g., Chisholm, 2000) and research on marine bacterial foraging investigates how bacterial cells manage to find and utilize this biomass.

      Our revised manuscript contains the following modified statements (line 47 and line 60): “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      References

      • Alcolombri, U., Peaudecerf, F. J., Fernandez, V. I., Behrendt, L., Lee, K. S., & Stocker, R. (2021). Sinking enhances the degradation of organic particles by marine bacteria. Nature Geoscience, 14(10), 775–780. https://doi.org/10.1038/s41561-021-00817-x
      • Bassler, B. L., Gibbons, P. J., Yu, C., & Roseman, S. (1991). Chitin utilization by marine bacteria. Chemotaxis to chitin oligosaccharides by Vibrio furnissii. Journal of Biological Chemistry, 266(36), 24268–24275. https://doi.org/10.1016/S0021-9258(18)54224-1
      • Chisholm, S. W. (2000). Stirring times in the Southern Ocean. Nature, 407(6805), 685–686. https://doi.org/10.1038/35037696
      • Chubukov, V., Gerosa, L., Kochanowski, K., & Sauer, U. (2014). Coordination of microbial metabolism. Nature Reviews. Microbiology, 12(5), 327–340. https://doi.org/10.1038/nrmicro3238
      • Clerc, E. E., Raina, J.-B., Keegstra, J. M., Landry, Z., Pontrelli, S., Alcolombri, U., Lambert, B. S., Anelli, V., Vincent, F., Masdeu-Navarro, M., Sichert, A., De Schaetzen, F., Sauer, U., Simó, R., Hehemann, J.-H., Vardi, A., Seymour, J. R., & Stocker, R. (2023). Strong chemotaxis by marine bacteria towards polysaccharides is enhanced by the abundant organosulfur compound DMSP. Nature Communications, 14(1), 8080. https://doi.org/10.1038/s41467-023-43143z
      • Dal Co, A., van Vliet, S., Kiviet, D. J., Schlegel, S., & Ackermann, M. (2020). Shortrange interactions govern the dynamics and functions of microbial communities. Nature Ecology and Evolution, 4(3), 366–375. https://doi.org/10.1038/s41559-019-1080-2
      • D’Souza, G., Ebrahimi, A., Stubbusch, A., Daniels, M., Keegstra, J., Stocker, R., Cordero, O., & Ackermann, M. (2023). Cell aggregation is associated with enzyme secretion strategies in marine polysaccharide-degrading bacteria. The ISME Journal. https://doi.org/10.1038/s41396-023-01385-1
      • D’Souza, G. G., Povolo, V. R., Keegstra, J. M., Stocker, R., & Ackermann, M. (2021). Nutrient complexity triggers transitions between solitary and colonial growth in bacterial populations. The ISME Journal, 15(9), 2614–2626. https://doi.org/10.1038/s41396-021-00953-7
      • D’Souza, G., Schwartzman, J., Keegstra, J., Schreier, J. E., Daniels, M., Cordero, O. X., Stocker, R., & Ackermann, M. (2023). Interspecies interactions determine growth dynamics of biopolymer-degrading populations in microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 120(44), e2305198120. https://doi.org/10.1073/pnas.2305198120
      • Fenchel, T. (2002). Microbial Behavior in a Heterogeneous World. Science, 296(5570), 1068–1071. https://doi.org/10.1126/science.1070118
      • Jiao, N., Luo, T., Chen, Q., Zhao, Z., Xiao, X., Liu, J., Jian, Z., Xie, S., Thomas, H., Herndl, G. J., Benner, R., Gonsior, M., Chen, F., Cai, W.-J., & Robinson, C. (2024). The microbial carbon pump and climate change. Nature Reviews Microbiology. https://doi.org/10.1038/s41579-024-01018-0
      • Keegstra, J. M., Carrara, F., & Stocker, R. (2022). The ecological roles of bacterial chemotaxis. Nature Reviews Microbiology, 20(8), 491–504. https://doi.org/10.1038/s41579-022-00709-w
      • Konishi, H., Hio, M., Kobayashi, M., Takase, R., & Hashimoto, W. (2020). Bacterial chemotaxis towards polysaccharide pectin by pectin-binding protein. Scientific Reports, 10(1), 3977. https://doi.org/10.1038/s41598-020-60274-1
      • Li, Y., Sun, H., Ma, X., Lu, A., Lux, R., Zusman, D., & Shi, W. (2003). Extracellular polysaccharides mediate pilus retraction during social motility of Myxococcus xanthus. Proceedings of the National Academy of Sciences, 100(9), 5443–5448. https://doi.org/10.1073/pnas.0836639100
      • Martínez-Antonio, A., Janga, S. C., Salgado, H., & Collado-Vides, J. (2006). Internal sensing machinery directs the activity of the regulatory network in Escherichia coli. Trends in Microbiology, 14(1), 22–27. https://doi.org/10.1016/j.tim.2005.11.002
      • McDougald, D., Rice, S. A., Barraud, N., Steinberg, P. D., & Kjelleberg, S. (2012). Should we stay or should we go: Mechanisms and ecological consequences for biofilm dispersal. Nature Reviews Microbiology, 10(1), 39–50. https://doi.org/10.1038/nrmicro2695
      • Nguyen, T. T. H., Zakem, E. J., Ebrahimi, A., Schwartzman, J., Caglar, T., Amarnath, K., Alcolombri, U., Peaudecerf, F. J., Hwa, T., Stocker, R., Cordero, O. X., & Levine, N. M. (2022). Microbes contribute to setting the ocean carbon flux by altering the fate of sinking particulates. Nature Communications, 13(1), 1657. https://doi.org/10.1038/s41467-022-29297-2
      • Norris, N., Alcolombri, U., Keegstra, J. M., Yawata, Y., Menolascina, F., Frazzoli, E., Levine, N. M., Fernandez, V. I., & Stocker, R. (2022). Bacterial chemotaxis to saccharides is governed by a trade-off between sensing and uptake. Biophysical Journal, 121(11), 2046–2059. https://doi.org/10.1016/j.bpj.2022.05.003
      • Povolo, V. R., D’Souza, G. G., Kaczmarczyk, A., Stubbusch, A. K., Jenal, U., & Ackermann, M. (2022). Extracellular appendages govern spatial dynamics and growth of Caulobacter crescentus on a prevalent biopolymer. bioRxiv, 2022.06.13.495907. https://doi.org/10.1101/2022.06.13.495907
      • Preheim, S. P., Boucher, Y., Wildschutte, H., David, L. A., Veneziano, D., Alm, E. J., & Polz, M. F. (2011). Metapopulation structure of Vibrionaceae among coastal marine invertebrates. Environmental Microbiology, 13(1), 265–275. https://doi.org/10.1111/j.1462-2920.2010.02328.x
      • Schwartzman, J. A., Ebrahimi, A., Chadwick, G., Sato, Y., Orphan, V., & Cordero, O. X. (2021). Bacterial growth in multicellular aggregates leads to the emergence of complex lifecycles. bioRxiv, 2021.11.01.466752. https://doi.org/10.1101/2021.11.01.466752
      • Singh, P. K., Bartalomej, S., Hartmann, R., Jeckel, H., Vidakovic, L., Nadell, C. D., & Drescher, K. (2017). Vibrio cholerae Combines Individual and Collective Sensing to Trigger Biofilm Dispersal. Current Biology, 27(21), 3359-3366.e7. https://doi.org/10.1016/j.cub.2017.09.041
      • Ulrich, L. E., Koonin, E. V., & Zhulin, I. B. (2005). One-component systems dominate signal transduction in prokaryotes. Trends in Microbiology, 13(2), 52–56. https://doi.org/10.1016/j.tim.2004.12.006
      • Wall, M. E., Hlavacek, W. S., & Savageau, M. A. (2004). Design of gene circuits: Lessons from bacteria. Nature Reviews Genetics, 5(1), 34–42. https://doi.org/10.1038/nrg1244
      • Yawata, Y., Carrara, F., Menolascina, F., & Stocker, R. (2020). Constrained optimal foraging by marine bacterioplankton on particulate organic matter. Proceedings of the National Academy of Sciences, 117(41), 25571–25579. https://doi.org/10.1073/pnas.2012443117
      • Yawata, Y., Cordero, O. X., Menolascina, F., Hehemann, J.-H., Polz, M. F., & Stocker, R. (2014). Competition–dispersal tradeoff ecologically differentiates recently speciated marine bacterioplankton populations. Proceedings of the National Academy of Sciences, 111(15), 5622–5627. https://doi.org/10.1073/pnas.1318943111
      • Zöttl, A., & Yeomans, J. M. (2019). Enhanced bacterial swimming speeds in macromolecular polymer solutions. Nature Physics, 15(6), 554–558. https://doi.org/10.1038/s41567-019-0454-3
    1. Reviewer #1 (Public Review):

      The authors effectively delineate the differential distribution and behaviour of MNPs within the heart, noting that these cells can be characterised by their expression levels of csf1ra and mpeg1.1. Key findings include the identification of distinct origins for larval macrophage populations and the sustained presence of csf1ra-expressing cells on the surface of the adult heart. The study examines the embryonic development of these MNPs, revealing that csf1ra+ cells begin populating the heart from embryonic day 3, while mpeg1.1+ cells colonise the heart around day 10, with a significant increase by day 17. Given that the emergence of mpeg1.1+ cells coincides with the reported timing for the onset of haematopoietic stem cell-derived haematopoiesis, the authors combined kaede-lineage tracing experiments and mutant backgrounds to conclude that the earliest tissue-resident macrophages in the heart are derived from primitive haematopoiesis.

      The authors also note that the spatial distribution of MNPs varies, with csf1ra+ cells found on the atrium and ventricle surfaces, while mpeg1.1+ cells are initially located on the surface but later distributed throughout the cardiac tissue. Notably, the study demonstrates that tissue-resident macrophages proliferate rapidly following cardiac injury. The authors observe an increased number of proliferating csf1ra+ cells, especially in csf1ra mutant zebrafish, which likely correspond to primitive-derived tissue-resident macrophages that rapidly respond to injury and contribute to the reduced scarring observed in these mutants.

      This manuscript makes an important contribution to the field by enhancing our understanding of the ontogeny of tissue-resident macrophages in the heart and their cellular behaviour in a vertebrate model capable of heart regeneration.

      Strengths:

      This work presents a landmark study on the ontogeny and cellular behaviour of macrophages in the zebrafish heart as it comprehensively examines their development and distribution in both embryonic and adult stages.

      One of the key strengths of this study is its thorough cellular description using a range of available genetic tools. By employing transgenic lines to differentiate between a few MNP subtypes, the authors provide a detailed and nuanced understanding of these cells' origins, distribution, and behaviour. This approach allows for high-resolution characterisation of MNP populations, revealing significant insights into their potential role in cardiac homeostasis and regeneration.

      Furthermore, the study's findings are significant as they parallel those observed in mouse models, thereby reinforcing the validity and relevance of the zebrafish as a model organism for studying macrophage function in the context of cardiac injury. This comparative aspect underscores the evolutionary conservation of these cellular processes and enhances the study's impact.

      Another notable strength is the use of ex vivo imaging techniques, which enable the authors to observe and study the dynamic behaviour of MNPs in heart tissue in real-time. This live imaging capability is crucial for understanding how these cells interact with their environment, particularly in response to cardiac injury. The ability to visualise MNP proliferation and movement provides valuable insights into the mechanisms underlying tissue repair and regeneration.

      Weaknesses:

      While the manuscript offers significant insights into the ontogeny and behaviour of MNPs in the zebrafish heart, a few limitations described below should be considered:

      One potential issue lies in the lineage tracing experiments using the photoconversion Tg(csf1ra:Gal4); Tg(UAS:kaede) line. The authors photoconverted all cardiac tissue macrophages present at 2 days post-fertilisation (dpf) and examined the hearts of these fish at 21 dpf. Although photoconverted macrophages were still observed at 21 dpf, the majority of cells present in the heart at that time were non-photoconverted (cyan) csf1ra+ cells. While this suggests that early-seeded embryonic csf1ra+ macrophages are retained during late larval stages, the contribution of macrophages derived from haematopoietic stem cells (HSCs) might be overestimated. An important concern is that the kaede-converted cells could have proliferated during the embryonic timeframe analysed, thereby diluting and extinguishing the converted kaede protein. This dilution effect could lead to an underestimation of the contribution of primitive embryonic macrophages relative to the HSC-derived cells, resulting in an inaccurate assessment of the proportion of embryonic-derived tissue-resident macrophages over time.

      Moreover, the study reports no significant difference in immune cell numbers in the hearts of cmyb-/- mutants, which have normal primitive haematopoiesis but lack HSCs, at 5 dpf. Given the authors' suggestion that mpeg+ cells originate from the HSC wave, it is essential to assess the number of mpeg+ cells in these mutants at later stages. This assessment would clarify whether mpeg+ cells are indeed HSC-derived or if csf1ra+ cells later switch on mpeg expression. Without this additional data, conclusions about the origins of mpeg+ cells remain speculative.

      The study's reliance on available genetic tools, while a strength, also introduces limitations. The use of only a few transgenic lines will not fully capture the complexity and diversity of MNP populations, leading to an incomplete understanding of their roles and dynamics.

      Furthermore, while the use of ex vivo imaging provides dynamic insights into cell behaviour, it may not fully capture the complexity of in vivo conditions, possibly overlooking interactions and influences present in the living organism.

      The manuscript would benefit from increasing the sample sizes to ensure the robustness of the findings. The use of Phalloidin staining to delineate single cells more accurately would also enhance the precision of cell counting and improve the overall data quality.

      The study could also benefit from a more in-depth exploration of the functional consequences of MNP heterogeneity in the heart. While the cellular characterisations are thorough, the molecular and regulatory insights provided by the study are limited to a couple of RT-PCRs for some known genes.

      Overall, the manuscript by Moyse and colleagues significantly advances our understanding of the ontogeny and behaviour of macrophages in the zebrafish heart, revealing important parallels with mammalian models. However, the points above should be carefully considered when interpreting the results presented in this study.

    1. eLife assessment

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, and presents fundamental findings that substantially advance our understanding of the drivers of influenza epidemics. The authors use a rich set of data sources to gather and analyze compelling evidence on the roles of genetic distance, other influenza dynamics and epidemiological indicators in predicting influenza epidemics. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor. This paper also makes relevant data available to the research community.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor.

      Major Strengths:

      The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights.

      Major Weaknesses:

      While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings, and minor adjustments are suggested. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.

      Comments on revised version:

      The authors have addressed each of the comments well. I have no further comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor. 

      Major Strengths: 

      The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights. 

      Major Weaknesses: 

      While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.  

      Reviewer #2 (Public Review): 

      Summary: This paper aims to achieve a better understanding of how the antigenic or genetic compositions of the dominant influenza A viruses in circulation at a given time are related to key features of seasonal influenza epidemics in the US. To this end, the authors analyze an extensive dataset with a range of statistical, data science and machine learning methods. They find that the key drivers of influenza A epidemiological dynamics are interference between influenza A subtypes and genetic divergence, relative to the previous one or two seasons, in a broader range of antigenically related sites than previously thought. 

      Strengths: A thorough investigation of a large and complex dataset. 

      Weaknesses: The dataset covers a 21 year period which is substantial by epidemiological standards, but quite small from a statistical or machine learning perspective. In particular, it was not possible to follow the usual process and test predictive performance of the random forest model with an independent dataset. 

      Reviewer #3 (Public Review): 

      Summary: 

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. It's a strong paper representing a thorough and fascinating exploration of potential drivers, and it makes a trove of relevant data readily available to the community. 

      Strengths: 

      This paper makes links between epidemiological and evolutionary data for influenza. Placing each in the context of the other is crucial for understanding influenza dynamics and evolution and this paper does a thorough job of this, with many analyses and nuances. The results on the extent to which evolutionary factors relate to epidemic burden, and on interference among influenza types, are particularly interesting. The github repository associated with the paper is clear, comprehensive, and well-documented. 

      Weaknesses: 

      The format of the results section can be hard to follow, and we suggest improving readability by restructuring and simplifying in some areas. There are a range of choices made about data preparation and scaling; the authors could explore sensitivity of the results to some of these. 

      Response to public reviews

      We appreciate the positive comments from the reviewers and have implemented or responded to all of the reviewers’ recommendations.

      In response to Reviewer 1, we expand on the potential drivers and biological implications of the findings pointed out in their specific recommendations. For example, we now explicitly mention that antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study. We note that, after the 2009 A(H1N1) pandemic, the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons is lower compared to A(H3N2) dominant seasons prior to 2009. We propose that the weakening of A(H3N2) predominance may be linked to the diversification of A(H3N2) viruses during the 2010s, wherein multiple antigenically distinct clades with similar fitness circulated in each season, as opposed to a single variant with high fitness.

      In response to Reviewer 2, we agree that it would be ideal and best practice to measure model performance with an independent test set, but our dataset includes only ~20 seasons. Predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. In the revised manuscript, we provide more justification and clarification of our methodology. Instead of testing model performance on an independent test set, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (Kuhn & Johnson, 2019).

      In response to Reviewer 3, we follow the reviewer’s advice to put the Methods section before the Results section. Concerning Reviewer 3’s question about the sensitivity of our results to data preparation and rescaling, we provide more justification and clarification of our methodology in the revised manuscript. In our study, we adjust influenza type/subtype incidences for differences in reporting between the pre- and post-2009 pandemic periods and across HHS regions. We adjust for differences in reporting between the pre- and post-2009 periods because the US CDC and WHO increased laboratory testing capacity in response to the 2009 A(H1N1) pandemic, which led to substantial, long-lasting improvements to influenza surveillance that are still in place today. Figure 1 - figure supplement 2 shows systematic increases in influenza test volume in all HHS regions after the 2009 pandemic. Given the substantial increase in test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results when adjusting for both pre- and post-2009 pandemic reporting and regional reporting versus only adjusting for the pre- and post-2009 pandemic reporting.

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      (1) Line 155-156. Request for a reference for: "Given that protective immunity wanes after 1-4 years" 

      We now include two references (He et al. 2015 and Wraith et al. 2022), which were cited at the beginning of the introduction when referring to the duration of protective immunity for antigenically homologous viruses. (Lines 640-642 in revised manuscript)

      (2) Line 162-163: Request a further explanation of the negative correlation between seasonal diversity of HA and NA LBI values and NA epitope distance. Clarify biological implications to aid reader understanding. 

      In the revised manuscript we expand on the biological implications of A(H3N2) virus populations characterized by high antigenic novelty and low LBI diversity.

      Lines 649-653:

      “The seasonal diversity of HA and NA LBI values was negatively correlated with NA epitope distance (Figure 2 – figure supplements 5 – 6), with high antigenic novelty coinciding with low genealogical diversity. This association suggests that selective sweeps tend to follow the emergence of drifted variants with high fitness, resulting in seasons dominated by a single A(H3N2) variant rather than multiple cocirculating clades.”

      (3) Figure S3 legend t-2 may be marked as t-1. 

      Thank you for catching this. We have fixed this typo. Note: Figure S3 is now Figure 2 – figure supplement 5.

      (4) Lines 201-214. The key takeaways from the analysis of subtype dominance are ultimately not clear. It also misses the underlying dynamics that H3N2 predominance following an evolutionary change has waned since 2009.

      In the revised manuscript we elaborate on key takeaways concerning the relationship between antigenic drift and A(H3N2) dominance. We also add a caveat noting that A(H3N2) predominance is weaker during the post-2009 period, which may be linked to the diversification of A(H3N2) lineages after 2012. We do not know of a reference that links the diversification of A(H3N2) viruses in the 2010s to a particular evolutionary change. Therefore, we do not attribute the diversification of A(H3N2) viruses to a specific evolutionary change in A(H3N2) variants circulating at the time (A/Perth/16/2009-like strains (PE09)). Instead, we allude to the potential role of A(H3N2) diversification in creating multiple co-circulating lineages that may have less of a fitness advantage.

      Lines 681-703:

      “We explored whether evolutionary changes in A(H3N2) may predispose this subtype to dominate influenza virus circulation in a given season. A(H3N2) subtype dominance – the proportion of influenza positive samples typed as A(H3N2) – increased with H3 epitope distance (t – 2) (R2 = 0.32, P = 0.05) and N2 epitope distance (t – 1) (R2 = 0.34, P = 0.03) (regression results: Figure 4; Spearman correlations: Figure 3 – figure supplement 1). Figure 4 illustrates this relationship at the regional level across two seasons in which A(H3N2) was nationally dominant, but where antigenic change differed. In 2003-2004, we observed widespread dominance of A(H3N2) viruses after the emergence of the novel antigenic cluster, FU02 (A/Fujian/411/2002-like strains). In contrast, there was substantial regional heterogeneity in subtype circulation during 2007-2008, a season in which A(H3N2) viruses were antigenically similar to those circulating in the previous season. Patterns in type/subtype circulation across all influenza seasons in our study period are shown in Figure 4 – figure supplement 1. As observed for the 2003-2004 season, widespread A(H3N2) dominance tended to coincide with major antigenic transitions (e.g.,

      A/Sydney/5/1997 (SY97) seasons, 1997-1998 to 1999-2000; A/California/7/2004 (CA04) season, 20042005), though this was not universally the case (e.g., A/Perth/16/2009 (PE09) season, 2010-2011). 

      After the 2009 A(H1N1) pandemic, A(H3N2) dominant seasons still occurred more frequently than A(H1N1) dominant seasons, but the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons was lower compared to A(H3N2) dominant seasons prior to 2009. Antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study (https://nextstrain.org/seasonal-

      flu/h3n2/ha/12y@2024-05-13) (Dhanasekaran et al., 2022; Huddleston et al., 2020; Yan et al., 2019). The decline in A(H3N2) predominance during the post-2009 period may be linked to the genetic and antigenic diversification of A(H3N2) viruses, wherein multiple lineages with similar fitness co-circulated in each season.”

      (5) Line 253-255: It would be beneficial to provide a more detailed interpretation of the statement that "pre-2009 seasonal A(H1N1) viruses may limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses." Elaborate on the cause-and-effect relationship within this statement.

      In the revised manuscript we suggest that seasonal A(H1N1) viruses may interfere with the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, because seasonal A(H1N1) viruses and A(H3N2) are more closely related, and thus may elicit stronger cross-reactive T cell responses.

      Lines 738-745:

      “The internal gene segments NS, M, NP, PA, and PB2 of A(H3N2) viruses and pre-2009 seasonal A(H1N1) viruses share a common ancestor (Webster et al., 1992) whereas A(H1N1)pdm09 viruses have a combination of gene segments derived from swine and avian reservoirs that were not reported prior to the 2009 pandemic (Garten et al., 2009; Smith et al., 2009). Non-glycoprotein genes are highly conserved between influenza A viruses and elicit cross-reactive antibody and T cell responses (Grebe et al., 2008; Sridhar, 2016). Because pre-2009 seasonal A(H1N1) viruses and A(H3N2) are more closely related, we hypothesized that seasonal A(H1N1) viruses could potentially limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, due to greater T cell-mediated cross-protective immunity.”

      (6) In the results section, many statements report statistical results of correlation analyses. Consider providing further interpretations of these results, such as the implications of nonsignificant correlations and how they support or contradict the hypothesis or previous studies. For example, the statement on line 248 regarding the lack of significant correlation between influenza B epidemic size and A(H3N2) epidemic metrics would benefit from additional discussion on what this non-significant correlation signifies and how it relates to the hypothesis or previous research. 

      In the Discussion section, we suggest that the lack of an association between influenza B circulation and A(H3N2) epidemic metrics is due to few T and B cell epitopes shared between influenza A and B viruses (Terajima et al., 2013).

      Lines 1005-1007 in revised manuscript (Lines 513-515 in original manuscript): 

      “Overall, we did not find any indication that influenza B incidence affects A(H3N2) epidemic burden or timing, which is not unexpected, given that few T and B cell epitopes are shared between the two virus types (Terajima et al., 2013).”

      Minor comments: 

      (1) Line 116-122: Include a summary statistical description of all collected data sets, detailing the number of HA and NA sequence data and their sources. Briefly describe subsampled data sets, specifying preferences (e.g., the number of HA or NA sequence data collected from each region). 

      In our revised manuscript we now include supplementary tables that summarize the number of A/H3 and

      A/N2 sequences in each subsampled dataset, aggregated by world region, for all seasons combined (Figure 2 - table supplements 1 - 2). We also include supplementary figures showing the number of sequences collected in each month and each season in North America versus the other nine world regions combined (Figure 2 - figure supplements 1 - 2). Subsampled datasets are plotted individually in the figures below but individual time series are difficult to discern due to minor differences in sequence counts across the datasets.

      (2) Figure 7A: Due to space limitations, consider rounding numbers on the x-axis to whole numbers for clarity. 

      Thank you for this suggestion. In the revised manuscript we round numbers in the axes of Figure 7A (Figure 9A in the revised manuscript) so that the axes are less crowded.

      (3) Figure 4C & Figure 4D: Note that Region 10 (purple) data were unavailable for seasons before 2009 (lines 1483-1484). Label each region on the map with its respective region number (1 to 10) and indicate this in the legend for easy identification. 

      In our original submission, the legend for Figure 4 included “Data for Region 10 (purple) were not available for seasons prior to 2009” at the end of the caption. We have moved this sentence, as well as other descriptions that apply to both C and D, so that they follow the sentence “C-D. Regional patterns of influenza type and subtype incidence during two seasons when A(H3N2) was nationally dominant.”

      In our revised manuscript, Figure 4, and Figure 4 - figure supplement 1 (Figure S10 in original submission) include labels for each HHS region.

      We did not receive specific recommendations from Reviewer #2. However, our responses to Reviewer #3 addresses the study’s weaknesses mentioned by Reviewer #2.

      Reviewer #3 (Recommendations For The Authors): 

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. 

      This is a work horse of paper, in the volumes of data that are analyzed and the extensive analysis that is done. The data that are provided are a treasure trove resource for influenza modelers and for anyone interested in seeing influenza surveillance data in the context of evolution, and evolutionary information in the context of epidemiology. 

      L53 - end of sentence "and antigenic drift": not sure this fits, explain? I thought this sentence was in contrast to antigenic drift.

      Thank you for catching this. We did not intend to include “and antigenic drift” at the end of this sentence and have removed it (Line 59).

      Para around L115: would using primarily US data be a limitation, because it's global immunity that shapes success of strains? Or, how much does each country's immunity and vaccination and so on actually shape what strains succeed there, compared to global/international factors? 

      The HA and NA phylogenetic trees in our study are enriched with US sequences because our study focuses on epidemiological dynamics in the US, and we wanted to prioritize A(H3N2) viruses that the US human population encountered in each season. We agree with the reviewer that the world population may be the right scale to understand how immunity, acquired by vaccination or natural infection, may shape the emergence and success of new lineages that will go on to circulate globally. However, our study assesses the overall impact of antigenic drift on regional A(H3N2) epidemic dynamics in the US. In other words, our driving question is whether we can predict the population-level impact of an A(H3N2) variant in the US, conditional on this particular lineage having established in the US and circulating at relatively high levels. We do not assess the global or population-level factors that may influence which A(H3N2) virus lineages are successful in a given location or season.

      We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader. 

      Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”

      In the Results section, I found the format hard to follow, because of the extensive methodological details, numbers with CIs and long sentences. Sentences sometimes included the question, definitions of variables, and lists. For example at line 215 we have: "Next, we tested for associations between A(H3N2) evolution and epidemic timing, including onset week, defined as the winter changepoint in incidence [16], and peak week, defined as the first week of maximum incidence; spatiotemporal synchrony, measured as the variation (standard deviation, s.d.) in regional onset and peak timing; and epidemic speed, including seasonal duration and the number of weeks from onset to peak (Table 2, Figure S11)". I would suggest putting the methods section first, using shorter sentences, separating lists from the question being asked, and stating what was found without also putting in all the extra detail. Putting the methods section before the results might reduce the sense that you have to explain what you did and how in the results section too.

      Thank you for suggesting how to improve the readability of the Results section. In the revised manuscript, we follow the reviewer’s advice to put the Methods section before the Results section. Although eLife formatting requirements specify the order: Introduction, Results, Discussion, and Methods, the journal allows for the Methods section to follow the Introduction when it makes sense to do so. We agree with the reviewer that putting the Methods section before the Results section makes our results easier to follow because we no longer need to introduce methodological details at the beginning of each set of results.

      L285 in the RF you remove variables without significant correlations with the target variables, but isn't one of the aims of RF to uncover relationships where a correlation might not be evident, and in part to reveal combinations of features that give the targeted outcome? Also with the RF, I am a bit concerned that you could not use the leave-one-out approach because it was "unstable" - presumably that means that you obtain quite different results if you leave out a season. How robust are these results, and what are the most sensitive aspects? Are the same variables typically high in importance if you leave out a season, for example? What does the scatterplot of observed vs predicted epidemic size (as in Fig 7) look like if each prediction is for the one that was left out (i.e. from a model trained on all the rest)? In my experience, where the RF is "unstable", that can look pretty terrible even if the model trained on all the data looks great (as does Figure 7). In any case I think it's worth discussing sensitivity.

      (1) In response to the reviewer’s first question, we explain our rationale for not including all candidate predictors in random forest and penalized regression models. 

      Models trained with different combinations of predictors can have similar performance, and these combinations of predictors can include variables that do not necessarily have strong univariate associations with the target variable. The performance of random forest and LASSO regression models are not sensitive to redundant or irrelevant predictors (see Figure 10.2 in Kuhn & Johnson, 2019). However,  if our goal is variable selection rather than strictly model performance, it is considered best practice to remove collinear, redundant, and/or irrelevant variables prior to training models (see section 11.3 in Kuhn & Johnson, 2019). In both random forest and LASSO regression models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection. In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores. Thus, failing to minimize multicollinearity prior to model training could result in some variables having low rankings and the appearance of being unimportant, because their importance scores are overshadowed by those of the highly correlated variables. Our rationale for preprocessing predictor data follows the philosophy of Kuhn & Johnson, 2019, who recommend including the minimum possible set of variables that does not compromise model performance. Even if a particular model is insensitive to extra predictors, Kuhn and John explain that “removing predictors can reduce the cost of acquiring data or improve the throughput of the software used to make predictions.”

      In the revised manuscript, we include more details about our steps for preprocessing predictor data. We also follow the reviewer’s suggestion to include all evolutionary predictors in variable selection analyses, regardless of whether they have strong univariate correlations with target outcomes, because the performance of random forest and LASSO regression models is not affected by redundant predictors. 

      Including additional predictors in our variable selection analyses does not change our conclusions. As reported in our original manuscript, predictors with strong univariate correlations with various epidemic metrics were the highest ranked features in both random forest and LASSO regression models.

      Lines 523-563:

      “Preprocessing of predictor data: The starting set of candidate predictors included all viral fitness metrics: genetic and antigenic distances between current and previously circulating strains and the standard deviation and Shannon diversity of H3 and N2 LBI values in the current season. To account for potential type or subtype interference, we included A(H1N1) or A(H1N1)pdm09 epidemic size and B epidemic size in the current and prior season and the dominant IAV subtype in the prior season (Lee et al., 2018). We included A(H3N2) epidemic size in the prior season as a proxy for prior natural immunity to A(H3N2). To account for vaccine-induced immunity, we considered four categories of predictors and included estimates for the current and prior seasons: national vaccination coverage among adults (18-49 years coverage × ≥ 65 years coverage), adjusted A(H3N2) vaccine effectiveness (VE), a combined metric of vaccination coverage and A(H3N2) VE (18-49 years coverage × ≥ 65 years coverage × VE), and H3 and N2 epitope distances between naturally circulating A(H3N2) viruses and the U.S. A(H3N2) vaccine strain in each season. We could not include a predictor for vaccination coverage in children or consider cladespecific VE estimates, because these data were not available for most seasons in our study.

      Random forest and LASSO regression models are not sensitive to redundant (highly collinear) features (Kuhn & Johnson, 2019), but we chose to downsize the original set of candidate predictors to minimize the impact of multicollinearity on variable importance scores. For both types of models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection (Kuhn & Johnson, 2019). In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores (Kuhn & Johnson, 2019). We first confirmed that none of the candidate predictors had zero variance or near-zero variance. Because seasonal lags of each viral fitness metric are highly collinear, we included only one lag of each evolutionary predictor, with a preference for the lag that had the strongest univariate correlations with various epidemic metrics. We checked for multicollinearity among the remaining predictors by examining Spearman’s rank correlation coefficients between all pairs of predictors. If a particular pair of predictors was highly correlated (Spearman’s 𝜌 > 0.8), we retained only one predictor from that pair, with a preference for the predictor that had the strongest univariate correlations with various epidemic metrics. Lastly, we performed QR decomposition of the matrix of remaining predictors to determine if the matrix is full rank and identify sets of columns involved in linear dependencies. This step did not eliminate any additional predictors, given that we had already removed pairs of highly collinear variables based on Spearman correlation coefficients. 

      After these preprocessing steps, our final set of model predictors included 21 variables, including 8 viral evolutionary indicators: H3 epitope distance (t – 2), HI log2 titer distance (t – 2), H3 RBS distance (t – 2), H3 non-epitope distance (t – 2), N2 epitope distance (t – 1), N2 non-epitope distance (t – 1), and H3 and N2 LBI diversity (s.d.) in the current season; 6 proxies for type/subtype interference and prior immunity:

      A(H1N1) and B epidemic sizes in the current and prior season, A(H3N2) epidemic size in the prior season, and the dominant IAV subtype in the prior season; and 7 proxies for vaccine-induced immunity: A(H3N2) VE in the current and prior season, H3 and N2 epitope distances between circulating strains and the vaccine strain in each season, the combined metric of adult vaccination coverage × VE in the current and prior season, and adult vaccination coverage in the prior season.”

      (2) Next, we clarify our model training methodology to address the reviewer’s second point about using a leave-one-out cross-validation approach.

      We believe the reviewer is mistaken; we use a leave-one-season-out validation approach which lends some robustness to the predictions. In our original submission, we stated “We created each forest by generating 3,000 regression trees from 10 repeats of a leave-one-season-out (jackknife) cross-validated sample of the data. Due to the small size of our dataset, evaluating the predictive accuracy of random forest models on a quasi-independent test set produced unstable estimates.” (Lines 813-816 in the original manuscript)

      To clarify, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (see Section 3.4 in Kuhn & Johnson, 2019). To reduce noise, we generated 10 bootstrap resamples of each fold and averaged the RMSE and R2 values of model predictions from resamples. 

      Although it would be ideal and best practice to measure model performance with an independent test set, our dataset includes only ~20 seasons. We found that predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. Further, we suspect that large antigenic jumps in a small subset of seasons further contribute to variation in prediction accuracy across randomly selected test sets. Our rationale for using cross-validation instead of an independent test set is best described in Section 4.3 of Kuhn and Johnson’s book “Applied Predictive Modeling” (Kuhn & Johnson, 2013):

      “When the number of samples is not large, a strong case can be made that a test set should be avoided because every sample may be needed for model building. Additionally, the size of the test set may not have sufficient power or precision to make reasonable judgements. Several researchers (Molinaro 2005; Martin and Hirschberg 1996; Hawkins et al. 2003) show that validation using a single test set can be a poor choice. Hawkins et al. (2003) concisely summarize this point: “holdout samples of tolerable size [...] do not match the cross-validation itself for reliability in assessing model fit and are hard to motivate. “Resampling methods, such as cross-validation, can be used to produce appropriate estimates of model performance using the training set. These are discussed in length in Sect.4.4. Although resampling techniques can be misapplied, such as the example shown in Ambroise and McLachlan (2002), they often produce performance estimates superior to a single test set because they evaluate many alternate versions of the data.”

      In our revised manuscript, we provide additional clarification of our methods (Lines 574-590):

      “We created each forest by generating 3,000 regression trees. To determine the best performing model for each epidemic metric, we used leave-one-season-out (jackknife) cross-validation to train models and measure model performance, wherein each “assessment” set is one season of data predicted by the model, and the corresponding “analysis” set contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of each model (Kuhn & Johnson, 2019). Due to the small size of our dataset (~20 seasons), evaluating the predictive accuracy of random forest models on a quasi-independent test set of 2-3 seasons produced unstable estimates. Instead of testing model performance on an independent test set, we generated 10 bootstrap resamples (“repeats”) of each analysis set (“fold”) and averaged the predictions of models trained on resamples (Kuhn & Johnson, 2013, 2019). For each epidemic metric, we report the mean root mean squared error (RMSE) and R2 of predictions from the best tuned model. We used permutation importance (N = 50 permutations) to estimate the relative importance of each predictor in determining target outcomes. Permutation importance is the decrease in prediction accuracy when a single feature (predictor) is randomly permuted, with larger values indicating more important variables. Because many features were collinear, we used conditional permutation importance to compute feature importance scores, rather than the standard marginal procedure (Altmann et al., 2010; Debeer & Strobl, 2020; Strobl et al., 2008; Strobl et al., 2007).”

      (3) In response to the reviewer’s question about the sensitivity of results when one season is left out, we clarify that the variable importance scores in Figure 8 and model predictions in Figure 9 were generated by models tuned using leave-one-season-out cross-validation. 

      As explained above, in our leave-one-season-out cross-validation approach, each “assessment” set contains one season of data predicted by the model, and the corresponding “analysis” set (“fold”) contains the remaining seasons. We generated predictions of epidemic metrics and variable importance rankings by averaging the model output of 10 bootstrap resamples of each cross-validation fold. 

      In Lines 791-806, we describe which epidemic metrics have the highest prediction accuracy and report that random forest models tend to underpredict most epidemic metrics in seasons with high antigenic novelty:

      “We measured correlations between observed values and model-predicted values at the HHS region level. Among the various epidemic metrics, random forest models produced the most accurate predictions of A(H3N2) subtype dominance (Spearman’s 𝜌 = 0.95, regional range = 0.85 – 0.97), peak incidence (𝜌 = 0.91, regional range = 0.72 – 0.95), and epidemic size (𝜌 = 0.9, regional range = 0.74 – 0.95), while predictions of effective 𝑅! and epidemic intensity were less accurate (𝜌 = 0.81, regional range = 0.65 – 0.91; 𝜌 = 0.78, regional range = 0.63 – 0.92, respectively) (Figure 9). Random forest models tended to underpredict most epidemic targets in seasons with substantial H3 antigenic transitions, in particular the SY97 cluster seasons (1998-1999, 1999-2000) and the FU02 cluster season (2003-2004) (Figure 9). 

      For epidemic size and peak incidence, seasonal predictive error – the root-mean-square error (RMSE) across all regional predictions in a season – increased with H3 epitope distance (epidemic size, Spearman’s 𝜌 = 0.51, P = 0.02; peak incidence, 𝜌 = 0.63, P = 0.004) and N2 epitope distance (epidemic size, 𝜌 = 0.48, P = 0.04; peak incidence, 𝜌 = 0.48, P = 0.03) (Figure 9 – figure supplements 1 – 2). For models of epidemic intensity, seasonal RMSE increased with N2 epitope distance (𝜌 = 0.64, P = 0.004) but not H3 epitope distance (𝜌 = 0.06, P = 0.8) (Figure 9 – figure supplements 1 – 2). Seasonal RMSE of effective 𝑅! and subtype dominance predictions did not correlate with H3 or N2 epitope distance (Figure 9 – figure supplements 1 – 2).”

      I think the competition (interference) results are really interesting, perhaps among the most interesting aspects of this work. 

      Thank you! We agree that our finding that subtype interference has a greater impact than viral evolution on A(H3N2) epidemics is one of the more interesting results in the study.

      Have you seen the paper by Barrat-Charlaix et al? They found that LBI was not good predicting frequency dynamics (see https://pubmed.ncbi.nlm.nih.gov/33749787/); instead, LBI was high for sequences like the consensus sequence, which was near to future strains. LBI also was not positively correlated with epidemic impact in Figure S7.

      The local branching index (LBI) measures the rate of recent phylogenetic branching and approximates relative fitness among viral clades, with high LBI values representing greater fitness (Neher et al. 2014).

      Two of this study’s co-authors (John Huddleston and Trevor Bedford) are also co-authors of BarratCharlaix et al. 2021. Barrat-Charlaix et al. 2021 assessed the performance of LBI in predicting the frequency dynamics and fixation of individual amino acid substitutions in A(H3N2) viruses. Our study is not focused on predicting the future success of A(H3N2) clades or the frequency dynamics or probability of fixation of individual substitutions. Instead, we use the standard deviation and Shannon diversity of LBI values in each season as a proxy for genealogical (clade-level) diversity. We find that, at a seasonal level, low diversity of H3 or N2 LBI values in the current season correlates with greater epidemic intensity, higher transmission rates, and shorter seasonal duration.

      In the Discussion we provide an explanation for these correlation results (Lines 848-857): 

      “The local branching index (LBI) is traditionally used to predict the success of individual clades, with high LBI values indicating high viral fitness (Huddleston et al., 2020; Neher et al., 2014). In our epidemiological analysis, low diversity of H3 or N2 LBI in the current season correlated with greater epidemic intensity, higher transmission rates, and shorter seasonal duration. These associations suggest that low LBI diversity is indicative of a rapid selective sweep by one successful clade, while high LBI diversity is indicative of multiple co-circulating clades with variable seeding and establishment times over the course of an epidemic. A caveat is that LBI estimation is more sensitive to sequence sub-sampling schemes than strain-level measures. If an epidemic is short and intense (e.g., 1-2 months), a phylogenetic tree with our sub-sampling scheme (50 sequences per month) may not incorporate enough sequences to capture the true diversity of LBI values in that season.”

      Figure 1 - LBI goes up over time. Is that partly to do with sampling? Overall how do higher sampling volumes in later years impact this analysis? (though you choose a fixed number of sequences so I guess you downsample to cope with that). I note that LBI is likely to be sensitive to sequencing density. 

      Thank you for pointing this out. We realized that increasing LBI Shannon diversity over the course of the study period was indeed an artefact of increasing sequence volume over time. Our sequence subsampling scheme involves selecting a random sample of up to 50 viruses per month, with up to 25 viruses selected from North America (if available) and the remaining sequences evenly divided across nine other global regions. In early seasons of the study (late 1990s/early 2000s), sampling was often too sparse to meet the 25 viruses/month threshold for North America or for the other global regions combined (H3: Figure 2 - figure supplement 1; N2: Figure 2 - figure supplement 2). Ecological diversity metrics are sensitive to sample size, which explains why LBI Shannon diversity appeared to steadily increase over time in our original submission. In our revised manuscript, we correct for uneven sample sizes across seasons before estimating Shannon diversity and clarify our methodology. 

      Lines 443-482: 

      “Clade growth: The local branching index (LBI) measures the relative fitness of co-circulating clades, with high LBI values indicating recent rapid phylogenetic branching (Huddleston et al., 2020; Neher et al., 2014). To calculate LBI for each H3 and N2 sequence, we applied the LBI heuristic algorithm as originally described by Neher et al., 2014 to H3 and N2 phylogenetic trees, respectively. We set the neighborhood parameter 𝜏 to 0.4 and only considered viruses sampled between the current season 𝑡 and the previous season 𝑡 – 1 as contributing to recent clade growth in the current season 𝑡.  

      Variation in the phylogenetic branching rates of co-circulating A(H3N2) clades may affect the magnitude, intensity, onset, or duration of seasonal epidemics. For example, we expected that seasons dominated by a single variant with high fitness might have different epidemiological dynamics than seasons with multiple co-circulating clades with varying seeding and establishment times. We measured the diversity of clade growth rates of viruses circulating in each season by measuring the standard deviation (s.d.) and Shannon diversity of LBI values in each season. Given that LBI measures relative fitness among cocirculating clades, we did not compare overall clade growth rates (e.g., mean LBI) across seasons.

      Each season’s distribution of LBI values is right-skewed and does not follow a normal distribution. We therefore bootstrapped the LBI values of each season in each replicate dataset 1000 times (1000 samples with replacement) and estimated the seasonal standard deviation of LBI from resamples, rather than directly from observed LBI values. We also tested the seasonal standard deviation of LBI from log transformed LBI values, which produced qualitatively equivalent results to bootstrapped LBI values in downstream analyses.

      As an alternative measure of seasonal LBI diversity, we binned raw H3 and N2 LBI values into categories based on their integer values (e.g., an LBI value of 0.5 is assigned to the (0,1] bin) and estimated the exponential of the Shannon entropy (Shannon diversity) of LBI categories (Hill, 1973; Shannon, 1948). The Shannon diversity of LBI considers both the richness and relative abundance of viral clades with different growth rates in each season and is calculated as follows:  

      where 𝑞 𝐷 is the effective number of categories or Hill numbers of order 𝑞 (here, clades with different growth rates), with 𝑞 defining the sensitivity of the true diversity to rare versus abundant categories (Hill,

      1973). exp is the exponential function, 𝑝# is the proportion of LBI values belonging to the 𝑖th category, and 𝑅 is richness (the total number of categories). Shannon diversity 1𝐷 (𝑞 = 1) estimates the effective number of categories in an assemblage using the geometric mean of their proportional abundances 𝑝# (Hill, 1973).  

      Because ecological diversity metrics are sensitive to sampling effort, we rarefied H3 and N2 sequence datasets prior to estimating Shannon diversity so that seasons had the same sample size. For each season in each replicate dataset, we constructed rarefaction and extrapolation curves of LBI Shannon diversity and extracted the Shannon diversity estimate of the sample size that was twice the size of the reference sample size (the smallest number of sequences obtained in any season during the study) (iNEXT R package) (Chao et al., 2014). Chao et al. found that their diversity estimators work well for rarefaction and short-range extrapolation when the extrapolated sample size is up to twice the reference sample size. For H3, we estimated seasonal diversity using replicate datasets subsampled to 360 sequences/season; For N2, datasets were subsampled to 230 sequences/season.”

      Estimating the Shannon diversity of LBI from datasets with even sampling across seasons removes the previous secular trend of increasing LBI diversity over time (Figure 2 in revised manuscript).

      Figure 3 - I wondered what about the co-dominant times? 

      In Figure 3, orange points correspond to seasons in which A(H3N2) and A(H1N1) were codominant. We are not sure of the reviewer’s specific question concerning codominant seasons, but if it concerns whether antigenic drift is linked to epidemic magnitude among codominant seasons alone, we cannot perform separate regression analyses for these seasons because there are only two codominant seasons during the 22 season study period.

      Figure 4 - Related to drift and epidemic size, dominance, etc. -- when is drift measured, and (if it's measured in season t), would larger populations create more drift, simply by having access to more opportunity (via a larger viral population size)? This is a bit 'devil's advocate' but what if some epidemiological/behavioural process causes a larger and/or later peak, and those gave rise to higher drift?

      Seasonal drift is measured as the genetic or antigenic distance between viruses circulating during season t and viruses circulating in the prior season (𝑡 – 1) or two seasons ago (𝑡 – 2).

      Concerning the question about whether larger human populations lead to greater rates of antigenic drift, phylogeographic studies have repeatedly found that East-South-Southeast Asia are the source populations for A(H3N2) viruses (Bedford et al., 2015; Lemey et al., 2014), in part because these regions have tropical or subtropical climates and larger human populations, which enable year-round circulation and higher background infection rates. Larger viral populations (via larger host population sizes) and uninterrupted transmission may increase the efficiency of selection and the probability of strain survival and global spread (Wen et al., 2016). After A(H3N2) variants emerge in East-South-Southeast Asia and spread to other parts of the world, A(H3N2) viruses circulate via overlapping epidemics rather than local persistence (Bedford et al., 2015; Rambaut et al., 2008). Each season, A(H3N2) outbreaks in the US (and other temperate regions) are seeded by case importations from outside the US, genetic diversity peaks during the winter, and a strong genetic bottleneck typically occurs at the end of the season (Rambaut et al., 2008).

      Due to their faster rates of antigenic evolution, A(H3N2) viruses undergo more rapid clade turnover and dissemination than A(H1N1) and B viruses, despite similar global migration networks across A(H3N2), A(H1N1), and B viruses (Bedford et al., 2015). Bedford et al. speculate that there is typically little geographic differentiation in A(H3N2) viruses circulating in each season because A(H3N2) viruses tend to infect adults, and adults are more mobile than children. Compared to A(H3N2) viruses, A(H1N1) and B viruses tend to have greater genealogical diversity, geographic differentiation, and longer local persistence times (Bedford et al., 2015; Rambaut et al., 2008). Thus, some A(H1N1) and B epidemics are reseeded by viruses that have persisted locally since prior epidemics (Bedford et al., 2015).

      Theoretical models have shown that epidemiological processes can influence rates of antigenic evolution (Recker et al., 2007; Wen et al., 2016; Zinder et al., 2013), though the impact of flu epidemiology on viral evolution is likely constrained by the virus’s intrinsic mutation rate. 

      In conclusion, larger host population sizes and flu epidemiology can indeed influence rates of antigenic evolution. However, given that our study is US-centric and focuses on A(H3N2) viruses, these factors are likely not at play in our study, due to intrinsic biological characteristics of A(H3N2) viruses and the geographic location of our study.

      We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.

      Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”

      Methods -- 

      L 620 about rescaling and pre- vs post-pandemic times : tell us more - how has reporting changed? could any of this not be because of reporting but because of NPIs or otherwise? Overall there is a lot of rescaling going on. How sensitive are the results to it? 

      it would be unreasonable to ask for a sensitivity analysis for all the results for all the choices around data preparation, but some idea where there is a reason to think there might be a dependence on one of these choices would be great.

      In response to the 2009 A(H1N1) pandemic, the US CDC and WHO increased laboratory testing capacity and strengthened epidemiological networks, leading to substantial, long-lasting improvements to influenza surveillance that are still in place today (https://www.cdc.gov/flu/weekly/overview.htm). At the beginning of the COVID-19 pandemic, influenza surveillance networks were quickly adapted to detect and understand the spread of SARS-CoV-2. The 2009 pandemic occurred over a time span of less than one year, and strict non-pharmaceutical interventions (NPIs), such as lockdowns and mask mandates, were not implemented. Thus, we attribute increases in test volume during the post-2009 period to improved virologic surveillance and laboratory testing capacity rather than changes in care-seeking behavior. In the revised manuscript, we include a figure (Figure 1 - figure supplement 2) that shows systematic increases in test volume in all HHS regions after the 2009 pandemic.

      Given the substantial increase in influenza test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various

      A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results for Spearman correlations and regression models, when adjusting for the pre- and post-2009 pandemic time periods and regional reporting versus only adjusting for the pre-/post-2009 pandemic time periods. Below, we share adjusted versions of Figure 3 (regression results) and Figure 3 - figure supplement 1 (Spearman correlations). Each figure only adjusts for differences in pre- and post-2009 pandemic reporting.

      Author response image 1.

      Adjustment for pre- and post-2009 pandemic only

      Author response image 2.

      Adjustment for pre- and post-2009 pandemic only

      L635 - Why discretize the continuous LBI distribution and then use Shannon entropy when you could just use the variance and/or higher moments? (or quantiles)? Similarly, why not use the duration of the peak, rather than Shannon entropy? (though there, because presumably data are already binned weekly, and using duration would involve defining start and stop times, it's more natural than with LBI)

      We realize that we failed to mention in the methods that we calculated the standard deviation of LBI in each season, in addition to the exponential of the Shannon entropy (Shannon diversity) of LBI. Both the Shannon diversity of LBI values and the standard deviation of LBI values were negatively correlated with effective Rt and epidemic intensity and positively correlated with seasonal duration. The two measures were similarly correlated with effective Rt and epidemic intensity (Figure 3 - figure supplements 2 - 3), while the Shannon diversity of LBI had slightly stronger correlations with seasonal duration than s.d. LBI (Figure 5). Thus, both measures of LBI diversity appear to capture potentially biologically important heterogeneities in clade growth rates.

      Separately, we use the inverse Shannon entropy of the incidence distribution to measure the spread of an A(H3N2) epidemic during the season, following the methods of Dalziel et al. 2018. The peak of an epidemic is a single time point at which the maximum incidence occurs. We have not encountered “the duration of the peak” before in epidemiology terminology, and, to our knowledge, there is not a robust way to measure the “duration of a peak,” unless one were to measure the time span between multiple points of maximum incidence or designate an arbitrary threshold for peak incidence that is not strictly the maximum incidence. Given that Shannon entropy is based on the normalized incidence distribution over the course of the entire influenza season (week 40 to week 20), it does not require designating an arbitrary threshold to describe epidemic intensity.

      L642 - again why normalize epidemic intensities, and how sensitive are the results to this? I would imagine given that the RF results were unstable under leave-one-out analysis that some of those results could be quite sensitive to choices of normalization and scaling.

      Epidemic intensity, defined as the inverse Shannon entropy of the incidence distribution, measures the spread of influenza cases across the weeks in a season. Following Dalziel et al. 2018, we estimated epidemic intensity from normalized incidence distributions rather than raw incidences so that epidemic intensity is invariant under differences in reporting rates and/or attack rates across regions and seasons. If we were to use raw incidences instead, HHS regions or seasons could have the appearance of greater or lower epidemic intensity (i.e., incidence concentrated within a few weeks or spread out over several weeks), due to differences in attack rates or test volume, rather than fundamental differences in the shapes of their epidemic curves. In other words, epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season.

      In the methods section, we provide further clarification for why epidemic intensities are based on normalized incidence distributions rather than raw incidences.

      Lines 206-209: “Epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season. Following the methodology of Dalziel et al. 2018, epidemic intensity values were normalized to fall between 0 and 1 so that epidemic intensity is invariant to differences in reporting rates and/or attack rates across regions and seasons.”  

      L643 - more information about what goes into Epidemia (variables, priors) such that it's replicable/understandable without the code would be good. 

      We now include additional information concerning the epidemic models used to estimate Rt, including all model equations, variables, and priors (Lines 210-276 in Methods).

      L667 did you do breakpoint detection? Why linear models? Was log(incidence) used? 

      In our original submission, we estimated epidemic onsets using piecewise regression models (Lines 666674 in original manuscript), which model non-linear relationships with breakpoints by iteratively fitting linear models (Muggeo, 2003). Piecewise regression falls under the umbrella of parametric methods for breakpoint detection.

      We did not include results from linear models fit to log(incidence) or GLMs with Gaussian error distributions and log links, due to two reasons. First, models fit to log-transformed data require non-zero values as inputs. Although breakpoint detection does not necessarily require weeks of zero incidence leading up to the start of an outbreak, limiting the time period for breakpoint detection to weeks with nonzero incidence (so that we could use log transformed incidence) substantially pushed back previous more biologically plausible estimates of epidemic onset weeks. Second, as an alternative to limiting the dataset to weeks with non-zero incidence, we tried adding a small positive number to weekly incidences so that we could fit models to log transformed incidence for the whole time period spanning epidemic week 40 (the start of the influenza season) to the first week of maximum incidence. Fitting models to log

      transformed incidences produced unrealistic breakpoint locations, potentially because log transformations 1) linearize data, and 2) stabilize variance by reducing the impact of extreme values. Due to the short time span used for breakpoint detection, log transforming incidence diminishes abrupt changes in incidence at the beginning of outbreaks, making it difficult for models to estimate biologically plausible breakpoint locations. Log transformations of incidence may be more useful when analyzing time series spanning multiple seasons, rather than short time spans with sharp changes in incidence (i.e., the exponential growth phase of a single flu outbreak).

      As an alternative to piecewise regression, our revised manuscript also estimates epidemic onsets using a Bayesian ensemble algorithm that accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (BEAST - a Bayesian estimator of Abrupt change, Seasonal change, and Trend; Zhao et al., 2019). Although a few regional onset time times differed across the two methods, our conclusions did not change concerning correlations between viral fitness and epidemic onset timing.

      We have rewritten the methods section for estimating epidemic onsets to clarify our methodology and to include the BEAST method (Lines 292-308):

      “We estimated the regional onsets of A(H3N2) virus epidemics by detecting breakpoints in A(H3N2) incidence curves at the beginning of each season. The timing of the breakpoint in incidence represents epidemic establishment (i.e., sustained transmission) rather than the timing of influenza introduction or arrival (Charu et al., 2017). We used two methods to estimate epidemic onsets: 1) piecewise regression, which models non-linear relationships with break points by iteratively fitting linear models to each segment (segmented R package) (Muggeo, 2008; Muggeo, 2003), and 2) a Bayesian ensemble algorithm (BEAST – a Bayesian estimator of Abrupt change, Seasonal change, and Trend) that explicitly accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (Rbeast R package) (Zhao et al., 2019). For each region in each season, we limited the time period of breakpoint detection to epidemic week 40 to the first week of maximum incidence and did not estimate epidemic onsets for regions with insufficient signal, which we defined as fewer than three weeks of consecutive incidence and/or greater than 30% of weeks with missing data. We successfully estimated A(H3N2) onset timing for most seasons, except for three A(H1N1) dominant seasons: 20002001 (0 regions), 2002-2003 (3 regions), and 2009-2010 (0 regions). Estimates of epidemic onset weeks were similar when using piecewise regression versus the BEAST method, and downstream analyses of correlations between viral fitness indicators and onset timing produced equivalent results. We therefore report results from onsets estimated via piecewise regression.”

      L773 national indicators -- presumably this is because you don't have regional-level information, but it might be worth saying that earlier so it doesn't read like there are other indicators now, called national indicators, that we should have heard of 

      In the revised manuscript, we move a paragraph that was at the beginning of the Results to the beginning of the Methods.

      Lines 123-132: 

      “Our study focuses on the impact of A(H3N2) virus evolution on seasonal epidemics from seasons 19971998 to 2018-2019 in the U.S.; whenever possible, we make use of regionally disaggregated indicators and analyses. We start by identifying multiple indicators of influenza evolution each season based on changes in HA and NA. Next, we compile influenza virus subtype-specific incidence time series for U.S. Department of Health and Human Service (HHS) regions and estimate multiple indicators characterizing influenza A(H3N2) epidemic dynamics each season, including epidemic burden, severity, type/subtype dominance, timing, and the age distribution of cases. We then assess univariate relationships between national indicators of evolution and regional epidemic characteristics. Lastly, we use multivariable regression models and random forest models to measure the relative importance of viral evolution, heterosubtypic interference, and prior immunity in predicting regional A(H3N2) epidemic dynamics.”

      In Lines 484-487 in the Methods, we now mention that measures of seasonal antigenic and genetic distance are at the national level. 

      “For each replicate dataset, we estimated national-level genetic and antigenic distances between influenza viruses circulating in consecutive seasons by calculating the mean distance between viruses circulating in the current season 𝑡 and viruses circulating during the prior season (𝑡 – 1 year; one season lag) or two prior seasons ago (𝑡 – 2 years; two season lag).”

      L782 Why Beta regression and what is "the resampled dataset" ? 

      Beta regression is appropriate for models of subtype dominance, epidemic intensity, and age-specific proportions of ILI cases because these data are continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). “The resampled dataset” refers to the “1000 bootstrap replicates of the original dataset (1000 samples with replacement)” mentioned in Lines 777-778 of the original manuscript. 

      In the revised manuscript, we include more background information about Beta regression models, and explicitly mention that regression models were fit to 1000 bootstrap replicates of the original dataset.

      Lines 503-507: 

      “For subtype dominance, epidemic intensity, and age-specific proportions of ILI cases, we fit Beta regression models with logit links. Beta regression models are appropriate when the variable of interest is continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). For each epidemic metric, we fit the best-performing regression model to 1000 bootstrap replicates of the original dataset.”

      The github is clear, comprehensive and well-documented, at least at a brief glance. 

      Thank you! At the time of resubmission, our GitHub repository is updated to incorporate feedback from the reviewers.

      References

      Altmann, A., Tolosi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), 1340-1347.

      https://doi.org/10.1093/bioinformatics/btq134  

      Barrat-Charlaix, P., Huddleston, J., Bedford, T., & Neher, R. A. (2021). Limited Predictability of Amino Acid Substitutions in Seasonal Influenza Viruses. Mol Biol Evol, 38(7), 2767-2777.

      https://doi.org/10.1093/molbev/msab065  

      Bedford, T., Riley, S., Barr, I. G., Broor, S., Chadha, M., Cox, N. J., Daniels, R. S., Gunasekaran, C. P.,

      Hurt, A. C., Kelso, A., Klimov, A., Lewis, N. S., Li, X., McCauley, J. W., Odagiri, T., Potdar, V., Rambaut, A., Shu, Y., Skepner, E., . . . Russell, C. A. (2015). Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature, 523(7559), 217-220.

      https://doi.org/10.1038/nature14460  

      Chao, A., Gotelli, N. J., Hsieh, T. C., Sander, E. L., Ma, K. H., Colwell, R. K., & Ellison, A. M. (2014). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1), 45-67. https://doi.org/10.1890/13-0133.1  Charu, V., Zeger, S., Gog, J., Bjornstad, O. N., Kissler, S., Simonsen, L., Grenfell, B. T., & Viboud, C. (2017). Human mobility and the spatial transmission of influenza in the United States. PLoS

      Comput Biol, 13(2), e1005382. https://doi.org/10.1371/journal.pcbi.1005382  

      Dalziel, B. D., Kissler, S., Gog, J. R., Viboud, C., Bjornstad, O. N., Metcalf, C. J. E., & Grenfell, B. T.

      (2018). Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities.

      Science, 362(6410), 75-79. https://doi.org/10.1126/science.aat6030  

      Debeer, D., & Strobl, C. (2020). Conditional permutation importance revisited. BMC Bioinformatics, 21(1), 307. https://doi.org/10.1186/s12859-020-03622-2  

      Dhanasekaran, V., Sullivan, S., Edwards, K. M., Xie, R., Khvorov, A., Valkenburg, S. A., Cowling, B. J., & Barr, I. G. (2022). Human seasonal influenza under COVID-19 and the potential consequences of influenza lineage elimination. Nat Commun, 13(1), 1721. https://doi.org/10.1038/s41467-02229402-5  

      Ferrari, S., & Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501  

      Garten, R. J., Davis, C. T., Russell, C. A., Shu, B., Lindstrom, S., Balish, A., Sessions, W. M., Xu, X., Skepner, E., Deyde, V., Okomo-Adhiambo, M., Gubareva, L., Barnes, J., Smith, C. B., Emery, S. L., Hillman, M. J., Rivailler, P., Smagala, J., de Graaf, M., . . . Cox, N. J. (2009). Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.

      Science, 325(5937), 197-201. https://doi.org/10.1126/science.1176225  

      Grebe, K. M., Yewdell, J. W., & Bennink, J. R. (2008). Heterosubtypic immunity to influenza A virus:

      where do we stand? Microbes Infect, 10(9), 1024-1029.

      https://doi.org/10.1016/j.micinf.2008.07.002  

      Hill, M. O. (1973). Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology, 54(2), 427-432. https://doi.org/https://doi.org/10.2307/1934352  

      Huddleston, J., Barnes, J. R., Rowe, T., Xu, X., Kondor, R., Wentworth, D. E., Whittaker, L., Ermetal, B., Daniels, R. S., McCauley, J. W., Fujisaki, S., Nakamura, K., Kishida, N., Watanabe, S., Hasegawa, H., Barr, I., Subbarao, K., Barrat-Charlaix, P., Neher, R. A., & Bedford, T. (2020).

      Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza

      A/H3N2 evolution. Elife, 9, e60067. https://doi.org/10.7554/eLife.60067  Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer. 

      Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Chapman and Hall/CRC. 

      Lee, E. C., Arab, A., Goldlust, S. M., Viboud, C., Grenfell, B. T., & Bansal, S. (2018). Deploying digital health data to optimize influenza surveillance at national and local scales. PLoS Comput Biol,

      14(3), e1006020. https://doi.org/10.1371/journal.pcbi.1006020  

      Lemey, P., Rambaut, A., Bedford, T., Faria, N., Bielejec, F., Baele, G., Russell, C. A., Smith, D. J., Pybus,

      O. G., Brockmann, D., & Suchard, M. A. (2014). Unifying viral genetics and human transportation

      data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog, 10(2), e1003932. https://doi.org/10.1371/journal.ppat.1003932  

      Muggeo, V. (2008). Segmented: An R Package to Fit Regression Models With Broken-Line Relationships. R News, 8, 20-25. 

      Muggeo, V. M. (2003). Estimating regression models with unknown break-points. Stat Med, 22(19), 30553071. https://doi.org/10.1002/sim.1545  

      Neher, R. A., Russell, C. A., & Shraiman, B. I. (2014). Predicting evolution from the shape of genealogical trees. Elife, 3, e03568. https://doi.org/10.7554/eLife.03568  

      Rambaut, A., Pybus, O. G., Nelson, M. I., Viboud, C., Taubenberger, J. K., & Holmes, E. C. (2008). The genomic and epidemiological dynamics of human influenza A virus. Nature, 453(7195), 615-619.

      https://doi.org/10.1038/nature06945  

      Recker, M., Pybus, O. G., Nee, S., & Gupta, S. (2007). The generation of influenza outbreaks by a network of host immune responses against a limited set of antigenic types. Proceedings of the National Academy of Sciences, 104(18), 7711-7716.

      https://doi.org/doi:10.1073/pnas.0702154104  

      Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423. 

      Smith, G. J., Vijaykrishna, D., Bahl, J., Lycett, S. J., Worobey, M., Pybus, O. G., Ma, S. K., Cheung, C. L., Raghwani, J., Bhatt, S., Peiris, J. S., Guan, Y., & Rambaut, A. (2009). Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature, 459(7250), 1122-1125. https://doi.org/10.1038/nature08182  

      Sridhar, S. (2016). Heterosubtypic T-Cell Immunity to Influenza in Humans: Challenges for Universal TCell Influenza Vaccines. Front Immunol, 7, 195. https://doi.org/10.3389/fimmu.2016.00195  

      Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. https://doi.org/10.1186/1471-2105-9-307  

      Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8, 25.

      https://doi.org/10.1186/1471-2105-8-25  

      Terajima, M., Babon, J. A., Co, M. D., & Ennis, F. A. (2013). Cross-reactive human B cell and T cell epitopes between influenza A and B viruses. Virol J, 10, 244. https://doi.org/10.1186/1743-422x10-244  

      Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M., & Kawaoka, Y. (1992). Evolution and ecology of influenza A viruses. Microbiological Reviews, 56(1), 152-179.

      https://doi.org/doi:10.1128/mr.56.1.152-179.1992  

      Wen, F., Bedford, T., & Cobey, S. (2016). Explaining the geographical origins of seasonal influenza A

      (H3N2). Proc Biol Sci, 283(1838). https://doi.org/10.1098/rspb.2016.1312  

      Yan, L., Neher, R. A., & Shraiman, B. I. (2019). Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. Elife, 8. https://doi.org/10.7554/eLife.44205  

      Zhao, K., Wulder, M. A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X., & Brown, M. (2019). Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing

      of Environment, 232, 111181. https://doi.org/10.1016/j.rse.2019.04.034  

      Zinder, D., Bedford, T., Gupta, S., & Pascual, M. (2013). The Roles of Competition and Mutation in Shaping Antigenic and Genetic Diversity in Influenza. PLOS Pathogens, 9(1).

      https://doi.org/10.1371/journal.ppat.1003104

    1. eLife assessment

      This work by Shin et al. demonstrated that a different form of PTH (R25C PTH) generated a comparable anabolic signal to rhPTH 1-34 using a large animal model. This valuable finding may have therapeutic potential in promoting bone formation or the healing process, and the methods seem solid, although there remains a concern regarding the small sample size and surgical procedure.

    2. Reviewer #1 (Public Review):

      Summary:

      This study, titled "Enhancing Bone Regeneration and Osseointegration using rhPTH(1-34) and Dimeric R25CPTH(1-34) in an Osteoporotic Beagle Model," provides valuable insights into the therapeutic effects of two parathyroid hormone (PTH) analogs on bone regeneration and osseointegration. The research is methodologically sound, employing a robust animal model and a comprehensive array of analytical techniques, including micro-CT, histological/histomorphometric analyses, and serum biochemical analysis.

      Strengths:

      The use of a large animal model, which closely mimics postmenopausal osteoporosis in humans, enhances the study's relevance to clinical applications. The study is well-structured, with clear objectives, detailed methods, and a logical flow from introduction to conclusion. The findings are significant, demonstrating the potential of rhPTH(1-34) and dimeric R25CPTH(1-34) in enhancing bone regeneration, particularly in the context of osteoporosis.

      Weaknesses: There are no major weaknesses.

    3. Reviewer #2 (Public Review):

      Summary:

      This article explores the regenerative effects of recombinant PTH analogues on osteogenesis.

      Strengths:

      Although PTH has known to induce the activity of osteoclasts, accelerating bone resorption, paradoxically its intermittent use has become a common treat for osteoporosis. Previous studies successfully demonstrated this phenomenon in vivo, but most of them used rodent animal models, inevitably having a limitation. In this article, the authors tried to address this, using a beagle model, and assessed the osseointegrative effect of recombinant PTH analogues. As a result, the authors clearly observed the regenerative effects of PTH analogues, and compared the efficacy, using histologic, biochemical, and radiologic measurement for surgical-endocrinal combined large animal models. The data seem to be solid, and has potential clinical implications.

      Weaknesses:

      All the issues that I raised have been resolved in the revision process.

      Overall, this paper is well-written and has clarity and consistency for a broader readership.

    4. Reviewer #3 (Public Review):

      Summary:

      The work submitted by Dr. Jeong-Oh Shin and co-workers aims to investigate the therapeutic efficacy of rhPTH(1-34) and R25CPTH(1-34) on bone regeneration and osseointegration of titanium implants using a postmenopausal osteoporosis animal model.

      In my opinion the findings presented are not strongly supported by the provided data since the methods utilized do not allow to significantly support the primary claims.

      Strengths:

      Strengths include certain good technologies utilized to perform histological sections (i.e. the EXAKT system).

      Weaknesses:

      Certain weaknesses significantly lower the enthusiasm for this work. Most important: the limited number of samples/group. In fact, as presented, the work has an n=4 for each treatment group. This limited number of samples/group significantly impairs the statistical power of the study. In addition, the implants were surgically inserted following a "conventional implant surgery", implying that no precise/guided insertion was utilized. This weakness is, in my opinion, particularly significant since the amount of bone osteointegration may greatly depend on the bucco-lingual positioning of each implant at the time of the surgical insertion (which should, therefore, be precisely standardized across all animals and for all surgical procedures).

      Comments on current version:

      As mentioned in my first review, this work is significantly underpowered for the following reasons: 1) n=4 for each treatment group.; 2) no randomization of the surgical sites receiving treatments; 3) implants surgically inserted without precision/guided surgery. The authors have not addressed these concerns.

      On a minor note: not sure why the authors present a methodology to evaluate the dynamic bone formation (line 272) but do not present results (i.e. by means of histomorphometrical analyses) utilizing this methodology.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1

      (Cys25)PTH(1-84) does not show efficacy surpassing that of the previously used rhPTH(1-34). This needs to be discussed biologically and clinically.

      Thank you very much for your valuable comments for enhancing the manuscript. We appreciate your input and have noted that this aspect was not addressed in the discussion. The authors have included the following paragraph in discussion section.

      “This biological difference is thought to be due to dimeric R25CPTH(1-34) exhibiting a more preferential binding affinity for the RG versus R0 PTH1R conformation, despite having a diminished affinity for either conformation. Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity.  (Noh et al., 2024) One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolcast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. (Noh et al., 2024) Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group.” (2nd paragraph, Discussion section)

      The terms (Cys25)PTH(1-84) and Dimeric R25CPTH(1-34) are being used interchangeably and incorrectly. A unification of these terms is necessary.

      We totally agree with the reviewer’s notion. R25CPTH(1-84) represents mutated human PTH, rhPTH(1-34) and dimeric R25CPTH(1-34) are synthesized PTH analogs. To clarified the terminology, we thus have changeed the terminology in the manuscript appear in red.

      The figure legend is incorrect. Not all figures are described, and even though there are figures from A to I, only up to E is explained, or the content is different.

      We apologize for our negligence. As suggested by a reviewer, we've fixed the figure legends throughout before the list of references in the manuscript as follows.

      “Figure legends

      Figure 1. Micro-CT analysis (A-D) Experimental design for the controlled delivery of rhPTH(1-34) and dimeric R25CPTH(1-34) in ovariectomized beagle model. Representative images for injection and placement of titanium implant. (E) Micro-CT analysis. bone mineral density (BMD), bone volume (TV; mm3), trabecular number (Tb.N; 1/mm), trabecular thickness (Tb. Th; um), trabecular separation (Tb.sp; ㎛). Error bars indicate standard deviation. Data are shown as mean ± s.d. *p<0.05, **p<0.01, ***p<0.001, n.s., not significant.  P, posterior. R, right

      Figure 2. (A-I) Histological analysis of the different groups stained in Goldner’s trichrome. The presence of bone is marked by the green color and soft tissue in red. Red arrows indicate the position with soft tissues without bone around the implant threads. The area of bone formed was the widest in the rhPTH(1-34)-treated group. In the dimeric R25CPTH(1-34)treated group, there is a greater amount of bone than vehicle-treated group. Green arrows represent the bone formed over the implant. blue dotted line, margin of bone and soft tissue; Scale bars: 1mm

      Figure 3. Histological analysis using Masson trichrome staining results in the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated group (A-L) Masson trichrome-stained sections of cancellous bone in the mandibular bone. The formed bone is marked by the color red. Collagen is stained blue. Black dotted box magnification region of trabecular bone in the mandible. Scale bars, A-C, G-I: 1mm; D-F, J-L: 200 ㎛

      Figure 4. Immunohistochemical analysis using TRAP staining for bone remodeling activity (A-L) TRAP staining is used to evaluate bone remodeling by staining osteoclasts. Osteoclasts is presented by the purple color. Black dotted box magnification region of trabecular bone in the mandible. (M, N) The number of TRAP-positive cells in the mandible of the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated beagles. Scale bars, A-C, G-I: 1mm; D-F, J-L: 200 ㎛. Error bars indicate standard deviation. Data are shown as mean ± s.d. *p<0.05, **p<0.01, n.s., not significant

      Figure 5. Measurement of biochemical Marker Dynamics in serum. The serum levels of calcium, phosphorus, P1NP, and CTX across three time points (T0, T1, T2) following treatment with dimeric dimeric R25CPTH(1-34), rhPTH(1-34), or control. (A-B) Calcium and phosphorus levels exhibit an upward trend in response to both PTH treatments compared to control, suggesting enhanced bone mineralization. (C) P1NP levels, indicative of bone formation, remain relatively unchanged across time and treatments. (D) CTX levels, associated with bone resorption, show no significant differences between groups. Data points for the dimeric R25CPTH(1-34), rhPTH(1-34), and control are marked by squares, circles, and triangles, respectively, with error bars representing confidence intervals.

      Supplementary Figure. Three-dimensional reconstructed image of the bone surrounding the implants. Three-dimensional reconstructed images of the peri-implant bone depicting the osseointegration after different therapeutic interventions. (A) Represents the bone response to recombinant human parathyroid hormone fragment (rhPTH 1-34) treatment, showing the most robust degree of bone formation around the implant in the three groups. (B) Shows the bone response to a modified PTH fragment (dimeric R25CPTH(1-34)), indicating a similar level of bone growth and integration as seen with rhPTH(1-34), although to a slightly lesser extent. (C) Serves as the control group, demonstrating the least amount of bone formation and osseointegration. The upper panel provides a top view of the bone-implant interface, while the lower panel offers a cross-sectional view highlighting the extent of bony ingrowth and integration with the implant surface.”

      In Figure 5, although the descriptions of T0, T1, T2 are mentioned in the method section, it would be more clear if there was a timeline like in Figure 1.

      Based on the reviewer’s advice, we have indicated the timing of T0, T1, and T2 in the materials & methods section describing the serum biochemical assay, and we have shown a timeline in figure 5.

      In Figure 5, instead of having calcium, phosphorus, P1NP, CTX graphs all under Figure 5, it would be more convenient for referencing in the text to label them as Figure 5A, Figure 5B, Figure 5C, Figure 5D.

      We totally understood the reviewer’s comment. As the reviewer’s suggested, we have corrected the labeling in the text for figure 5 as follows.

      “The levels of calcium, phosphorus, CTX, and P1NP were analyzed over time using RM-ANOVA (Figure 5). There were no significant differences between the groups for calcium and phosphorus at time points T0 and T1 (Figure 5A). However, after the PTH analog was administered at T2 (Figure 5A), the levels were highest in the rhPTH(1-34) group, followed by the dimeric R25CPTH(1-34) group, and then, lowest in the control group, which was statistically significant (Figure 5B,C). (P < 0.05) The differences between the groups over time for CTX and P1NP were not statistically significant (Figure 5D, E).”

      Significance should be indicated in the figure (no asterisk present).

      As the reviewer’s comment, we put the asterisk in the figure 5.

      Addition of Figures in Text:

      Line 112: change from "figure 2" to "figure 1" / Line 115: mention "figure 1. E"

      Line 120: refer to "figure 1. E" / Line 123: change from "figure 3" to "figure 2"

      Line 128: refer to "figure 2.A-C" / Line 137: mention "figure 3"

      Line 138: refer to "figure 3. A-L" / Line 143: mention "figure 3. A-L"

      Line 144: refer to "figure 3. E,F,K,L" / Line 148: mention "figure 4"

      Line 150: refer to "figure 4 M,N" / Line 152: mention "figure 4. M,N"

      Line 155: refer to "figure 5" / Line 157: mention "figure 5"

      Line 159: refer to "figure 5" / Line 171: mention "figure 1 E"

      Line 175: refer to "figure 2 M, N"/ Line 194: mention "figure 3"

      Above all, thank you for the reviewer’s notion. We corrected detailed figure labeling in text to red color.

      Response to Reviewer 2

      First, the authors should clarify why they compared the effects of rhPTH(1-34) and of dimeric R25C2 PTH(1-34)? In most of the parameters, rhPTH(1-34) seems to be superior to dimeric R25C2 PTH(1-34). Why did the authors insist that the anabolic effects of dimer were prominent? Even though implication of dimeric R25C2 PTH(1-34) was drawn from genetic mutation studies, the authors should describe more clearly in the discussion the potential clinical benefits of the dimeric R25C2 PTH(1-34) compared to rhPTH(1-34), especially if dimeric R25C2 PTH(1-34) has just partial agonistic effect in pharmacodynamics.

      Thank you for your insightful comments and questions regarding our results between rhPTH(1-34) and dimeric R25CPTH(1-34). rhPTH(1-34) is a well-characterized therapy for osteoporosis. In this study, rhPTH(1-34) generally showed superior outcomes in most parameters tested, the dimeric R25CPTH(1-34) exhibited specific anabolic effects that are not as pronounced with rhPTH(1-34). We recognized R25CPTH(1-34) as a anabolic effector. One of the potential advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. Also, based on our results, we notes that the effects of dimer were prominent, as we mentioned better bone formation than the control group. We appreciate your input and have noted that this aspect was not addressed in the discussion. As a result, we have included the following paragraph in discussion section.

      “This biological difference is thought to be due to dimeric R25CPTH(1-34) exhibiting a more preferential binding affinity for the RG versus R0 PTH1R conformation, despite having a diminished affinity for either conformation. Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity.  (Noh et al., 2024) One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolcast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. (Noh et al., 2024) Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group.” (2nd paragraph, Discussion section)

      Second, please describe the intermittent and continuous application of PTH analogues. Many of the readers may misunderstand that the authors' daily injection of PTHs were actually to mimic the clinical intermittent application or continuous one. Incorporation of the author's intention for experimental design would be more helpful for readers.

      Thank you for your insightful comments regarding the need for clearer differentiation between intermittent and continuous applications of PTH analogs in this study. We appreciate your concern that the readers may not fully grasp whether our daily injection protocol was intended to mimic clinical intermittent or continuous PTH administration. To address this, we have revised the manuscript to explicitly clarify that the daily injections of rhPTH(1-34) and dimeric R25CPTH(1-34) were designed to simulate the intermittent dosing regimen commonly used in clinical practice. This regimen is known to maximize the anabolic effects on bone while minimizing potential catabolic actions associated with more frequent or continuous hormone exposure. We have added detailed explanations in the Introduction, Methods, and Discussion sections to help readers understand our experimental design and its relevance to clinical settings.

      Introduction section

      “Administration of prathyroid hormone (PTH) analogs can be categorized into two distinct protocols: intermittent and continuous. Intermittent rhPTH(1-34) therapy, typically characterized by daily injections, is clinically used to enhance bone formation and strength. This method leverages the anabolic effects of rhPTH(1-34) without significant bone resorption, which can occur with more frequent or continuous exposure. On the other hand, continuous rhPTH(1-34) exposure, often modeled in research as constant infusion, tends to accelerate bone resorption activities, potentially leading to bone loss (Silva and Bilezikian, 2015; Jilka, 2007). Understanding these differences is crucial for interpreting the therapeutic implications of rhPTH(1-34) in bone health.”

      Silva, B. C., & Bilezikian, J. P. (2015). Parathyroid hormone: anabolic and catabolic actions on the skeleton. Current Opinion in Pharmacology, 22, 41-50.

      Jilka, R. L. (2007). Molecular and cellular mechanisms of the anabolic effect of intermittent PTH. Bone, 40(6), 1434-1446.

      Materials and Methods section

      “Each animal received one injection per day, aimed at replicating the intermittent rhPTH(1-34) exposure proven beneficial for bone regeneration and overall skeletal health in clinical settings (Neer et al., 2001; Kendler et al., 2018). This regimen was chosen to investigate the potential anabolic effects of these specific PTH analogs under conditions closely resembling therapeutic use.”

      Neer, R. M., Arnaud, C. D., Zanchetta, J. R., Prince, R., Gaich, G. A., Reginster, J. Y., Hodsman, A. B., Eriksen, E. F., Ish-Shalom, S., Genant, H. K., Wang, O., and Mitlak, B. H. (2001). Effect of Parathyroid Hormone (1-34) on Fractures and Bone Mineral Density in Postmenopausal Women with Osteoporosis. The New England Journal of Medicine, 344(19), 1434-1441.

      Kendler, D. L., Marin, F., Zerbini, C. A. F., Russo, L. A., Greenspan, S. L., Zikan, V., Bagur, A., Malouf-Sierra, J., Lakatos, P., Fahrleitner-Pammer, A., Lespessailles, E., Minisola, S., Body, J. J., Geusens, P., Moricke, R., & Lopez-Romero, P. (2018). Effects of Teriparatide and Risedronate on New Fractures in Post-Menopausal Women with Severe Osteoporosis (VERO): A Multicenter, Double-Blind, Double-Dummy, Randomized Controlled Trial. The Lancet, 391(10117), 230-240.

      Discussion section

      “The use of daily injections in this study was intended to simulate intermittent PTH therapy, a well-established clinical approach for managing osteoporosis and enhancing bone regeneration. Intermittent administration of PTH, as opposed to continuous exposure, is critical for maximizing the anabolic response while minimizing the catabolic effects that are associated with higher frequency or continuous hormone levels. Our findings support the notion that even with daily administration, both rhPTH(1-34) and dimeric dimeric R25CPTH(1-34) promote bone formation and osseointegration, consistent with the outcomes expected from intermittent therapy. It’s important for future research to consider the dosage and timing of administration to further optimize the therapeutic benefits of PTH analogs (Dempster et al., 2001; Hodsman et al., 2005).”

      Dempster, D. W., Cosman, F., Kurland, E. S., Zhou, H., Nieves, J., Woelfert, L., Shane, E., Plavetic, K., Müller, R., Bilezikian, J., & Lindsay, R. (2001). Effects of Daily Treatment with Parathyroid Hormone on Bone Microarchitecture and Turnover in Patients with Osteoporosis: A Paired Biopsy Study. Journal of Bone and Mineral Research, 16(10), 1846-1853.

      Hodsman, A. B., Bauer, D. C., Dempster, D. W., Dian, L., Hanley, D. A., Harris, S. T., Kendler, D. L., McClung, M. R., Miller, P. D., Olszynski, W. P., Orwoll, E., Yuen, C. K. (2005). Parathyroid Hormone and Teriparatide for the Treatment of Osteoporosis: A Review of the Evidence and Suggested Guidelines for Its Use. Endocrine Reviews, 26(5), 688-703.

      Third, please unify the nomenclature. Ensure consistency in the nomenclature throughout the article. Unify the naming conventions for PTH analogues, such as rhPTH(1-34) vs teriparatide and (Cys25)PTH(1-84) vs R25CPTH(1-34) vs R25CPTH(1-34) vs (1-84). Choose one nomenclature for each analogue and use it consistently throughout the article.

      We totally agree with the reviewer’s notion. R25CPTH(1-84) represents mutated human PTH, rhPTH(1-34) and dimeric R25CPTH(1-34) are synthesized PTH analogs. To clarified the terminology, we thus have changed the terminology in the manuscript appear in red.

      Response to Reviewer 3

      I would recommend to rewrite the manuscript in a form that is more understandable to the readers. In fact, it appears to me that this work was originally formatted in a way that would need the Materials and Methods to precede the results. As presented (and as requested by the eLife formatting) the Materials and Methods are available only at the end of the reading and, as a consequence, the readers needs to refer to the Materials and Methods to have a general and initial understanding of the study design (i.e. type of treatment for each group, etc are not well specified in the Results section).

      Thank you for you constructive comments and suggestions regarding the manuscript. We appreciate your feedback on the organization of the manuscript entirely. As reviewer mentioned, Materials and methods were placed after the discussion section in accordance with the format of the elife journal. For a better and initial understanding, a description of each experimental group has been added to the Results section as follow. Thank you again for your valuable comments.

      “To investigate evaluating and comparing the efficacy of rhPTH(1-34) and the dimeric R25CPTH(1-34) in promoting bone regeneration and healing in a clinically relevant animal model. In our study, beagle dogs were selected as the model due to their anatomical similarity to human oral structures, suitable size for surgeries, human-like bone turnover rates, and established oral health profiles, ensuring comparable and ethically sound research outcomes. The normal saline injected-control group, injected with 40ug/day PTH (Forsteo, Eli Lilly) group, and 40ug/day PTH analog-injected group. Animals in each group were injected subcutaneously for 10 weeks.”

    1. eLife assessment

      This valuable paper investigates how fish avoid thermal disturbances that occur on fast timescales. The authors use a creative experimental approach that quickly creates a vertical thermal interface, which they combine with careful behavioral analyses. The evidence supporting their results is solid, but there is a potential confounding factor between temperature and vertical positioning, and characterization of the thermal interface would greatly assist in interpreting the results.

    2. Reviewer #1 (Public Review):

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult.

    3. Reviewer #2 (Public Review):

      This paper investigates an interesting question: how do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales? Previous work has identified potential strategies for warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While I found the paper interesting and convincing overall, there are a few omissions and choices in the presentation that limit interpretability and clarity.

      A main question concerns the thermal interface itself. The authors track this interface using a blue dye that is mixed in with either colder or warmer water before a gate is opened that leads to gravitational flow overlaying the two water temperatures. The dye likely allows to identify convective currents which could lead to rapid mixing of water temperatures. However, it is less clear whether it accurately reflects thermal diffusion. This is problematic as the authors identify upward turning behavior around the interface which appears to be the behavioral strategy for avoiding cold water but not warm water. Without knowing the extent of the gradient across the interface, it is hard to know what the fish are sensing. The authors appear to treat the interface as essentially static, leading them to the conclusion that turning away before the interface is reached is likely related to associative learning. However, thermal diffusion could very likely create a gradient across centimeters which is used as a cue by the fish to initiate the turn. In an ideal world, the authors would use a thermal camera to track the relationship between temperature and the dye interface. Absent that, the simulation that is mentioned in passing in the methods section should be discussed in detail in the main text, and results should be displayed in Figure 1. Error metrics on the parameters used in the simulation could then be used to identify turns in subsequent figures that likely are or aren't affected by a gradient formed across the interface.

      The authors assume that the thermal interface triggers the upward-turning behavior. However, an alternative explanation, which should be discussed, is that cold water increases the tendency for upward turns. This could be an adaptive strategy since for temperatures > 4C turning swimming upwards is likely a good strategy to reach warmer water.

      The paper currently also suffers from a lack of clarity which is largely created by figure organization. Four main and 38 supplemental figures are very unusual. I give some specific recommendations below but the authors should decide which data is truly supplemental, versus supporting important points made in the paper itself. There also appear to be supplemental figures that are never referenced in the text which makes traversing the supplements unnecessarily tedious.<br /> The N that was used as the basis for statistical tests and plots should be identified in the figures to improve interpretability. To improve rigor, the experimental procedures should be expanded. Specifically, the paper uses two thermal models which are not detailed at all in the methods section.

    4. Reviewer #3 (Public Review):

      In this study, the authors measured the behavioural responses of brown trout to the sudden availability of a choice between thermal environments. The data clearly show that these fish avoid colder temperatures than the acclimation condition, but generally have no preference between the acclimation condition or warmer water (though I think the speculation that the fish are slowly warming up is interesting). Further, the evidence is compelling that avoidance of cold water is a combination of thermotaxis and thermokinesis. This is a clever experimental approach and the results are novel, interesting, and have clear biological implications as the authors discuss. I also commend the team for an extremely robust, transparent, and clear explanation of the experimental design and analytical decisions. The supplemental material is very helpful for understanding many of the methodological nuances, though I admit that I found it overwhelming at times and wonder if it could be pruned slightly to increase readability. Overall, I think the conclusions are generally well-supported by the data, and I have no major concerns.

    1. eLife assessment

      This study describes the development and validation of an Automated Reproducible Mechano-stimulator (ARM), a valuable tool for standardizing and automating somatosensory behavior experiments. The data supporting the use of the ARM system are compelling, though the determination of whether that device emits any sounds, including in the ultrasonic range when in operation or when at rest, would add value to the study. Nevertheless, the ARM system is anticipated to be popular amongst somatosensory and pain researchers.

    2. Reviewer #1 (Public Review):

      Allodynia is commonly measured in the pain field using von Frey filaments, which are applied to a body region (usually hindpaw if studying rodents) by a human. While humans perceive themselves as being objective, as the authors noted, humans are far from consistent when applying these filaments. Not to mention, odors from humans, including those of different sexes, can influence animal behavior. There is thus a major unmet need for a way to automate this tedious von Frey testing process and to remove humans from the experiment. I have no major scientific concerns with the study, as the authors did an outstanding job of comparing this automated system to human experimenters in a rigorous and quantitative manner. They even demonstrated that their automated system can be used in conjunction with in vivo imaging techniques.

      While it is somewhat unclear how easy and inexpensive this device will be, I anticipate everyone in the pain field will be clamoring to get their hands on a system like this. And given the mechanical nature of the device and the propensity for mice to urinate on things, I also wonder how frequently the device breaks/needs to be repaired. Perhaps some details regarding the cost and reliability of the device would be helpful to include, as these are the two things that could make researchers hesitant to adopt immediately.

      The only major technical concern, which is easy to address, is whether the device generates ultrasonic sounds that rodents can hear when idle or operational, across the ultrasonic frequencies that are of biological relevance (20-110 kHz). These sounds are generally alarm vocalizations and can create stress in animals, and/or serve as cues of an impending stimulus (if indeed they are produced by the device).

    3. Reviewer #2 (Public Review):

      Summary:

      Burdge, Juhmka, et al describe the development and validation of a new automated system for applying plantar stimuli in rodent somatosensory behavior tasks. This platform allows the users to run behavior experiments remotely, removing experimenter effects on animals and reducing variability in the manual application of stimuli. The system integrates well with other automated analysis programs that the lab has developed, providing a complete package for standardizing behavior data collection and analysis. The authors present extensive validations of the system against manual stimulus application. Some proof of concept studies also show how the system can be used to better understand the effect of experimenters on behavior and the effects of how stimuli are presented on the micro features of the animal withdrawal response.

      Strengths:

      If widely adopted, ARM has the potential to reduce variability in plantar behavior studies across and within labs and provide a means to standardize results. The system is well-validated and results clearly and convincingly presented. Most claims are well supported by experimental evidence.

      Weaknesses:

      ARM seems like a fantastic system that could be widely adopted, but no details are given on how a lab could build ARM, thus its usefulness is limited.

      The ARM system appears to stop short of hitting the desired forces that von Frey filaments are calibrated toward (Figure 2). This may affect the interpretation of results.

      The authors mention that ARM generates minimal noise; however, if those sounds are paired with stimulus presentation they could still prompt a withdrawal response. Including some 'catch' trials in an experiment could test for this.

      The experimental design in Figure 2 is unclear- did each experimenter have their own cohort of 10 mice, or was a single cohort of mice shared? If shared, there's some concern about repeat testing.

    4. Reviewer #3 (Public Review):

      Summary:

      This report describes the development and initial applications of the ARM (Automated Reproducible Mechano-stimulator), a programmable tool that delivers various mechanical stimuli to a select target (most frequently, a rodent hindpaw). Comparisons to traditional testing methods (e.g., experimenter application of stimuli) reveal that the ARM reduces variability in the anatomical targeting, height, velocity, and total time of stimulus application. Given that the ARM can be controlled remotely, this device was also used to assess the effect of the experimenter's presence on reflexive responses to mechanical stimulation. Lastly, the ARM was used to stimulate rodent hind paws while measuring neuronal activity in the basolateral nucleus of the amygdala (BLA), a brain region that is associated with the negative effect of pain. This device, and similar automated devices, will undoubtedly reduce experimenter-related variability in reflexive mechanical behavior tests; this may increase experimental reproducibility between laboratories.

      Strengths:

      Clear examples of variability in experimenter stimulus application are provided and then contrasted with uniform stimulus application that is inherent to the ARM.

      Weaknesses:

      Limited details are provided for statistical tests and inappropriate claims are cited for individual tests. For example, in Figure 2, differences between researchers at specific forces are reported to be supported by a 2-way ANOVA; these differences should be derived from a post-hoc test that was completed only if the independent variable effects (or interaction effect) were found to be significant in the 2-way ANOVA. In other instances, statistical test details are not provided at all (e.g., Figures 3B, 3C, Figure 4, Figure 6G).

      One of the arguments for using the ARM is that it will minimize the effect that the experimenter's presence may have on animal behavior. In the current manuscript, the effects of the experimenter's presence on both habituation time and aspects of the withdrawal reflex are minimal for Researcher 2 and non-existent for Research 1. This is surprising given that Researcher 2 is female; the effect of experimenter presence was previously documented for male experiments as the authors appropriately point out (Sorge et al. PMID: 24776635). In general, this argument could be strengthened (or perhaps negated) if more than N=2 experiments were included in this assessment.

      The in vivo BLA calcium imaging data feel out of place in this manuscript. Is the point of Figure 6 to illustrate how the ARM can be coupled to Inscopix (or other external inputs) software? If yes, the following should be addressed: why do the up-regulated and down-regulated cell activities start increasing/decreasing before the "event" (i.e., stimulus application) in Figure 6F? Why are the paw withdrawal latencies and paw distanced travelled values in Figures 6I and 6J respectively so much faster/shorter than those illustrated in Figure 5 where the same approach was used?

      Another advance of this manuscript is the integration of a 500 fps camera (as opposed to a 2000 fps camera) in the PAWS platform. To convince readers that the use of this more accessible camera yields similar data, a comparison of the results for cotton swabs and pinprick should be completed between the 500 fps and 2000 fps cameras. In other words, repeat Supplementary Figure 3 with the 2000 fps camera and compare those results to the data currently illustrated in this figure.

    1. eLife assessment

      This study makes a valuable contribution to understanding antiviral responses in fish by revealing a role for the cell cycle protein kinase CDK2 in type I interferon signaling. The evidence supporting the authors' claims is solid, including both in vivo and in vitro investigative approaches. However, the mechanisms underlying CDK2 activity are not completely established. This work will be of interest to cell biologists, immunologists, and virologists.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors set out to evaluate the regulation of interferon (IFN) gene expression in fish, using mainly zebrafish as a model system. Similar to more widely characterized mammalian systems, fish IFN is induced during viral infection through the action of the transcription factor IRF3 which is activated by phosphorylation by the kinase TBK1. It has been previously shown in many systems that TBK1 is subjected to both positive and negative regulation to control IFN production. In this work, the authors find that the cell cycle kinase CDK2 functions as a TBK1 inhibitor by decreasing its abundance through the recruitment of the ubiquitinylation ligase, Dtx4, which has been similarly implicated in the regulation of mammalian TBK1. Experimental data are presented showing that CDK2 interacts with both TBK1 and Dtx4, leading to TBK1 K48 ubiquitinylation on K567 and its subsequent degradation by the proteasome.

      Strengths:

      The strengths of this manuscript are its novel demonstration of the involvement of CDK2 in a process in fish that is controlled by different factors in other vertebrates and its clear and supportive experimental data.

      Weaknesses:

      The weaknesses of the study include the following. 1) It remains unclear whether the function described for CDK2 is regulatory, that is, it affects TBK1 levels during physiological responses such as viral infection or cell cycle progression, or if it is homeostatic, governing the basal abundance of TBK1 but not responding to signaling. 2) The authors have not explored whether the catalytic activity of CDK2 is required for TBK1 ubiquitinylation and, if so, what its target specificity is. 3) Given the multitude of CDK isoforms in fish, it remains unexplored whether the identified fish CDK2 homolog is a requisite cell cycle regulator or if its action in the cell cycle is redundant with other CDKs.