10,000 Matching Annotations
  1. Apr 2025
    1. eLife Assessment

      This valuable study provides a comprehensive description of the Nematostella vectensis matrisome - the genes encoding the proteins of the extracellular matrix. The authors combine new mass spectrometry data with bioinformatic analyses of previously published genomic and single-cell RNAseq data. Although this work will be of interest to biologists working on the evolution of the matrisome, as well as more broadly those working with non-bilaterian animals, in its current state it is incomplete due to the lack of rigorous criteria for manual curation and comprehensive annotation of the predicted matrisome.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript entitled "Molecular dynamics of the matrisome across sea anemone life history", Bergheim and colleagues report the prediction, using an established sequence analysis pipeline, of the "matrisome" - that is, the compendium of genes encoding constituents of the extracellular matrix - of the starlet sea anemone Nematostella vectensis. Re-analysis of an existing scRNA-Seq dataset allowed the authors to identify the cell types expressing matrisome components and different developmental stages. Last, the authors apply time-resolved proteomics to provide experimental evidence of the presence of the extracellular matrix proteins at three different stages of the life cycle of the sea anemone (larva, primary polyp, adult) and show that different subsets of matrisome components are present in the ECM at different life stages with, for example, basement membrane components accompanying the transition from larva to primary polyp and elastic fiber components and matricellular proteins accompanying the transition from primary polyp to the adult stage.

      Strengths:

      The ECM is a structure that has evolved to support the emergence of multicellularity and different transitions that have accompanied the complexification of multicellular organisms. Understanding the molecular makeup of structures that are conserved throughout evolution is thus of paramount importance.

      The in-silico predicted matrisome of the sea anemone has the potential to become an essential resource for the scientific community to support big data annotation efforts and understand better the evolution of the matrisome and of ECM proteins, an important endeavor to better understand structure/function relationships. This study is also an excellent example of how integrating datasets generated using different -omic modalities can shed light on various aspects of ECM metabolism, from identifying the cell types of origins of matrisome components using scRNA-Seq to studying ECM dynamics using proteomics.

      Weaknesses:

      My concerns pertain to the three following areas of the manuscript:

      (1) In-silico definition of the anemone matrisome using sequence analysis:

      a) While a similar computational pipeline has been applied to predict the matrisome of several model organisms, the authors fail to provide a comprehensive definition of the anemone matrisome: In the text, the authors state the anemone matrisome is composed of "551 proteins, constituting approximately 3% of its proteome (see page 6, line 14), but Figure 1 lists 829 entries as part of the "curated" matrisome, Supplementary Table S1 lists the same 829 entries and the authors state that "Here, we identified 829 ECM proteins that comprise the matrisome of the sea anemone Nematostella vectensis" (see page 17, line 10). Is the sea anemone matrisome composed of 551 or 829 genes? If we refer to the text, the additional 278 entries should not be considered as part of the matrisome, but what is confusing is that some are listed as glycoproteins and the "new_manual_annotation" proposed by the authors and that refer to the protein domains found in these additional proteins suggest that in fact, some could or should be classified as matrisome proteins. For example, shouldn't the two lectins encoded by NV2.3951 and NV2.3157 be classified as matrisome-affiliated proteins? Based on what has been done for other model organisms, receptors have typically been excluded from the "matrisome" but included as part of the "adhesome" for consistency with previously published matrisome; the reviewer is left wondering whether the components classified as "Other" / "Receptor" should not be excluded from the matrisome and moved to a separate "adhesome" list.

      In addition to receptors, the authors identify nearly 70 glycoproteins classified as "Other". Here, does other mean "non-matrisome" or "another matrisome division" that is not core or associated? If the latter, could the authors try to propose a unifying term for these proteins? Unfortunately, since the authors do not provide the reasons for excluding these entries from the bona fide matrisome (list of excluding domains present, localization data), the reader is left wondering how to treat these entries.

      Overall, the study would gain in strength if the authors could be more definitive and, if needed, even propose novel additional matrisome annotations to include the components for now listed as "Other" (as was done, for example, for the Drosophila or C. elegans matrisomes).

      b) It is surprising that the authors are not providing the full currently accepted protein names to the entries listed in Supplementary Table S1 and have used instead "new_manual_annotation" that resembles formal protein names. This liberty is misleading. In fact, the "new_manual_annotation" seems biased toward describing the reason the proteins were positively screened for through sequence analysis, but many are misleading because there is, in fact, more known about them, including evidence that they are not ECM proteins. The authors should at least provide the current protein names in addition to their "new_manual_annotations".

      c) To truly serve as a resource, the Table should provide links to each gene entry in the Stowers Institute for Medical Research genome database used and some sort of versioning (this could be added to columns A, B, or D). Such enhancements would facilitate the assessment of the rigor of the list beyond the manual QC of just a few entries.

      d) Since UniProt is the reference protein knowledge database, providing the UniProt IDs associated with the predicted matrisome entries would also be helpful, giving easy access to information on protein domains, protein structures, orthology information, etc.

      e) In conclusion, at present, the study only provides a preliminary draft that should be more rigorously curated and enriched with more comprehensive and authoritative annotations if the authors aspire the list to become the reference anemone matrisome and serve the community.

      (2) Proteomic analysis of the composition of the mesoglea during the sea anemone life cycle:

      a) The product of 287 of the 829 genes proposed to encode matrisome components was detected by proteomics. What about the other ~550 matrisome genes? When and where are they expressed? The wording employed by the authors (see line 11, page 13) implies that only these 287 components are "validated" matrisome components. Is that to say that the other ~550 predicted genes do not encode components of the ECM? This should be discussed.

      b) Can the authors comment on how they have treated zero TMT values or proteins for which a TMT ratio could not be calculated because unique to one life stage, for example?

      c) Could the authors provide a plot showing the distribution of protein abundances for each matrisome category in the main figure 4? In mammals, the bulk of the ECM is composed of collagens, followed by fibrillar ECM glycoproteins, the other matrisome components being more minor. Is a similar distribution observed in the sea anemone mesoglea?

      d) Prior proteomic studies on the ECM of vertebrate organisms have shown the importance of allowing certain post-translational modifications during database search to ensure maximizing peptide-to-spectrum matching. Such PTMs include the hydroxylation of lysines and prolines that are collagen-specific PTMs. Multiple reports have shown that omitting these PTMs while analyzing LC-MS/MS data would lead to underestimating the abundance of collagens and the misidentification of certain collagens. The authors may want to re-analyze their dataset and include these PTMs as part of their search criteria to ensure capturing all collagen-derived peptides.

      e) The authors should ensure that reviewers are provided with access to the private PRIDE repository so the data deposited can also be evaluated. They should also ensure that sufficient meta-data is provided using the SRDF format to allow the re-use of their LC-MS/MS datasets.

      (3) Supplementary tables:

      The supplementary tables are very difficult to navigate. They would become more accessible to readers and non-specialists if they were accompanied by brief legends or "README" tabs and if the headers were more detailed (see, for example, Table S2, what does "ctrl.ratio_Larvae_rep2" exactly refer to? Or Table S6 whose column headers using extensive abbreviations are quite obscure). Similarly, what do columns K to BX in Supplementary Table S1 correspond to? Without more substantial explanations, readers have no way of assessing these data points.

    3. Reviewer #2 (Public review):

      This work set out to identify all extracellular matrix proteins and associated factors present within the starlet sea anemone Nematostella vectensis at different life stages. Combining existing genomic and transcriptomic datasets, alongside new mass spectometry data, the authors provide a comprehensive description of the Nematostella matrisome. In addition, immunohistochemistry and electron microscopy were used to image whole mount and de-cellularized mesoglea from all life stages. This served to validate the de-cellularization methods used for proteomic analyses, but also resulted in a very nice description of mesoglea structure at different life stages. A previously published developmental cell type atlas was used to identify the cell type specificity of the matrisome, indicating that the core matrisome is predominantly expressed in the gastrodermis, as well as cnidocytes. The analyses performed were rigorous and the results were clear, supporting the conclusions made by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Bergheim et al investigates the molecular and developmental dynamics of the matrisome, a set of gene products that comprise the extra cellular matrix, in the sea anemone Nematostella vectensis using transcriptomic and proteomic approaches. Previous work has examined the matrisome of the hydra, a medusozoan, but this is the first study to characterize the matrisome in an anthozoan. The major finding of this work is a description of the components of the matrisome in Nematostella, which turns out to be more complex than that previously observed in hydra. The authors also describe remodeling of the extra cellular matrix that occurs in the transition from larva to primary polyp, and from primary polyp to adult. The authors interpret these data to support previously proposed (Steinmetz et al. 2017) homology between the cnidarian endoderm with the bilaterian mesoderm.

      Strengths:

      The data described in this work are comprehensive (but see important considerations of reviewer #1) combining both transcriptome and proteomic interrogation of key stages in the life history of Nematostella and are of value to the community.

      Weaknesses:

      The authors offer numerous evolutionary interpretations of their results that I believe are unfounded. The main problem with extending these results, together with previous results from hydra, into an evolutionary synthesis that aims to reconstruct the matrisome of the ancestral cnidarian is that we are considering data from only two species. I agree with the authors' depiction of hydra as "derived" relative to other medusozoans and see it as potentially misleading to consider the hydra matrisome as an exemplar for the medusozoan matrisome. Given the organismal and morphological diversity of the phylum, a more thorough comparative study that compares matrisome components across a selection of anthozoan and medusozoan species using formal comparative methods to examine hypotheses is required.<br /> Specifically, I question the author's interpretation of the evolutionary events depicted in this statement:

      "The observation that in Hydra both germ layers contribute to the synthesis of core matrisome proteins (Epp et al. 1986; Zhang et al. 2007) might be related to a secondary loss of the anthozoan-specific mesenteries, which represent extensions of the mesoglea into the body cavity sandwiched by two endodermal layers."<br /> Anthozoans and medusozoans are evolutionary sisters. Therefore, secondary loss of "anthozoan-like mesenteries" in hydrozoans is at least as likely as the gain of this character state in anthozoans. By extension, there is no reason to prefer the hypothesis that the state observed in Nematostella, where gastroderm is responsible for the synthesis of the core matrisome components, is the ancestral state of the phylum.<br /> Moreover, the fossil evidence provided in support of this hypotheses (Ou et al. 2022)is not relevant here because the material described in that work is of a crown group anthozoan, which diversified well after the origin of Anthozoa. The phylogenetic structure of Cnidaria has been extensively studied using phylogenomic approaches and is generally well supported(Kayal et al. 2018; DeBiasse et al. 2024). Based on these analyses, anthozoans are not on a "basal" branch, as the authors suggest. The structure of cnidarian phylogeny bifurcates with Anthozoa forming one clade and Medusozoa forming the other. From the data reported by Bergheim and co-workers, it is not possible to infer the evolutionary events that gave rise to the different matrisome states observed in Nematostella (an anthozoan) and hydra (a medusozoan).<br /> Furthermore, I take the observation in Fig 5 that anthozoan matrisomes generally exhibit a higher complexity than other cnidarian species to be more supportive of a lineage specific expansion of matrisome components in the Anthozoa, rather than those components being representative of an ancestral state for Cnidaria. Whatever the implication, I take strong issue with the statement that "the acquisition of complex life cycles in medusozoa, that are distinguished by the pelagic medusa stage, led to a secondary reduction in the matrisome repertoire." There is no causal link in any of the data or analyses reported by Bergheim and co-workers to support this statement and, as stated above, while we are dealing with limited data, insufficient to address this question, it seems more likely to me that the matrisome expanded in anthozoans, contrasting with the authors conclusions. While the discussion raises many interesting evolutionary hypotheses related to the origin of the cnidarian matrisome, which is of vital interest if we are to understand the origin of the bilaterian matrisome, a more thorough comparative analysis, inclusive of a much greater cnidarian species diversity, is required if we are to evaluate these hypotheses.

      DeBiasse MB, Buckenmeyer A, Macrander J, Babonis LS, Bentlage B, Cartwright P, Prada C, Reitzel AM, Stampar SN, Collins A, et al. 2024. A Cnidarian Phylogenomic Tree Fitted With Hundreds of 18S Leaves. Bulletin of the Society of Systematic Biologists [Internet] 3. Available from: https://ssbbulletin.org/index.php/bssb/article/view/9267

      Epp L, Smid I, Tardent P. 1986. Synthesis of the mesoglea by ectoderm and endoderm in reassembled hydra. J Morphol [Internet] 189:271-279. Available from: https://pubmed.ncbi.nlm.nih.gov/29954165/

      Kayal E, Bentlage B, Sabrina Pankey M, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. 2018. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol [Internet] 18:1-18. Available from: https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-018-1142-0

      Ou Q, Shu D, Zhang Z, Han J, Van Iten H, Cheng M, Sun J, Yao X, Wang R, Mayer G. 2022. Dawn of complex animal food webs: A new predatory anthozoan (Cnidaria) from Cambrian. The Innovation 3:100195.

      Steinmetz PRH, Aman A, Kraus JEM, Technau U. 2017. Gut-like ectodermal tissue in a sea anemone challenges germ layer homology. Nature Ecology & Evolution 2017 1:10 [Internet] 1:1535-1542. Available from: https://www.nature.com/articles/s41559-017- 0285-5

      Zhang X, Boot-Handford RP, Huxley-Jones J, Forse LN, Mould AP, Robertson DL, Li L, Athiyal M, Sarras MP. 2007. The collagens of hydra provide insight into the evolution of metazoan extracellular matrices. J Biol Chem [Internet] 282:6792-6802. Available from: https://pubmed.ncbi.nlm.nih.gov/17204477/

    5. Author response:

      We appreciate the effort the reviewers have put into evaluating our work, and will take the opportunity to revise and improve our submission. In response to the reviewer's comments, we will carefully revisit our manuscript to address the concerns they have raised. Specifically, we will ensure that our revised version is coherent with our annotations and public databases, clarify any discrepancy between the investigated proteins and gene models, and re-examine our discussion of the evolutionary implications in light of their suggestions. We are confident that these revisions will strengthen our work and provide a clearer understanding of our research findings.

    1. eLife Assessment

      This important study utilizes single-cell RNA sequencing to reveal the heterogeneity of trans-sialidase-like superfamily gene expression in Trypanosoma cruzi populations. The approach is highly convincing, as it successfully assigns cells to specific developmental forms and highlights the variability in surface protein expression among trypomastigotes. However, while the findings are solid and contribute to the understanding of immune evasion mechanisms, the study would benefit from a more detailed exploration of the regulatory factors governing trans-sialidase expression. Strengthening this aspect would further enhance its impact on researchers studying T. cruzi pathogenesis and host-parasite interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to assess the variability in the expression of surface protein multigene families between amastigote and trypomastigote Trypanosoma cruzi, as well as between individuals within each population. The analysis presented shows higher expression of multigene family transcripts in trypomastigotes compared to amastigotes and that there is variation in which copies are expressed between individual parasites. Notably, they find no clear subpopulations expressing previously characterised trans-sialidase groups. The mapping accuracy to these multicopy genes requires demonstration to confirm this, and the analysis could be extended further to probe the features of the top expressed genes and the other multigene families also identified as variable.

      Strengths:

      The authors successfully process methanol-fixed parasites with the 10x Genomics platform. This approach is valuable for other studies where using live parasites for these methods is logistically challenging.

      Weaknesses:

      The authors describe a single experiment, which lacks controls or complementation with other approaches and the investigation is limited to the trans-sialidase transcripts.

      It would be more convincing to show either bioinformatically or by carrying out a controlled experiment, that the sequencing generated has been mapped accurately to different members of multigene families to distinguish their expression. If mapping to the multigene families is inaccurate, this will impact the transcript counts and downstream analysis.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a valuable single-cell RNA-seq study on Trypanosoma cruzi, an important human parasite. It investigates the expression heterogeneity of surface proteins, particularly those from the trans-sialidase-like (TcS) superfamily, within amastigote and trypomastigote populations. The findings suggest a previously underappreciated level of diversity in TcS expression, which could have implications for understanding parasite-host interactions and immune evasion strategies. The use of single-cell approaches to delve into population heterogeneity is strong. However, the study does have some limitations that need to be addressed.

      The focus on single-cell transcriptional heterogeneity in surface proteins, especially the TcS family, in T. cruzi is novel. Given the important role of these proteins in parasite biology and host interaction, the findings have potential significance.

      Strengths:

      The key finding of heterogeneous TcS expression in trypomastigotes is well-supported. The analysis comparing multigene families, single-copy genes, and ribosomal proteins highlights the unusual nature of the variation in surface protein-coding genes.

      Weaknesses:

      While the manuscript identifies TcS heterogeneity, the functional implications of the different expression profiles remain speculative. The authors state it may reflect differences in infectivity, but no direct experimental evidence supports this.

      The manuscript lacks any functional validation of the single-cell findings. For instance, do the trypomastigote subpopulations identified based on TcS expression exhibit differences in infectivity, host cell tropism, or immune evasion? Such experiments would greatly strengthen the study.

      The authors identify a subpopulation of TcS genes that are highly expressed in many cells. However, it is unclear if these correspond to previously characterized TcS members with specific functions.<br /> The authors hypothesize that observed heterogeneity may relate to chromatin regulation. However, the study does not directly address these mechanisms. There are interesting connections to be made with what they identify as the colocalization of genes within chromatin folding domains, but the authors do not fully explore this. It would be insightful to address these mechanisms in future work.

      The merging of technical replicates needs further justification and explanation as they were not processed through separate experimental conditions. While barcodes were retained, it would be informative to know how well each technical replicate corresponds with the other. If both datasets were sequenced on the same lane, the inclusion of technical replicates adds noise to the analysis.<br /> While the number of cells sequenced (3192) seems reasonable, it's not clear how much the conclusions are affected by the depth of sequencing. A more detailed description of the sequencing depth and its impact on gene detection would be valuable.

      While most of the methods are clear, the way in which the subsampled gene lists were generated could be more thoroughly described, as some details are not clear for the subsampling of single-copy genes.

      Some of the figures are difficult to interpret. For example, the color scaling in the heatmap of Supplementary Figure 3B is not self-explanatory and it is hard to extract meaningful conclusions from the graph.

    4. Reviewer #3 (Public review):

      The study aimed to address a fundamental question in T. cruzi and Chagas disease biology - how much variation is there in gene expression between individual parasites? This is particularly important with respect to the surface protein-encoding genes, which are mainly from massive repetitive gene families with 100s to 1000s of variant sequences in the genome. There is very little direct evidence for how the expression of these genes is controlled. The authors conducted a single-cell RNAseq experiment of in vitro cultured parasites with a mixture of amastigotes and trypomastigotes. Most of the analysis focused on the heterogeneity of gene expression patterns amongst trypomastigotes. They show that heterogeneity was very high for all gene classes, but surface-protein encoding genes were the most variable. In the case of the trans-sialidase gene family, many sequence variants were only detected in a small minority of parasites. The biology of the parasite (e.g. extensive post-transcriptional regulation) and potential technical caveats (e.g. high dropout rates across the genome) make it difficult to infer what this might mean for actual protein expression on the parasite surface.

      (1) Limit of detection and gene dropouts

      An average of ~1100 genes are detected per parasite which indicates a dropout rate of over 90%. It appears that RNA for the "average" single copy 'core' gene is only detected in around 3% of the parasites sampled (Figure 2c: ~100 / 3192). This may be comparable with some other trypanosome scRNAseq studies, but this still seems to be a major caveat to the interpretation that high cell-to-cell variability in gene expression is explained by biological rather than technical factors. The argument would be more convincing if the dropout rates and expression heterogeneity were minimal for well-known highly expressed genes e.g. tubulin, GAPDH, and ribosomal RNAs. Admittedly, in their Final Remarks, the authors are very cautious in their interpretation, but it would be good to see a more thorough discussion of technical factors that might explain the low detection rates and how these could be tested or overcome in future work.

      (2) Heterogeneity across the board

      The authors focus on the relative heterogeneity in RNA abundance for surface proteins from the multicopy gene families vs core genes. While multicopy gene sequences do show more cell-to-cell variability, the differences (Figure 2D) are roughly average Gini values of 0.99 vs 0.97 (single copy) or 0.95 (ribosomal). Other studies that have applied similar approaches in other systems describe Gini values of < 0.2-0.25 for evenly expressed "housekeeping" genes (PMIDs 29428416, 31784565). Values observed here of >0.9 indicate that the distribution for all gene classes is extremely skewed and so the biological relevance of the comparison is uncertain.

      Nevertheless, this study does provide some tantalising evidence that the expression of surface genes may vary substantially between individual parasites in a single clonal population. The study is also amongst the very first to apply scRNAseq to T. cruzi, so the broader data set will be an important resource for researchers in the field.

    5. Author response:

      We sincerely thank all three reviewers for their time, comments, and valuable suggestions, which will help improve our manuscript. Below, we provide preliminary remarks addressing some of the key issues that have been raised.

      Reviewer 1:

      We agree with the reviewer on the challenge of accurately mapping reads to multigene families. We carefully considered this issue and addressed it by evaluating the performance of multiple aligners using simulated RNA-seq reads. Our results indicate that kallisto performs particularly well in this context, outperforming widely used aligners such as Bowtie2 and STAR. This is likely due to kallisto’s expectation-maximization (EM) algorithm (described in the Materials and Methods section), which employs a probabilistic model to assign reads from similar transcripts. Previous studies have demonstrated the effectiveness of this approach in quantifying highly repetitive sequences, such as transposons (doi.org/10.1093/bioinformatics/btv422). In the revised manuscript, we are considering the inclusion of a supplementary figure to further support the selection of the mapping algorithm.

      Reviewer 2:

      We believe that obtaining experimental evidence on the influence of multiple multigene families would represent a significant advancement in the field. However, we would like to emphasize that this is a short communication centered on a specific and biologically relevant observation within a single multigene family. The manuscript is not intended to comprehensively address all aspects of the experiment but rather to highlight what we consider an important biological phenomenon with potential functional implications.

      The influence of phenotypic heterogeneity and its possible advantages under environmental pressures has been previously proposed for Trypanosoma cruzi, related trypanosomatids, and other biological systems, ranging from bacteria to tumors (Seco-Hidalgo 2015, doi: 10.1098/rsob.150190 and Luzak 2021, doi: 10.1146/annurev-micro-040821-012953, for a comprehensive review on this topic). While the reviewer is correct in noting that our model does not demonstrate a functional role for TcS heterogeneity, the experimental approaches required to address this question in a large multigene family are highly complex and beyond the scope of this study. However, we acknowledge the importance of clarifying that the proposed functional implications remain speculative, so we will revise the manuscript accordingly.

      As the reviewer suggests, in the revised version of the manuscript, we will include additional analyses on the characteristics of frequently expressed TcS genes to identify common features that may explain their expression patterns.

      We appreciate the reviewer’s comments and suggestions regarding the clarity of methodological choices and the explanation of key concepts. Accordingly, we will refine the description of our methodology and ensure that our figures are more intuitive and self-explanatory.

      Reviewer 3:

      We recognize the limitations imposed by gene dropout in our data, as highlighted by the reviewer. In the manuscript, we have aimed to be transparent about this issue and discussed its impact in two separate sections (lines 110–121 and 175–181). To enhance clarity, we will revise these paragraphs to provide a more comprehensive discussion of this limitation. Unfortunately, gene dropout is an inherent limitation of 10x genomics data. Trypanosomatids are not an exception in this regard, and the general metrics of the single-cell RNA-seq data in other reports are equivalent to those obtained in our experiment.

      Despite this important limitation, we believe that our comparative analyses (the contrast between TcS and ribosomal protein expression) provide valuable insights into a biological phenomenon with potential functional relevance for the parasite. Furthermore, we are actively working on generating single-cell RNA-seq data using alternative methodologies that improve gene dropout rates. We anticipate that these future studies will help clarify the extent of the phenomenon described in this work.

    1. eLife Assessment

      The study is useful for advancing spatial transcriptomics through its novel regression-based linear model (glmSMA) that integrates single-cell RNA-seq with spatial reference atlases, though its methodological framework remains incomplete regarding spatial communication applications and feature dependence. The approach demonstrates notable utility by enabling higher-resolution cell mapping across multiple biological systems and spatial platforms compared to existing tools.

    2. Reviewer #1 (Public review):

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model development and performance evaluation need to be clarified and extended.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

    3. Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Weaknesses:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencing-based ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribution to enhance the clarity and robustness of their investigation.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. It remains unclear whether glmSMA can be effectively applied to tissue types where anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model development and performance evaluation need to be clarified and extended.

      We thank the reviewer for their thoughtful comments. We will clarify the model assumptions and the feature selection process to make it more understandable. To clarify, the performance of glmSMA does not depend on cell type. For some rare cell types, the small number of cells can lead to a drop in performance. To better illustrate our results and reduce cell type-specific biases, we will shuffle and randomly sample the cell types.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model. 

      Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.

      There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns. In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary. Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For high-quality datasets, fewer than 100 markers are typically sufficient for accurate prediction. To address this more clearly, we will revise the manuscript to include detailed descriptions of our feature selection process and demonstrate how varying the number of selected features impacts performance.

      We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used.

      We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. To mitigate this issue, we will implement shuffling and sampling procedures to reduce potential bias introduced by rare cell types.

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships, we may provide some cell-cell communications in the future.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

      Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. We will include the resolution and the number of cells assigned to each spot in future versions. In our framework, each cell is mapped to one or more spatial locations with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain. For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms. 

      Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3

      Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1. 

      Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Thank you for recognizing our contribution. Our goal was to develop a method that achieves higher spatial resolution in mapping single-cell data compared to existing tools. We are encouraged by the results and will continue to refine the approach to improve accuracy and generalizability across platforms and tissue types.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Thank you for this comment. We believe that evaluating our method across diverse tissue types—such as the mouse cortex, human PDAC, and intestinal villus—demonstrates its robustness and broad applicability. We plan to continue expanding these evaluations to additional tissue contexts and species to further validate the method’s generalizability.

      Weakness:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencing-based ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data. However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      Thank you for the comment. To clarify how ground truth is defined across different tissues, we provide the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:

      - 10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.

      - Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.

      - Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.

      These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribution to enhance the clarity and robustness of their investigation.

      Thank you for the comment. We will include additional details for this dataset in the supplementary figures. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      Thank you for the comment. To improve visualization, we will include anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Regions will be color-coded to enhance clarity and make the spatial organization easier to interpret.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we will compute the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset.

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

      Thank you for the comment. We have already tested our algorithm on the fly embryo, where anatomical structures are not well defined or clearly separated. If needed, we can further apply glmSMA to more complex tissues such as the liver. To clarify the role of anatomical structures in our model: glmSMA does not require anatomical information as input. Instead, it leverages a distance matrix between cells to apply L2 norm regularization. Despite the absence of anatomical information, the model still demonstrates strong performance. We will include results to illustrate its effectiveness without anatomical input. Additionally, we plan to evaluate the model on tissues where anatomical regions are not clearly delineated.

      Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453

      Reviewer #3 (Public review):

      Summary:

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      We thank reviewer #3 for their positive feedback and thoughtful recommendations.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

      While this assumption effectively captures spatial continuity in many cases, we acknowledge that it may not hold across all biological contexts. To address this, we plan to refine our regularization strategy and evaluate the model's performance in heterogeneous tissue regions.

    1. eLife Assessment

      Kwon et al. present an important paper using a novel approach to estimating rotavirus vaccine efficacy using data from a passive surveillance network in the US. They provide convincing evidence to support their conclusion that using the whole genome, rather than previous use of two surface proteins, enhances our understanding of strain-specific vaccine efficacy. These findings have implications for this vaccine specifically as well as type-specific vaccine evaluation more generally.

    2. Reviewer #1 (Public review):

      Summary:

      Kwon et al present a very well-conducted and well-written sieve analysis of rotavirus infections in a passive surveillance network in the US, considering how relative vaccine efficacy changes with genetic distance from the vaccine strains including the whole genome. The results are compelling, supported by a number of sensitivity analyses, and the manuscript is generally easy to follow.

      Strengths:

      (1) The underlying study base, a surveillance network across multiple sites in the US.

      (2) The use of a test-negative design, which is well established for rotavirus, to estimate vaccine efficacy.

      (3) The use of genetic distance to measure differences between infecting and vaccine strains, and the innovative use of k-means clustering to make results more interpretable.

      (4) The secondary and sensitivity analyses that provide additional context and support for the primary findings.

      Weaknesses:

      (1) As identified by the authors, there is a limited sample size for the analysis of RV1 (monovalent rotavirus vaccine).

      (2) Sieve analyses were originally designed for randomized trials, in which setting their key assumptions are more likely to be met. There is little discussion in this paper of how those assumptions might be violated and what effect that might have on the results. The authors have access to some important confounders, but I believe some more discussion on potential biases in this observational study is warranted.

    3. Reviewer #2 (Public review):

      Summary:

      This study introduces a new metric for assessing the efficacy of rotavirus vaccines through the genetic distance clustering of strains. The authors analyzed variations in vaccine protection using whole genome sequencing.

      Strengths:

      Evaluating vaccine efficacy using whole genome sequencing can enhance our understanding of how pathogen evolution influences disease transmission and control.

      Weaknesses:

      While the study proposed a new method for evaluating vaccine efficacy using genetic information, its weaknesses arise from the insufficient evidence that analyses based on whole genome sequencing are more reliable than those that rely solely on VP7 and VP4 genotypes.

      Though most cases received the RV5 vaccine (n=119 compared to n=30 for RV1), Figure 2 and the primary focus of the paper concentrate on RV1, as the authors identified a stronger association with genetic distance.

      Additionally, it is unclear whether the difference between the two groups (j=0 versus j=1) is statistically significant for the analysis based on genetic distance to the RV1 strain, as well as for that based on minimum genetic distance to any of the RV5 vaccine strains. In both cases, the confidence intervals show substantial overlap.

      The authors do not seem to have used a criterion for model selection based on the number of clusters; therefore, k=2 may not represent the optimal number of clusters, particularly in relation to the genetic distance associated with the RV5 vaccine (Figure 1B), which does not appear to show a bimodal distribution.

      Finally, outcomes for RV1 are highly associated with both homotypic and heterotypic antibody responses (Supplemental Figure 10), which have already been shown to impact vaccine effectiveness (The Pediatric Infectious Disease Journal 40(12):p 1135-1143, 2021, doi:10.1097/INF.0000000000003286). Given this strong association, the benefit of using genetic distance is unclear, as the GxPx genotype serves as a good proxy for genetic similarity.

    4. Reviewer #3 (Public review):

      Overall, this is an outstanding paper. It presents a novel approach to estimating rotavirus vaccine efficacy; is clearly written and presented; and has implications for this vaccine specifically as well as type-specific vaccine evaluation more generally. The analytical framework is a creative and there is rigorous use of data and statistical approaches. It has long been argued that rotavirus immunity/vaccine performance operates beyond the scale of G/P genotyping. This paper is the first to demonstrate that convincingly, using data on all 11 viral genes and whole genome sequence analysis. I have only minor comments that I recommend should be addressed.

    5. Author response:

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      Kwon et al present a very well-conducted and well-written sieve analysis of rotavirus infections in a passive surveillance network in the US, considering how relative vaccine efficacy changes with genetic distance from the vaccine strains including the whole genome. The results are compelling, supported by a number of sensitivity analyses, and the manuscript is generally easy to follow.

      Strengths:

      (1) The underlying study base, a surveillance network across multiple sites in the US.

      (2) The use of a test-negative design, which is well established for rotavirus, to estimate vaccine efficacy.

      (3) The use of genetic distance to measure differences between infecting and vaccine strains, and the innovative use of k-means clustering to make results more interpretable.

      (4) The secondary and sensitivity analyses that provide additional context and support for the primary findings.

      Weaknesses:

      (1) As identified by the authors, there is a limited sample size for the analysis of RV1 (monovalent rotavirus vaccine).

      (2) Sieve analyses were originally designed for randomized trials, in which setting their key assumptions are more likely to be met. There is little discussion in this paper of how those assumptions might be violated and what effect that might have on the results. The authors have access to some important confounders, but I believe some more discussion on potential biases in this observational study is warranted.

      We appreciate the reviewer’s positive comments and the opportunity to discuss the application of sieve analysis in observational vaccine effectiveness studies, contrasting it with its traditional use in clinical trials assessing vaccine efficacy. We fully acknowledge the reviewer's point that sieve analysis was originally developed for, and is most frequently employed in, randomized controlled trials (RCTs).

      Sieve analysis, as defined by Gilbert et al. (2001), has the following core assumptions: (A1) uniform susceptibility to infection for all participants except for vaccine-induced strain-specific effects; (A2) equal exposure (for each strain s = 1,…,K ) distribution between vaccine groups; and (A3), constant strain prevalence. RCTs ensure these through randomization. However, our observational design is vulnerable to violating these assumptions, especially A1 and A3. To address A1 and A3, we adjusted for age (in years), sample collection year, and clinical setting (i.e., outpatient, inpatient, ED), aiming to account for both individual-level and temporal variations.

      A2 is particularly challenging in observational settings. We found that study site was correlated with both vaccination status (main predictor) and the strain distribution, potentially violating A2. However, adjusting for study site reversed the expected association. Upon further reflection, we realized that the site-specific differences in strain distributions likely reflect the population-level effect of vaccination, which we believe outweighs the potential confounding by study site as an independent cause of both individual-level vaccination status and strain distributions irrespective of vaccination. Thus, adjusting for site would have obscured this genuine population-level effect, and therefore we elected not to do so. We will include further discussion of this point in the revised manuscript.

      Our study demonstrates the unique capacity of sieve analysis to disentangle individual- and population-level effects on vaccine effectiveness in observational settings. We will expand on these considerations, including the potential biases inherent to observational studies and the rationale for our analytical choices, within the discussion section of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study introduces a new metric for assessing the efficacy of rotavirus vaccines through the genetic distance clustering of strains. The authors analyzed variations in vaccine protection using whole genome sequencing.

      Strengths:

      Evaluating vaccine efficacy using whole genome sequencing can enhance our understanding of how pathogen evolution influences disease transmission and control.

      Weaknesses:

      While the study proposed a new method for evaluating vaccine efficacy using genetic information, its weaknesses arise from the insufficient evidence that analyses based on whole genome sequencing are more reliable than those that rely solely on VP7 and VP4 genotypes.

      Though most cases received the RV5 vaccine (n=119 compared to n=30 for RV1), Figure 2 and the primary focus of the paper concentrate on RV1, as the authors identified a stronger association with genetic distance.

      Additionally, it is unclear whether the difference between the two groups (j=0 versus j=1) is statistically significant for the analysis based on genetic distance to the RV1 strain, as well as for that based on minimum genetic distance to any of the RV5 vaccine strains. In both cases, the confidence intervals show substantial overlap

      The authors do not seem to have used a criterion for model selection based on the number of clusters; therefore, k=2 may not represent the optimal number of clusters, particularly in relation to the genetic distance associated with the RV5 vaccine (Figure 1B), which does not appear to show a bimodal distribution.

      Finally, outcomes for RV1 are highly associated with both homotypic and heterotypic antibody responses (Supplemental Figure 10), which have already been shown to impact vaccine effectiveness (The Pediatric Infectious Disease Journal 40(12):p 1135-1143, 2021, doi:10.1097/INF.0000000000003286). Given this strong association, the benefit of using genetic distance is unclear, as the GxPx genotype serves as a good proxy for genetic similarity. 

      We sincerely appreciate reviewer's careful consideration of our manuscript and their constructive suggestions for improvement.

      Regarding the comparison of whole-genome sequencing with traditional VP7/VP4 genotyping, we concur that a more explicit comparison would strengthen our findings. To this end, we plan to incorporate the direct comparison of genetic distance (GD) and genotype-specific vaccine effectiveness (VE) analyses into the main text. Additionally, we will conduct an analysis of VE based on homotypic, partially heterotypic, and fully heterotypic genotype groupings. This will provide a clearer demonstration of the potential added value of GD in refining VE estimates, particularly for future applications. Given the potential for reassortment among the rotavirus gene segments, our analysis highlights that relying solely on the VP7/VP4 genotype can at times be misleading. 

      Regarding k-means clustering, we wish to clarify that the selection of k=2 was not arbitrary. It was determined using the elbow method on the total within-sum-of-squares (using the fviz_nbclust function in the factoextra R package, with n=5000 bootstrapping). While we acknowledge that other methods, such as silhouette and gap statistics, may yield different optimal cluster numbers, we prioritized maximizing group sample sizes. We will explicitly state this model selection criterion within the methods section of the revised manuscript.

      We acknowledge the reviewer’s concern regarding the overlapping confidence intervals and the statistical significance of the differences between the VE for the j=0 and j=1 groups. One way to address this would be to modify our analysis. Instead of two separate logistic regression models (controls vs j=0 cases, and controls vs j=1 cases), we could employ a multinomial logistic regression model with three categories: controls (reference), j=0 cases, and j=1 cases, then conduct Wald test to directly compare the regression slopes for the j=0 and j=1 cases against controls. We intend to explore this approach in the revised manuscript, which will provide a more rigorous assessment of differences in VE by accounting for the relationship between groups within a single model.

      Reviewer #3 (Public review):

      Overall, this is an outstanding paper. It presents a novel approach to estimating rotavirus vaccine efficacy; is clearly written and presented; and has implications for this vaccine specifically as well as type-specific vaccine evaluation more generally. The analytical framework is a creative and there is rigorous use of data and statistical approaches. It has long been argued that rotavirus immunity/vaccine performance operates beyond the scale of G/P genotyping. This paper is the first to demonstrate that convincingly, using data on all 11 viral genes and whole genome sequence analysis. I have only minor comments that I recommend should be addressed.

      We sincerely thank the reviewer for their highly positive assessment of our manuscript. We will carefully address their minor comments and incorporate their recommendations in the revised manuscript, which we believe will further enhance the clarity and impact of our study.

    1. eLife Assessment

      This study reanalyzed previously published scRNA-seq and TCR-seq data to examine the proportion and characteristics of dual-TCR-expressing Treg cells in mice, presenting some useful insights into TCR diversity and immune regulation. However, the evidence is incomplete, particularly with respect to data interpretation, statistical rigor, and the functionality of dual -TCR Treg cells. The study is potentially of interest to immunologists studying T-cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents findings on dual TCR regulatory T cells (Tregs) using previously published single-cell RNA and TCR sequencing datasets. The authors aimed to quantify dual TCR Tregs in different tissues and analyze their characteristics. Rather than perform the difficult experiments needed to ascertain the functional role of dual receptors, this study relies entirely on scRNA-VDJ-seq data published by two other groups. The findings primarily confirm prior work rather than provide new insights, and the methodology has significant weaknesses that limit the study's impact. We have concerns about the scientific integrity of this work.

      Strengths:

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Weaknesses

      (1) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      (2) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      (3) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      (4) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      (5) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      (6) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications:

      (1) Peng, Q., Xu, Y. & Yao, X. scRNA+ TCR-seq revealed dual TCR T cells antitumor response in the TME of NSCLC. J Immunother Cancer 12 (2024). https://doi.org:10.1136/jitc-2024-009376

      (2) Wang, H., Li, J., Xu, Y. & Yao, X. scRNA + BCR-seq identifies proportions and characteristics of dual BCR B cells in the peritoneal cavity of mice and peripheral blood of healthy human donors across different ages. Immun Ageing 21, 90 (2024). https://doi.org:10.1186/s12979-024-00493-6

      (3) Xu, Y. et al. scRNA+TCR-seq reveals the pivotal role of dual receptor T lymphocytes in the pathogenesis of Kawasaki disease and during IVIG treatment. Front Immunol 15, 1457687 (2024). https://doi.org:10.3389/fimmu.2024.1457687

      (4) Yuanyuanxu, Qipeng, Qingqingma & Yao, X. scRNA + TCR-seq revealed the dual TCR pTh17 and Treg T cells involvement in autoimmune response in ankylosing spondylitis. Int Immunopharmacol 135, 112279 (2024). https://doi.org:10.1016/j.intimp.2024.112279

      (5) Zhu, L. et al. scRNA-seq revealed the special TCR beta & alpha V(D)J allelic inclusion rearrangement and the high proportion dual (or more) TCR-expressing cells. Cell Death Dis 14, 487 (2023). https://doi.org:10.1038/s41419-023-06004-7

      (6) Zhu, L., Peng, Q., Wu, Y. & Yao, X. scBCR-seq revealed a special and novel IG H&L V(D)J allelic inclusion rearrangement and the high proportion dual BCR expressing B cells. Cell Mol Life Sci 80, 319 (2023). https://doi.org:10.1007/s00018-023-04973-8

      In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration

      Appraisal of the Study's Aims and Conclusions:

      The authors set out to analyze dual TCR Tregs across tissues, but the lack of robust controls, incomplete analyses, and insufficient novelty limit the study's ability to achieve its aims. The results confirm prior findings but do not provide compelling evidence to support the broader claims about the characteristics or significance of dual TCR Tregs.

      Impact and Utility:

      While the study provides a descriptive analysis of dual TCR Tregs, its limited novelty and methodological weaknesses reduce its likely impact on the field. The methods and data could have utility for researchers interested in tissue-specific TCR gene usage, but additional rigor is required to make the findings broadly applicable.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript, "scRNA+TCR-seq Reveals the Proportion and Characteristics of Dual TCR Treg Cells in Mouse Lymphoid and Non-lymphoid Tissues" by Xu and Peng, et al. investigates whether co-expression of 2 T cell receptor (TCR) clonotypes can be detected in FoxP3+ regulatory CD4+ T cells (Tregs) and if it is associated with identifiable phenotypic effects. This paper presents data reanalyzing publicly available single-cell TCR sequencing and transcriptional analysis, convincingly demonstrating that dual TCR co-expression can be detected in Tregs, both in peripheral circulation as well as among Tregs in tissues. They then compare metrics of TCR diversity between single-TCR and dual TCR Tregs, as well as between Tregs in different anatomic compartments, finding the TCR repertoires to be generally similar though with dual TCR Tregs exhibiting a less diverse repertoire and some moderate differences in clonal expansion in different anatomic compartments. Finally, they examine the transcriptional profile of dual TCR Tregs in these datasets, finding some potential differences in the expression of key Treg genes such as Foxp3, CTLA4, Foxo3, Foxo1, CD27, IL2RA, and Ikzf2 associated with dual TCR-expressing Tregs, which the authors postulate implies a potential functional benefit for dual TCR expression in Tregs.

      Strengths:

      This report examines an interesting and potentially biologically significant question, given recent demonstrations that dual TCR co-expression is a much more common phenomenon than previously appreciated (approximately 15-20% of T cells) and that dual TCR co-expression has been associated with significant effects on the thymic development and antigenic reactivity of T cells. This investigation leverages large existing datasets of single-cell TCRseq/RNAseq to address dual TCR expression in Tregs. The identification and characterization of dual TCR Tregs is rigorously demonstrated and presented, providing convincing new evidence of their existence.

      Weaknesses:

      The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

    4. Reviewer #3 (Public review):

      Summary:

      This study addressed the TCR pairing types and CDR3 characteristics of Treg cells. By analyzing scRNA and TCR-seq data, it claims that 10-20% of dual TCR Treg cells exist in mouse lymphoid and non-lymphoid tissues and suggests that dual TCR Treg cells in different tissues may play complex biological functions.

      Strengths:

      The study addresses an interesting question of how dual-TCR-expressing Treg cells play roles in tissues.

      Weaknesses:

      This study is inadequate, particularly regarding data interpretation, statistical rigor, and the discussion of the functional significance of Dual TCR Tregs.

      Major Comments:

      (1) Definition of Dual TCR and Validity of Doublet Removal<br /> This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?<br /> We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      (2) Inconsistency in the Proportion of Dual TCR T Cells in the Skin Across Figures<br /> In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      (3) Issue of Cell Contamination<br /> In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity<br /> The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      (5) Functional Evaluation of Dual TCR Tregs<br /> This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      (6) Appropriateness of Statistical Analysis<br /> When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

    1. eLife Assessment

      This important study shows, for the first time, the structure and snapshots of the dynamics of the full-length soluble Angiotensin-I converting enzyme dimer. The combination of structural and computational approaches elucidates with convincing evidence the conformational dynamics of the complex and key regions mediating the conformational change. This work provides an example of how conformational heterogeneity can be used to gain insights into protein function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report four cryoEM structures (2.99 to 3.65 Å resolution) of the 180 kDa, full-length, glycosylated, soluble Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aβ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites.

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC-focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could result from different approaches used for dimensionality reduction and trajectory generation in these methods.

      Strengths:

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally while lacking ground truth for validation.

      Weaknesses:

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps, Figure 2D). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on qualitative analysis. As the authors clearly pointed out, there are many other methods for continuous conformational heterogeneity analysis in cryo-EM. Among these methods, some allow analyzing particle images in terms of atomic models, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951), which result in one atomic model per particle image and can help in analyzing cooperativity of domain motions through measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371). This could be discussed in the article.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape. However, the strength of evidence supporting the conclusions needs refinement, particularly in defining key terms, improving structural validation, and ensuring consistency in data analysis. Addressing these points through major revisions will significantly improve the clarity, rigor, and accessibility of the study to a broader audience, allowing it to make a stronger impact in the field.

      Strengths:

      The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated.

      Weaknesses:

      Several aspects of the manuscript require further refinement to improve clarity and scientific rigor as detailed in my recommendations for the authors.

    4. Reviewer #3 (Public review):

      Summary:

      Mancl et al. report four Cryo-EM structures of glycosylated and soluble Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction region, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE.

      Strengths:

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology.

      Weaknesses:

      In general, the findings are supported by experimental data, but some experimental details and approaches could be improved. For example, CryoDRGN analysis is limited to the top 5 PCA components for ease of comparison with cryoSPARC 3DVA, but wouldn't an expansion to more components with CryoDRGN potentially identify further conformational states? The authors also say that they performed heterogeneity analysis on both datasets but only show data for one. The results for the first dataset should be shown and can be included in supplementary figures. In addition, the authors mention that they were not successful in performing cryoSPARC 3DFLex analysis, but they do not show their data or describe the conditions they used in the methods section. These data should be added and clearly described in the experimental section.

      Some cryo-EM data processing details are missing. Please add local resolution maps, box sizes, and Euler angle distributions and reference the initial PDB model used for model building.

    1. eLife Assessment

      This study presents a useful contribution to understanding zinc regulation of sperm physiology, specifically its inhibitory effects on the sperm-specific potassium channel Slo3. However, the evidence supporting the claims is incomplete, as critical experimental controls are lacking, key mechanistic aspects remain insufficiently explored, and experimental descriptions are often inadequate, making it difficult to fully assess the findings. Strengthening the study with additional electrophysiological recordings in sperm cells, improved imaging controls, and clearer methodological descriptions would enhance its impact and rigor.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Andriani et al. show intracellular zinc is exported from sperm during capacitation and suppresses the alkalinization-induced hyperpolarization in sperm. Intracellular zinc inhibits Slo3 current, which is enhanced by the co-expression of gamma subunit Lrrc52. Computational studies reveal that the Zn binding site on mSlo3 is located near E169 and E205, which are involved in the sustained zinc inhibition of mSlo3 current. The authors propose that intracellular zinc plays a key role in sperm capacitation by inhibiting the Slo3 channel.

      Strengths:

      Overall, the work appears well-designed (e.g., oocyte patch-clamp experiments), and clearly presented. Three-dimensional structural modeling and flooding simulations are executed.

      Weaknesses:

      The simple mutagenesis analysis of E169 and E205 showed partial abolishment, but the molecular mechanism by which zinc inhibits Slo3 current is not yet fully shown. The authors should consider performing more extensive experiments, such as creating double mutants or combination mutants involving other residues. Additionally, could other mechanisms explain the role of zinc in regulating the Slo3 current?

      While elucidating the mechanism of Slo3 is interesting, there is substantial literature indicating how zinc regulates channel functions at a molecular level. Given this, the manuscript should provide a deeper understanding by clearly elucidating the molecular mechanism of the regulation of Slo3 current by zinc.<br /> The manuscript includes no experimental data on the mechanism of intracellular zinc export during sperm capacitation, despite being crucial for the regulation of sperm function.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Andriani and colleagues are examining the potential role of Zn flux in sperm and its effect on Slo3 channels. This is an interesting question that is likely critical to how sperm function properly and Slo3 channels are a possible candidate for a downstream molecule that is impacted by Zn. In this paper, the authors use Zn imaging, sperm motility assays, and electrophysiology to show that Zn flux impacts sperm function. They then go on to look at the impact Zn has on Slo3 current and propose a binding site based on MD simulations. While the ideas are interesting, the experiments are not well described in many places making understanding the results very difficult. In addition, critical controls are missing throughout the paper.

      Strengths:

      The question of how Zn flux impacts membrane potential and sperm motility is an important one. Moreover, Slo3 presents an interesting candidate or the target of Zn regulation. The combination of methods used here also has the potential to uncover mechanisms of Zn regulation of Slo3.

      Weaknesses:

      Much of the paper lacks experimental description which makes interpretation quite difficult, or a detailed discussion is missing. Examples include:

      (1) Figure 1, particularly the Zn imaging, is not sufficiently described. How is the fluorescence intensity measured? A representative ROI? The whole tail and head? Are the sperm immobile? If not, there is evidence that motion artifacts can significantly distort these sorts of measures from Calcium measurements in Cilia. Were there controls done? Is the small amount of Zn seen in the tail above the background?

      (2) The second half of Figure 1 is also not well described. What is the extracellular solution in the recordings? When you apply the Zn ionophore, do you expect influx or efflux? I assume efflux is based on the conclusions but this should be discussed explicitly.

      (3) Figure 2H labels the Y axis, "normalized current". Normalized to what? Why do neither of the curves end at 1? A better description of what this figure represents is needed.

      (4) The alpha fold simulations are not well described. How many Zn binding sites were found? Are all of the histidine mutations in Figure 4 Supplement 1 the ones that were found?

      (5) There is no discussion of physiological intracellular Zn concentration. How much Zn is inside the sperm? How much if likely Free vs buffered? Is 100uM a reasonable physiological concentration?

      There are a number of areas where the interpretation is not well supported by the data including:

      (6) You say in the Figure 4 supplement, that "we did not observe any significant decrease in the percentage of current inhibition." But that is a pretty misleading statement. There are large changes (increases) in the amount of zinc inhibition. These might be allosteric changes but I don't think you can safely eliminate these as relevant Zn binding sites. Also, some of these mutations appear to allow at least some unbinding of Zn.

      (7) Following up on the above point, it seems unfair to conclude that the D162S, E169A, and E205 mutants are part of the inhibitory binding site for Zn when the mutation has no effect on inhibition and only an effect on the washout. The mutations on the intracellular side also had an impact on the washout so it seems equally likely that they are the critical residues based on your data.

      (8) Nowhere in the paper do you make the specific link between Zn flux and membrane hyperpolariation via Slo3. You show that Zn flux changes the ability of the sperm to hyperpolarize and you show that Slo3 is inhibited by Zn but the connection between the two is not demonstrated. There appears to be a specific Slo3 blocker. If you use this in sperm, do you no longer see the Zn effect?

      (9) In the second half of Figure 1, the authors suggest that there is "no hyperpolization in 100uM Zn. That is not really true. It is reduced but not absent.

      (10) The claim that Lrcc52 with Slo3 shows a higher current inhibition at pH 7.5 than pH 8 is not well supported because there are only 3 replicates in the 7.5 case. In addition, the claim is made in the test that 100uM ZnCl2 "already inhibited mSlo3+Lrcc52 at pH7.5", contrasted with mSlo3 alone, is not tested statistically.

      In a number of places, better controls are needed.

      (11) How specific is this effect for Zn? Mg2+, for instance, is also a divalent cation that is in the hundreds of uM range inside the cell. Does it exert the same effect? Each ion certainly has unique preferred coordination geometries, does your predicted binding with MD show what you might expect for tetrahedral coordination with Zn? Did you test other divalent cations functionally or in silicon?

      (12) For the VCF experiments, a significantly higher concentration of Zn was used (10mM). What is the reason for this? There is no discussion of how much a "puff" is. Assuming you are using the RNA injector it is probably on the order of 50nL or less. Assuming the volume of an oocyte is 1uL that would argue that the final concentration is 500uM or higher. But this is also complicated by potential local effects of high Zn at the injection site, artifacts of injecting that much metal, and the fact that a great deal of the Zn will likely be bound to other things inside the cell. Better controls are needed for this experiment.

    4. Reviewer #3 (Public review):

      Summary:

      The study titled "Zinc is a Key Regulator of the Sperm-Specific K+ Channel (Slo3) Function" aims to investigate the role of intracellular zinc in sperm capacitation and its regulation of the sperm-specific Slo3 potassium channel. Capacitation is a crucial physiological process that enables sperm to fertilize an egg, and membrane hyperpolarization through Slo3 activation is a well-established event in this process. The authors propose that intracellular zinc dynamically decreases during capacitation and inhibits Slo3-mediated K⁺ currents, thereby playing a regulatory role in sperm function.

      Strengths:

      (1) Novel Contribution to Sperm Physiology.

      The study provides new insights into how zinc dynamics contribute to sperm capacitation, specifically through its direct inhibition of Slo3 activity.<br /> Previous research has focused primarily on extracellular zinc's effect on sperm function; this work expands the discussion to intracellular zinc regulation, an area with limited prior investigation.

      (2) Strong Electrophysiological Evidence.

      The study employs inside-out patch-clamp recordings in Xenopus oocytes to demonstrate zinc's direct inhibition of Slo3 currents.<br /> The observed slow dissociation of zinc from Slo3 suggests a long-lasting regulatory effect, adding to the understanding of ion channel modulation in sperm cells.

      (3) Molecular Mechanistic Insights

      Using Molecular Dynamics (MD) simulations and mutagenesis, the authors identify potential zinc-binding sites within Slo3's voltage-sensing domain (VSD), particularly E169 and E205.

      These computational predictions are supported by electrophysiological recordings, strengthening the argument that zinc directly binds and inhibits Slo3.

      (4) Physiological Relevance and Functional Implications

      The study suggests that zinc inhibition of Slo3 could contribute to sperm motility regulation during capacitation.

      The authors provide sperm motility assays as supporting evidence, showing that zinc chelation affects motility only after capacitation has begun, suggesting a dynamic role of intracellular zinc in the capacitation process.

      Weaknesses:

      While the study presents compelling electrophysiological data and molecular insights, there are several critical gaps that must be addressed before fully supporting the physiological relevance of the findings.

      (1) The authors should measure the effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      (2) Lack of Controls in Non-Capacitated Sperm

      The claim that zinc is exported from sperm during capacitation needs stronger experimental validation.

      The authors did not include a control group of non-capacitated sperm in key fluorescence imaging experiments, making it difficult to confirm that the observed zinc decrease is capacitation-specific rather than a general zinc redistribution process.

      To strengthen this conclusion, experiments should be performed in non-capacitating conditions to determine whether intracellular zinc levels remain unchanged.

      (3) Unclear Role of Zinc in Physiological Capacitation

      The study clearly demonstrates zinc inhibition of Slo3 but does not sufficiently establish how this affects capacitation at a functional level.

      Additional motility and capacitation markers should be analyzed to confirm that zinc influences sperm behavior beyond Slo3 inhibition.

      (4) Insufficient Data on Zinc-Slo3 Specificity

      The authors should consider using quinidine, a known washable Slo3 inhibitor, to confirm that zinc acts specifically on Slo3 channels rather than other endogenous ion channels.

      The study would benefit from including washout controls in the inside-out patch-clamp recordings, as seen in Figure 3-Supplement 1, to confirm that zinc inhibition is reversible or long-lasting.

      (5) Missing Discussion of Zinc's Role in CatSper Regulation

      The study focuses solely on Slo3 but does not mention CatSper, the principal Ca²⁺ channel essential for sperm capacitation.

      Zinc has been reported to inhibit CatSper activity, which could significantly impact sperm function.

      The discussion should address whether zinc's effect on Slo3 represents a broader regulatory mechanism influencing multiple ion channels during capacitation.

      Final Assessment

      This work presents important findings on zinc regulation of Slo3 channels, supported by strong electrophysiological and molecular analyses. However, the physiological relevance of these findings remains unclear due to missing controls, and needs additional functional assays. Addressing these issues would significantly enhance the manuscript's scientific rigor and impact.

    1. eLife Assessment

      This important study addresses a topic that is frequently discussed in the literature but is under-assessed, namely correlations among genome size, repeat content, and pathogenicity in fungi. Contrary to previous assertions, the authors found that repeat content is not associated with pathogenicity. Rather, pathogenic lifestyle was found to be better explained by the number of protein-coding genes, with other genomic features associated with insect association status. While the results are considered solid, confidence in the results would be deepened if the authors were to comprehensively account for potential biases stemming from the underlying data quality of the analyzed genomes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Lifestyles shape genome size and gene content in fungal pathogens" by Fijarczyk et al. presents a comprehensive analysis of a large dataset of fungal genomes to investigate what genomic features correlate with pathogenicity and insect associations. The authors focus on a single class of fungi, due to the diversity of lifestyles and availability of genomes. They analyze a set of 12 genomic features for correlations with either pathogenicity or insect association and find that, contrary to previous assertions, repeat content does not associate with pathogenicity. They discover that the number of protein-coding genes, including the total size of non-repetitive DNA does correlate with pathogenicity. However, unique features are associated with insect associations. This work represents an important contribution to the attempts to understand what features of genomic architecture impact the evolution of pathogenicity in fungi.

      Strengths:

      The statistical methods appear to be properly employed and analyses thoroughly conducted. The manuscript is well written and the information, while dense, is generally presented in a clear manner.

      Weaknesses:

      My main concerns all involve the genomic data, how they were annotated, and the biases this could impart to the downstream analyses. The three main features I'm concerned with are sequencing technology, gene annotation, and repeat annotation.

      The collection of genomes is diverse and includes assemblies generated from multiple sequencing technologies including both short- and long-read technologies. Not only has the impact of the sequencing method not been evaluated, but the technology is not even listed in Table S1. From the number of scaffolds it is clear that the quality of the assemblies varies dramatically. This is going to impact many of the values important for this study, including genome size, repeat content, and gene number. Additionally, since some filtering was employed for small contigs, this could also bias the results.

      I have considerable worries that the gene annotation methods could impart biases that significantly affect the main conclusions. Only 5 reference training sets were used for the Sordariomycetes and these are unequally distributed across the phylogeny. Augusts obviously performed less than ideally, as the authors reported that it under-annotated the genomes by 10%. I suspect it will have performed worse with increasing phylogenetic distance from the reference genomes. None of the species used for training were insect-associated, except for those generated by the authors for this study. As this feature was used to split the data it could impact the results. Some major results rely explicitly on having good gene annotations, like exon length, adding to these concerns. Looking manually at Table S1 at Ophiostoma, it does seem to be a general trend that the genomes annotated with Magnaporthe grisea have shorter exons than those annotated with H294. I also wonder if many of the trends evident in Figure 5 are also the result of these biases. Clades H1 and G each contain a species used in the training and have an increase in genes for example.

      Unfortunately, the genomes available from NCBI will vary greatly in the quality of their repeat masking. While some will have been masked using custom libraries generated with software like Repeatmodeler, others will probably have been masked with public databases like repbase. As public databases are again biased towards certain species (Fusarium is well represented in repbase for example), this could have significant impacts on estimating repeat content. Additionally, even custom libraries can be problematic as some software (like RepeatModeler) will include multicopy host genes leading to bona fide genes being masked if proper filtering is not employed. A more consistent repeat masking pipeline would add to the robustness of the conclusions.

      To a lesser degree, I wonder what impact the use of representative genomes for a species has on the analyses. Some species vary greatly in genome size, repeat content, and architecture among strains. I understand that it is difficult to address in this type of analysis, but it could be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on the genomic correlates of the transition to the pathogenic lifestyle in Sordariomycetes. The pathogenic lifestyle was found to be better explained by the number of genes, and in particular effectors and tRNAs, but this was modulated by the type of interacting host (insect or not insect) and the ability to be vectored by insects.

      Strengths:

      The main strength of this study lies in the size of the dataset, and the potentially high number of lifestyle transitions in Sordariomycetes.

      Weaknesses:

      The main strength of the study is not the clarity of the conclusions.

      (1) This is due firstly to the presentation of the hypotheses. The introduction is poorly structured and contradictory in some places. It is also incomplete since, for example, fungus-insect associations are not mentioned in the introduction even though they are explicitly considered in the analyses.

      (2) The lack of clarity also stems from certain biases that are challenging to control in microbial comparative genomics. Indeed, defining lifestyles is complicated because many fungi exhibit different lifestyles throughout their life cycles (for instance, symbiotic phases interspersed with saprotrophic phases). In numerous fungi, the lifestyle referenced in the literature is merely the sampling substrate (such as wood or dung), which doesn't mean that this substrate is a crucial aspect of the life cycle. This issue is discussed by the authors, but they do not eliminate the underlying uncertainties.

    4. Reviewer #3 (Public review):

      Summary:

      This important study combines comparative genomics with other validation methods to identify the factors that mediate genome size evolution in Sordariomycetes fungi and their relationship with lifestyle. The study provides insights into genome architecture traits in this Ascomycete group, finding that, rather than transposons, the size of their genomes is often influenced by gene gain and loss. With an excellent dataset and robust statistical support, this work contributes valuable insights into genome size evolution in Sordariomycetes, a topic of interest to both the biological and bioinformatics communities.

      Strengths:

      This study is complete and well-structured.

      Bioinformatics analysis is always backed by good sampling and statistical methods. Also, the graphic part is intuitive and complementary to the text.

      Weaknesses:

      The work is great in general, I just had issues with the Figure 1B interpretation.

      I struggled a bit to find the correspondence between this sentence: "Most genomic features were correlated with genome size and with each other, with the strongest positive correlation observed between the size of the assembly excluding repeats and the number of genes (Figure 1B)." and the Figure 1B. Perhaps highlighting the key p values in the figure could help.

    1. eLife Assessment

      The IBEX Knowledge-Base is an important tool that will enhance scientific collaboration by providing a centralized, community-driven resource for immunofluorescence imaging and reagent validation. Its detailed use cases, open-source design, and transparent reporting offer compelling evidence of its broad utility and impact in the life sciences. Overall, the resource sets a high standard as a blueprint for future community initiatives in reproducibility and standardization.

    2. Reviewer #1 (Public review):

      IBEX Knowledge Database

      Here, Anidi and colleagues present the IBEX knowledge base. A community tool developed to centralize knowledge and help its adoption by more users. The authors have done a fantastic job, and there is careful consideration of the many aspects of data management and FAIR principles. The manuscript needs no further work, as it is very well written and has detailed descriptions for data contribution as well as describing the KB itself. Overall, it is a great initiative, especially the aim to inform about negative data and non-recommended reagents, which will positively affect the user community and scientific reproducibility.

      As such amount of work has been put into developing this community tool, it would be worth thinking about how it could serve other multiplex-immunofluorescence methods (such as immunoSABER, 4i, etc). Adding an extra tab where the particular method that uses those reagents is mentioned. This would also help as IBEX itself and related methods evolve in the future.

      It has a rather minimal description of the software. In particular, there is software that has not been developed for IBEX specifically but that could be used for IBEX datasets (ASHLAR, WSIReg, VALIS, WARPY, and QuPath, etc). It would be nice if there was mention of those.

      There is a concern about how the negative data information will be added, as no publication or peer-review process can back it up. Perhaps the particular conditions of the experiment should be very well described to allow future users to assess the validity. The proposed scheme where a reagent can be validated or recommended against by up to 4 different labs should be good. It may be good to make sure that researchers who validate belong to different labs and are not only different ORCID that belong to the same group. Similar to making a case of recommendations against a reagent.

      It is very interesting to keep track of the protocol versions used. Perhaps users should be able to validate independent versions and it will be important to know how information is kept.

      The final point I would make is that the need to form a GitHub repository may deter some people from submitting data. For sporadic contributions, authors could think that users could either reach out to main developers and/or provide a submission form that can help less experienced users of command-line and GitHub programming, but still promote the contribution from the community.

      I am keen to see how the KB evolves and how it helps disseminate the use of this and other great techniques.

    3. Reviewer #2 (Public review):

      Summary:

      The paper introduces the IBEX Knowledge-Base (KB), a shared online resource designed to help scientists working with immunofluorescence imaging. It acts as a central hub where researchers can find and share information about reagents, protocols, and imaging methods. The KB is not static like traditional publications; instead, it evolves as researchers contribute new findings and refinements. A key highlight is that it includes results of both successful and unsuccessful experiments, helping scientists avoid repeating failed experiments and saving time and resources. The platform is built on open-access tools ensuring that the information remains available to everyone. Overall, the KB aims to collaboratively accelerate research, improve reproducibility, and reduce wasted effort in imaging experiments.

      Strengths:

      (1) The IBEX KB is built entirely on open-source tools, ensuring accessibility and long-term sustainability. This approach aligns with FAIR data principles and ensures that the KB remains adaptable to future advancements.

      (2) The KB also follows strict data organization standards, ensuring that all information about reagents and protocols is clearly documented and easy to find with little ambiguity.

      (3) The KB allows scientists to report both positive and negative results, reducing duplication of effort and speeding up the research process.

      (4) The KB is helpful for all researchers, but even more so for scientists in resource-limited settings. It provides guidance on finding affordable alternatives to expensive or discontinued reagents, making it easier for researchers with fewer resources to perform high-quality experiments.

      (5) The KB includes a community discussion forum where scientists can ask for advice, share troubleshooting tips, and collaborate with others facing similar challenges.

      Weaknesses:

      (1) The potential impact of IBEX KB is very clear. However, the paper would benefit by also discussing more on KB maintenance and outreach, and how higher participation could be incentivized.

      (2) Use of resources like GitHub may limit engagement from non-coding members of the scientific community. Will there be alternative options like a user-friendly web interface to contribute more easily?

    4. Reviewer #3 (Public review):

      Summary:

      The authors have developed an interactive knowledge-base that uses crowdsourcing information on antibodies and reagents for immunofluorescence imaging.

      Strengths:

      The authors provide an extremely relevant and needed interphase for a community-based IF reagent and protocol knowledgebase, and a well-built interface. All the links on their website work, the information provided, reagents, datasets, videos, and protocols are very informative. The instructions for the community researchers to contribute are clear and they provide detailed instructions on how to technically proceed.

      Weaknesses:

      Reporting of the validation of antibodies could be improved. To increase public participation they suggest reducing the amount of details that one needs to submit to claim that something does not work. However, in our experience, this information is critical to be shared with the community.

    1. eLife Assessment

      This manuscript demonstrates that Oct4 overexpression synergizes with Notch inhibition (Rbpj knockout) to promote the conversion of adult murine Müller glia (MG) into bipolar cells. These findings are important as the authors used rigorous genetic lineage tracing (GLAST-CreER; Sun-GFP) to confirm that neurogenesis indeed originates from MGs, addressing a key issue in the field. The single-cell multiomic analyses are convincing, and while functional studies of MG-derived bipolar cells would strengthen the conclusions, they are beyond the scope of this study.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Le et al.. aimed to explore whether AAV-mediated overexpression of Oct4 could induce neurogenic competence in adult murine Müller glia, a cell type that, unlike its counterparts in cold-blooded vertebrates, lacks regenerative potential in mammals. The primary goal was to determine whether Oct4 alone, or in combination with Notch signaling inhibition, could drive Müller glia to transdifferentiate into bipolar neurons, offering a potential strategy for retinal regeneration.

      The authors demonstrated that Oct4 overexpression alone resulted in the conversion of 5.1% of Müller glia into Otx2+ bipolar-like neurons by five weeks post-injury, compared to 1.1% at two weeks. To further enhance the efficiency of this conversion, they investigated the synergistic effect of Notch signaling inhibition by genetically disrupting Rbpj, a key Notch effector. Under these conditions, the percentage of Müller glia-derived bipolar cells increased significantly to 24.3%, compared to 4.5% in Rbpj-deficient controls without Oct4 overexpression. Similarly, in Notch1/2 double-knockout Müller glia, Oct4 overexpression increased the proportion of GFP+ bipolar cells from 6.6% to 15.8%.

      To elucidate the molecular mechanisms driving this reprogramming, the authors performed single-cell RNA sequencing (scRNA-seq) and ATAC-seq, revealing that Oct4 overexpression significantly altered gene regulatory networks. They identified Rfx4, Sox2, and Klf4 as potential mediators of Oct4-induced neurogenic competence, suggesting that Oct4 cooperates with endogenously expressed neurogenic factors to reshape Müller glia identity.

      Overall, this study aimed to establish Oct4 overexpression as a novel and efficient strategy to reprogram mammalian Müller glia into retinal neurons, demonstrating both its independent and synergistic effects with Notch pathway inhibition. The findings have important implications for regenerative therapies as they suggest that manipulating pluripotency factors in vivo could unlock the neurogenic potential of Müller glia for treating retinal degenerative diseases.

      Strengths:

      (1) Novelty: The study provides compelling evidence that Oct4 overexpression alone can induce Müller glia-to-bipolar neuron conversion, challenging the conventional view that mammalian Müller glia lacks neurogenic potential.

      (2) Technological Advances: The combination of Muller glia-specific labeling and modifying mouse line, AAV-GFAP promoter-mediated gene expression, single-cell RNA-seq, and ATAC-seq provides a comprehensive mechanistic dissection of glial reprogramming.

      (3) Synergistic Effects: The finding that Oct4 overexpression enhances neurogenesis in the absence of Notch signaling introduces a new avenue for retinal repair strategies.

      Weaknesses:

      (1) In this study, the authors did not perform a comprehensive functional assessment of the bipolar cells derived from Müller glia to confirm their neuronal identity and functionality.

      (2) Demonstrating visual recovery in a bipolar cell-deficiency disease model would significantly enhance the translational impact of this work and further validate its therapeutic potential.

    3. Reviewer #2 (Public review):

      Summary:

      The authors harness single-cell RNAseq data from zebrafish and mice to identify Oct4 as a candidate driver of neurogenesis. They then use adeno-associated virus vectors to show that while Oct4 overexpression alone converts rare adult Müller glia (MG) to bipolar cells, it synergizes with Notch pathway inhibition to cause this neurogenesis (achieved by Cre-mediated knockout of Rbpj floxed allele). Importantly, they genetically lineage-mark adult MG using a GLAST-CreER transgene and a Sun-GFP reporter, so that any non-MG cells that convert can be identified unambiguously. This is crucial because several high-profile papers made erroneous claims using short promoters in the viral delivery vector itself to mark MG, but those promoters are leaky and mark other non-MG cell types, making it impossible to definitively state whether manipulations studied were actually causing neurogenesis, or were merely the result of expression in pre-existing neurons. Once the authors establish Oct4 + RbpjKO synergy they use snRNAseq/ATACseq to identify known and novel transcription factors that could play a role in driving neurogenesis.

      Strengths:

      The system to mark MG is stringent, so the authors are studying transdifferentiation, not artifactual effects due to leaky viral promoters. The synergy between Oct4 and Notch pathway blockade is notable. The single-cell results add the potential involvement of new players such as Rfx4 in adult-MG-neurogenesis.

      Weaknesses:

      The existing version is difficult to read due to an unusually high number of text errors (e.g. references to the wrong figure panels etc.). A fuller explanation for the fraction of non-MG cells seen in control scRNAseq assays is required, particularly because the neurogenic trajectory which is enhanced in the Oct4/Rbpj-KO context is also evident in the control retina. Claims regarding the involvement of transcription factors in adult neurogenesis (such as Rfx4) need to be toned down unless they are backed up with functional data. It is possible that such factors are important, but equally, they may have no role or a redundant role, and without functional tests, it's impossible to say one way or the other.

      Overall, the authors achieved what they set out to do, and have made new insights into how neurogenesis can be stimulated in MG. Ultimately, a major long-term goal in the field is to replace lost photoreceptors as this is most relevant to many human visual disorders, and while this paper (like all others before it) does not generate rods or cones, it opens new strategies to coax MG to form a related neuronal cell type. Their approach underscores the benefits of using a gold-standard approach for lineage tracing.

    1. eLife Assessment

      This manuscript presents important information as to how adolescent alcohol exposure (AIE) alters pain behavior and relevant neurocircuits, with convincing data. The manuscript focuses on how AIE alters the basolateral amygdala, to the PFC (PV-interneurons), to the periaquaductal gray circuit, resulting in feed-forward inhibition. The manuscript is a detailed study of the role of alcohol exposure in regulating the circuit and reflexive pain, however, the role of the PV interneurons in mechanistically modulating this feed-forward circuit could be more strongly supported.

    1. eLife Assessment

      This manuscript presents important information as to how adolescent alcohol exposure (AIE) alters pain behavior and relevant neurocircuits, with convincing data. The manuscript focuses on how AIE alters the basolateral amygdala, to the PFC (PV-interneurons), to the periaquaductal gray circuit, resulting in feed-forward inhibition. The manuscript is a detailed study of the role of alcohol exposure in regulating the circuit and reflexive pain, however, the role of the PV interneurons in mechanistically modulating this feed-forward circuit could be more strongly supported.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript by Obray et al., the authors show that adolescent ethanol exposure increases mechanical allodynia in adulthood. Additionally, the show that BLA mediated inhibition of prelimbic cortex is reduced, resulting in increased excitability in neurons that then project to vlPAG. This effect was mediated by BLA inputs onto PV interneurons. The primary finding of the manuscript is that these AIE induced changes further impact acute pain processing in the BLA-PrL-vlPAG circuit, albeit behavioral readouts after inducing acute pain were not different between AIE rats and controls. These results provide novel insights into how AIE can have long lasting effects on pain-related behaviors and neurophysiology.In this manuscript by Obray et al., the authors show that adolescent ethanol exposure increases mechanical allodynia in adulthood. Additionally, the show that BLA mediated inhibition of prelimbic cortex is reduced, resulting in increased excitability in neurons that then project to vlPAG. This effect was mediated by BLA inputs onto PV interneurons. The primary finding of the manuscript is that these AIE induced changes further impact acute pain processing in the BLA-PrL-vlPAG circuit, albeit behavioral readouts after inducing acute pain were not different between AIE rats and controls. These results provide novel insights into how AIE can have long lasting effects on pain-related behaviors and neurophysiology.

      The manuscript was very well written and the experiments were rigorously conducted. The inclusion of both behavioral and neurophysiological circuit recordings was appropriate and compelling. The authors analyzed their data extensively, and consider how many different factors may influence physiological activity and downstream behavior. The attention to SABV and appropriate controls was well thought out. The Discussion provided novel ideas for how to think about AIE and chronic pain, and proposed several interesting mechanisms. This was a very well executed set of experiments.

      Comments on revisions:

      The authors have addressed the concerns raised by the reviewers. Excellent work!

    3. Reviewer #2 (Public review):

      Summary:

      The study by Obray et al. entitled "Adolescent alcohol exposure promotes mechanical allodynia and alters synaptic function at inputs from the basolateral amygdala to the prelimbic cortex" investigated how adolescent intermittent ethanol exposure (AIE) affects the BLA -> PL circuit, with an emphasis on PAG projecting PL neurons, and how AIE changes mechanical and thermal nociception. The authors found that AIE increased mechanical, but not thermal nociception, and an injection of an inflammatory agent did not produce changes in an ethanol-dependent manner. Physiologically, a variety of AIE-specific effects were found in PL neuron firing at BLA synapses, suggestive of AIE-induced alterations in neurotransmission at BLA-PVIN synapses.

      Strengths:

      This was a comprehensive examination of the effects of AIE on this neural circuit, with an in-depth dissection of the various neuronal connections within the PL.

      Sex was included as a biological variable, yet, there were little to no sex differences in AIE's effects, suggestive of similar adaptations in males and females.

      Comments on revisions:

      The authors addressed the reviews from the first submission which has substantially strengthened the conclusions of the study, including acknowledgement of unanswered questions for future studies to address.

    4. Reviewer #3 (Public review):

      Summary:

      Obray et al. investigate the long-lasting effects of adolescent intermittent ethanol (AIE) in rats, a model of alcohol dependence, on a neural circuit within prefrontal cortex. The studies are focused on inputs from the basolateral amygdala (BLA) onto parvalbumin (PV) interneurons and pyramidal cells that project to the periaqueductal gray (PAG). The authors found that AIE increased BLA excitatory drive onto parvalbumin interneurons and increased BLA feedforward inhibition onto PAG-projecting neurons.

      Strengths:

      Fully powered cohorts of male and female rodents are used, and the design incorporates both AIE and an acute pain model. The authors used several electrophysiological techniques to assess synaptic strength and excitability from a few complimentary angles. The design and statistical analysis are sound, and the evidence supporting synaptic changes following AIE results is convincing. The authors have also revised the Discussion to assimilate the findings within prior work out of their lab and others.

      Weaknesses:

      (1) There is incomplete evidence supporting some of the conclusions drawn in this manuscript. The authors claim the changes in feedforward inhibition onto pyramidal cells are due to the changes in parvalbumin interneurons; however, the authors did not determine that PV cells mediate the feedforward BLA op-IPSCs and changes following AIE (this would require a manipulation to reduce/block PV-IN activity). This limitation in results and interpretation is important because prior work shows BLA-PFC feedforward IPSCs can be driven by somatostatin cells. Cholecystokinin cells are also abundant basket cells in PFC and have been recently shown to mediate feedforward inhibition from thalamus and ventral hippocampus, so it's also possible that CCK cells are involved in the effects observed here

      (2) The authors conclude that the changes in this circuit likely mediate long-lasting hyperalgesia, but this is not addressed experimentally. In some ways, the focused nature of the study is a benefit in this regard, as there is extensive prior literature linking this circuit with pain behaviors in alternative models (e.g., SNI), but it should be noted that these studies have not assessed hyperalgesia stemming from prior alcohol exposure. While the current studies do not include a causative behavioral manipulation, the strength of the association between BLA-PL-PAG function and hyperalgesia could be bolstered by with current data if there were relationships detected between electrophysiological properties and hyperalgesia.

      (3) It should be noted that asEPSC frequency can also reflect changes in number of functional/detectable synapses. This measurement is also fairly susceptible to differences in inter-animal differences in ChR2 expression. There are other techniques for assessing presynaptic release probability (e.g., PPR, MK-801 sensitivity) that would improve the interpretation of these studies if that is intended to be a point of emphasis.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Major Concerns/Public Review

      Comment 1: There is a mild disconnect between behavioral readout (reflexive pain) and neural circuits of interest (emotional). Considering that this circuit is likely engaged in the aversiveness of pain, it would have been interesting to see how carrageenan and/or AIE impacted non-reflexive pain measures. Perhaps this would reveal a potentiated or dysregulated phenotype that matches the neurophysiological changes reported. However, this critique does not take away from the value of the paper or its conclusions.

      We agree that including measures of non-reflexive pain would enhance future studies and potentially reveal a phenotype that is closely related to the observed changes in neurophysiology.

      Minor Concerns/Recommendations

      Comment 1: There are a few minor grammatical errors in the text, mostly in the captions. A close read should be able to identify these errors.

      We have fixed what grammatical errors we found.

      Reviewer #2:

      Major Concerns/Public Review

      No major concerns.

      Minor Concerns/Recommendations

      Comment 1: If pain sensitivity was assessed at 3 time points post carrageenan administration, why were these data averaged? Were there no differences between the time points? The data from the 3 time points should be presented, either in a figure, table, or supplementary materials.

      We averaged the pain sensitivity data across the 3 time points following carrageenan administration because we were trying to present this data in a more concise manner. Pain sensitivity did change over time following carrageenan administration. We have now included the unaveraged data in figure 2 (panels D, F, H, and J).

      Comment 2: For the optically-evoked EPSCs and IPSCs, were the peak amplitudes the max responses that could be obtained? If not, how were levels of ChR2 expression or light intensity controlled for?

      The peak amplitudes for EPSCs and IPSCs were half the maximal response that could be evoked by optical stimulation. The AMPA and NMDA currents were maximal responses as prior literature indicated some PVINs have small NMDA currents, and we wanted to ensure these currents would be detected reliably. We updated our methods section to include this information in the voltage clamp recordings section.

      Comment 3: In the example traces for the aEPSC experiment, the figure legend states that the "+" symbol indicates an asynchronous event. However, there are several "|" or "-" symbols in the figure. Perhaps this is an issue with the resolution of the figure and those are supposed to be "+"s.

      We have increased the resolution of the figures to ensure that the markings of the asynchronous events display properly. We apologize for not noticing that these symbols were not displayed correctly in the original figures included in the manuscript.

      Comment 4: For the von Frey and the Hargreaves test, were animals acclimated to the apparatus in the days leading up to the first test, or was the 5-minute pre-test the only acclimation that was done? This information needs to be provided. If the latter, there is concern that the animals did not fully acclimate to the apparatus and handling prior to testing, which should be taken into consideration in the interpretation of the behavioral analyses.

      The rats underwent handling once a day for three days prior to the first von Frey and Hargreaves tests. On the day prior to the first test, rats were acclimated to the von Frey and Hargreaves apparatuses. The acclimation period consisted of a 15-min exposure to the von Frey apparatus and a 30-min exposure to the Hargreaves apparatus for each animal. This information has been added to the revised methods section under the assessment of mechanical and thermal sensitivity heading.

      Reviewer #3:

      Major Concerns/Public Review

      Comment 1: There is incomplete evidence supporting some of the conclusions drawn in this manuscript. The authors claim that the changes in feedforward inhibition onto pyramidal cells are due to the changes in parvalbumin interneurons, but evidence is not provided to support that idea. PV cells do not spontaneously fire action potentials spontaneously in slices (nor do they receive high levels of BLA activity while at rest in slices). It is possible that spontaneous GABA release from PV cells is increased after AIE but the authors did not report sIPSC frequency. Second, the authors did not determine that PV cells mediate the feedforward BLA op-IPSCs and changes following AIE (this would require manipulation to reduce/block PV-IN activity). This limitation in results and interpretation is important because prior work shows BLA-PFC feedforward IPSCs can be driven by somatostatin cells. Cholecystokinin cells are also abundant basket cells in PFC and have been recently shown to mediate feedforward inhibition from the thalamus and ventral hippocampus, so it's also possible that CCK cells are involved in the effects observed here.

      The hypothesis that adolescent alcohol exposure could change spontaneous GABA release from PVINs is an interesting one that merits future exploration. Unfortunately, as the focus of this manuscript was on circuit-specific alterations in synaptic function, this experiment is somewhat outside the scope of the paper as sIPSCs and mIPSCs are not circuit specific measures of GABA activity and would not reflect spontaneous release from only GABA interneurons receiving input from the BLA. Despite this, a future study investigating spontaneous GABA release from PVINs in the PrL would be a valuable complement to the present study.

      While we did not directly manipulate PVINs to demonstrate that decreased oIPSC amplitude at PrL<sup>PAG</sup> neurons following AIE is due solely to changes in PVINs, it is notable that both the intrinsic excitability of PVINs and the BLA-driven E/I balance at PVINs were reduced following AIE. These changes would be consistent with decreased PVIN output onto PrL<sup>PAG</sup> neurons. However, we agree that this does not preclude the possibility that changes in SST or CCK interneurons contribute to the observed decrease in BLA-driven inhibition at PrL<sup>PAG</sup> neurons following AIE. As such, we have altered the wording in the discussion to indicate that reduced BLA-driven feedforward inhibition of PrL<sup>PAG</sup> neurons may be related, at least in part, to the observed changes in PVINs.

      Comment 2: The authors conclude that the changes in this circuit likely mediate long-lasting hyperalgesia, but this is not addressed experimentally. In some ways, the focused nature of the study is a benefit in this regard, as there is extensive prior literature linking this circuit with pain behaviors in alternative models (e.g., SNI), but it should be noted that these studies have not assessed hyperalgesia stemming from prior alcohol exposure. While the current studies do not include a causative behavioral manipulation, the strength of the association between BLA-PL-PAG function and hyperalgesia could be bolstered by current data if there were relationships detected between electrophysiological properties and hyperalgesia. Have the authors assessed this? In addition, this study is limited by not addressing the specificity of synaptic adaptations to the BLA-PL-PAG circuit. For instance, PL neurons send reciprocal projections to BLA and send direct projections to the locus coeruleus (which the authors note is an important downstream node of the PAG for regulating pain).

      We have not assessed correlations between the electrophysiological properties and hyperalgesia. We feel that future studies using DREADDs to perform cell-type and circuit-specific manipulations can better address the involvement of this circuitry in long-lasting hyperalgesia following AIE. With respect to the circuit specificity of the observed changes, we have previously evaluated the effects of AIE on pyramidal neurons projecting from the PrL to the BLA (PrL<sup>BLA</sup>). We found that following AIE exposure there was no change in the intrinsic excitability of these neurons. In addition, the amplitude and frequency of sEPSCs and sIPSCs onto PrL<sup>BLA</sup> neurons was unchanged. While these results did not assess whether the BLA-PrL-BLA circuit undergoes synaptic adaptations similar to those observed in the BLA-PrL-vlPAG circuit, it is notable that the intrinsic excitability of PrL<sup>BLA</sup> neurons was unchanged following AIE exposure. This indicates that the effects of AIE on the intrinsic excitability of pyramidal neurons in the PrL may be circuit specific. We agree that it would be interesting to study the effect of AIE on PrL neurons that project to the locus coeruleus, however due to the well-defined role of the BLA-PrL-vlPAG circuit in pain we chose to evaluate this circuit first.

      Comment 3: I have some concerns about methodology. First, 5-ms is a long light pulse for optogenetics and might induce action-potential independent release. Does TTX alone block op-EPSCs under these conditions? Second, PV cells express a high degree of calcium-permeable AMPA receptors, which display inward rectification at positive holding potentials due to blockade from intracellular polyamines. Typically, this is controlled/promoted by including spermine in the internal solution, but I do not believe the authors did that. Nonetheless, the relatively low A/N ratios for this cell type suggest that CP-AMPA receptors were not sampled with the +40/+40 design of this experiment, raising concerns that the majority of AMPA receptors in these cells were not sampled during this experiment. Finally, it should be noted that asEPSC frequency can also reflect changes in a number of functional/detectable synapses. This measurement is also fairly susceptible to differences in inter-animal differences in ChR2 expression. There are other techniques for assessing presynaptic release probability (e.g., PPR, MK-801 sensitivity) that would improve the interpretation of these studies if that is intended to be a point of emphasis.

      When we included TTX but not 4-AP we did not observe any optically evoked responses, so we don’t believe that the 5-ms pulse induced action-potential independent release in these experiments. With respect to the second point, we did not include spermine in the internal solution for the AMPA/NMDA recordings in PVINs, and it is possible that endogenous polyamines interfered with recording CP-AMPA receptors in the +40/+40 design. To address this concern, we recalculated the AMPA/NMDA ratio for PVINs using data from an optically evoked AMPA current that was collected while holding the cell at -70 mV. This data was collected at the end of the +40/+40 recording protocol as we were interested in assessing whether there would be any difference in the ratio of the +40/-70 AMPA current across treatment conditions. As there were no observed difference in the +40/-70 AMPA current ratio across treatment groups, we had originally used the +40 AMPA current for calculating the AMPA/NMDA ratio for PVINs to make the methods for calculating this ratio uniform for both PVINs and PrL<sup>PAG</sup> neurons. The methods, results, and Fig. 10 have been updated to reflect the recalculated AMPA/NMDA ratio for PVINs. Notably, only the significance of the AIE x carrageenan interaction was altered by the change in the way the AMPA/NMDA ratio was calculated. Originally, this interaction displayed a trend toward significance (p = 0.0501), however when the recalculated AMPA/NMDA ratio was analyzed this interaction term became significant (p = 0.0131). We have also added the +40/-70 AMPA ratio to figure 10 as it might be of interest.

      Finally, the point regarding aEPSC frequency reflecting not only release probability but also the number of functional/detectable synapses is an important consideration. For this manuscript, we intentionally selected aEPSC frequency for this reason. As the BLA to PrL projection continues to mature during adolescence, the number of BLA contacts onto GABA neurons in the PrL increases. Thus, we thought that it was possible that AIE would alter the number of detectable BLA inputs onto PVINs. We acknowledge that as this measure is sensitive to differences in ChR2 expression between animals/slices it can be difficult to interpret. We also agree that in the future it would be beneficial to include either PPR or MK-801 sensitivity to improve interpretability.

      Comment 4: In a few places in the manuscript, results following voluntary drinking experiments (especially Salling et al. and Sicher et al.) are discussed without clear distinction from prior work in vapor models of dependence.

      We have altered the manuscript to specifically note where voluntary drinking was used rather than vapor models.

      Comment 5: Discussion (lines 416-420). The authors describe some differing results with the literature and mention that the maximum current injection might be a factor. To me, this does not seem like the most important factor and potentially undercuts the relevance of the findings. Are the cells undergoing a depolarization block? Did the authors observe any changes in the rheobase or AP threshold? On the other hand, a more likely difference between this and previous work is that the proportion of PAG-projecting cells is relatively low, so previous work in L5 likely sampled many types of pyramidal cells that project to other areas. This is a key example where additional studies by the current group assessing a distinct or parallel set of pyramidal cells would aid in the interpretation of these results and help to place them within the existing literature. Along these lines, PAG-projecting neurons are Type A cells with significant hyperpolarization sag. Previous studies showed that adolescent binge drinking stunts the development of HCN channel function and ensuing hyperpolarization sag. Have the authors observed this in PAG-projecting cells? Another interesting membrane property worth exploring with the existing data set is the afterhyperpolarization / SK channel function.

      In discussing the maximum current injection as a factor in differing results on intrinsic excitability, we were principally considering how the additional data points increase the power of the analysis and thus the likelihood of detecting an effect. In focusing on this, however, we ignored other relevant and interesting factors that we should also have discussed. Additional analyses examining HCN and SK channel function have now been added to the manuscript and incorporated into the results section under the heading Adolescent Intermittent Ethanol Exposure and Carrageenan Enhanced the Intrinsic Excitability of Prelimbic Neurons Projecting to the Ventrolateral Periaqueductal Gray. We have also modified the third paragraph in the discussion to add additional context. Additional information on the biophysical properties of the neurons has been added to Figure 4.

      Minor Concerns/Recommendations

      Comment 1: Subheadings are vague. "Analysis of..." Should be rephrased to use active voice to describe key findings.

      The subheadings have been rephrased to describe key findings.

      Comment 2: Consider altering or consolidating the figure layout for clarity. For instance, it would be helpful for aEPSCs to be near the AMPA and NMDA experiments. The feedforward IPSCs could also be with the PV-IN recordings. This would be helpful in developing a cohesive picture of key findings. To that end, a working model or graphical abstract would be helpful.

      It doesn’t appear that this journal allows graphical abstracts, but we have added a model that summarizes the principal findings in the discussion.

      Comment 3: There are a lot of statistics punctuating the text in the Results. It can be hard to parse at times.

      We considered moving the statistics to tables, but this became unwieldy.

      Comment 4: The Discussion is quite long (10 paragraphs). Suggest consolidating to 3-4 most salient points.

      We appreciate this comment and have made some edits to the discussion, albeit without consolidating it to only 3-4 points.

    1. eLife Assessment

      This study provides a novel and critically important insight into the long-term use of DREADDs to modulate neuronal activity in nonhuman primates. The methods are compelling, demonstrating the peak dynamics and the subsequent stability of chemogenetic effects for 1.5 years, informing experimental designs and interpretation of highly impactful chemogenetic studies in macaques. The protocols, data, and outcomes can serve as guidelines for future experiments. Therefore, the findings will be of significant interest to the field of chemogenetics and may also be of broader interest to researchers and clinicians who seek to utilize viral vectors and/or related genetic technologies.

    2. Reviewer #1 (Public review):

      Summary:

      Inhibitory hM4Di and excitatory hM3Dq DREADDs are currently the most commonly utilized chemogenetic tools in the field of nonhuman primate research, but there is a lack of available information regarding the temporal aspects of virally-mediated DREADD expression and function. Nagai et al. investigated the longitudinal expression and efficacy of DREADDs to modulate neuronal activity in the macaque model. The authors demonstrate that both hM4Di and hM3Dq DREADDs reach peak expression levels after approximately 60 days and are stably expressed for a period of at least 1.5 years in the macaque brain. During this period, DREADDs effectively modulated neuronal activity, as evidenced by a variety of measures, including behavioural testing, functional imaging, and/or electrophysiological recording. Notably, some of the data suggest that DREADD expression may decline after two years. This is a novel finding and has important implications for the utilization of this technology for long-term studies, as well as its potential therapeutic applications. Lastly, the authors highlight that peak DREADD expression may be significantly influenced by the choice of viral titer and the expressed protein tag, emphasizing the importance of careful design and selection of viral constructs for neuroscientific research. This study represents a critical step in the field of chemogenetics, setting the scene for future development and optimization of this technology.

      Strengths:

      The longitudinal approach of this study provides important preliminary insights into the long-term utility of chemogenetics, which has not yet been thoroughly explored.

      The data presented are novel and inclusive, relying on well-established in vivo imaging methods, as well as behavioral and immunohistochemical techniques. The conclusions made by the authors are generally supported by a combination of these techniques. In particular, the utilization of in vivo imaging as a non-invasive method is translationally relevant and likely to make an impact in the field of chemogenetics, such that other researchers may adopt this method of longitudinal assessment in their own experiments. Rigorous standards have been applied to the datasets, and the appropriate controls have been included where possible.

      The number of macaque subjects (20) from which data was available is also notable. Behavioral testing was performed in 11 subjects, FDG-PET in 5, electrophysiology in 1, and [11C]DCZ-PET in 15. This is an impressive accumulation of work that will surely be appreciated by the growing community of researchers using chemogenetics in nonhuman primates.

      The implication that chemogenetic effects can be maintained for up to 1.5-2 years, followed by a gradual decline beyond this period, is an important development in knowledge. The limited duration of DREADD expression may present an obstacle in the translation of chemogenetic technology as a potential therapeutic tool, and it will be of interest for researchers to explore whether this limitation can be overcome. This study therefore represents a key starting point upon which future research can build.

      Weaknesses:

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data.

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.)

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only.

    3. Reviewer #2 (Public review):

      Summary

      This paper reports histological, PET imaging, functional, and behavioural data evaluating the longevity of AAV2 infection in multiple brain areas of macaques in the context of DREADD experiments. The central aim is to provide unprecedented information about how long the expression of HM4di or HM3dq receptors is expressed and efficient in modulating brain functions after vector injections. The data show peak expression after 40 to 60 days of vector injection, and stable expressions for up to 1.5 years for hM4di, and that hM3dq remained mostly at 75% of peak after a year, declining to 50% after 2 years. DREADDs effectively modulated neuronal activity and behaviour for approximately two years, evaluated with behavioral testings, neural recordings, or FDG-PET. A statistical evaluation revealed that vector titers, DREADD type, and tags contribute to the measured peak level of DREADD expression.

      The article presents a thorough discussion of the limitations and specificities of chemogenetic approaches in monkeys.

      Strength

      These are unique data, in non-human primates (NHP), an animal model that not only features physiological and immunological characteristics similar to humans but also contribute to neurobiological functional studies on a long timescale with experiments spanning months or years. This evaluation of the long-term efficacy of DREADDs will be very important for all laboratories using this approach in NHP but also for future use of such approach in experimental therapies. The longevity estimates are based on multiple approaches including behavioural and neurophysiological ones, thus providing information on the functional efficacy of DREADD expression.

      Performing such evaluation requires specific tools like PET imaging that very few monkey labs have access to in the world. This study was done by the laboratory that has developed the radiotracer c11-DCZ used here, a radiotracer binding selectively to DREADDs and providing, using PET, quantitative in vivo measures of DREADD expression. This study and its data should thus be a reference in the field, providing estimates to plan future chemogenetic experiments.

      Publishing databases of experimental outcomes in NHP DREADD experiments is crucial for the community because such experiments are rare, expensive, and long. It contributes to refining experiments and reducing the number of animals overall used in the domain.

      Weaknesses

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs.

    4. Reviewer #3 (Public review):

      Summary

      This manuscript, from the developers of the novel DREADD-selective agonist DCZ (Nagai et al., 2020), utilizes a unique dataset where multiple PET scans in a large number of monkeys, including baseline scans before AAV injection, 30-120 days post-injection, and then periodically over the course of the prolonged experiments, were performed to access short- and long-term dynamics of DREADD expression in vivo, and to associate DREADD expression with the efficacy of manipulating the neuronal activity or behavior. The goal was to provide critical insights into the practicality and design of multi-year studies using chemogenetics and to elucidate factors affecting expression stability.

      Strengths are systematic quantitative assessment of the effects of both excitatory and inhibitory DREADDs, quantification of both the short-term and longer-term dynamics, a wide range of functional assessment approaches (behavior, electrophysiology, imaging), and assessment of factors affecting DREADD expression levels, such as serotype, promoter, titer (concentration), tag, and DREADD type.

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision.

      These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data.

      We thank the reviewer for the thoughtful and constructive feedback. We are pleased that the reviewer found the overall conclusions of our paper to be well supported by the data, and we appreciate the suggestions for improving figure clarity and interpretive accuracy. Below we address each point raised:

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.)

      We agree that our interpretation should be stated more cautiously, given the limited number of cases assessed beyond the two-year timepoint. In the revised manuscript, we will clarify in both the Results and Discussion that the observed decline is based on a subset of animals. We will also state that while a consistent decline was observed in hM4Di-expressing monkeys, the trajectory for hM3Dq expression was more variable—with at least one case showing increased in signal beyond two years.

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient.

      We thank the reviewer for these helpful suggestions. In response, we will revise the relevant figures as noted in the “Recommendations for the authors”, including simplifying visual encodings and improving labeling. We will also provide a supplementary table listing the animal ID and brain regions for each data point shown in the graphs.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight.

      We thank the reviewer for raising this important issue. We agree that injection volume is a potentially confounding variable. In response, we will conduct an exploratory analysis including volume as an additional factor. We will also expand the Discussion to highlight the need for future systematic evaluation of injection volume, especially in relation to immune responses or transduction efficiency in different brain regions.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only.

      We appreciate this important clarification. In response, we will revise the title to “Factors influencing peak DREADD expression levels”, and we will specify that our analysis focused on peak ΔBP<sub>ND</sub> values around 60 days post-injection. We will also explicitly distinguish these findings from the later-stage changes in expression seen in the longitudinal PET data in both the Results and Discussion sections.

      Reviewer #2 (Public review):

      Weaknesses

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs.

      We thank the reviewer for bringing this important point to our attention. We fully agree that the retrospective nature of our dataset, compiled from multiple studies conducted within a single laboratory, introduces variability due to differences in constructs, injection sites, and timelines. While this reflects the real-world constraints of long-term NHP research, we acknowledge the need for more standardized approaches. We will add a statement in the revised Discussion emphasizing that future multicenter and harmonized studies would be valuable for systematically examining specific parameters and inter-individual variability.

      Reviewer #3 (Public review):

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision.

      These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

      We thank the reviewer for the positive assessment of our manuscript and for the constructive suggestions noted in the “Recommendations for the authors”. In response, we will carefully review and revise the manuscript to improve visualization and quantification.

    1. eLife Assessment

      This important study provides compelling insights into the differential impact of intrinsic and synaptic conductances on circuit robustness using computational models of the pyloric network from the crustacean stomatogastric ganglion. The results demonstrate that model networks are more sensitive to perturbations in intrinsic conductances than in synaptic conductances, highlighting the critical role of intrinsic plasticity in stabilizing neuronal networks. These findings underscore the importance of intrinsic plasticity, a crucial yet often overlooked factor in neuronal dynamics. The generality of these conclusions should be tested across diverse networks and functions.

    2. Reviewer #1 (Public review):

      The paper by Fournier et al. investigates the sensitivity of neural circuits to changes in intrinsic and synaptic conductances. The authors use models of the stomatogastric ganglion (STG) to compare how perturbations to intrinsic and synaptic parameters impact network robustness. Their main finding is that changes to intrinsic conductances tend to have a larger impact on network function than changes to synaptic conductances, suggesting that intrinsic parameters are more critical for maintaining circuit function.

      The paper is well-written, and the results are compelling. The authors addressed most of the minor comments I had and improved the manuscript.

      However, it remains unclear how general the results are and what the underlying mechanism is. Regarding generality, the authors changed the title and added a sentence in the discussion. At this point, they do not claim generality beyond the specific function they explore in the STG circuit. While this is acceptable, I still believe the paper would be much more insightful if it provided a more general statement and investigated the mechanism behind why, in their hands, synaptic parameters appear more resilient to changes than intrinsic parameters.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Fournier et al. investigates the sensitivity of neural circuits to changes in intrinsic and synaptic conductances. The authors use models of the stomatogastric ganglion (STG) to compare how perturbations to intrinsic and synaptic parameters impact network robustness. Their main finding is that changes to intrinsic conductances tend to have a larger impact on network function than changes to synaptic conductances, suggesting that intrinsic parameters are more critical for maintaining circuit function.

      The paper is well-written and the results are compelling, but I have several concerns that need to be addressed to strengthen the manuscript. Specifically, I have two main concerns:

      (1) It is not clear from the paper what the mechanism is that leads to the importance of intrinsic parameters over synaptic parameters.

      (2) It is not clear how general the result is, both within the framework of the STG network and its function, and across other functions and networks. This is crucial, as the title of the paper appears very general.

      I believe these two elements are missing in the current manuscript, and addressing them would significantly strengthen the conclusions. Without a clear understanding of the mechanism, it is difficult to determine whether the results are merely anecdotal or if they depend on specific details such as how the network is trained, the particular function being studied, or the circuit itself. Additionally, understanding how general the findings are is vital, especially since the authors claim in the title that "Circuit function is more robust to changes in synaptic than intrinsic conductances," which suggests a broad applicability.

      I do not wish to discourage the authors from their interesting result, but the more we understand the mechanism and the generality of the findings, the more insightful the result will be for the neuroscience community.

      Major comments

      (1) Mechanism

      While the authors did a nice job of describing their results, they did not provide any mechanism for why synaptic parameters are more resilient to changes than intrinsic parameters. For example, from Figure 5, it seems that there is mainly a shift in the sensitivity curves. What is the source of this shift? Can something be changed in the network, the training, or the function to control it? This is just one possible way to investigate the mechanism, which is lacking in the paper.

      (2) Generality of the results within the framework of the STG circuit

      (a) The authors did show that their results extend to multiple networks with different parameters (the 100 networks). However, I am still concerned about the generality of the results with respect to the way the models were trained. Could it be that something in the training procedure makes the synaptic parameters more robust than intrinsic parameters? For example, the fact that duty cycle error is weighted as it is in the cost function (large beta) could potentially affect the parameters that are more important for yielding low error on the duty cycle.

      (b) Related to (a), I can think of a training scheme that could potentially improve the resilience of the network to perturbations in the intrinsic parameters rather than the synaptic parameters. For example, in machine learning, methods like dropout can be used to make the network find solutions that are robust to changes in parameters. Thus, in principle, the results could change if the training procedure for fitting the models were different, or by using a different optimization algorithm. It would be helpful to at least mention this limitation in the discussion.

      (3) Generality of the function

      The authors test their hypothesis based on the specific function of the STG. It would be valuable to see if their results generalize to other functions as well. For example, the authors could generate non-oscillatory activity in the STG circuit, or choose a different, artificial function, maybe with different duty cycles or network cycles. It could be that this is beyond the scope of this paper, but it would be very interesting to characterize which functions are more resilient to changes in synapses, rather than intrinsic parameters. In other words, the authors might consider testing their hypothesis on at least another 'function' and also discussing the generality of their results to other functions in the discussion.

      (4) Generality of the circuit

      The authors have studied the STG for many years and are pioneers in their approach, demonstrating that there is redundancy even in this simple circuit. This approach is insightful, but it is important to show that similar conclusions also hold for more general network architectures, and if not, why. In other words, it is not clear if their claim generalizes to other network architectures, particularly larger networks. For example, one might expect that the number of parameters (synaptic vs intrinsic) might play a role in how resilient the function is with respect to changes in the two sets of parameters. In larger models, the number of synaptic parameters grows as the square of the number of neurons, while the number of intrinsic parameters increases only linearly with the number of neurons. Could that affect the authors' conclusions when we examine larger models?

      In addition, how do the authors' conclusions depend on the "complexity" of the non-linear equations governing the intrinsic parameters? Would the same conclusions hold if the intrinsic parameters only consisted of fewer intrinsic parameters or simplified ion channels? All of these are interesting questions that the authors should at least address in the discussion.

      We thank Reviewer #1 for their valuable input. We agree with the reviewer that generality of the results may have been overstated. To address this we changed the title of the manuscript to make it more specific to rhythmic circuits and we included a sentence to this effect in the discussion. 

      (1) We were more interested in knowing which set of conductances is more robust in a population of models, rather than a mechanism. If such a mechanism exists it will be the subject of a different study.

      (2) (a) It is impossible to explore the whole parameter space of these models. Our method to find circuits will leave subsets of circuits out of the study. Our sole goal in constructing the model database was that the activities were similar but the conductances were different.  (b) Of course one could devise a cost function targeting circuits that are more or less robust to changes in one parameter. Whether those exist is a different matter. This is not what we intended to do.

      (3) For this we would need a different circuit that produces non-oscillatory activity. A normal pyloric rhythm circuit always produces oscillatory activity unless it is “crashed"either by temperature or perturbations, but even in this case because we don’t have a proper “control” activity (circuits crash in different ways) we would not be able to utilize the same approach.

      We think it is a valuable idea to perform a similar study in another small circuit with nonoscillatory (or rhythmic) activities. 

      (4) We did not explore the issue of how our results generalize to larger networks as it would be pure speculation. It could be potentially interesting to do a similar sensitivity analysis with a large network trained to perform a simple task. Our understanding is that many large trained networks are extremely sensitive to perturbations in synaptic weights, at the same time that the intrinsic properties of neurons in ANN are typically oversimplified and identical across units. 

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents an important exploration of how intrinsic and synaptic conductances affect the robustness of neural circuits. This is a well-deserved question, and overall, the manuscript is written well and has a logical progression.

      The focus on intrinsic plasticity as a potentially overlooked factor in network dynamics is valuable. However, while the stomatogastric ganglion (STG) serves as a well-characterized and valuable model for studying network dynamics, its simplified structure and specific dynamics limit the generalizability of these findings to more complex systems, such as mammalian cortical microcircuits.

      Strengths:

      Clean and simple model. Simulations are carefully carried out and parameter space is searched exhaustively.

      Weaknesses:

      (1) Scope and Generalizability:

      The study's emphasis on intrinsic conductance is timely, but with its minimalistic and unique dynamics, the STG model poses challenges when attempting to generalize findings to other neural systems. This raises questions regarding the applicability of the results to more complex circuits, especially those found in mammalian brains and those where the dynamics are not necessarily oscillating. This is even more so (as the authors mention) because synaptic conductances in this study are inhibitory, and changes to their synaptic conductances are limited (as the driving force for the current is relatively low).

      (2) Challenges in Comparison:

      A significant challenge in the study is the comparison method used to evaluate the robustness of intrinsic versus synaptic perturbations. Perturbations to intrinsic conductances often drastically affect individual neurons' dynamics, as seen in Figure 1, where such changes result in single spikes or even the absence of spikes instead of the expected bursting behavior. This affects the input to downstream neurons, leading to circuit breakdowns. For a fair comparison, it would be essential to constrain the intrinsic perturbations so that each neuron remains within a particular functional range (e.g., maintaining a set number of spikes). This could be done by setting minimal behavioral criteria for neurons and testing how different perturbation limits impact circuit function.

      (3) Comparative Metrics for Perturbation:

      Another notable issue lies in the evaluation metrics for intrinsic and synaptic perturbations. Synaptic perturbations are straightforward to quantify in terms of conductance, but intrinsic perturbations involve more complexity, as changes in maximal conductance result in variable, nonlinear effects depending on the gating states of ion channels. Furthermore, synaptic perturbations focus on individual conductances, while intrinsic perturbations involve multiple conductance changes simultaneously. To improve fairness in comparison, the authors could, for example, adjust the x-axis to reflect actual changes in conductance or scale the data post hoc based on the real impact of each perturbation on conductance. For example, in Figure 6, the scale of the panels of the intrinsic (e.g., g_na-bar) is x500 larger than the synaptic conductance (a row below), but the maximal conductance for sodium hits maybe for a brief moment during every spike and than most of the time it is close to null. Moreover, changing the sodium conductance over the range of 0-250 for such a nonlinear current is, in many ways, unthinkable, did you ever measure two neurons with such a difference in the sodium conductance? So, how can we tell that the ranges of the perturbations make a meaningful comparison?

      We thank Reviewer #2 for their comments. We agree with both reviewers about scope and generalizability. We changed the title of the manuscript and included a sentence in the discussion to address this. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 63: Tau_b is tau in Fig 1B? What is the 'network period' tau_n? Both are defined in the methods, but it would be good to clarify here and also in the figure.

      This was fixed. Tau_b is the  bursting period and we indicated it in the figure. Network period means the period of the network activity. This was rewritten.  

      (2) Line 74: "maximal conductances g_i." What is i? I can imagine what you meant, but it would be good to clarify the notation.

      There are multiple different currents. Letter ‘i' is an index over the different types. It now reads as follows,

      "The activity of the network depends on the values of the maximal conductances g ̄ i, where i is an index corresponding to the different current types (Na,CaS,CaT,Kd,KCa,A,H,Leak IMI)"

      (3) Line 78: "conductances are changed by a random amount." How much is the "random amount"? In percentages? 

      We fixed this sentence. This is how it reads now, 

      "The blue trace in Figure 1C corresponds to the activity of the same model when each  of the intrinsic conductances is changed by a random amount within a range between 0  (completely removing the conductance) and twice its starting value, 2×gi, or equivalently, an increment of 100%."

      Similarly, in Line 87: "by a similar percent." Can you provide Figures 1E-F in percentages? Are the percentages the same?

      The phrase "by a similar percent.” Is misleading and unimportant. Thank you, we removed it. 

      (4) Line 113: Why did you add I_MI? Is it important for the results or for the conclusions?

      I_MI was added because the current is known to be there and it is not more or less important for the results or conclusions than any other current. 

      (5) Line 117: "We used a genetic algorithm to generate a database." Confusing. I guess you meant that you used genetic algorithms to optimize the cost function.

      Thank you for this comment. We fixed this sentence, see below. 

      “We used a genetic algorithm to optimize the cost function, and in this way generated a database of N = 100 models with different values of maximal conductances (Holland 88)."

      (6) Line 136: "The models in the database were constrained to produce solutions whose features were similar to the experimental measurements." Why are there differences in the features? Is this an optimization issue? I thought you wanted to claim that there are degenerate solutions, that is, solutions where the parameters are different, but the output is identical. Please clarify.

      The concept of degenerate solutions does not imply that the solutions are mathematically identical. In biology this means that they provide very similar functions, but do so with different underlying parameters (in this case, maximal conductances). The activity of the pyloric network is slightly different across animals, and it also changes over time within the same individual. Variation across models reflects individual variation in the biological circuit, and it is strength of our modeling approach. The function of the circuits are equally good because they produce biologically realistic patterns, although the details of the activity patterns show differences. 

      (7) Line 139: "distributed (p > 0.05)." What test did you use? N? Similarly, at Lines 218, 241, 239, etc. Please be more rigorous when reporting statistical tests.

      Thank you. We now specify the test we utilized every time we report a p value. 

      (8) Line 143: "In this case, it is not possible to identify clusters, suggesting that there are no underlying relationships between the features in the model database." The 2D plot is misleading, as the features are in 11 dimensions. Claims should be about the 11D space, not projections onto 2D. In fact, I don't think you can rule out correlations between the features based on the 2D plots. For example, shouldn't there be correlations between the on and off phases and the burst durations?

      Thank you. These sentences were confusing and were removed. We added the following sentence to the end of that paragraph.

      "Because the feature vectors are similar, their t-SNE projections do not form groups or clusters."

      (9) Related to this, I don't understand this sentence: "Even though the conductances are broadly distributed over many-fold ranges, the output of the circuits results in tight yet uncorrelated distributions.”

      This sentence is confusing and was removed. 

      (10) Line 158: Repetition of Line 152: Figure 3 shows the currentscapes of each cell in two model networks.

      We removed the second instance of the repeated sentences. 

      (11) Line 160: "yet the activity of the networks is similar." Well, they are similar, but not identical. I can also say that the current scapes are 'similar'. This should be better quantified and not left as a qualitative description.

      While this is an interesting point it will not change the results and conclusions of the present study. The network models are different since the values of their maximal conductances are distributed over wide ranges.  

      (12) Line 218: midpoint parameter? Is that b - the sharpness? Please be consistent. Regarding the mechanism (see above) - any ideas what leads to this shift in the sensitivity curves between the two types of parameters?

      Yes, we made a mistake. ‘b’ is the midpoint parameter. This was fixed in the text, thank you.

      (13) Figure 6 illustrates why synaptic parameters are more robust, but it is not quantified. Why not provide a quantitative measure for this claim? For example, calculate the colored area within the white square for each pair, for each cell, and for each model. Show that these measures can predict improved robustness for one model over another and for synaptic vs. intrinsic parameters.

      The ratio of areas of the colored and non-colored regions in the whole hyperboxes (for intrinsic and synaptic conductances) is the number reported in the y-axis of the sensitivity curves when we include all conductances (and not just a pair). 

      We computed the ratios of the colored/noncolored areas in all panels in figure 6 and now report these quantities as follows, 

      "We computed the proportions of areas of the white boxes that correspond to pyloric activity. These values for the intrinsic conductances panels are PD = 0.58, LP = 0.50, PY = 0.49, and the proportions for the synaptic conductances panels are PDPY = 0.62, P DLP = 0.87, and LPPD = 0.94. The occupied areas for synaptic conductances are larger than in the intrinsic conductances panels, consistent with our finding that the circuits’ activities are more robust to changes in synaptic conductances versus changes in intrinsic conductances."

      "As before, we computed the proportion of areas of pyloric activity within the white boxes: PD = 0.61, LP = 0.55, PY = 0.52, and the proportions for the synaptic conductances panels are PDPY = 0.88, PDLP = 0.87, and LPP D = 0.83. These results provide an intuition of the complexities of GP . Not only are these regions hard-to-impossible to characterize in one circuit, but they are also different across circuits.” 

      (14) Does the sign of the synaptic weights affect the conclusions?

      We did not explore this issue because all chemical synapses in this network are inhibitory.

      (15) Line 492: typo: deltai.

      We fixed this.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 301 - you can also add Williams and Fletcher 2019 Neuron.

      We added the reference. Thank you. 

      (2) Line 316 - this is a strange comment as these exact regions that were shown intrinsic plasticity (e.g., Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. "Compartmentalized dendritic plasticity and input feature storage in neurons." Nature 452.7186 (2008): 436-441).

      We did not understand this comment. 

      (3) I found only one citation for the work of Turrigiano, the most relevant of which is only mentioned in the Method section. This is odd, as her work directly relates how synaptic conductance perturbation results in changes in intrinsic conductance.

      We included more references to the work of Turrigiano to provide more context. 

      "Desai, Niraj S., Lana C. Rutherford, and Gina G. Turrigiano. "Plasticity in the intrinsic excitability of cortical pyramidal neurons." Nature neuroscience 2, no. 6 (1999): 515-520.” "Desai, Niraj S., Sacha B. Nelson, and Gina G. Turrigiano. "Activity-dependent regulation of excitability in rat visual cortical neurons." Neurocomputing 26 (1999): 101-106.”

      (4) Line 329 - The list of citations is very limited regarding studies of ext/int balance which started really way before 2009. Please give some of the credit to the classics.

      We included the following additional references.

      Van Vreeswijk, Carl, and Haim Sompolinsky. "Chaos in neuronal networks with balanced excitatory and inhibitory activity." Science 274, no. 5293 (1996): 1724-1726.

      Rubin, Ran, L. F. Abbott, and Haim Sompolinsky. "Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity." Proceedings of the National Academy of Sciences 114, no. 44 (2017): E9366-E9375.

      Wang, Xiao-Jing. "Macroscopic gradients of synaptic excitation and inhibition in the neocortex." Nature reviews neuroscience 21, no. 3 (2020): 169-178.

      Lo, Chung-Chuan, Cheng-Te Wang, and Xiao-Jing Wang. "Speed-accuracy tradeoff by a control signal with balanced excitation and inhibition." Journal of Neurophysiology 114, no. 1 (2015): 650-661.

      (5) In Figure 1B, why does it say 'OFF' when the neuron is spiking?

      The label indicates the interval of time elapsed between the first spike in the PD neuron (taken as a reference), and the last spike in the burst (PD off). 

      Summary of changes to figures:

      Figure 1:

      Fixed labels indicating bursting period and burst duration.

      Figure 5:

      Added labels in panels C and D specifying the symbol corresponding to the sigmoidal parameter.

      Additional changes

      We changed the title of the manuscript as follows:

      "Rhythmic circuit function is more robust to changes in  synaptic than intrinsic conductances." We included the following sentence at the end of the Discussion Section. 

      "We believe our results will hold for other rhythmic circuits and will be relevant for similar studies in other circuits with more complex functions.”

      We realized we made a mistake with the units for maximal conductances. They were incorrectly expressed in nS (nano Siemens) in the figure labels, and correctly expressed in micro Siemens in the methods section. This was fixed and now conductances are expressed in micro Siemens consistently in the manuscript.

    1. eLife Assessment

      This observational study from the UK Biobank provides an important investigation into the associations between menopausal hormone therapy and brain health in a large, population-based cohort of females in the UK. A convincing model of brain aging using an open source algorithm is used. While some modest adverse brain health characteristics were associated with current mHT use and older age at last use, the findings do not support a general neuroprotective effect of mHT nor severe adverse effects on the female brain. This work addresses a topic that is of grave importance since menopausal hormone therapy and its effect on the brain should be better understood in order to provide individualized effective medical support to women going through menopause.

    2. Reviewer #1 (Public review):

      Summary:

      This study takes a detailed approach to understand the effect of menopausal hormone therapy (MHT) in brain aging of females. Neuroimaging data from the UK Biobank is used to explore brain aging and shows an unexpected effect of current MHT use and poorer brain health outcomes relative to never users. There is considerable debate about the benefits of MHT and estrogens in particular for brain health, and this analysis illustrates thta the effects are certainly not straight forward and require greater considerations.

      Strengths:

      (1) The detailed approach to obtain important information about MHT use from primary care records. Prior studies have suggested that factors such as estrogen/progestin type, route of administration, duration, and timing of use relative to menopause onset can contribute to whether MHT benefits brain health.<br /> (2) Consideration of type of menopause (spontaneous, or surgical) in the analysis, as well as sensitivity diagnoses to rule out the effect being driven by those with clinical conditions<br /> (3) The incorporation of the brain age estimate along with hippocampal volume to address brain health<br /> (4) The complex data are also well explained and interpretations are reasonable.<br /> (5) Limitations of the UKbiobank data are acknowledged

      Weaknesses:

      These have since been addressed by the authors in the revision.

    3. Reviewer #2 (Public review):

      Summary:

      In this observational study, Barth et al. investigated the association between menopausal hormone therapy and brain health in middle- to older-aged women from the UK Biobank. The study evaluated detailed MHT data (never, current, or past user), duration of mHT use (age first/last used), history of hysterectomy with or without bilateral oophorectomy, APOEE4 genotype, and brain characteristics in a large, population-based sample. The researchers found that current mHT use (compared to never-users), but not past use, was associated with a modest increase in gray and white matter brain age gap (GM and WM BAG) and decrease in hippocampal volumes. No significant association was found between the age of mHT initiation and brain measures among mHT users. Longer duration of use and older age at last MHT use post-menopause were associated with higher GM and WM BAG, larger WMH volumes, and smaller hippocampal volumes. In a sub-sample, after adjusting for multiple comparisons, no significant associations were found between detailed mHT variables (formulations, route of administration, dosage) and brain measures. The association between mHT variables and brain measures was not influenced by APOEE4 allele carrier status. Women with a history of hysterectomy with or without bilateral oophorectomy had lower GM BAG compared to those without such history. Overall, these observational data suggest that the association between mHT use and brain health in women may vary depending on the duration of use and surgical history.

      Strengths:

      The study has several strengths, including a large, population-based sample of women in the UK, and comprehensive details of demographic variables such as menopausal status, history of oophorectomy/hysterectomy, genetic risk factors for Alzheimer's disease (APOE ε4 status), age at mHT initiation, age at last use, duration of mHT, and brain imaging data (hippocampus and WMH volume).

      In a sub-sample, the study accessed detailed mHT prescription data (formulations, route of administration, dosage, duration), allowing the researchers to study how these variables were associated with brain health outcomes. This level of detail is generally missing in observational studies investigating the association of mHT use with brain health.

      Weaknesses:

      While the study has many strengths, it also has some weaknesses. These weaknesses were properly discussed throughout the article. The manuscript has indicated that the need of mHT use which might be associated with these symptoms may be indicators of preexisting neurological changes, potentially reflecting worse brain health scores, including higher BAG and lower hippocampal volume and/or higher WMH. The authors noted that the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, limiting the study's ability to understand how these variables influence outcomes. The authors also highlighted that these results don't reflect causal relationships. The authors caution that these findings should not guide individual-level decisions regarding the benefits versus risks of mHT use. However, the study raises new questions that should be addressed by randomized clinical trials to investigate the varying effects of MHT on brain health and dementia risk.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study takes a detailed approach to understanding the effect of menopausal hormone therapy (MHT) in the brain aging of females. Neuroimaging data from the UK Biobank is used to explore brain aging and shows an unexpected effect of current MHT use and poorer brain health outcomes relative to never users. There is considerable debate about the benefits of MHT and estrogens in particular for brain health, and this analysis illustrates that the effects are certainly not straightforward and require greater consideration.

      Strengths:

      (1) The detailed approach to obtaining important information about MHT use from primary care records. Prior studies have suggested that factors such as estrogen/progestin type, route of administration, duration, and timing of use relative to menopause onset can contribute to whether MHT benefits brain health.

      (2) Consideration of type of menopause (spontaneous, or surgical) in the analysis, as well as sensitivity diagnoses to rule out the effect being driven by those with clinical conditions.

      (3) The incorporation of the brain age estimate along with hippocampal volume to address brain health.

      (4) The complex data are also well explained and interpretations are reasonable.

      (5) Limitations of the UK Biobank data are acknowledged

      We thank the reviewer for their time and the positive evaluation of our manuscript.

      Weaknesses:

      (1) Lifestyle factors are listed and the authors acknowledge group differences (at least between current users and never users of MHT). I was not able to find these analyses showing these differences.

      We highlighted and tested for group differences in lifestyle scores, and the results are shown in Table 1-3, column p-value. As highlighted in the method section (page 9): “The lifestyle score was calculated using a published formula (69), and included data on sleep, physical activity, nutrition, smoking, and alcohol consumption (see supplementary Note 3, Table S2)”. In line with reviewer 1 suggestion to the authors, we now included an additional table testing for group differences in the specific lifestyle factors constituting the lifestyle score in the supplementary materials (Table S2). Please find a more detailed response below (Recommendations for the authors, Response to Comment 1).

      (2) The distribution of women who were not menopausal was unequal across groups, and while the authors acknowledge this, one wonders to what extent this explains the observed findings.

      We agree with the reviewer that the unequal distribution of women across groups can influence the observed findings. We have made minor edits to highlight this important topic more explicitly in the discussion:

      Discussion (page 21): “Current MHT users were significantly younger than past- and never-users, and around 67 % were menopausal relative to over 80% in the past- and never-user groups. The unequal distribution of age and menopausal status across groups may have influenced the observed findings. For instance, a larger proportion of the current users might be in the perimenopausal phase, which is often associated with debilitating neurological and vasomotor symptoms (1). MHT is commonly prescribed to minimize such symptoms. Although MHT initiation during perimenopause has been associated with improved memory and hippocampal function, as well as lower AD risk later in life (15), the need for MHT might in itself be an indicator of neurological changes (71); here potentially reflected in higher BAG and lower hippocampal volumes. After the transition to menopause, symptoms might subside and some perimenopausal brain changes might revert or stabilize in the postmenopausal phase 5. Although the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, our results might be capturing subtle disturbances during perimenopause that later stabilize. This could explain why the largely postmenopausal groups of past MHT users and never-users present with lower GM and WM BAG than the current user group. Considering the critical window hypothesis emphasizing perimenopause as a key phase for MHT action (29,43), future longitudinal studies are crucial to clarify the interplay between neurological changes and MHT use across the menopause transition.”

      Discussion (page 25): “In addition, previous studies highlight that UK Biobank participants are considered healthier than the general population based on several lifestyle and health-related factors (89, 90). This healthy volunteer bias increases with age, likely resulting in a disproportionate number of healthier older adults. Together with the imbalance in age distributions across groups, this might explain the less apparent brain aging in the older MHT user groups. We have previously highlighted that age is negatively associated with the number of APOE ε4 carriers in the UK Biobank (21), which is indicative of survivor bias.”

      (3) While the interpretations are reasonable, and relevant theories (healthy cell & critical window) are mentioned, the discussion is missing a more zoomed-out perspective of the findings. While I appreciate wanting to limit speculation, the reader is left having to synthesize a lot of complex details on their own. A particularly difficult finding to reconcile is under what conditions these women benefit from MHT and when do they not (and why that may be).

      We thank the reviewer for this comment. As the presented data is cross-sectional and does not enable causal inference, we have refrained from a more zoomed-out interpretation of the results to avoid undue speculations. However, where applicable, we have discussed our findings in a broader context such as the effects of MHT use on the brain across the menopausal transition (discussion page 21) and the effects of MHT use on the brain in the presence and absence of bilateral oophorectomy and/or hysterectomy (discussion page 25).

      To best inform the reader about the scope of our paper, we would like to highlight the following sentences in our discussion (page 24):

      “The current work represents the most comprehensive study of detailed MHT data, APOE ε4 genotype, and several brain measures in a large population-based cohort to date. Overall, our findings do not unequivocally support general neuroprotective effects of MHT, nor do they indicate severe adverse effects of MHT use on the female brain. The results suggest subtle yet complex relationships between MHT’s and brain health, highlighting the necessity for a personalized approach to MHT use. Importantly, our analyses provide a broad view of population-based associations and are not designed to guide individual-level decisions regarding the benefits versus risks of MHT use.”

      And the conclusion (page 25): “In conclusion, our findings suggest that associations between MHT use and female brain health might vary depending on duration of use and past surgical history. Although the effect sizes were generally modest, future longitudinal studies and RCTs, particularly focused on the perimenopausal transition window, are warranted to fully understand how MHT use influences female brain health. Importantly, considering risks and benefits, decisions regarding MHT use should be made within the clinical context unique to each individual.”

      Reviewer #1 (Recommendations for the authors):

      Can the authors provide:

      (1) More information about which aspects of lifestyle factors were different between the groups, and how these factors may have contributed to the observed findings (if possible, without burying this information in the supplemental)?

      We thank the reviewer for this suggestion. We now added a table comparing lifestyle factors contained in the lifestyle score by MHT user status using t-tests (continuous variables) or χ2 tests (see Table S2). The results are referred to in the main manuscript result section under “Sample characteristics”, and the table (Table S2) is provided in the supplements not to overburden the main text, in line with input from reviewer 3.

      We updated the main text to refer to Table S2 and updated the supplementary Note 3 (page 2-3) to include the results of the comparison of the lifestyle factors contained in the lifestyle score by MHT user status.

      Methods, page 9:“The lifestyle score was calculated using a published formula (69), and included data on sleep, physical activity, nutrition, smoking, and alcohol consumption (see supplementary Note 3, Table S2).”

      Results, page 13: “Sample demographics including lifestyle score, stratified by MHT user group, surgical history among MHT users, and estrogen only MHT or combined MHT use, are summarized in Table 1, 2 and 3, respectively. MHT user group differences for each lifestyle factor contained in the lifestyle score are shown in Table S2.”

      “Note 3| Lifestyle Score

      The lifestyle score was calculated based on sleep duration, time spent watching television, current and past smoking status, alcohol consumption frequency, physical activity level (number of days per week of moderate/vigorous activity for at least 10 minutes), intake of fruits and vegetables, and intake of oily fish, beef, lamb/mutton, pork and processed meat (for details see (10)). Each unhealthy lifestyle factor was scored with 1 point (e.g., smoking), and participants points were summed to generate an unweighted score (from 0-9): the higher the lifestyle score, the unhealthier the participant’s lifestyle.

      A comparison of the lifestyle factors contained in the lifestyle score by MHT user status is presented in Table S2. In summary, we found that current MHT were more often smokers than never-users, had a higher alcohol intake than never- and past MHT users, reported the lowest fruit and vegetable intake relative to never-users and past MHT users, and stated lower moderate activity levels relative to past MHT users. Past MHT users reported higher alcohol intake than never-users, spend more time watching TV relative to never- and current-users, consumed more beef, pork, lamb/mutton, and processed meat than never-users, and reported lower vigorous activity levels relative to never-users. However, oily fish intake and fruit and vegetable intake was higher among past MHT users relative to never-and current-users. Self-reported sleep duration did not differ between MHT user groups.”

      (2) A greater description of the 2 main theories of MHT effects on the brain (healthy cell vs critical window). Can the authors also provide a more thorough explanation for how the findings fit with these theories.

      We thank the reviewer for this comment. We have described our findings in the context of the critical window hypothesis (discussion, page 21, paragraph 2), the healthy cell bias hypothesis (discussion, page 22, paragraph 3), and healthy user bias hypothesis (discussion, page 22, paragraph 4). We refrained from a more thorough explanation to avoid undue speculations.

      (3) Reflect more on what the findings may indicate as to who benefits from MHT, and why. There are some references that the authors may want to add, particularly related to recent findings from premenopausal bilateral oophortectomies that also speak to when (and for whom) MHT use might benefit.

      We thank the reviewer for this feedback. We have included additional references in the revised manuscript as follows:

      Discussion, page 23: “It is also possible that the timing between MHT use and surgery is more tightly controlled and therefore more beneficial for brain aging (43). For instance, studies suggest that MHT may mitigate the potential long-term adverse effects of bilateral oophorectomy before natural menopause on bone mineral density as well as cardiovascular, cognitive and mental health (79-81). In addition, a 2024 UK Biobank study found that ever used MHT was associated with decreased odds of Alzheimer’s disease in women with bilateral oophorectomy (82).”  

      (79) Blumel JE, Arteaga E, Vallejo MS, et al. Association of bilateral oophorectomy and menopause hormone therapy with mild cognitive impairment: the REDLINC X study. Climacteric 2022;25:195-202.

      (80) Kaunitz AM, Kapoor E, Faubion S. Treatment of Women After Bilateral Salpingo-oophorectomy Performed Prior to Natural Menopause. JAMA 2021;326:1429-1430.

      (81) Stuursma A, Lanjouw L, Idema DL, de Bock GH, Mourits MJE. Surgical Menopause and Bilateral Oophorectomy: Effect of Estrogen-Progesterone and Testosterone Replacement Therapy on Psychological Well-being and Sexual Functioning; A Systematic Literature Review. J Sex Med 2022;19:1778-1789.

      (82) Calvo N, McFall GP, Ramana S, et al. Associated risk and resilience factors of Alzheimer's disease in women with early bilateral oophorectomy: Data from the UK Biobank. J Alzheimers Dis 2024;102:119-128.

      Reviewer #2 (Public review):

      Summary:

      In this observational study, Barth et al. investigated the association between menopausal hormone therapy and brain health in middle- to older-aged women from the UK Biobank. The study evaluated detailed MHT data (never, current, or past user), duration of mHT use (age first/last used), history of hysterectomy with or without bilateral oophorectomy, APOEE4 genotype, and brain characteristics in a large, population-based sample. The researchers found that current mHT use (compared to never-users), but not past use, was associated with a modest increase in gray and white matter brain age gap (GM and WM BAG) and a decrease in hippocampal volumes. No significant association was found between the age of mHT initiation and brain measures among mHT users. Longer duration of use and older age at last MHT use post-menopause were associated with higher GM and WM BAG, larger WMH volumes, and smaller hippocampal volumes. In a sub-sample, after adjusting for multiple comparisons, no significant associations were found between detailed mHT variables (formulations, route of administration, dosage) and brain measures. The association between mHT variables and brain measures was not influenced by APOEE4 allele carrier status. Women with a history of hysterectomy with or without bilateral oophorectomy had lower GM BAG compared to those without such a history. Overall, these observational data suggest that the association between mHT use and brain health in women may vary depending on the duration of use and surgical history.

      Strengths:

      (1) The study has several strengths, including a large, population-based sample of women in the UK, and comprehensive details of demographic variables such as menopausal status, history of oophorectomy/hysterectomy, genetic risk factors for Alzheimer's disease (APOE ε4 status), age at mHT initiation, age at last use, duration of mHT, and brain imaging data (hippocampus and WMH volume).

      (2) In a sub-sample, the study accessed detailed mHT prescription data (formulations, route of administration, dosage, duration), allowing the researchers to study how these variables were associated with brain health outcomes. This level of detail is generally missing in observational studies investigating the association of mHT use with brain health.

      We thank the reviewer for their time and the positive evaluation of our manuscript.

      Weaknesses:

      (1) While the study has many strengths, it also has some weaknesses. As highlighted in an editorial by Kantarci & Manson (2023), women with symptoms such as subjective cognitive problems, sleep disturbances, and elevated vasomotor symptoms combined with sleep disturbances tend to seek mHT more frequently than those without these symptoms. The authors of this study have also indicated that the need of mHT use which might be associated with these symptoms may be indicators of preexisting neurological changes, potentially reflecting worse brain health scores, including higher BAG and lower hippocampal volume and/or higher WMH. However, among current users, how many of these women have these symptoms could not be reported in the study. Women with these vasomotor symptoms who are using mHT are more likely to stay longer in the healthcare system compared with those without these symptoms and no MHT use history. The authors noted that the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, limiting the study's ability to understand how these variables influence outcomes.

      We thank the reviewer for the succint synopsis of the limitations highlighted in discussion, page 21. We have now added the mentioned reference, 2023 editoral by Kantarci & Manson, to the discussion as well (see reference 71).

      Discussion (page 21): “Current MHT users were significantly younger than past- and never-users, and around 67 % were menopausal relative to over 80% in the past- and never-user groups. The unequal distribution of age and menopausal status across groups may have influenced the observed findings. For instance, a larger proportion of the current users might be in the perimenopausal phase, which is often associated with debilitating neurological and vasomotor symptoms (1). MHT is commonly prescribed to minimize such symptoms. Although MHT initiation during perimenopause has been associated with improved memory and hippocampal function, as well as lower AD risk later in life (15), the need for MHT might in itself be an indicator of neurological changes (71); here potentially reflected in higher BAG and lower hippocampal volumes. After the transition to menopause, symptoms might subside and some perimenopausal brain changes might revert or stabilize in the postmenopausal phase 5. Although the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, our results might be capturing subtle disturbances during perimenopause that later stabilize. This could explain why the largely postmenopausal groups of past MHT users and never-users present with lower GM and WM BAG than the current user group. Considering the critical window hypothesis emphasizing perimenopause as a key phase for MHT action (29,43), future longitudinal studies are crucial to clarify the interplay between neurological changes and MHT use across the menopause transition.”

      (2)  Earlier observational studies have reported conflicting results regarding the association between mHT use and the risk of dementia and brain health. Contrary to some observational studies, three randomized trials (WHI, KEEPS, ELITE) (Espeland et al 2013, Gleason et al 2015; Henderson et al 2016) demonstrated neither beneficial nor harmful effects of mHT (with varying doses and formulations) when initiated closer to menopause (<5 years). While strong efforts were made to run proper statistical analyses to investigate the association between mHT use and brain health, these results reflect mainly associations, but not causal relationships as also stated by the authors.

      We thank the reviewer for pointing that out.

      (3)  Furthermore, observational studies have intrinsic limitations, such as a lack of control over switching mHT doses and formulations, a lack of laboratory measures to confirm mHT use, and reliance on self-reported data, which may not always be reliable. The authors caution that these findings should not guide individual-level decisions regarding the benefits versus risks of mHT use. However, the study raises new questions that should be addressed by randomized clinical trials to investigate the varying effects of MHT on brain health and dementia risk.

      We thank the reviewer for making our efforts in providing proper disclaimers in the discussion visible.

      Reviewer #2 (Recommendations for the authors):

      (1) The study could benefit from extending these findings by adding plasma biomarkers of AD and PET imaging markers to further study the association of mHT variables with brain health.

      We agree with the reviewer that such markers would be beneficial for elucidating the association between MHT variables and brain health. Unfortunately, these markers are not readily available in the UK Biobank.

      (2) The study's reliance on a predominantly white cohort limits the generalizability of the findings to more diverse populations. This homogeneity may not capture the full spectrum of responses to MHT across different ethnic and genetic backgrounds.

      We fully agree with the reviewers statement and state this limitation in the discussion (page 25) as follows:

      “In addition to these inherent biases in aging cohorts, the ethnic background of the sample is homogeneous (> 96% white), further reducing the generalizability of the results.”

      (3) The study may benefit by editing the following information in the introduction: "In summary, WHIMS, HERS, and KEEPS mainly relied on orally administered CEE in older-aged or recently postmenopausal females." KEEPS used two routes and formulations (transdermal estradiol and oCEE, both with micronized progesterone).

      We thank the reviewer for catching this oversight. We removed the sentence to avoid ambiguities and revised the sentence specifically refering to the KEEPS study as follows:

      Introduction, page 3: “In contrast, administering oral CEE or transdermal estradiol plus micronized progesterone in recently postmenopausal females did not alter cognition in the Kronos Early Estrogen Prevention Study (KEEPS) (28).”

      (4) The study may benefit by editing the following statement in the introduction: "oral CEE use in combination with MPA seems to increase the risk for AD regardless of timing": I would suggest revising this statement, which is based on review article 29. The statement of the adverse effect of oCEE regardless of the time of start contradicts earlier randomized clinical findings. I think it is important to make a distinction between the outcomes of randomized control trials and observational studies. The WMIHS (Shumaker et al., 2003) (randomized control trial) reported that there was an increased risk of dementia for women (who were more than 10 years from the onset of menopause when the therapy was initiated) in oCEE + MPA compared to placebo. Two other long-duration randomized trials tested the effect of oral oestrogen and progesterone treatment on cognitive function in women who started treatment shortly after menopause (within 3 or 6 years) did not find evidence that treatment benefits or harms cognitive function compared with placebo (Gleason et al., 2015; Henderson et al., 2016). A short-term (4 months) randomized trial (Maki et al 2007 (Maki et al., 2007) (mentioned in ref 29) reported a potential negative effect of CEE/MPA on verbal memory in women who started HT shortly after menopause (within 3 years). The study did not investigate the risk of dementia, and the duration of use of HT was short-term.

      We thank the reviewer for this detailed input. After checking the provided references, we rephrased the sentence as follows:

      Introduction, page 4:“Although emerging evidence supports this hypothesis (30, 31), oral CEE use in combination with MPA has been found to increase the risk for memory decline regardless of timing (26, 29, 32).”

      We believe this formulation is more in line with the evidence provided by Shumaker et al. 2003, Maki et al. 2007 and the other references provided in the review paper by Maki and colleagues (mentioned in ref. 29). The reviewer further refers to Gleason et al. 2015 and Henderson et al. 2016, however both RCTs use micronized progesterone, not MPA, thereby not supporting the statement.

      (26) Shumaker SA, Legault C, Rapp SR, et al. Estrogen plus progestin and the incidence of dementia and mild cognitive impairment in postmenopausal women: the Women's Health Initiative Memory Study: a randomized controlled trial. JAMA 2003;289:2651-2662.

      (29) Maki PM. Critical window hypothesis of hormone therapy and cognition: a scientific update on clinical studies. Menopause 2013;20:695-709.

      (32) Maki PM, Gast MJ, Vieweg AJ, Burriss SW, Yaffe K. Hormone therapy in menopausal women with cognitive complaints: a randomized, double-blind trial. Neurology 2007;69:1322-1330.

      Reviewer #3 (Public review):

      In this study Barth et al. present results of detailed analyses of the relationships between menopausal hormone therapy (MHT), APOE ε4 genotype, and measures of anatomical brain age in women in the UK Biobank. While past studies have investigated the links between some of these variables (including works by the authors themselves), this new study adds more detailed MHT variables, surgical status, and additional brain aging measures. The UK biobank sample is large, but it is a population cohort and many of the MHT measures are self-reported (as the authors point out). However, the authors present a solid analysis of the available information which shows associations between MHT user status, length of MHT use, as well as surgical status with brain age. However, as the authors themselves state, the results do not unequivocally support the neuroprotective or adverse effect of MHT on the brain. I think this work strengthens the case for the need of better-designed longitudinal studies investigating the effect of MHT on the brain in the peri/post-menopausal stage.

      Strengths:

      (1) The authors addressed the statistical analyses rigorously. For example, multiple testing corrections, outlier removal, and sensitivity analysis were performed carefully. Ample background information is provided in the introduction allowing even individuals not familiar with the field to understand the motivation behind the work. The discussion section also does a great job of addressing open questions and limitations. Very detailed results of all statistical tests are provided either in the main text or in the supplementary information.

      We thank the reviewer for their time and the positive evaluation of our manuscript.

      Weaknesses:

      (1) For me, the biggest weakness was the presentation of the results. As many variables are involved and past studies have investigated several of these questions, it would have helped to better clarify the analysis and questions that are addressed by this study in particular and what sets this work apart from past studies. The information is present in the manuscript but better organization might have helped. For example, a figure depicting the key questions near the beginning of the manuscript would have been very helpful for me. The Tables also contain a lot of information but I wonder if there might be a way to capture the most relevant information more succinctly (either in Table format or in a figure) for the main text.

      We thank the reviewer for this comment. We do agree that with the large number of analyses it can be hard to keep an overview. We now added a Figure summarizing the main and sensitity analyses by sample.

      (2) Another concern I had was the linear models investigating the effects of these MHT variables on the brain age gap. The authors have included "age" as one of the parameters in this analysis. I wonder if adding a quadratic age factor age2 in the model might have improved the fit since many brain phenotypes tend to show quadratic brain age effects in the 40 to 80-year age range.

      We thank the reviewer for this suggestion. We have rerun the main analysis in the whole sample (model 1) with age squared as an additional covariate, and compared the gray matter brain age gap model fits using the corrected Akaike Information Criterion (AIC). All models with age squared had a better model fit than models without age squared (see Author response table 1). Hence, in the revised manuscript, we added a sensitivity analysis rerunning the model 1 with age squared to account for potential non-linear effect. The results were largely consistent. The manuscript was revised as follows to reflect the added analysis:

      Sensitivity analysis (Methods, Page 11): “To test whether the results were influenced by the inclusion of participants with ICD-10 diagnosis or by non-linear effects of age, the main analyses (models 1-2) were re-run excluding the sub-sample with diagnosed brain disorders (see supplementary Note 2) or adding age(2) as additional covariate, respectively.”

      Sensitivity analysis (Results, Page 20): “The results were consistent after removing participants with ICD-10 diagnoses known to impact the brain (see Table S9 for model 1 analyses and Table S10 for model 2 analyses), after additionally adjusting for age(2) (see Table S11), and after removing extreme values (see Table S12 for model 1 analyses).”

      Author response table 1.

      Gray matter brain age gap model selection based on corrected Akaike Information Criterion (AICc)

      Abbreviations and explanations of parameters: MHT = menopausal hormone therapy, K = number of estimated parameters for each model, AICc = the information criterion requested for each model, ΔAICc = the appropriate delta AIC component depending on the information criteria selectedModelLik = the relative likelihood of the model given the data, AICcWT = Akaike weights to indicate the level of support in favor of any given model being the most parsimonious among the candidate model sets, LL = log-likelihood of each model.

      Reviewer #3 (Recommendations for the authors):

      (1) Please note typo in Figures 2 and 3 legend "GM WM".

      We thank the reviewer for catching this typo and we changed it to BAG GM and BAG WM for all Figures for consistency.

    1. eLife Assessment

      This valuable study uses dynamic metabolic models to compare perturbation responses in a bacterial system, analyzing whether they return to their steady state or amplify beyond the initial perturbation. The evidence supporting the emergent properties of perturbed metabolic systems to network topology and sensitivity to specific metabolites is solid.

    2. Reviewer #1 (Public review):

      Summary:

      The author studied metabolic networks for central metabolism, focusing on how system trajectories returned to their steady state. To quantify the response, systematic perturbation was performed in simulation and the maximal destabilization away from steady state (compared with initial perturbation distance) was characterized. The author analyzed the perturbation response and found that sparse network and networks with more cofactors are more "stable", in the sense that the perturbed trajectories have smaller deviation along the path back to the steady state.

      Strengths and major contributions:

      The author compared three metabolic models and performed systematic perturbation analysis in simulation. This is the first work characterized how perturbed trajectories deviate from equilibrium in large biochemical systems and illustrated interesting findings about the difference between sparse biological systems and randomly simulated reaction networks.

      Discussion and impact for the field:

      Metabolic perturbation is an important topic in cell biology and has important clinical implication in pharmacodynamics. The computational analysis in this study provides an initiative for future quantitative analysis on metabolism and homeostasis.

      Comments on latest version:

      In the latest version of this work, the author included NADH, NADPH into the analysis, and perform some comparison about sensitivity analysis. I think this paper is ready to be finalized, and many open questions inspired from this work can be studied in future.

    3. Reviewer #2 (Public review):

      The authors have conducted a valuable comparative analysis of perturbation responses in three nonlinear kinetic models of E. coli central carbon metabolism found in the literature. They aimed to uncover commonalities and emergent properties in the perturbation responses of bacterial metabolism. They discovered that perturbations in the initial concentrations of specific metabolites, such as adenylate cofactors and pyruvate, significantly affect the maximal deviation of the responses from steady-state values. Furthermore, they explored whether the network connectivity (sparse versus dense connections) influences these perturbation responses. The manuscript is reasonably well written.

      Comments on latest version:

      The authors have adequately addressed my concerns.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reply to the comments of the second referee

      We sincerely appreciate the positive evaluation and the useful suggestions on our manuscript.

      (1) The authors identified key metabolites affecting responses to perturbations in two ways: (i) by fixing a metabolite's value and (ii) by performing a sensitivity analysis. It would be helpful for the modeling community to understand better the differences and similarities in the obtained results. Do both methods identify substrate-level regulators? Is freezing a metabolite's dynamics dramatically changing the metabolic response (and if yes, which ones are so different in the two cases)? Does the scope of the network affect these differences and similarities? 

      Thank you for these suggestions. We compared the Sobolʼ total sensitivity index with the absolute values of the change in the response coefficient (Figure S6 in the revised manuscript). There is no clear relationship between the two quantities. The Sobolʼ sensitivity analysis quantifies how a perturbation on the concentration of a metabolite X contributes to the overall dynamics. On the other hand, the analysis in which metabolitesʼ concentrations are fixed measures how strongly metabolite X helps propagate the perturbations on the other metabolites throughout the metabolic network. In other words, in the Sobolʼ analysis, we evaluate the outcome when the perturbation is applied directly to metabolite X, whereas in the fixing-metabolites analysis, we consider perturbations applied to other metabolites and assess how X influences those perturbations. We believe this conceptual difference explains why the two quantities do not correlate. We suspect that this lack of correlation is independent of the networkʼs scope, because each method evaluates a different aspect of the system.  We would say that both methods identify the effect of the metabolite dynamics on the overall dynamics whatever the form is, i.e. the methods do not distinguish the perturbation on the metabolite affecting the overall dynamics by whether the stoichiometric (reactant) way or, the substrate-level regulations. Thus, identifying the substrate-level regulation by utilizing the methods would be challenging. 

      (2) Regarding the issues the authors encountered when performing the sensitivity analysis, they can be approached in two ways. First, the authors can check the methods for computing conserved moieties nicely explained by Sauro's group (doi:10.1093/bioinformatics/bti800) and compute them for large-scale networks (but beware of metabolites that belong to several conserved pools). Otherwise, the conserved pools of metabolites can be considered as variables in the sensitivity analysis-grouping multiple parameters is a common approach in sensitivity analysis. 

      Thank you for this helpful suggestion. Following the method described in the reference, we have computed the Sobolʼ sensitivity index of NADH, NADPH, and Q8H2 (with their counterparts algebraically solved and treated as dependent variables). We have updated Figure S5 accordingly.

    1. eLife Assessment

      This important study includes convincing evidence to show that behavioral measures and hippocampal representations when animals use task-relevant information and ignore irrelevant information do not depend on the medial prefrontal cortex. The results are expected to be of interest to those studying neural mechanisms of cognitive control and functions of associational brain regions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors examine the role of the medial prefrontal cortex (mPFC) in cognitive control, i.e. the ability to use task-relevant information and ignore irrelevant information, in the rat. According to the central-computation hypothesis, cognitive control in the brain is centralized in the mPFC and according to the local hypothesis, cognitive control is performed in task-related local neural circuits. Using the place avoidance task which involves cognitive control, it is predicted that if mPFC lesions affect learning, this would support the central computation hypothesis whereas no effect of lesions would rather support the local hypothesis. The authors thus examine the effect of mPFC lesions in learning and retention of the place avoidance task. They also look at functional interconnectivity within a large network of areas that could be activated during the task by using cytochrome oxydase, a metabolic marker. In addition, electrophysiological unit recordings of CA1 hippocampal cells are made in a subset of (mPFC-lesioned or intact) animals to evaluate overdispersion, a firing property that reflects cognitive control in the hippocampus. The results indicate that mPFC lesions disrupted correlations of activity between functionally-related regions. Behaviorally, lesions did not impair place avoidance learning and retention (though flexibility was altered during conflict training). In addition, hippocampal place cell overdispersion was decreased in lesioned rats only in the absence of cognitive control challenge (pretraining). Cognitive control seen in hippocampal place cell activity (alternation of frame-specific firing) was not affected by the lesion. Overall, the absence of effects of mPFC lesions on cognitive control in the task or in hippocampal place cells firing support the local hypothesis.

      Strengths:

      Straightforward hypothesis: clarification of the involvement of the mPFC in the brain is expected and achieved. Appropriate use of fully mastered methods (active place avoidance task, electrophysiological unit recordings, measure of metabolic marker cytochrome oxidase) and rigorous analysis of the data. The conclusion is strongly supported by the data.

      Weaknesses:

      No notable weaknesses in the conception, making of the study and data analysis.

      Comments on revisions:

      The authors have satisfactorily addressed all my comments in the revised version.

    3. Reviewer #2 (Public review):

      Park et al. set out to test two competing hypotheses about the role of the medial prefrontal cortex (PFC) in cognitive control, the ability to use task-relevant cues and ignore task-irrelevant cues to guide behavior. The "central computation" hypothesis assumes that cognitive control relies on computations performed by the PFC, which then interacts with other brain regions to accomplish the task. Alternatively, the "local computation" hypothesis suggests that computations necessary for cognitive control are carried out by other brain regions that have been shown to be essential for cognitive control tasks, such as the dorsal hippocampus and the thalamus. If the central computation hypothesis is correct, PFC lesions should disrupt cognitive control. Alternatively, if the local computation hypothesis is correct, cognitive control would be spared after PFC lesions. The task used to assess cognitive control is the active place avoidance task in which rats must avoid a sector of a rotating arena using the stationary room cues and ignoring the local olfactory cues on the rotating platform. Performance on this task has previously been shown to be disrupted by hippocampal lesions and hippocampal ensembles dynamically represent the room and arena depending on the animal's proximity to the shock zone. They found no group (lesion vs. sham) differences in the three behavioral parameters tested: distance traveled, latency to enter the shock zone, and number of shock zone entries for both the standard task and the "conflict" task in which the shock zone was rotated by 180 degrees. The only significant difference was the savings index; the lesion group entered the new shock zone more often than the sham group during the first 5 minutes of the second conflict session. This deficit was interpreted as a cognitive flexibility deficit rather than a cognitive control failure. Next, the authors compared cytochrome oxidase activity between sham and lesion groups in 14 brain regions and found that only the amygdala shows significant elevation in the lesion vs. sham group. Pairwise correlation analysis revealed a striking difference between groups, with many correlations between regions lost in the lesion group (between reuniens and hippocampus, reuniens and amygdala and a correlation between dorsal CA1 and central amygdala that appeared in the lesion group and were absent in the sham group. Finally, the authors assessed dorsal hippocampal representations of the spatial frame (arena vs. room) and found no differences between lesion and sham groups. The only difference in hippocampal activity was reduced overdispersion in the lesion group compared to the sham group on the pretraining session only and this difference disappeared after the task began. Collectively, the authors interpret their findings as supporting the local computation hypothesis; computations necessary for cognitive control occur in brain regions other than the PFC.

      Strengths:

      The data were collected in a rigorous way with experimental blinding and appropriate statistical analyses.<br /> Multiple approaches were used to assess differences between lesion and sham groups, including behavior, metabolic activity in multiple brain regions, and hippocampal single unit recording.

      Weaknesses:

      Only male rats were used with no justification provided for excluding females from the sample.

      The conceptual framework used to interpret the findings was to present two competing hypotheses with mutually exclusive predictions about the impact of PFC lesions on cognitive control. The authors then use mainly null findings as evidence in support of the local computation hypothesis. They acknowledge that some people may question the notion that the active place avoidance task indeed requires cognitive control, but then call the argument "circular" because PFC has to be involved in cognitive control. This assertion does not address the possibility that the active place avoidance task simply does not require cognitive control.

      The authors did not link the CO activity with the behavioral parameters even though the CO imaging was done on a subset of the animals that ran the behavioral task nor do they make any attempt to interpret these findings in light of the two competing hypotheses posed in the introduction. Moreover, the discussion is lacking any mechanistic interpretations of the findings. For example, there are no attempts to explain why amygdala activity and its correlation with dCA1 activity might be higher in the PFC lesioned group.

      Publishing null results is important to avoid wasting animals, time, and money. This study's results will have a significant impact on how the field views the role of the PFC in cognitive control. Whether or not some people reject the notion that the active place avoidance task measures cognitive control, the findings are solid and can serve as a starting point for generating hypotheses about how brain networks change when deprived of PFC input.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Park and colleagues investigated how the medial prefrontal cortex (mPFC) influences behavior and hippocampal place cell activity during a two-frame active place avoidance task in rats. Rats learned to avoid the location of mild shock within a rotating arena, with the shock zone being defined relative to distal cues in the room. Permanent chemical lesions of the mPFC did not impair the ability to avoid the shock zone by using the distal cues and ignoring proximal cues in the arena. In parallel, hippocampal place cells alternated between two spatial tuning patterns, one anchored to the distal cues and the other to the proximal cues, and this alteration was not affected by the mPFC lesion. Based on these findings, the authors argue that the mPFC is not essential for differentiating between task-relevant and irrelevant information.

      Strengths:

      This study was built on substantial work by the Fenton lab that validated their two-frame active place avoidance task and provided sound theoretical and analytical foundations. Additionally, the effectiveness of mPFC lesions was validated by several measures, enabling the authors to base their argument on the lack of lesion effects on behavior and place cell dynamics.

      Weaknesses:

      The authors define cognitive control as "the ability to judiciously use task-relevant information while ignoring salient concurrent information that is currently irrelevant for the task." (Lines 77-78). This definition is much simpler than the one by Miller and Cohen: "the ability to orchestrate thought and action in accordance with internal goals (Ref. 1)" and by Robbins: "processes necessary for optimal scheduling of complex sequence of behaviour." (Dalley et al., 2004, PMID: 15555683). Differentiating between task-relevant and irrelevant information is required in various behavioral tasks, such as differential learning, reversal learning, and set-shifting tasks. Previous rodent behavioral studies have shown that the integrity of the mPFC is necessary for set-shifting but not for differential or reversal learning (e.g., Enomoto et al., 2011, PMID: 21146155; Cho et al., 2015, PMID: 25754826). In the present task design, the initial training is a form of differential learning between proximal and distal cues, and the conflict training is akin to reversal learning. Therefore, the lack of lesion effects is somewhat expected. It would be interesting to test whether mPFC lesions impair set-shifting in their paradigm (e.g., the shock zone initially defined by distal cues and later by proximal cues). If the mPFC lesions do not impair this ability and associated hippocampal place dynamics, it will provide strong support for the authors' local-computation hypothesis.

      Comments on revisions:

      The authors fully addressed my comments. I do not have any additional suggestions.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary:

      The authors examine the role of the medial prefrontal cortex (mPFC) in cognitive control, i.e. the ability to use task-relevant information and ignore irrelevant information, in the rat. According to the central-computation hypothesis, cognitive control in the brain is centralized in the mPFC and according to the local hypothesis, cognitive control is performed in task-related local neural circuits. Using the place avoidance task which involves cognitive control, it is predicted that if mPFC lesions affect learning, this would support the central computation hypothesis whereas no effect of lesions would rather support the local hypothesis. The authors thus examine the effect of mPFC lesions in learning and retention of the place avoidance task. They also look at functional interconnectivity within a large network of areas that could be activated during the task by using cytochrome oxidase, a metabolic marker. In addition, electrophysiological unit recordings of CA1 hippocampal cells are made in a subset of (lesioned or intact) animals to evaluate overdispersion, a firing property that reflects cognitive control in the hippocampus. The results indicate that mPFC lesions do not impair place avoidance learning and retention (though flexibility is altered during conflict training), do not affect cognitive control seen in hippocampal place cell activity (alternation of frame-specific firing), a measure of location-specific firing variability, in pretraining. It nevertheless has some effect on functional interconnections. The results overall support the local hypothesis. 

      Strengths:

      Straightforward hypothesis: clarification of the involvement of the mPFC in the brain is expected and achieved. Appropriate use of fully mastered methods (behavioral task, electrophysiological recordings, measure of metabolic marker cytochrome oxidase) and rigorous analysis of the data. The conclusion is strongly supported by the data. 

      Weaknesses:

      No notable weaknesses in the conception, making of the study, and data analysis. The introduction does not mention important aspects of the work, i.e. cytochrome oxidase measure and electrophysiological recordings. The study is actually richer than expected from the introduction. 

      The revised Introduction now includes:

      “We used cytochrome oxidase, a metabolic marker of baseline neuronal activity, to confirm the mPFC lesions were effective and that there are non-local network consequences despite the local lesion. We first evaluated cytochrome oxidase activity in regions known to be associated with performance in the active place avoidance task, or regions with known connectivity to the mPFC. We then evaluated covariance of activity amongst the regions in an effort to detect network consequences of the lesion.”

      Reviewer #2 (Public review): 

      Park et al. set out to test two competing hypotheses about the role of the medial prefrontal cortex (PFC) in cognitive control, the ability to use task-relevant cues and ignore taskirrelevant cues to guide behavior. The "central computation" hypothesis assumes that cognitive control relies on computations performed by the PFC, which then interacts with other brain regions to accomplish the task. Alternatively, the "local computation" hypothesis suggests that computations necessary for cognitive control are carried out by other brain regions that have been shown to be essential for cognitive control tasks, such as the dorsal hippocampus and the thalamus. If the central computation hypothesis is correct, PFC lesions should disrupt cognitive control. Alternatively, if the local computation hypothesis is correct, cognitive control would be spared after PFC lesions. The task used to assess cognitive control is the active place avoidance task in which rats must avoid a section of a rotating arena using the stationary room cues and ignoring the local olfactory cues on the rotating platform. Performance on this task has previously been shown to be disrupted by hippocampal lesions and hippocampal ensembles dynamically represent the room and arena depending on the animal's proximity to the shock zone. They found no group (lesion vs. sham) differences in the three behavioral parameters tested: distance traveled, latency to enter the shock zone, and number of shock zone entries for both the standard task and the "conflict" task in which the shock zone was rotated by 180 degrees. The only significant difference was the savings index; the lesion group entered the new shock zone more often than the sham group during the first 5 minutes of the second conflict session. This deficit was interpreted as a cognitive flexibility deficit rather than a cognitive control failure. Next, the authors compared cytochrome oxidase activity between sham and lesion groups in 14 brain regions and found that only the amygdala showed significant elevation in the lesion vs. sham group. Pairwise correlation analysis revealed a striking difference between groups, with many correlations between regions lost in the lesion group (between reuniens and hippocampus, reuniens and amygdala and a correlation between dorsal CA1 and central amygdala that appeared in the lesion group and were absent in the sham group. Finally, the authors assessed dorsal hippocampal representations of the spatial frame (arena vs. room) and found no differences between lesion and sham groups. The only difference in hippocampal activity was reduced overdispersion in the lesion group compared to the sham group on the pretraining session only and this difference disappeared after the task began. Collectively, the authors interpret their findings as supporting the local computation hypothesis; computations necessary for cognitive control occur in brain regions other than the PFC. 

      Strengths:

      (1) The data were collected in a rigorous way with experimental blinding and appropriate statistical analyses. 

      (2) Multiple approaches were used to assess differences between lesion and sham groups, including behavior, metabolic activity in multiple brain regions, and hippocampal singleunit recording. 

      Weaknesses:

      (1) Only male rats were used with no justification provided for excluding females from the sample.

      This is a weakness we acknowledge. The experiments were performed at a time when we did not have female rats in the lab.

      (2) The conceptual framework used to interpret the findings was to present two competing hypotheses with mutually exclusive predictions about the impact of PFC lesions on cognitive control. The authors then use mainly null findings as evidence in support of the local computation hypothesis. They acknowledge that some people may question the notion that the active place avoidance task indeed requires cognitive control, but then call the argument "circular" because PFC has to be involved in cognitive control. This assertion does not address the possibility that the active place avoidance task simply does not require cognitive control. 

      We beg to differ that the possibility was not addressed. Prior to making the assertion, the manuscript describes the evidence that the active place avoidance task requires cognitive control. The evidence is multifold, and includes task design, behavior, and electrophysiology; we argue that this is more evidence than has been provided for other tasks that are asserted to require cognitive control. Specifically line 417 states:

      “We have previously demonstrated cognitive control in the active place avoidance task variant we used (Fig. 1) because the rats must ignore local rotating place cues to avoid the stationary shock zone. Even when the arena does not rotate, rats distinctly learn to avoid the location of shock according to distal visual room cues and local olfactory arena cues, such that the distinct place memories can be independently manipulated using probe trials [49, 50]. When the arena rotates as in the present studies, neural manipulations that impair the place avoidance are no longer impairing when the irrelevant arena cues are hidden by shallow water [14, 15, 51, 52]. Furthermore, persistent hippocampal neural circuit changes caused by active place avoidance training are not detected when shallow water hides the irrelevant arena cues to reduce the cognitive control demand [10, 31, 33]. While these findings unequivocally demonstrate the salience of relevant stationary room cues to use for avoiding shock and irrelevant arena cues to ignore during active place avoidance, the most compelling evidence of cognitive control comes from recording hippocampal ensemble discharge. Hippocampal ensemble discharge purposefully represents current position using stationary room information when the subject is close to the stationary shock zone and alternatively represents rotating arena information when the mouse is far from the stationary shock zone [Fig. 4; 10].”

      Line 436, however, acknowledges a fact that will always be true: no matter what anyone opines - until there are universally agreed upon objective criteria, it is logically possible that active place avoidance does not require cognitive control. The revision states: Despite this evidence from task design, behavioral observations, and direct electrophysiological representational switching as required to directly demonstrate cognitive control, one might still argue that it is logically possible that the active place avoidance task does not require cognitive control and this is why the mPFC lesion did not impair place avoidance of the initial shock zone. We consider such reasoning to be unproductive because it presumes that only tasks that require an intact mPFC can be cognitive control tasks. We nonetheless acknowledge that for some, we have not provided sufficient evidence that the active place avoidance requires cognitive control.

      “We assert the evidence is compelling, and together these findings require rejecting the central-computation hypothesis that the mPFC is essential for the neural computations that are necessary for all cognitive control tasks.”

      (3) The authors did not link the CO activity with the behavioral parameters even though the CO imaging was done on a subset of the animals that ran the behavioral task nor did they make any attempt to interpret these findings in light of the two competing hypotheses posed in the introduction. Moreover, the discussion lacks any mechanistic interpretations of the findings. For example, there are no attempts to explain why amygdala activity and its correlation with dCA1 activity might be higher in the PFC lesioned group. 

      The CO study was performed to assess the effects of the lesion, as stated on line 262 “Cytochrome oxidase (CO), a sensitive metabolic marker for neuronal function [27], was used to evaluate whether lesion effects were restricted to the mPFC.” Furthermore, as a matter of fact, line 411 states “Thus, CO imaging and electrophysiological evidence identify changes in the brain beyond the directly damaged mPFC area. In particular, the dorsal hippocampus loses the inhibitory input from mPFC [45, 46] and loses the metabolic correlation with the nucleus reuniens, which is thought to be a relay between the mPFC and the dorsal hippocampus [47, 48].”

      These CO measures assess baseline metabolic function and so it would be inappropriate to correlate them with the measures of behavior. Because the lesion and control groups do not differ on most measures of behavior, a relationship to CO measures is not expected. Importantly, even if there were differences in correlations between CO activity and behavioral measures, what could they mean? The study was designed to distinguish between two hypotheses, not to determine what CO differences could mean for behavior. As such, it is not at all clear how metabolic consequences of the lesion relate to the two hypotheses being evaluated, and so we consider it inappropriate to speculate. We did examine, and now include, the correlation between lesion size and conflict behavior. The Fig. 1 legend states “Savings was not related to lesion size r = 0.009, p = 0.98. *p < 0.05.”

      (4) Publishing null results is important to avoid wasting animals, time, and money. This study's results will have a significant impact on how the field views the role of the PFC in cognitive control. Whether or not some people reject the notion that the active place avoidance task measures cognitive control, the findings are solid and can serve as a starting point for generating hypotheses about how brain networks change when deprived of PFC input. 

      We thank the reviewer for the acknowledgement.

      Reviewer #3 (Public review): 

      Summary:

      This study by Park and colleagues investigated how the medial prefrontal cortex (mPFC) influences behavior and hippocampal place cell activity during a two-frame active place avoidance task in rats. Rats learned to avoid the location of mild shock within a rotating arena, with the shock zone being defined relative to distal cues in the room. Permanent chemical lesions of the mPFC did not impair the ability to avoid the shock zone by using distal cues and ignoring proximal cues in the arena. In parallel, hippocampal place cells alternated between two spatial tuning patterns, one anchored to the distal cues and the other to the proximal cues, and this alteration was not affected by the mPFC lesion. Based on these findings, the authors argue that the mPFC is not essential for differentiating between task-relevant and irrelevant information. 

      Strengths:

      This study was built on substantial work by the Fenton lab that validated their two-frame active place avoidance task and provided sound theoretical and analytical foundations. Additionally, the effectiveness of mPFC lesions was validated by several measures, enabling the authors to base their argument on the lack of lesion effects on behavior and place cell dynamics. 

      Weaknesses:

      The authors define cognitive control as "the ability to judiciously use task-relevant information while ignoring salient concurrent information that is currently irrelevant for the task." (Lines 77-78). This definition is much simpler than the one by Miller and Cohen: "the ability to orchestrate thought and action in accordance with internal goals (Ref. 1)" and by Robbins: "processes necessary for optimal scheduling of complex sequence of behaviour." (Dalley et al., 2004, PMID: 15555683). Differentiating between task-relevant and irrelevant information is required in various behavioral tasks, such as differential learning, reversal learning, and set-shifting tasks. Previous rodent behavioral studies have shown that the integrity of the mPFC is necessary for set-shifting but not for differential or reversal learning (e.g., Enomoto et al., 2011, PMID: 21146155; Cho et al., 2015, PMID: 25754826). In the present task design, the initial training is a form of differential learning between proximal and distal cues, and the conflict training is akin to reversal learning. Therefore, the lack of lesion effects is somewhat expected. It would be interesting to test whether mPFC lesions impair set-shifting in their paradigm (e.g., the shock zone initially defined by distal cues and later by proximal cues). If the mPFC lesions do not impair this ability and associated hippocampal place dynamics, it will provide strong support for the authors' local computation hypothesis.

      Thank you for these comments. In addressing them we have provided a significant revision to the manuscript’s Introduction. While authors like those cited by the reviewer have defined cognitive control, those definitions are difficult to test rigorously, as it is almost a matter of opinion whether a subject is displaying “the ability to orchestrate thought and action in accordance with internal goals" or whether they are using "processes necessary for optimal scheduling of complex sequence of behaviour." What would such definitions of cognitive control predict about neuronal activity? We have deliberately used a simple, operational definition of cognitive control because it is physiologically testable. In the revision, starting at line 93, we have provided an excerpt from Miller and Cohen (2001) with discussion. The importance of that work is that it provides explicit neuronal criteria and a means to operationally define cognitive control. As stated on Line 118 “Accordingly, cognitive control would be at work when there is sustained neuronal network representations of task-relevant information that suppresses or gates representations of salient task-irrelevant information in accord with purposeful judicious behavior.”

      We used a R+A- task variant in which there is a stationary room-frame shock zone and task irrelevant arena-frame information. A strict correspondence to shift-shifting task design cannot be accomplished with active place avoidance because an A+R- task that requires avoiding an arena-frame shock zone in the absence of a room-frame shock zone can be accomplished trivially if the subject chooses to not move when it is in a place with no shock. However, the R+A+ task variant is readily learned, in which there is both a room-frame and an arena-frame shock zone (see cited work below). This task variant requires the subject to judiciously shift between avoiding the room-frame shock zone using stationary room information and avoiding the arena-frame shock zone using rotating arena information. This R+A+ task variant might meet the reviewer’s criteria for cognitive control. We have recorded hippocampal and entorhinal ensemble activity during the R+A+ task variant and it is very similar to the activity during the R+A- task we used. Nonetheless, future work will investigate the efect of mPFC lesion on the R+A+ task variant.

      Cited work:

      Fenton AA, Wesierska M, Kaminsky Y, Bures J (1998), Both here and there: simultaneous expression of autonomous spatial memories in rats. Proc Natl Acad Sci U S A 95:11493-11498. Kelemen E, Fenton AA (2010), Dynamic grouping of hippocampal neural activity during cognitive control of two spatial frames. PLoS Biol 8:e1000403.

      Burghardt NS, Park EH, Hen R, Fenton AA (2012), Adult-born hippocampal neurons promote cognitive flexibility in mice. Hippocampus 22:1795-1808.

      Park EH, Keeley S, Savin C, Ranck JB, Jr., Fenton AA (2019), How the Internally Organized Direction Sense Is Used to Navigate. Neuron 101:1-9.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      (1) Incorporate the cytochrome oxidase and hippocampal recordings (rationale and hypothesis) in the introduction, explaining how these aspects are relevant to the general question. 

      We have done this as requested. See lines 159-173 of the revised introduction.

      (2) Figure 1C. On Day 4-5 (conflict training) in which the shock zone was relocated 180 deg from the initial location, the behavioral tracks did not show any presence of the rat in this sector (in particular for the lesion example). Figure 4 nevertheless indicates that entrances have been made (which was expected since rats have to know that the shock zone was relocated).

      Thanks for pointing this out. The tracks are from the end of the sessions. The labels have been changed to specify which trials the tracks are from.

      (3) Figure 1C. The caption is huge as it contains the statistical analyses details. I would prefer to have these details in the text and keep the caption at a "reasonable" length. At the end of the caption (l. 190-191), it would be less confusing the keep the numbering of the training days: replace D1T1 with D2T1 and D2T9 with D3T9).

      The statistical details have been relocated to the main text and the numbering updated, as suggested, thank you.

      (4) It was not inconsiderable to show that mPFC lesion had some effects in the present task if it were only to validate the effectiveness of the lesion. This brain area has been shown to be important for planning, cognitive flexibility, etc. Indeed the authors found that the saving index was greater in sham than in mPFC rats (overdispersion in hippocampal firing was also reduced in pretraining) and interpreted this result as impaired flexibility. Would an alternative explanation be a memory deficit? I nevertheless expected that impaired flexibility in mPFC rats would be expressed in conflict trials in the form of more entrances in the zone that was initially not associated with shock (at least in the first trials of Day 4). But it appears to not be the case.

      A memory deficit is unlikely to explain the difference between the groups on the first trial of Day 5. Memory in the lesion rats was tested multiple times, specifically at the start of each trial (time to first entrance), including on the 24-h retention test, and no deficits were observed. Performance on Day 9 trial 1 is worse in the lesion group than in the controls, but it is not parsimonious to attribute this to a simple memory deficit since 24-h memory was good and similar between lesion and control rats on days 3 and 4, and memory on Day 5 was equally poor in both the lesion and control rats, as measured by time to first entrance.  

      (5) Material and methods. The injected volume of ibotenic acid should be mentioned. 

      The volume 0.2 µl was added. See line 531.

      (6) The rationale for doing the conflict training session should be indicated somewhere. 

      The rationale was provided. See lines 204-208.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 132: The text states that all sham rats improved and only 6/10 lesion rats improved is followed by a t-test, which tests the difference between means; it does not compare proportions. Also, what criterion was used to determine if an improvement was seen or not? 

      The statistical comparison is provided (now lines 230: test of proportions z = 2.3, p = 0.03). Improvement was simply numerically fewer entrances.

      (2) Line 138: This is a very long and confusing sentence. Consider revising for clarity. 

      The sentence (now line 234) was revised.

      (3) Figure 1B only includes data from 3 animals. Most published studies show the whole dataset by presenting the largest and smallest lesions. 

      Supplemental Figure S2 was added with all the lesions depicted and quantified.

      (4) Figure 1C suggestion to make the schematic shock zone line up with the shock zone shown for the tracking data. 

      Graphically, it looks better as drawn as it uses to perspective to depict a three-dimensional structure.

      (5) Methods: Clarify if the shock zone location was the same across all rats. 

      Line 570 states that the shock zone was the same for all rats.

      (6) Line 158: "Behavioral tracks" is not clear. Suggest more precise wording.

      Reworded to “Tracked room-frame positions” (now line 249)

      (7) Line 166: "effect of trial" - should this be the main effect of trial?; "interaction" - should this be "group x trial" interaction? 

      Reworded (now line 181).

      (8) Line 167: "or their interaction" is awkward in the context of the sentence. 

      Reworded (now line 182).

      (9) Line 182: Avoid talking about "trends" as if they are almost significant unless the authors suspect that they did not have sufficient statistical power to detect differences. In that case, a power analysis should be provided. 

      Removed.

      (10) Line 190: "left:...right..." is hard to follow, especially with acronyms like D1T1. Consider revising for clarity. 

      Revised (now lines 246-248).

      (11) Line 195: "effectiveness of the PFC to impair" is unnecessarily verbose. 

      Reworded (now lines 255-257).

      (12) Savings results: There is a lot of variability in the lesion group. It would be interesting to know if the extent of the lesion correlates with savings.

      Savings was not related to lesion. See line 259.

      (13) Line 300: The thalamic recording results are not reported in the results section (other than appearing in the table). Moreover, there is no detail about which thalamic nucleus these recordings are from.

      Lines 411 and 614 provides these details.  

      (14) Line 312: "no longer impair" contains a grammatical error. 

      Corrected (now line 422)

      (15) Line 325: "was not impairing" contains a grammatical error. 

      Corrected (now line 437).

      (16) Line 327: The sentence ending with "...opinion of others" seems unnecessarily confrontational. 

      Previous reviewers at other journals have maintained this position, we therefore included such a strong statement in our initial submission. However, we now revised this statement to avoid appearing confrontational.

      (17) Line 329: Sentence is awkward. Consider revising. 

      Revised (now line 443).

      (18) Line 384: The authors should disclose if there was an objective metric for determining the adequacy of the lesion. 

      The lesion assessment and quantification is better explained in the Methods under “Cytochrome oxidase activity and Nissl staining,” (lines 708-714).

      (19) Line 385: The authors should clarify how they got from 15 rats (Line 376) to 10. 

      This information is provided in the methods.

      (20) Line 390: It is not clear why skin irritation in the cage mate would prevent the rat from being tested. 

      This has been explained in the Methods under “Behavioral analysis followed by cytochrome oxidase activity” (lines 515-518).

      (21) Methods section: The authors should describe how the tracking data were acquired. Overhead camera? Tracker based on luminance or body position? What software program was used? What was the sampling rate? 

      This is now better explained in the Methods under “Active place avoidance task) (lines 538551).

      (22) Methods section: Include how fast the arena was rotating and other details about the task such as where rats were placed during the ITI. 

      Better explained in the Methods under “Active place avoidance task”.

      (23) Line 439: The recording system used (hardware & software) should be stated. 

      This is now included in the Methods (line 538).

      (24) Line 435: Though overdispersion calculation is described thoroughly, there is nothing in the paper that tells me what overdispersion means. 

      What the measure means is now described in the Methods under “Electrophysiology data analysis” (lines 646-650).

      (25) Line 561: The test used to assess effect sizes should be stated. 

      Effect sizes corresponding to the statistical tests are provided.

      Reviewer #3 (Recommendations for the authors): 

      (1) At the end of the conflict training, rats with mPFC lesions learned to avoid the new shock zone (Figure 1F, Block 16), but their place cells did not show room-preferring activity near the shock zone (Figure 4B). This observation questions whether spatial frame-specific representation is relevant for active avoidance. Can the authors clarify this point?

      This is a dynamic behavior and the hippocampal dynamics match, changing with a dynamic that is a few seconds, as we have shown in several published papers. The lack of a preference averaged over 20 minutes when the rats are avoiding both the current and former shock zones during the conflict session is pretty much what would be expected from such a coarse measurement. The important measure is the spatially-resolved measure of room versus arena preference. Figure 4B shows that in the lesion rats there is less of a frame preference during conflict, generally (consistent with poorer flexibility). However, Figure 4D quantifies the frame preference near and far from the shock zone and accordingly, there is no difference between the groups.

      (2) Related to the point above, the author might consider including panels in Figures 4C and D to show the neural activity during the pretraining and conflict training retention period. I assume p(room) will be comparable between the Near and Far segment in both sessions, but the p(room) may be higher in the Conflict training session than the Pretraining session. This would show that the mPFC lesion impairs suppressing the place cell activity encoding the old shock location. 

      Thanks for the suggestion. While we don’t think we can draw any strong conclusions from this analysis we are fine to show it. The issue is that during conflict, the rats have two perfectly reasonable representations of where there was shock, the initial location that was turned off to make the conflict, and the most recent conflict location of shock. Importantly, these recordings are during conflict retention after we turned off the shock for the retention recording (for the second time in the rat’s experience). Turning off the shock allows us to exactly match the physical conditions of pretraining, initial retention and conflict retention, which was the experimental design’s goal. However, the experiential history of the rats prior to initial retention and conflict retention cannot match, because during initial retention the rats had never experienced a changed shock zone whereas, by conflict retention, they had experienced multiple changes. Importantly, we have previously shown that mouse hippocampal ensembles represent both initial and conflict shock locations, as the animals consider their options during conflict trials (see Dvorak et al 2018, PLoS Biol 16:e2003354). Consequently, we cannot make any strong predictions about whether or not hippocampal activity during conflict retention should be room-frame preferring selectively in the vicinity of the current shock zone. As I am sure the reviewer appreciates from their own introspection, mental representations are mercifully not obliged to dictate behavior. In fact, that is what is interesting and controversial about cognitive control – it is a dynamic internal process and the innovation of our work lies in demonstrating that one cannot only rely on behavior to assess this process. Nonetheless, we did this analysis and now present it in the revised Fig. 4. During pretraining both lesion and sham groups express no particular spatially-modulated preference for either the room or the arena frame, as expected. During initial training both groups express a room-frame preference in the vicinity of the shock zone, as we initially reported. By inspection, during conflict, the sham rats express a preference for room-frame activity in the vicinity of the most recent shock zone location; this preference is weaker than what is expressed during initial retention. The lesion rats do not show this preference. These impressions are quantified in revised Fig. 4D; the comparisons within the conflict retention sessions did not reach statistical significance. We leave it to the reader to interpret what that means. Thanks for the nudge.

      (3) The significant group difference in place cell overdispersion during the pretraining phase (Figure 3C) is interesting, but some readers would appreciate additional sentences on its functional implication. Does it mean the spatial tuning of place cells was disrupted by the mPFC lesion?

      Only the reliability of spatial firing was altered, not the spatial tuning.

      (4) Although the method section described how to calculate overdispersion and SFEP, some concise, intuitive descriptions of these measures in the result section would help readers understand these results.

      Overdispersion is better explained. See lines 646-650.

      (5) I recommend adding a figure of the task performance of the rats used in the electrophysiological recording experiment and a table summarizing the number of cells recorded per animal. 

      We have included Table S2 with the cell counts and a summary of the performance for each of the rat in the electrophysiological recording experiment.

      (6) Readers would appreciate additional information on task apparatus, such as the size, appearance, and rotating speed of the arena, as well as stationary cues available in the room. 

      This is now provided in the Methods under “Active place avoidance task”.

      (7) Lines 425-416: "On the fourth day of the behavioral training, the rats had a single trial with the shock on to test retention of the training." Shouldn't it be "shock off"? 

      No the shock was on to prevent extinction learning and to increase the challenge for conflict learning.

    1. eLife Assessment

      The authors provide a useful summary of ten years of Brain Initiative funding including the historical development, the specific funding mechanisms, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed. The evidence for impact is incomplete due to the omission of a comparison group of funded grants.

    2. Reviewer #1 (Public review):

      Summary:

      This is a convincing description of approximately ten years of funding from the NIH BRAIN initiative. It is of particular value at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

      Strengths:

      The paper contains a fair bit of documentation so that the curious reader can actually parse what this BRAIN program funded.

      Weaknesses:

      There are too many acronyms, and the manuscript reads as if it were an internal NIH document, where the audience knows all of the NIH nomenclature and program details. It is not particularly friendly to the outside, lay reader.

    3. Reviewer #2 (Public review):

      Summary:

      The authors provide an important summary of ten years of Brain Initiative funding including a description of the historical development of the initiative, the specific funding mechanisms utilized, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed.

      Strengths:

      This is a useful perspective on an important funding initiative over a ten-year period. It is clearly written and the illustrations and analyses are mostly useful for understanding the impact of the initiative.

      Weaknesses:

      The major limitation is that the bibliographic analysis does not provide a comparison group of funded grants. Because work that successfully competes for funding is likely to be more impactful than all work in a given area, the normalization of citations to field medians may reflect this "grant review" effect, rather than anything special about the Brain Initiative. Hopefully, this speculation is incorrect (I would guess that it is), but it would be helpful to try to demonstrate this more directly by including a funded comparison group.

      There are also minor inconsistencies in the numbering of the figures that need to be cleared up.

    4. Author response:

      eLife Assessment

      The authors provide a useful summary of ten years of Brain Initiative funding including the historical development, the specific funding mechanisms, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed. The evidence for impact is incomplete due to the omission of a comparison group of funded grants.

      In this combined version, we include a comparison group of non-BRAIN Initiative R01s derived from the parent notice of funding opportunity from FY2014-2022. We performed a bibliometric analysis of the publications, citations, RCR and budget productivity measure of the non-BRAIN parent R01. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a convincing description of approximately ten years of funding from the NIH BRAIN initiative. It is of particular value at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

      Strengths:

      The paper contains a fair bit of documentation so that the curious reader can actually parse what this BRAIN program funded.

      Weaknesses:

      There are too many acronyms, and the manuscript reads as if it were an internal NIH document, where the audience knows all of the NIH nomenclature and program details. It is not particularly friendly to the outside, lay reader.

      In this version, we have attempted to minimize acronyms and explain NIH nomenclature and program details to make it more accessible to readers not familiar with NIH terminology.

      Reviewer #2 (Public review):

      Summary:

      The authors provide an important summary of ten years of Brain Initiative funding including a description of the historical development of the initiative, the specific funding mechanisms utilized, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed.

      Strengths:

      This is a useful perspective on an important funding initiative over a ten-year period. It is clearly written and the illustrations and analyses are mostly useful for understanding the impact of the initiative.

      Weaknesses:

      The major limitation is that the bibliographic analysis does not provide a comparison group of funded grants. Because work that successfully competes for funding is likely to be more impactful than all work in a given area, the normalization of citations to field medians may reflect this "grant review" effect, rather than anything special about the Brain Initiative. Hopefully, this speculation is incorrect (I would guess that it is), but it would be helpful to try to demonstrate this more directly by including a funded comparison group.

      In this version, we have provided a comparison group of parent R01s that are not funded through the BRAIN Initiative from FY2014-2022 in Figure 3. We include publication metrics and budget efficiency measures for this comparison group.  

      There are also minor inconsistencies in the numbering of the figures that need to be cleared up.

      We have updated the figure numbers.

    1. eLife Assessment

      The manuscript presents some useful accounts of experiences funding team projects within the BRAIN Initiative. These would be more appropriate to add to the companion manuscript since the present manuscript contains some overlapping analyses and does not stand well on its own. Therefore the evidence supporting the conclusions is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.

      Strengths:

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals.

      Weaknesses:

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1.

    3. Reviewer #2 (Public review):

      Summary:

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact.

      Strengths:

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented.

      Weaknesses:

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success.

    4. Author response:

      eLife Assessment 

      The manuscript presents some useful accounts of experiences funding team projects within the BRAIN Initiative. These would be more appropriate to add to the companion manuscript since the present manuscript contains some overlapping analyses and does not stand well on its own. Therefore the evidence supporting the conclusions is incomplete. 

      We appreciate the feedback on merging both manuscripts into one and have followed the advice in this version. 

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.  

      Strengths: 

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals. 

      Weaknesses: 

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1. 

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.  

      Reviewer #2 (Public review): 

      Summary: 

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact. 

      Strengths: 

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented. 

      Weaknesses: 

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success. 

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

    1. eLife Assessment

      This important computational study investigates homeostatic plasticity mechanisms that neurons may employ to achieve and maintain stable target activity patterns. The work extends previous analyses of calcium-dependent homeostatic mechanisms based on ion channel density by considering activity-dependent shifts in channel activation and inactivation properties that operate on faster and potentially variable timescales. The model simulations demonstrate the potential functional importance of these mechanisms, but the evidence is incomplete and would be strengthened by more in-depth analyses and explicit exposition.

    2. Reviewer #1 (Public review):

      This computational study builds on a previous study (Liu et al) from the Marder lab from 1998, where a model was proposed that demonstrated activity-dependent homeostatic recovery of activity in individual bursting neurons, based on three "sensors" of intrinsic calcium concentration. The original model modified levels of ion channel conductances. The current model builds on that and adds activity-dependent modifications of the voltage-dependence of these ionic currents, implemented to happen concurrently with maximum conductance levels, but at a different timescale. The faster timescale change in voltage dependence is justified by the assumption that such changes can occur by neuromodulatory chemicals or similar second messenger-based mechanisms that presumably act at a faster rate than the regulation of channel densities. The main finding is that the difference in timescales between the two homeostatic mechanisms (channel density vs. voltage dependence) could result in distinct subsets of parameters, depending on how fast the second messenger mechanisms operate.

      This study is an interesting and noteworthy extension of the theoretical ideas proposed by the classic study of Liu et al, 1998. It addresses a very important question: How do two known mechanisms of modifications of neuronal activity that occur at different timescales interact within an activity-dependent homeostatic framework? However, the study and its presentation have some major shortcomings that should be addressed to strengthen the claim.

      Major comments:

      (1) The main issue that I have with this study is the lack of exploration of "why" the model produces the results it does. Considering this is a model, it should be possible to find out why the three timescales of half-act/inact parameter modifications lead to different sets of results. Without this, it is simply an exploratory exercise. (The model does this, but we do not know the mechanism.) Perhaps this is enough as an interesting finding, but it remains unconvincing and (clearly) does not have the impact of describing a potential mechanism that could be potentially explored experimentally.

      (2) A related issue is the use of bootstrapping to do statistics for a family of models, especially when the question is in fact the width of the distribution of output attributes. I don't buy this. One can run enough models to find say N number of models within a tight range (say 2% cycle period) and the same N number within a loose range (say 20%) and compare the statistics within the two groups with the same N.

      (3) The third issue is that many of the results that are presented (but not the main one) are completely expected. If one starts with gmax values that would never work (say all of them 0), then it doesn't matter how much one moves the act/inact curves one probably won't get the desired activity. Alternately, if one starts with gmax values that are known to work and randomizes the act/inact midpoints, then the expectation would be that it converges to something that works. This is Figure 1 B and C, no surprise. But it should work the other way around too. If one starts with random act/inact curves that would never work and fixes those, then why would one expect any set of gmax values would produce the desired response? I can easily imagine setting the half-act/inact values to values that never produce any activity with any gmax.

      (4) A potential response to my previous criticism would be that you put reasonable constraints on gmax's or half-act/inact values or tie the half-act to half-inact. But that is simply arbitrary ad hoc decisions made to make the model work, much like the L8-norm used to amplify some errors. There is absolutely no reason to believe this is tied to the biology of the system.

      (5) The discussion of this manuscript is at once too long and not adequate. It goes into excruciating detail about things that are simply not explored in this study, such as phosphorylation mechanisms, justification of model assumptions of how these alterations occur, or even the biological relevance. (The whole model is an oversimplification - lack of anatomical structure, three calcium sensors, arbitrary assumptions, and how parameter bounds are implemented.) Lengthy justifications for why channel density & half-act/inact of all currents are obeying the same time constant are answering a question that no one asked. It is a simplified model to make an important point. The authors should make these parts concise and to the point. More importantly, the authors should discuss the mechanism through which these differences may arise. Even if it is not clear, they should speculate.

      (6) There should be some justification or discussion of the arbitrary assumptions made in the model/methods. I understand some of this is to resolve issues that had come up in previous iterations of this approach and in fact the Alonso et al, 2023 paper was mainly to deal with these issues. However, some level of explanation is needed, especially when assumptions are made simply because of the intuition of the modeler rather than the existence of a biological constraint or any other objective measure.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach the target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, the main conclusions of the paper and the insights brought by this computational work are difficult to grasp.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The writing is rather confusing, and the state of the art explaining the need for the study is unclear.

      (2) The main outcomes and conclusions of the study are difficult to grasp. What is predicted or explained by this new version of homeostatic regulation of neuronal activity?

    4. Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement activity-dependent changes in ion channel conductance to support homeostatic plasticity. While changes in the voltage-dependent properties of ion channels are known to modulate neuronal excitability, their role as a homeostatic plasticity mechanism interacting with channel conductance has been largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage-dependent properties can interact with plasticity in channel conductance to allow neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. These results also show that the rate of channel voltage-dependent shifts can influence steady-state parameters reached as the model stabilizes into a stable intrinsic bursting state. That is, the rate of these modifications shapes the range of channel conductances and half-(in)activation parameters as well as activity characteristics such as burst period and duration. A major conclusion of the study is that altering the timescale of channel voltage dependence can seamlessly shift a neuron's activity characteristics, a mechanism that the authors argue may be employed by neurons to adapt to perturbations. While the study's conclusions are mostly well-supported, additional analyses, and simulations are needed.

      (1) A main conclusion of this study is that the speed at which (in)activation dynamics change determines the range of possible electrical patterns. The authors propose that neurons may dynamically regulate the timescale of these changes (a) to achieve alterations in electrical activity patterns, for example, to preserve the relative phase of neuronal firing in a rhythmic network, and (b) to adapt to perturbations. The results presented in Figure 4 clearly demonstrate that the timescale of (in)activation modifications impacts the range of activity patterns generated by the model as it transitions from an initial state of no activity to a final steady-state intrinsic burster. This may have important implications for neuronal development, as discussed by the authors.

      However, the authors also argue that the model neuron's dynamics - such as period, and burst duration, etc - could be dynamically modified by altering the timescale of (in)activation changes (Figure 6 and related text). The simulations presented here, however, do not test whether modifications in this timescale can shift the model's activity features once it reaches steady state. In fact, it is unlikely that this would be the case since, at steady-state, calcium targets are already satisfied. It is likely, however, as the authors suggest, that the rate at which (in)activation dynamics change may be important for neuronal adaptation to perturbations, such as changes in temperature or extracellular potassium. Yet, the results presented here do not examine how modifying this timescale influences the model's response to perturbations. Adding simulations to characterize how alterations in the rate of (in)activation dynamics affect the model's response to perturbations-such as transiently elevated extracellular potassium (Figure 5) - would strengthen this conclusion.

      (2) Another key argument in this study is that small, coordinated changes in channel (in)activation contribute to shaping neuronal activity patterns, but that, these subtle effects may be obscured when averaging across a population of neurons. This may be the case; however, the results presented don't clearly demonstrate this point. This point would be strengthened by identifying correlations, if they exist, between (in)activation curves, conductance, and the resulting bursting patterns of the models for the simulations presented in Figure 2 and Figure 4, for example. Alternatively, or additionally, relationships between (in)activation curves could be probed by perturbing individual (in)activation curves and quantifying how the other model parameters compensate, which could clearly illustrate this point.

    5. Author response:

      We thank the reviewers for their detailed and constructive comments on our manuscript entitled “Activity-Dependent Changes in Ion Channel Voltage-Dependence Influence the Activity Patterns Targeted by Neurons.” We appreciate the time and effort the reviewers invested in critiquing our work and are grateful for the opportunity to clarify and improve our manuscript.

      As noted by the reviewers, the main message of the manuscript is that the intrinsic properties and activity characteristics of targeted bursters depend on the timescale of half-(in)activation alterations in the homeostatic mechanism. However, the concerns of the reviewers reveal that the manuscript is organized in ways that detract from this message. Below we respond to the points the reviewers raise and close by outlining the changes that we will make to the manuscript as a result. Our goal will be to streamline the message of the paper while addressing the concerns of the reviewers.

      Response to Reviewer #1:

      Point 1: We interpret the reviewer’s question about “mechanism” to be: why do half-(in)activation alterations redirect degenerate bursters to different parameter regions? (A separate aspect of “mechanism,” namely how these alterations might be biologically implemented, is already addressed in the paper.)

      We speculate that Figure 3 illustrates this process. As conductance densities slowly evolve, rapid half-(in)activation changes cause the sensor variable (α) to jump abruptly as it searches for a voltage-dependence configuration that meets calcium targets (Figure 3A). The channel densities are slightly altered and this process continues again. Slowing the half-(in)activations alterations reduces these abrupt fluctuations (Figure 3B). Making the alterations infinitely slow effectively removes half-(in)activation changes altogether, leaving the system reliant solely on slower alterations in maximal conductances (Figure 3C). Because each timescale of half-(in)activation produces a different channel repertoire at each time step, the neuron follows distinct trajectories through the space of activity characteristics and intrinsic properties over the long term.

      Point 2: We appreciate the reviewer’s skepticism regarding our statistical approach with the “Group of 5” and “Group of 20.” These groups arose from historical aspects of our analysis and this analysis does not directly advance the main point—that changes in the timescale of channel voltage-dependence alterations impact the properties of bursters to which the homeostatic mechanism converges. Therefore, we plan to remove the references to the Group of 5 and focus on how the Group of 20 responds to variations in the timescale of voltage-dependent alterations.

      Point 3: Our paper claims that the half-(in)activation mechanism is subordinate to the maximal conductance mechanism. We agree with the reviewer that making this claim requires more care. The simulations we run are controls in the spirit described below.

      The reviewer notes that in our simulations, half-(in)activations are already near the range required for bursting, which forces maximal conductances to undergo larger changes and thus appear more critical. We however note that the opposite can also occur: if half-(in)activation values were already positioned in ranges required for bursting, an arrangement of small maximal conductances may potentially produce bursting. The latter might give the impression that maximal conductance alterations and half-(in)activation alterations are equally important. The simulations we ran are simply suggested this wasn’t true for these models.

      Points 4 - 6: In Point 4, the reviewer highlights model choices (e.g., constraints on maximal conductance and half-(in)activation, use of the L8 norm) are not clearly justified. In Point 5, the reviewer suggests that the paper provides excessive detail about other model choices. Point 6 appears to reiterate concerns about insufficient justification for some modeling decisions.

      Our intent was to acknowledge every caveat, which led us to include long section on Model Assumptions in the Discussion. However, as Point 5 notes, this makes the Discussion cumbersome. The Discussion should focus on remarks regarding the impact that timescale of half-(in)activation alterations have on the family of bursters targeted by the homeostatic mechanism. Consequently, we will relocate the extended discussion of model assumptions from the Discussion to the Methods section. This section already touches on how the constraints on half-(in)activation alterations compare to earlier versions of the model (noted in Point 6) and will be expanded to further explain our choice of the L8 norm (Point 4).

      Response to Reviewer #2:

      Weakness 1: The reviewer notes that the writing is “rather confusing.” This likely arises from the fact that we did not consistently emphasize the core message: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by the homeostatic mechanism. We will address this by reorganizing the manuscript to make that focus clearer, and we outline these planned revisions at the end of these responses.

      The reviewer specifically points out that the state-of-the-art is not clearly articulated. We will reorganize the Introduction to highlight this. Briefly, work on activity-dependent homeostasis has historically focused on changes in channel density. This is supported by experiment and has been modelled theoretically. In comparison, changes in channel voltage-dependence, while documented, are less explored due to the challenges of measuring them. In this work, we attempt to study the impact that alterations in channel voltage-dependence have on activity-dependent homeostasis. To do this, we extend existing computational models of activity-dependent homeostasis—models that have hitherto only altered channel density—by incorporating a mechanism that also adjusts channel voltage-dependence.

      Weakness 2: The Discussion highlights two potential implications of our findings—one for neuronal development and another for activity recovery following perturbations. However, they were outlined after the Model Assumptions section which, as Reviewer 1 points out, is quite detailed and cumbersome.

      Another aspect that may contribute to the challenge in interpreting our results may be our conceptual approach to neuronal excitability, which relies on a computational model of activity-dependent homeostasis that abstracts much of the underlying biochemistry. Our message is general: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by a homeostatic mechanism. As such, the implications are general. Their value lies in circumscribing a conceptual framework from which experimentalists may devise and test new hypotheses. We do not aim to predict or explain any specific phenomenon in this work. To address this concern however, we will expand our discussion of how these findings may guide experimental considerations, particularly regarding neuronal development and activity recovery during perturbations, to better illustrate the practical utility of our results.

      Response to Reviewer #3:

      Point 1: This reviewer suggests that our core message—namely, that the timescale of half-(in)activation alterations affects the intrinsic properties and activity patterns targeted by a homeostatic mechanism—should also apply during perturbations. We plan to address this by extending our analysis on the Group of 20 models. We will perturb activity by increasing extracellular potassium concentration and change the timescale of half-(in)activation alterations during the perturbation. This should underscore how the neuron’s stabilized activity pattern depends on this timescale, reinforcing our central message.

      Point 2: In this part of the Discussion, we noted that multiple half-activation shifts collectively shape the neuron’s global properties, and that averaging might obscure these effects. However, in light of the reviewers’ comments, we recognize that this observation alone does not directly advance the paper’s main message. To make it relevant, we would need to (1) identify correlations between intrinsic parameters (i.e., half-(in)activation and maximal conductance) and the resulting activity patterns, and (2) examine how these correlations shift under different timescales half-(in)activation alterations. Since we have not performed that analysis, we will revise this part of the Discussion to clarify its connection to the paper’s principal focus by noting that a deeper exploration of this notion using correlations will be the topic of future work.

      Conclusion: We outline updates we will make to the paper here.

      Introduction: In response to Reviewer 2, we will provide a clearer explanation of the state-of-the-art in activity-dependent homeostasis and highlight our specific contribution. We will emphasize that our conclusions, while generic, are relevant in experimental contexts.

      Results: We will reorganize this section to underscore the main point: the timescale of half-(in)activation alterations affects the intrinsic properties and activity characteristics of bursters in the homeostatic mechanism. Figures 1 will remain as it is. It shows assembly from random initial conditions and explain that for these simulations we must always consider the half-(in)activation mechanism with a mechanism that alters maximal conductances as the half-(in)activation alterations alone cannot form bursters. Figure 2 will remain as is, but we will remove any discussion of the “Group of 5,” addressing Reviewer 1’s feedback. What is presently Figure 4 will then follow, illustrating how timescale differences shape the properties of 20 degenerate solutions. We then present Figure 3 to address Reviewer 1’s critique on mechanism. Here we will explain how different timescales of half-(in)activation alteration cause the homeostatic mechanism to update channel properties differently, leading to distinct trajectories through the space of intrinsic properties and activity characteristics (as described in the response of Point 1 of Reviewer 1’s feedback). Finally, following Point 1 of Reviewer 3, we will add a new figure highlighting the role of half-(in)activation timescale during perturbation.

      Discussion: To streamline the Discussion, the “Model Assumptions” section will be moved to Methods. In line with Point 2 of Reviewer 3, we will clarify how the concept of "small half-(in)activation shifts lead to global changes in neuronal properties" aligns with our core message. Additionally, following Reviewer 2’s comments, we will expand our discussion of implications by including how experimentalists might use our findings to inform studies on perturbations and development.

      Methods: We will expand “Model Assumptions” to explain in more detail why we chose the L8 norm.

    1. eLife Assessment

      This fundamental study concerns a model for transgenerational epigenetic inheritance, the learned avoidance by C. elegans of the PA14 pathogenic strain of Pseudomonas aeruginosa. The authors test the impact of procedural alterations made in another study, by Gainey et al., which claimed that transgenerational inheritance in this paradigm lacks robustness, despite this observation having been reported in multiple papers from the Murphy lab. The authors of the present study show that by following a non-standard avoidance protocol, Gainey et al. likely biased their measurements in a way that made it hard to observe learned avoidance. The authors also highlight the importance of bacterial growth conditions, showing that expression of the trigger molecule, the bacterial P11 RNA, which is necessary and sufficient to drive the transgenerational inheritance of the avoidance phenotype, is influenced by temperature. As expression of P11 was not verified by Gainey et al., this provides another explanation for the inability to observe transgenerational epigenetic inheritance. Together, the authors provide compelling and powerful arguments that the original phenomenon is robust and that it can be reproduced in the Murphy lab by following their original protocol precisely, including the use of azide to immobilize the worms at the food source. Overall, this study not only provides guidance for investigators in this experimental paradigm, but it also provides additional understanding of the differences between naïve preference, learned preference, and transgenerational epigenetic inheritance. The present study is therefore of broad interest to anyone studying genetics, epigenetics, or learned behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript from Kaletsky et al is a response to a paper recently published by Craig Hunter's group (Gainey et al 2024). The Murphy lab has previously shown that learned avoidance of C. elegans to PA14 can be transmitted through four generations. In a series of detailed studies, they defined the mechanism of this transgenerational epigenetic inheritance (TEI), identifying both PA14 and C. elegans factors required for this effect (Moore et al., 2019, Kaletsky et al., 2020; Moore et al., 2021). PA14 produces a small RNA, P11, that is necessary and sufficient for transgenerational epigenetic inheritance of avoidance behaviour in C. elegans. In the worm, P11 decreases maco-1 expression, which in turn regulates daf-7.

      In the study by Gainey et al (eLife 2024), the authors report their attempt at replicating the original findings of the Murphy lab using a modified experimental setup. The Gainey study observed avoidance of PA14 and upregulation of daf-7::GFP in the F1 progeny of trained parents, but not in subsequent generations. Importantly, although they examined a number of different deviations of the protocol, they did not repeat the original experiment using the exact protocol outlined in the Moore or Kaletsky papers. Nevertheless, the authors concluded that "this example of TEI is insufficiently robust for experimental investigations".

      The manuscript by Kaletsky et al. attempts to provide an explanation as to why Gainey et al., were unable to observe transgenerational avoidance of PA14. They identify two discrepancies in the methodology used between the two studies and examine the possible impacts of these.

      One of the primary differences in protocols between the two papers is how avoidance is measured. The Murphy group uses the traditional method of adding azide to bacterial spots on the choice plates to trap worms once they have come close to the food spot. The animals are on the plate for 1 hour but most have likely been immobilized before this time point. Gainey et al. omit the azide and instead shift animals to 4C after 30-60 minutes of exposure to immobilize the worms for counting. Kaletsky et al show that the choice of assay has a significant impact on measuring attraction and avoidance.

      While Gainey et al., assert that the addition of azide had no discernable effect on the choice assay results, these data are not shown in their paper. Kaletsky et al. test these conditions head-to-head with the same 1 hour exposure time, showing that with azide, the initial response to PA14 in untrained worms is attraction. By contrast, in the absence of azide, when cold temperature is used to immobilize the worms , the response recorded is aversion to PA14. The choice assay generated by Kaletsky et al without azide is consistent with the choice assays in untrained worms shown in the Gainey paper, demonstrating that this is likely one factor that contributed to the different outcomes reported in the Gainey paper.

      Kaletsky et al. propose that learned aversion to PA14 may be occurring within the 1-hour exposure time when worms are not trapped in their initial decision with the use of azide. This is consistent with previous findings from another group (Ooi and Prahlad 2017), showing that 45 minutes of exposure is sufficient to overcome the attraction to PA14 and shift to avoidance of PA14. Importantly, the Gainey paper notes exposure times between 30 and 60 minutes before shifting worms to 4C to count, this window may have generated additional variability between assays.

      The second possibility explored by Kaletsky et al. is that the expression of P11 differed between the studies. Because P11 is required for TEI, differences in P11 expression is a reasonable explanation for different observations between studies. Unfortunately, in the Gainey study, P11 levels were not measured; it is therefore not possible to know whether low or absent levels of P11 explain the inability to observe TEI. Nevertheless, Kaletsky et al. test the potential for changes in one growth condition, temperature, to influence the production P11. Indeed, the expression of P11 differs in PA14 grown at different growth temperatures, providing an additional explanation for the discrepancies.

      While it is possible that temperature is the culprit, it may be another culture condition or media component suppressing P11 expression. Nevertheless, the fact that expression of P11 can so easily be modified demonstrates that P11 expression is not immune to differences in culture conditions. Given its role in nitrogen fixation, I would be surprised if it was not regulated by environmental conditions. Differences in iron content between media batches are notorious for altering bacteria phenotypes. Although outside the scope of this study, with the connection to biofilm formation, I would be curious if iron levels had an impact on P11 expression. All in all, the data highlight the fact that P11 levels should be measured if TEI is not seen.

      Strengths:

      Overall, this is an excellent study that has provided additional understanding of the difference between naïve preference and TEI and provides guidance for investigators in replicating TEI experiments. The manuscript is very well written and provides additional understanding regarding the replication of TEI in response to P. aeruginosa.

      The manuscript provides an important discussion about differences in methodology and how they might reflect specific biology. Many examples of experimental deviations that have large impacts have simple biological explanations. I believe the authors have done an excellent job making this point.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public review):

      In addition to the study by Kaletsky et al. (2025), I read the bioRxiv and eLife versions, as well as the eLife reviewer comments, for Gainey et al. (2024), to which Kaletsky et al. respond.

      Kaletsky et al. provide detailed, rigorous, and reproducible protocols and results. The authors point out the critical methods that the Hunter group failed to follow/confirm (e.g. azide to paralyze animals during pathogenic learning/memory assays; the expression of the P11 small RNA that is both necessary and sufficient for TEI of avoidance behavior; a single condition for training - PA14 grown on plates at 25°C and training at 20°C for 24 hr - that the Hunter lab did not follow and could not reproduce). The Kaletsky et al. response is evidence-based, fair, level-headed and unbiased, which is in contrast to the Gainey et al. paper.

      Reading the eLife review of Gainey et al., I note that the reviewers repeatedly pointed out that authors did not follow published protocols by the Murphy lab.

      Public response by Gainey et al. to Reviewer 2: "It remains possible that we misunderstood the published Murphy lab protocols, but we were highly motivated to replicate the results so we could use these assays to investigate the reported RNAi-pathway dependent steps, thus we read every published version with extreme care."

      Public response by Gainey et al. to Reviewer 3: "We agree that our study was not exhaustive in our exploration of variables that might be interfering with our ability to detect F2 avoidance."

      Gainey et al. provide reasons/excuses for why they did not follow published methods - notably their subjective decision to exclude the paralyzing agent sodium azide from their choice assays, but their abstract reads "We conclude that this example of transgenerational inheritance lacks robustness." I strongly disagree with this conclusion.

    4. Reviewer #3 (Public review):

      A recent bioRxiv paper from Craig Hunter's lab (Gainey et al. 2024) puts into question several manuscripts that report that pathogen avoidance by the nematode C. elegans to the pathogenic bacteria, Pseudomonas aeruginosa, for several generations after initial exposure is not robust nor repeatable. From the Hunter lab publication, the authors tried to eliminate genetic drift of the pathogenic bacterial strains and C. elegans, as well as several experimental conditions, including assay temperature conditions and the effect of light.

      The papers (Moore et al. 2019, Kaletsky et al. 2020, Moore et al. 2021 and Sengupta et al. 2024) that the Gainey et al. manuscript brings into question discovered that Pseudomonas aeruginosa can produce a small RNA (sRNA), P11, that is necessary and sufficient for pathogen avoidance of the future generation of C. elegans (up to F4 generation). The Gainey et al. manuscript does not assess the status of P11 production in their work.

      Here, the Murphy group has made several new discoveries that highlight the differences with the work performed in the Hunter lab. One, the assay used to test attraction and avoidance of C. elegans for pathogenic bacteria differs amongst the two groups. In the Murphy lab papers, and many others in this field, the assay is established whereby worms can decide between spots of non-pathogenic bacteria (E. coli) or pathogenic (P. aeruginosa) on a single plate separated by a few centimeters. Also included in each spot is an aliquot of NaN3 to freeze the animals upon entry into their first bacterial choice. C. elegans will initially choose the pathogenic bacteria as its first choice and then learn to avoid the pathogenic spot thereafter. Therefore, establishing this first baseline attraction point is essential for determining future avoidance events. The Hunter lab did not use NaN3 and instead relied upon moving plates to 4°C to slow the worm's movements to count the population. Furthermore, the Hunter lab allowed the "choice" to proceed for an hour before moving to 4°C, making capture of the initial attraction phase of the choice assay difficult to discern since the worms could move freely from their initial choice due to the lack of the paralyzing NaN3.

      The second major advance that the Murphy group has found is that the growth of P. aeruginosa prior to being used for the choice assay is critical. Growth on plates at 25°C, but not 20°C on plates or in liquid at 37°C, can produce the transgenerational inheritance of pathogen avoidance. Interestingly, P11 is only produced by P. aeruginosa at 25°C grown on plates. The Hunter group grew the Pseudomonas bacteria at 37°C in liquid with gentle shaking and then spotted onto assay plates followed by growth for 2 days at 25°C and then equilibrated to room temperature before the choice assay. The Hunter lab did not check the status of P11 production in any of their experiments.

      The results from the Murphy group are solid and they go on to find genetic requirements in C. elegans required for the transgenerational response to P. aeruginosa and P11. Furthermore, they repeat their results with additional members of the Pseudomonas clade and find the same transgenerational avoidance response and new sRNAs responsible for the avoidance response to the newly tested Pseudomonas members.

      Overall, the discrepancies between the Hunter work and the numerous papers for the Murphy group would tend to complicate this area of research. However, this eLife paper plainly illustrates the straightforward nature of the experimental setup and reconfirms the necessary and sufficient nature of P11 in orchestrating the multigenerational response to pathogenic Pseudomonas. It appears that ensuring the production of P11 from the Pseudomonas culture and ensuring that the assay captures the initial bacterial choice are essential to observe the transgenerational inheritance of the avoidance phenotype.

    1. eLife Assessment

      This valuable study combines real-time keypoint tracking with transdermal activation of sensory neurons to investigate sensory neuron recruitment in freely moving mice, and builds on the authors' prior work in stationary mice. The evidence supporting the utility of the system is solid, although a more thorough classification of the behavioral responses to nociceptor stimulation would strengthen the work. Importantly, future analyses could include other cutaneous sensory neuron subtypes, and could also be adapted for studying more complex behaviors. The work will be of interest to sensory biologists and pain researchers.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a system for delivering precisely controlled cutaneous stimuli to freely moving mice by coupling markerless real-time tracking to transdermal optogenetic stimulation, using the tracking signal to direct a laser via galvanometer mirrors. The principal claims are that the system achieves sub-mm targeting accuracy with a latency of <100 ms. The nature of mouse gait enables accurate targeting of forepaws even when mice are moving.

      Strengths:

      The study is of high quality and the evidence for the claims is convincing. There is increasing focus in neurobiology in studying neural function in freely moving animals, engaged in natural behaviour. However, a substantial challenge is how to deliver controlled stimuli to sense organs under such conditions. The system presented here constitutes notable progress towards such experiments in the somatosensory system and is, in my view, a highly significant development that will be of interest to a broad readership.

      Weaknesses:

      (1) "laser spot size was set to 2.00 } 0.08 mm2 diameter (coefficient of variation = 3.85)" is unclear. Is the 0.08 SD or SEM? (not stated). Also, is this systematic variation across the arena (or something else)? Readers will want to know how much the spot size varies across the arena - ie SD. CV=4 implies that SD~7 mm. ie non-trivial variation in spot size, implying substantial differences in power delivery (and hence stimulus intensity) when the mouse is in different locations. If I misunderstood, perhaps this helps the authors to clarify. Similarly, it would be informative to have mean & SD (or mean & CV) for power and power density. In future refinements of the system, would it be possible/useful to vary laser power according to arena location?

      (2) "The video resolution (1920 x 1200) required a processing time higher than the frame interval (33.33 ms), resulting in real-time pose estimation on a sub-sample of all frames recorded". Given this, how was it possible to achieve 84 ms latency? An important issue for closed-loop research will relate to such delays. Therefore please explain in more depth and (in Discussion) comment on how the latency of the current system might be improved/generalised. For example, although the current system works well for paws it would seem to be less suited to body parts such as the snout that do not naturally have a stationary period during the gait cycle.

    3. Reviewer #2 (Public review):

      Parkes et al. combined real-time keypoint tracking with transdermal activation of sensory neurons to examine the effects of recruitment of sensory neurons in freely moving mice. This builds on the authors' previous investigations involving transdermal stimulation of sensory neurons in stationary mice. They illustrate multiple scenarios in which their engineering improvements enable more sophisticated behavioral assessments, including (1) stimulation of animals in multiple states in large arenas, (2) multi-animal nociceptive behavior screening through thermal and optogenetic activation, and (3) stimulation of animals running through maze corridors. Overall, the experiments and the methodology, in particular, are written clearly. However, there are multiple concerns and opportunities to fully describe their newfound capabilities that, if addressed, would make it more likely for the community to adopt this methodology:

      The characterization of laser spot size and power density is reported as a coefficient of variation, in which a value of ~3 is interpreted as uniform. My interpretation would differ - data spread so that the standard deviation is three times larger than the mean indicates there is substantial variability in the data. The 2D polynomial fit is shown in Figure 2 - Figure Supplement 1A and, if the fit is good, this does support the uniformity claim (range of spot size is 1.97 to 2.08 mm2 and range of power densities is 66.60 to 73.80 mW). The inclusion of the raw data for these measurements and an estimate of the goodness of fit to the polynomials would better help the reader evaluate whether these parameters are uniform across space and how stable the power density is across repeated stimulations of the same location. Even more helpful would be an estimate of whether the variation in the power density is expected to meaningfully affect the responses of ChR2-expressing sensory neurons.

      While the error between the keypoint and laser spot error was reported as ~0.7 to 0.8 mm MAE in Figure 2L, in the methods, the authors report that there is an additional error between predicted keypoints and ground-truth labeling of 1.36 mm MAE during real-time tracking. This suggests that the overall error is not submillimeter, as claimed by the authors, but rather on the order of 1.5 - 2.5 mm, which is considerable given the width of a hind paw is ~5-6 mm and fore paws are even smaller. In my opinion, the claim for submillimeter precision should be softened and the authors should consider that the area of the paw stimulated may differ from trial to trial if, for example, the error is substantial enough that the spot overlaps with the edge of the paw.

      As the major advance of this paper is the ability to stimulate animals during ongoing movement, it seems that the Figure 3 experiment misses an opportunity to evaluate state-dependent whole-body reactions to nociceptor activation. How does the behavioral response relate to the animal's activity just prior to stimulation?

      Given the characterization of full-body responses to activation of TrpV1 sensory neurons in Figure 4 and in the authors' previous work, stimulation of TrpV1 sensory neurons has surprisingly subtle effects as the mice run through the alternating T maze. The authors indicate that the mice are moving quickly and thus that precise targeting is required, but no evidence is shared about the precision of targeting in this context beyond images of four trials. From the characterization in Figure 2, at max speed (reported at 241 +/- 53 mm/s, which is faster than the high speeds in Figure 2), successful targeting occurs less than 50% of the time. Is the initial characterization consistent with the accuracy in this context? To what extent does inaccuracy in targeting contribute to the subtlety of affecting trajectory coherence and speed? Is there a relationship between animal speed and disruption of the trajectory?

    4. Reviewer #3 (Public review):

      Summary:

      To explore the diverse nature of somatosensation, Parkes et al. established and characterized a system for precise cutaneous stimulation of mice as they walk and run in naturalistic settings. This paper provides a framework for real-time body part tracking and targeted optical stimuli with high precision, ensuring reliable and consistent cutaneous stimulation. It can be adapted in somatosensation labs as a general technique to explore somatosensory stimulation and its impact on behavior, enabling rigorous investigation of behaviors that were previously difficult or impossible to study.

      Strengths:

      The authors characterized the closed-loop system to ensure that it is optically precise and can precisely target moving mice. The integration of accurate and consistent optogenetic stimulation of the cutaneous afferents allows systematic investigation of somatosensory subtypes during a variety of naturalistic behaviors. Although this study focused on nociceptors innervating the skin (Trpv1::ChR2 animals), this setup can be extended to other cutaneous sensory neuron subtypes, such as low-threshold mechanoreceptors and pruriceptors. This system can also be adapted for studying more complex behaviors, such as the maze assay and goal-directed movements.

      Weaknesses:

      Although the paper has strengths, its weakness is that some behavioral outputs could be analyzed in more detail to reveal different types of responses to painful cutaneous stimuli. For example, paw withdrawals were detected after optogenetically stimulating the paw (Figures 3E and 3F). Animals exhibit different types of responses to painful stimuli on the hind paw in standard pain assays, such as paw lifting, biting, and flicking, each indicating a different level of pain. Improving the behavioral readouts from body part tracking would greatly strengthen this system by providing deeper insights into the role of somatosensation in naturalistic behaviors. Additionally, if the laser spot size could be reduced to a diameter of 2 mm², it would allow the activation of a smaller number of cutaneous afferents, or even a single one, across different skin types in the paw, such as glabrous or hairy skin.

  2. Mar 2025
    1. eLife Assessment

      This important study conducted experiments to quantify how neural activity independent changes in fluorescence might affect two-photon recordings when using diverse sensors. The researchers found a widespread presence of neural-activity-independent artifacts in two-photon imaging and provide convincing evidence that these artifacts are most likely caused by hemodynamic occlusion. Their findings underscore the importance of accounting for these artifacts when interpreting functional two-photon recordings.

    2. Reviewer #1 (Public review):

      Summary:

      Fluorescence imaging has become an increasingly popular technique for monitoring neuronal activity and neurotransmitter concentrations in the living brain. However, factors such as brain motion and changes in blood flow and oxygenation can introduce significant artifacts, particularly when activity-dependent signals are small. Yogesh et al. quantified these effects using GFP, an activity-independent marker, under two-photon and wide-field imaging conditions in awake behaving mice. They report significant GFP responses across various brain regions, layers, and behavioral contexts, with magnitudes comparable to those of commonly used activity sensors. These data highlight the need for robust control strategies and careful interpretation of fluorescence functional imaging data.

      Strengths:

      The effect of hemodynamic occlusion in two-photon imaging has been previously demonstrated in sparsely labeled neurons in V1 of anesthetized animals (see Shen and Kara et al., Nature Methods, 2012). The present study builds on these findings by imaging a substantially larger population of neurons in awake, behaving mice across multiple cortical regions, layers, and stimulus conditions. The experiments are extensive, the statistical analyses are rigorous, and the results convincingly demonstrate significant GFP responses that must be accounted for in functional imaging experiments.

      In the revised version, the authors have provided further methodological details that were lacking in the previous version, expanded discussions regarding alternative explanations of these GFP responses as well as potential mitigation strategies. They also added a quantification of brain motion (Fig. S5) and the fraction of responsive neurons when conducting the same experiment using GCaMP6f (Fig. 3D-3F), among other additional information.

      Weaknesses:

      (1) The authors have now included a detailed methodology for blood vessel area quantification, where they detect blood vessels as dark holes in GFP images and measure vessel area by counting pixels below a given intensity threshold (line 437-443). However, this approach has a critical caveat: any unspecific decrease in image fluorescence will increase the number of pixels below the threshold, leading to an apparent increase in blood vessel area, even when the actual vessel size remains unchanged. As a result, this method inherently introduces a positive correlation between fluorescence decrease and vessel dilation, regardless of whether such a relationship truly exists.

      To address this issue, I recommend labelling blood vessels with an independent marker, such as a red fluorescence dye injected into the bloodstream. This approach would allow vessel dilation to be assessed independently of GFP fluorescence -- dilation would cause opposite fluorescence changes in the green and red channels (i.e., a decrease in green due to hemodynamic occlusion and an increase in red due to the expanding vessel area). In my opinion, only when such ani-correlation is observed can one reliably infer a relationship between GFP signal changes and blood vessel dynamics.

      Because this relationship is central to the author's conclusion regarding the nature of the observed GFP signals, including this experiment would greatly strengthen the paper's conclusion.

      (2) Regarding mitigation strategy, the authors advocate repeating key functional imaging experiments using GFP, and state that their aim here is to provide a control for their 2012 study (Keller et al., Neuron). Given this goal, I find it important to discuss how these new findings impact the interpretation of their 2012 results, particularly given the large GFP responses observed.

      For example, Keller et al. (2012) concluded that visuomotor mismatch strongly drives V1 activity (Fig. 3A in that study). However, in the present study, mismatch fails to produce any hemodynamic/GFP response (Fig. 3A, 3B, rightmost bar), and the corresponding calcium response is also the weakest among the three tested conditions (Fig. 3D). How do these findings affect their 2012 conclusions?

      Similarly, the present study shows that GFP reveals twice as many responsive neurons as GCaMP during locomotion (Fig. 3A vs. Fig. 3D, "running"). Does this mean that their 2012 conclusions regarding locomotion-induced calcium activity need reconsideration? Given that more neurons responded with GFP than with GCaMP, the authors should clarify whether they still consider GCaMP a reliable tool for measuring brain activity during locomotion.

      (3) More generally, the author should discuss how functional imaging data should be interpreted going forward, given the large GFP responses reported here. Even when key experiments are repeated using GFP, it is not entirely clear how one could reliably estimate underlying neuronal activity from the observed GFP and GCaMP responses.

      For example, consider the results in Fig. 3A vs. 3D: how should one assess the relative strength of neuronal activity elicited by running, grating, or visuomotor mismatch? Does mismatch produce the strongest neuronal activity, since it is least affected by the hemodynamic/GFP confounds (Fig. 3A)? Or does mismatch actually produce the weakest neuronal activity, given that both its hemodynamic and calcium responses are the smallest?

      In my opinion, such uncertainty makes it difficult to robustly interpret functional imaging results. Simply repeating experiments with GFP does not fully resolve this issue, as it does not provide a clear framework for quantifying the underlying neuronal activity. Does this suggest a need for a better mitigation strategy? What could these strategies be?

      In my opinion, addressing these questions is critical not only for the authors' own work but also for the broader field to ensure a robust and reliable interpretation of functional imaging data.

      (4) The authors now discuss various alternative sources of the observed GFP signals. However, I feel that they often appear to dismiss these possibilities too quickly, rather than appreciating their true potential impacts (see below).

      For example, the authors argue that brain movement cannot explain their data, as movement should only result in a decrease in observed fluorescence. However, while this might hold for x-y motion, movement in the axial (z) direction can easily lead to both fluorescence increase and decrease. Neurons are not always precisely located at the focal plane -- some are slightly above or below. Axial movement in a given direction will bring some cells into focus while moving others out of focus, leading to fluorescence changes in both directions, exactly as observed in the data (see Fig. S2).

      Furthermore, the authors state that they discard data with 'visible' z-motion. However, subtle axial movements that escape visual detection could still cause fluorescence fluctuations on the order of a few percent, comparable to the reported signal amplitudes.

      Finally, the authors state that "brain movement kinematics are different in shape than the GFP responses we observe". However, this appears to contradict what they show in Fig. 2A. Specifically, the first example neuron exhibits fast GFP transients locked to running onset, with rapid kinematics closely matching the movement speed signals in Fig. S5A. These fast transients are incompatible with slower blood vessel area signals (Fig. 4), suggesting that alternative sources could contribute significantly.

      In sum, the possibility that alternative signal sources could significantly contribute should be taken seriously and more thoroughly discussed.

      (5) The authors added a quantification of brain movement (Fig. S5) and claim that they "only find detectable brain motion during locomotion onsets and not the other stimuli." However, Fig. S5 presents brain 'velocity' rather than 'displacement'. A constant (non-zero) velocity in Fig. S5 B-D indicates that the brain continues to move over time, potentially leading to significant displacement from its initial position across all conditions. While displacement in the x-y plane are corrected, similar displacement in the z direction likely occurs concurrently and cannot be easily accounted for. To assess this possibility, the authors should present absolute displacement relative to pre-stimulus frames, as displacement -- not velocity -- determines the size of movement-related fluorescence changes.

      (6) In line 132-133, the authors draw an analogy between the effect of hemodynamic occlusion and liquid crystal display (LCD) function. However, there are fundamental differences between the two. LCDs modulate light transmission by rotating the polarization of light, which then passes through a crossed polarizer. In contrast, hemodynamic occlusion alters light transmission by changing the number and absorbance properties of hemoglobin. Additionally, LCDs do not involve 'emission' light - back-illumination travels through the liquid crystal layer only once, whereas hemodynamic occlusion affects both incoming excitation light and the emitted fluorescence. Given these fundamental differences, the LCD analogy may not be entirely appropriate.

    3. Reviewer #2 (Public review):

      - Approach

      In this study, Yogesh et al. aimed at characterizing hemodynamic occlusion in two photon imaging, where its effects on signal fluctuations are underappreciated compared to that in wide field imaging and fiber photometry. The authors used activity-independent GFP fluorescence, GCaMP and GRAB sensors for various neuromodulators in two-photon and widefield imaging during a visuomotor context to evaluate the extent of hemodynamic occlusion in V1 and ACC. They found that the GFP responses were comparable in amplitude to smaller GCaMP responses, though exhibiting context-, cortical region-, and depth-specific effects. After quantifying blood vessel diameter change and surrounding GFP responses, they argued that GFP responses were highly correlated with changes in local blood vessel size. Furthermore, when imaging with GRAB sensors for different neuromodulators, they found that sensors with lower dynamic ranges such as GRAB-DA1m, GRAB-5HT1.0, and GRAB-NE1m exhibited responses most likely masked by the hemodynamic occlusion, while a sensor with larger SNR, GRAB-ACh3.0, showed much more distinguishable responses from blood vessel change. They thoroughly investigate other factors that could contribute to these signals and demonstrate hemodynamic occlusion is the primary cause.

      - Impact of revision

      This is an important update to the initial submission, adding much supplemental imaging and population data that provide greater detail to the analyses and increase the confidence in the authors conclusions.

      Specifically, inclusion of the supplemental figures 1 and 2 showing GFP expression across multiple regions and the fluorescence changes of thousands of individual neurons provides a clearer picture of how these effects are distributed across the population. Characterization of brain motion across stimulation conditions in supplemental figure 5 provides strong evidence that the fluorescence changes observed in many of the conditions are unlikely to be primarily due to brain motion associated imaging artifacts. The role of vascular area on fluorescence is further supported by addition of new analyses on vasoconstriction leading to increased fluorescence in Figures 4C1-4, complementing the prior analyses of vasodilation.

      The expansion of the discussion on other factors that could lead to these changes is thorough and welcome. The arguments against pH playing a factor in fluorescence changes of GFP, due to insensitivity to changes in the expected pH range are reasonable, as are the other discussed potential factors.

      With respect to the author's responses to prior critique, we agree that activity dependent hemodynamic occlusion is best investigated under awake conditions. Measurement of these dynamics under anesthesia could lead to an underestimation of their effects. Isoflurane anesthesia causes significant vasodilation and a large reduction in fluorescence intensity in non-functional mutant GRABs. This could saturate or occlude activity dependent effects.

      - Strengths

      This work is of broad interest to two photon imaging users and GRAB developers and users. It thoroughly quantifies the hemodynamic driven GFP response and compares it to previously published GCaMP data in a similar context, and illustrates the contribution of hemodynamic occlusion to GFP and GRAB responses by characterizing the local blood vessel diameter and fluorescence change. These findings provide important considerations for the imaging community and a sobering look at the utility of these sensors for cortical imaging.

      Importantly, they draw clear distinctions between the temporal dynamics and amplitude of hemodynamic artifacts across cortical regions and layers. Moreover, they show context dependent (Dark versus during visual stimuli) effects on locomotion and optogenetic light-triggered hemodynamic signals.

      The authors suggest that signal to noise ratio of an indicator likely affects the ability to separate hemodynamic response from the underlying fluorescence signal. With a new analysis (Supplemental Figure 4) They show that the relative degree of background fluorescence does not affect the size of the artifact.

      Most of the first generation neuromodulator GRAB sensors showed relatively small responses, comparable to blood vessel changes in two photon imaging, which emphasizes a need for improved the dynamic range and response magnitude for future sensors and encourages the sensor users to consider removing hemodynamic artifacts when analyzing GRAB imaging data.

      - Weaknesses

      The largest weakness of the paper remains that, while they convincingly quantify hemodynamic artifacts across a range of conditions, they provide limited means of correcting for them. However they now discuss the relative utility of some hemodynamic correction methods (e.g. from Ocana-Santero et al., 2024).

      The paper attributes the source of 'hemodynamic occlusion' primarily to blood vessel dilation, but leaves unanswered how much may be due to shifts in blood oxygenation. Figure 4 directly addresses the question of how much of the signal can be attributed to occlusion by measuring the blood vessel dilation, and has been improved by now showing positive fluorescence effects with vasoconstriction. They now also discuss the potential impact of oxygenation.

      Along these lines, the authors carefully quantified the correlation between local blood vessel diameter and GFP response (or neuropil fluorescence vs blood vessel fluorescence with GRAB sensors). We are left to wonder to what extent does this effect depend on proximity to the vessels? Do GFP/ GRAB responses decorrelate from blood vessel activity in neurons further from vessels (refer to Figure 5A and B in Neyhart et al., Cell Reports 2024)? The authors argue that the primary impact of occlusion is from blood vessels above the plane of imaging, but without a vascular reconstruction, their evidence for this is anecdotal.

      The choice of ACC as the frontal region provides a substantial contrast in location, brain movement, and vascular architecture as compared to V1. As the authors note, ACC is close to the superior sagittal sinus and thus is the region where the largest vascular effects are likely to occur. A less medial portion of M2 may have been a more appropriate comparison. The authors now include example imaging fields for ACC and interesting out-of-plane vascular examples in the supplementary figures that help assess these impacts.

      -Overall Assessment

      This paper is an important contribution to our understanding of how hemodynamic artifacts may corrupt GRAB and calcium imaging, even in two-photon imaging modes. While it would be wonderful if the authors were able to demonstrate a reliable way to correct for hemodynamic occlusion which did not rely on doing the experiments over with a non-functional sensor or fluorescent protein, the careful measurement and reporting of the effects here is, by itself, a substantial contribution to the field of neural activity imaging. It's results are of importance to anyone conducting two-photon or widefield imaging with calcium and GRAB sensors and deserves the attention of the broader neuroscience and in-vivo imaging community.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors aimed to investigate if hemodynamic occlusion contributes to fluorescent signals measured with two-photon microscopy. For this, they image the activity-independent fluorophore GFP in 2 different cortical areas, at different cortical depths and in different behavioral conditions. They compare the evoked fluorescent signals with those obtained with calcium sensors and neuromodulator sensors and evaluate their relationship to vessel diameter as a readout of blood flow.<br /> They find that GFP fluorescence transients are comparable to GCaMP6f stimuli-evoked signals in amplitude, although they are generally smaller. Yet, they are significant even at the single neuronal level. They show that GFP fluorescence transients resemble those measured with the dopamine sensor GRAB-DA1m and the serotonin sensor GRAB-5HT1.0 in amplitude an nature, suggesting that signals with these sensors are dominated by hemodynamic occlusion. 
Moreover, the authors perform similar experiments with wide-field microscopy which reveals the similarity between the two methods in generating the hemodynamic signals. Together the evidence presented calls for the development and use of high dynamic range sensors to avoid measuring signals that have another origin from the one intended to measure. In the meantime, the evidence highlights the need to control for those artifacts such as with the parallel use of activity independent fluorophores.

      Strengths:

      - Comprehensive study comparing different cortical regions in diverse behavioral settings in controlled conditions.<br /> - Comparison to the state-of-the-art, i.e. what has been demonstrated with wide-field microscopy.<br /> - Comparison to diverse activity-dependent sensors, including the widely used GCaMP.

      Comments on revisions:

      The authors have addressed my concerns well. I have no further comments.

    5. Author response:

      The following is the authors’ response to the current reviews.

      We thank you for the time you took to review our work and for your feedback! We have made only minor changes in this submission and primarily wanted to respond to the concerns raised by reviewer 1.

      Reviewer #1 (Public review): 

      Summary: 

      Fluorescence imaging has become an increasingly popular technique for monitoring neuronal activity and neurotransmitter concentrations in the living brain. However, factors such as brain motion and changes in blood flow and oxygenation can introduce significant artifacts, particularly when activitydependent signals are small. Yogesh et al. quantified these effects using GFP, an activity-independent marker, under two-photon and wide-field imaging conditions in awake behaving mice. They report significant GFP responses across various brain regions, layers, and behavioral contexts, with magnitudes comparable to those of commonly used activity sensors. These data highlight the need for robust control strategies and careful interpretation of fluorescence functional imaging data. 

      Strengths: 

      The effect of hemodynamic occlusion in two-photon imaging has been previously demonstrated in sparsely labeled neurons in V1 of anesthetized animals (see Shen and Kara et al., Nature Methods, 2012). The present study builds on these findings by imaging a substantially larger population of neurons in awake, behaving mice across multiple cortical regions, layers, and stimulus conditions. The experiments are extensive, the statistical analyses are rigorous, and the results convincingly demonstrate significant GFP responses that must be accounted for in functional imaging experiments. 

      In the revised version, the authors have provided further methodological details that were lacking in the previous version, expanded discussions regarding alternative explanations of these GFP responses as well as potential mitigation strategies. They also added a quantification of brain motion (Fig. S5) and the fraction of responsive neurons when conducting the same experiment using GCaMP6f (Fig. 3D-3F), among other additional information. 

      Weaknesses: 

      (1) The authors have now included a detailed methodology for blood vessel area quantification, where they detect blood vessels as dark holes in GFP images and measure vessel area by counting pixels below a given intensity threshold (line 437-443). However, this approach has a critical caveat: any unspecific decrease in image fluorescence will increase the number of pixels below the threshold, leading to an apparent increase in blood vessel area, even when the actual vessel size remains unchanged. As a result, this method inherently introduces a positive correlation between fluorescence decrease and vessel dilation, regardless of whether such a relationship truly exists. 

      To address this issue, I recommend labelling blood vessels with an independent marker, such as a red fluorescence dye injected into the bloodstream. This approach would allow vessel dilation to be assessed independently of GFP fluorescence -- dilation would cause opposite fluorescence changes in the green and red channels (i.e., a decrease in green due to hemodynamic occlusion and an increase in red due to the expanding vessel area). In my opinion, only when such ani-correlation is observed can one reliably infer a relationship between GFP signal changes and blood vessel dynamics. 

      Because this relationship is central to the author's conclusion regarding the nature of the observed GFP signals, including this experiment would greatly strengthen the paper's conclusion. 

      This is correct – a more convincing demonstration that blood vessels dilate or constrict anticorrelated with apparent GFP fluorescence would be a separate blood vessel marker. However, we don’t think this experiment is worth doing, as it is also not conclusive in the sense the reviewer may have in mind. The anticorrelation does not mean that occlusion drives all of the observed effect. Our main argument is instead that there is no other potential source than hemodynamic occlusion with sufficient strength that we can think of. The experiment one would want to do is block hemodynamic changes and demonstrate that the occlusion explains all of the observed changes. 

      (2) Regarding mitigation strategy, the authors advocate repeating key functional imaging experiments using GFP, and state that their aim here is to provide a control for their 2012 study (Keller et al., Neuron). Given this goal, I find it important to discuss how these new findings impact the interpretation of their 2012 results, particularly given the large GFP responses observed. 

      We are happy to discuss how the conclusions of our own work are influenced by this (see more details below), but the important response of the field should probably be to revisit the conclusions of a variety of papers published in the last two decades. This goes far beyond what we can do here. 

      For example, Keller et al. (2012) concluded that visuomotor mismatch strongly drives V1 activity (Fig. 3A in that study). However, in the present study, mismatch fails to produce any hemodynamic/GFP response (Fig. 3A, 3B, rightmost bar), and the corresponding calcium response is also the weakest among the three tested conditions (Fig. 3D). How do these findings affect their 2012 conclusions? 

      The average calcium response of L2/3 neurons to visuomotor mismatch is probably roughly similar to the average calcium response at locomotion onset (both are on the order of 1% to 5%, depending on indicator, dataset, etc.). In the Keller et al. (2012) paper, locomotion onset was about 1.5% and mismatch about 3% (see Figure 3A in that paper). What we quantify in Figure 3 of the paper here is the fraction of responsive neurons. Thus, mismatch drives strong responses in a small subset of neurons (approx. 10%), while locomotion drives a combination of a weak responses in a large fraction of the neurons (roughly 70%) and also large responses in a subset of neurons. A strong signal in a subset of neurons is what one would expect from a neuronal response, a weak signal from many neurons would be indicative of a contaminating signal. This all appears consistent. 

      Regarding influencing the conclusions of earlier work, the movement related signals described in the Keller et al. (2012) paper are probably overestimated, but are also apparent in electrophysiological recordings (Saleem et al., 2013). Thus, the locomotion responses reported in the Keller et al. (2012) paper are likely too high, but locomotion related responses in V1 are very likely real. The only conclusion we draw in the Keller et al. 2012 paper on the strength of the locomotion related responses is that they are smaller than mismatch responses (this conclusion is unaffected by hemodynamic contamination). In addition, the primary findings of the Keller et al. (2012) paper are all related to mismatch, and these conclusions are unaffected. 

      Similarly, the present study shows that GFP reveals twice as many responsive neurons as GCaMP during locomotion (Fig. 3A vs. Fig. 3D, "running"). Does this mean that their 2012 conclusions regarding locomotion-induced calcium activity need reconsideration? Given that more neurons responded with GFP than with GCaMP, the authors should clarify whether they still consider GCaMP a reliable tool for measuring brain activity during locomotion. 

      Comparisons of the fraction of significantly responsive neurons between GFP and GCaMP are not straightforward to interpret. One needs to factor in the difference in signal to noise between the two sensors. (Please note, we added the GCaMP responses here upon request of the reviewers). Note, there is nothing inherently wrong with the data, and comparisons within dataset are easily made (e.g. more grating responsive neurons than running responsive neurons in GCaMP, and vice versa with GFP). The comparison across datasets is not as straightforward as we define “responsive neurons” using a statistical test that compares response to baseline activity for each neuron. GFP labelled neurons are very bright and occlusion can easily be detected. Baseline fluorescence in GCaMP recordings is much lower and often close to or below the noise floor of the data (i.e. we only see the cells when they are active). Thus occlusion in GCaMP recordings is preferentially visible for cells that have high baseline fluorescence. Thus, in the GCaMP data we are likely underestimating the fraction of responsive neurons. 

      Regarding whether GCaMP (or any other fluorescence indicator used in vivo) is a reliable tool, we are not sure we understand. Whenever possible, fluorescence-sensor based measurements should be corrected for hemodynamic contamination – to quantify locomotion related signals this will be more difficult than e.g. for mismatch, but that does not mean it is not reliable. 

      (3) More generally, the author should discuss how functional imaging data should be interpreted going forward, given the large GFP responses reported here. Even when key experiments are repeated using GFP, it is not entirely clear how one could reliably estimate underlying neuronal activity from the observed GFP and GCaMP responses. 

      We are not sure we have a good answer to this question. The strategy for addressing this problem will depend on the specifics of the experiment, and the claims. Take the case of mismatch. Here we have strong calcium responses and no evidence of GFP responses. We would argue that this is reasonable evidence that the majority of the mismatch driven GCaMP signal is likely neuronal. For locomotion onsets, both GFP and GCaMP signals go in the same direction on average. Then one could use a response amplitude distribution comparison to conservatively exclude all neurons with a GCaMP amplitude lower than e.g. the 99th percentile of the GFP response. Etc. But we don’t think there is an easy generalizable fix for this problem.  

      For example, consider the results in Fig. 3A vs. 3D: how should one assess the relative strength of neuronal activity elicited by running, grating, or visuomotor mismatch? Does mismatch produce the strongest neuronal activity, since it is least affected by the hemodynamic/GFP confounds (Fig. 3A)? Or does mismatch actually produce the weakest neuronal activity, given that both its hemodynamic and calcium responses are the smallest? 

      See above, the reviewer may be confounding “response strength” with “fraction of responsive neurons” here. Regarding the relationship between neuronal activity and hemodynamics, it is very likely not just the average activity of all neurons, but a specific subset that drives blood vessel constriction and dilation. This would of course be a very interesting question to answer for the interpretation of hemodynamic based measurements of brain activity, like fMRI, but goes beyond the aim of the current paper.  

      In my opinion, such uncertainty makes it difficult to robustly interpret functional imaging results. Simply repeating experiments with GFP does not fully resolve this issue, as it does not provide a clear framework for quantifying the underlying neuronal activity. Does this suggest a need for a better mitigation strategy? What could these strategies be? 

      If the reviewer has a good idea - we would be all ears. We don’t have a better idea currently.  

      In my opinion, addressing these questions is critical not only for the authors' own work but also for the broader field to ensure a robust and reliable interpretation of functional imaging data. 

      We agree, having a solution to this problem would be important – we just don’t have one.  

      (4) The authors now discuss various alternative sources of the observed GFP signals. However, I feel that they often appear to dismiss these possibilities too quickly, rather than appreciating their true potential impacts (see below). 

      For example, the authors argue that brain movement cannot explain their data, as movement should only result in a decrease in observed fluorescence. However, while this might hold for x-y motion, movement in the axial (z) direction can easily lead to both fluorescence increase and decrease. Neurons are not always precisely located at the focal plane -- some are slightly above or below. Axial movement in a given direction will bring some cells into focus while moving others out of focus, leading to fluorescence changes in both directions, exactly as observed in the data (see Fig. S2). 

      The reviewer is correct that z-motion can result in an increase of apparent fluorescence (just like x-y motion can as well). On average however, just like with x-y motion, z-motion will always result in a decrease. This assumes that the user selecting regions of interest (the outlines of cells used to quantify fluorescence), will select these such that the distribution of cells selected centers on the zplane of the image. Thus, the distribution of z-location of the cell relative to the imaging plane will be some Gaussian like distribution centered on the z-plane of the image (with half the cell above the zplane and half below). Because the peak of the distribution is located on the z-plane at rest, any zmovement, up or down, will move away from the peak of the distribution (i.e. most cells will decrease in fluorescence). This is the same argument as for why x-y motion always results in decreases (assuming the user selects regions of interest centered on the location of the cells at rest).  

      Furthermore, the authors state that they discard data with 'visible' z-motion. However, subtle axial movements that escape visual detection could still cause fluorescence fluctuations on the order of a few percent, comparable to the reported signal amplitudes. 

      Correct, but as explained above, z-motion will always result in average decreases of average fluorescence as explained above.  

      Finally, the authors state that "brain movement kinematics are different in shape than the GFP responses we observe". However, this appears to contradict what they show in Fig. 2A. Specifically, the first example neuron exhibits fast GFP transients locked to running onset, with rapid kinematics closely matching the movement speed signals in Fig. S5A. These fast transients are incompatible with slower blood vessel area signals (Fig. 4), suggesting that alternative sources could contribute significantly. 

      We meant population average responses here. We have clarified this. Some of the signals we observed do indeed look like they could be driven by movement artifacts (whole brain motion, or probably more likely blood vessel dilation driven tissue distortion). We show this neuron to illustrate that this can also happen. However, to illustrate that this is a rare event we also show the entire distribution of peak amplitudes and the position in the distribution this neuron is from.  

      In sum, the possibility that alternative signal sources could significantly contribute should be taken seriously and more thoroughly discussed. 

      All possible sources (we could think of) are explicitly discussed (in roughly equal proportion). Nevertheless, the reviewer is correct that our focus here is almost exclusively on the what we think is the primary source of the problem. Given that – in my experience – this is also the one least frequently considered, I think the emphasis on – what we think is – the primary contributor is warranted.  

      (5) The authors added a quantification of brain movement (Fig. S5) and claim that they "only find detectable brain motion during locomotion onsets and not the other stimuli." However, Fig. S5 presents brain 'velocity' rather than 'displacement'. A constant (non-zero) velocity in Fig. S5 B-D indicates that the brain continues to move over time, potentially leading to significant displacement from its initial position across all conditions. While displacement in the x-y plane are corrected, similar displacement in the z direction likely occurs concurrently and cannot be easily accounted for. To assess this possibility, the authors should present absolute displacement relative to pre-stimulus frames, as displacement -- not velocity -- determines the size of movement-related fluorescence changes. 

      We use brain velocity here as a natural measure when using frame times as time bins. The problem with using a signed displacement is that if different running onsets move the brain in opposing directions, this can average out to zero. To counteract this, one can take the absolute displacement in a response window away from the position in a baseline time window. If this is done with time bins that correspond to frame times, this just becomes displacement per frame, i.e. velocity. Using absolute changes in displacement (i.e. velocity) is more sensitive than signed displacement. The responses for signed displacement are shown below (Author response image 1), but given that we are averaging signed quantities here, the average is not interpretable. 

      Author response image 1.

      Average signed brain displacement. 

      Regarding a constant drift, the reviewer might be misled by the fact that the baseline brain velocity is roughly 1 pixel per frame. The registration algorithm works in integer number of pixels only. 1 pixel per frame corresponds roughly to the noise floor of the registration algorithm. Registrations are done independently for each frame. As a consequence, the registration oscillates between a shift of 17 and 18 pixels – frame by frame – if the actual shift is somewhere between 17 and 18 pixels. This “jitter” results in a baseline brain velocity of about 1 pixel per frame. 

      (6) In line 132-133, the authors draw an analogy between the effect of hemodynamic occlusion and liquid crystal display (LCD) function. However, there are fundamental differences between the two. LCDs modulate light transmission by rotating the polarization of light, which then passes through a crossed polarizer. In contrast, hemodynamic occlusion alters light transmission by changing the number and absorbance properties of hemoglobin. Additionally, LCDs do not involve 'emission' light - backillumination travels through the liquid crystal layer only once, whereas hemodynamic occlusion affects both incoming excitation light and the emitted fluorescence. Given these fundamental differences, the LCD analogy may not be entirely appropriate. 

      The mechanism of occlusion is, as the reviewer correctly points out, different for an LCD. In both cases however, there is a variable occluder between a light source and an observer. The fact that with hemodynamic occlusion the light passes through the occluder twice (excitation and emission) does not appear to hamper the analogy to us. We have rephrased to highlight the time varying occlusion part. 

      Reviewer #2 (Public review):

      -  Approach 

      In this study, Yogesh et al. aimed at characterizing hemodynamic occlusion in two photon imaging, where its effects on signal fluctuations are underappreciated compared to that in wide field imaging and fiber photometry. The authors used activity-independent GFP fluorescence, GCaMP and GRAB sensors for various neuromodulators in two-photon and widefield imaging during a visuomotor context to evaluate the extent of hemodynamic occlusion in V1 and ACC. They found that the GFP responses were comparable in amplitude to smaller GCaMP responses, though exhibiting context-, cortical region-, and depth-specific effects. After quantifying blood vessel diameter change and surrounding GFP responses, they argued that GFP responses were highly correlated with changes in local blood vessel size. Furthermore, when imaging with GRAB sensors for different neuromodulators, they found that sensors with lower dynamic ranges such as GRAB-DA1m, GRAB-5HT1.0, and GRAB-NE1m exhibited responses most likely masked by the hemodynamic occlusion, while a sensor with larger SNR, GRAB-ACh3.0, showed much more distinguishable responses from blood vessel change. They thoroughly investigate other factors that could contribute to these signals and demonstrate hemodynamic occlusion is the primary cause. 

      -  Impact of revision 

      This is an important update to the initial submission, adding much supplemental imaging and population data that provide greater detail to the analyses and increase the confidence in the authors conclusions. 

      Specifically, inclusion of the supplemental figures 1 and 2 showing GFP expression across multiple regions and the fluorescence changes of thousands of individual neurons provides a clearer picture of how these effects are distributed across the population. Characterization of brain motion across stimulation conditions in supplemental figure 5 provides strong evidence that the fluorescence changes observed in many of the conditions are unlikely to be primarily due to brain motion associated imaging artifacts. The role of vascular area on fluorescence is further supported by addition of new analyses on vasoconstriction leading to increased fluorescence in Figures 4C1-4, complementing the prior analyses of vasodilation. 

      The expansion of the discussion on other factors that could lead to these changes is thorough and welcome. The arguments against pH playing a factor in fluorescence changes of GFP, due to insensitivity to changes in the expected pH range are reasonable, as are the other discussed potential factors. 

      With respect to the author's responses to prior critique, we agree that activity dependent hemodynamic occlusion is best investigated under awake conditions. Measurement of these dynamics under anesthesia could lead to an underestimation of their effects. Isoflurane anesthesia causes significant vasodilation and a large reduction in fluorescence intensity in non-functional mutant GRABs. This could saturate or occlude activity dependent effects. 

      - Strengths 

      This work is of broad interest to two photon imaging users and GRAB developers and users. It thoroughly quantifies the hemodynamic driven GFP response and compares it to previously published GCaMP data in a similar context, and illustrates the contribution of hemodynamic occlusion to GFP and GRAB responses by characterizing the local blood vessel diameter and fluorescence change. These findings provide important considerations for the imaging community and a sobering look at the utility of these sensors for cortical imaging. 

      Importantly, they draw clear distinctions between the temporal dynamics and amplitude of hemodynamic artifacts across cortical regions and layers. Moreover, they show context dependent (Dark versus during visual stimuli) effects on locomotion and optogenetic light-triggered hemodynamic signals. 

      The authors suggest that signal to noise ratio of an indicator likely affects the ability to separate hemodynamic response from the underlying fluorescence signal. With a new analysis (Supplemental Figure 4) They show that the relative degree of background fluorescence does not affect the size of the artifact. 

      Most of the first generation neuromodulator GRAB sensors showed relatively small responses, comparable to blood vessel changes in two photon imaging, which emphasizes a need for improved the dynamic range and response magnitude for future sensors and encourages the sensor users to consider removing hemodynamic artifacts when analyzing GRAB imaging data. 

      - Weaknesses 

      The largest weakness of the paper remains that, while they convincingly quantify hemodynamic artifacts across a range of conditions, they provide limited means of correcting for them. However they now discuss the relative utility of some hemodynamic correction methods (e.g. from Ocana-Santero et al., 2024). 

      The paper attributes the source of 'hemodynamic occlusion' primarily to blood vessel dilation, but leaves unanswered how much may be due to shifts in blood oxygenation. Figure 4 directly addresses the question of how much of the signal can be attributed to occlusion by measuring the blood vessel dilation, and has been improved by now showing positive fluorescence effects with vasoconstriction. They now also discuss the potential impact of oxygenation. 

      Along these lines, the authors carefully quantified the correlation between local blood vessel diameter and GFP response (or neuropil fluorescence vs blood vessel fluorescence with GRAB sensors). We are left to wonder to what extent does this effect depend on proximity to the vessels? Do GFP/ GRAB responses decorrelate from blood vessel activity in neurons further from vessels (refer to Figure 5A and B in Neyhart et al., Cell Reports 2024)? The authors argue that the primary impact of occlusion is from blood vessels above the plane of imaging, but without a vascular reconstruction, their evidence for this is anecdotal. 

      The choice of ACC as the frontal region provides a substantial contrast in location, brain movement, and vascular architecture as compared to V1. As the authors note, ACC is close to the superior sagittal sinus and thus is the region where the largest vascular effects are likely to occur. A less medial portion of M2 may have been a more appropriate comparison. The authors now include example imaging fields for ACC and interesting out-of-plane vascular examples in the supplementary figures that help assess these impacts. 

      -Overall Assessment 

      This paper is an important contribution to our understanding of how hemodynamic artifacts may corrupt GRAB and calcium imaging, even in two-photon imaging modes. While it would be wonderful if the authors were able to demonstrate a reliable way to correct for hemodynamic occlusion which did not rely on doing the experiments over with a non-functional sensor or fluorescent protein, the careful measurement and reporting of the effects here is, by itself, a substantial contribution to the field of neural activity imaging. It's results are of importance to anyone conducting two-photon or widefield imaging with calcium and GRAB sensors and deserves the attention of the broader neuroscience and invivo imaging community. 

      We agree with this assessment.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors aimed to investigate if hemodynamic occlusion contributes to fluorescent signals measured with two-photon microscopy. For this, they image the activity-independent fluorophore GFP in 2 different cortical areas, at different cortical depths and in different behavioral conditions. They compare the evoked fluorescent signals with those obtained with calcium sensors and neuromodulator sensors and evaluate their relationship to vessel diameter as a readout of blood flow.

      They find that GFP fluorescence transients are comparable to GCaMP6f stimuli-evoked signals in amplitude, although they are generally smaller. Yet, they are significant even at the single neuronal level. They show that GFP fluorescence transients resemble those measured with the dopamine sensor GRABDA1m and the serotonin sensor GRAB-5HT1.0 in amplitude an nature, suggesting that signals with these sensors are dominated by hemodynamic occlusion. Moreover, the authors perform similar experiments with wide-field microscopy which reveals the similarity between the two methods in generating the hemodynamic signals. Together the evidence presented calls for the development and use of high dynamic range sensors to avoid measuring signals that have another origin from the one intended to measure. In the meantime, the evidence highlights the need to control for those artifacts such as with the parallel use of activity independent fluorophores.

      Strengths:

      - Comprehensive study comparing different cortical regions in diverse behavioral settings in controlled conditions.

      - Comparison to the state-of-the-art, i.e. what has been demonstrated with wide-field microscopy.

      - Comparison to diverse activity-dependent sensors, including the widely used GCaMP.

      Comments on revisions:

      The authors have addressed my concerns well. I have no further comments.

      We agree with this assessment.  


      The following is the authors’ response to the original reviews

      The major changes to the manuscript are:

      (1) Re-wrote the discussion, going over all possible sources of the signals we describe.

      (2) We added a quantification of brain motion as Figure S5.

      (3) We added an example of blood vessel contraction as Figure 4C.

      (4) We added data on the fraction of responsive neurons when measured with GCaMP as Figures 3D-3F.

      (5) We added example imaging sites from all imaged regions as Figure S1.

      (6) We added GFP response heatmaps of all neurons as Figure S2.

      (7) We add a quantification of the relationship between GFP response amplitude and expression level Figure S4.

      A detailed point-by-point response to all reviewer concerns is provided below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Fluorescence imaging has become an increasingly popular technique for monitoring neuronal activity and neurotransmitter concentrations in the living brain. However, factors such as brain motion and changes in blood flow and oxygenation can introduce significant artifacts, particularly when activity-dependent signals are small. Yogesh et al. quantified these effects using GFP, an activity-independent marker, under two-photon and wide-field imaging conditions in awake behaving mice. They report significant GFP responses across various brain regions, layers, and behavioral contexts, with magnitudes comparable to those of commonly used activity sensors. These data highlight the need for robust control strategies and careful interpretation of fluorescence functional imaging data.

      Strengths:

      The effect of hemodynamic occlusion in two-photon imaging has been previously demonstrated in sparsely labeled neurons in V1 of anesthetized animals (see Shen and Kara et al., Nature Methods, 2012). The present study builds on these findings by imaging a substantially larger population of neurons in awake, behaving mice across multiple cortical regions, layers, and stimulus conditions. The experiments are extensive, the statistical analyses are rigorous, and the results convincingly demonstrate significant GFP responses that must be accounted for in functional imaging experiments. However, whether these GFP responses are driven by hemodynamic occlusion remains less clear, given the complexities associated with awake imaging and GFP's properties (see below).

      Weaknesses:

      (1) The authors primarily attribute the observed GFP responses to hemodynamic occlusion. While this explanation is plausible, other factors may also contribute to the observed signals. These include uncompensated brain movement (e.g., axial-direction movements), leakage of visual stimulation light into the microscope, and GFP's sensitivity to changes in intracellular pH (see e.g., Kneen and Verkman, 1998, Biophysical Journal). Although the correlation between GFP signals and blood vessel diameters supports a hemodynamic contribution, it does not rule out significant contributions from these (or other) factors. Consequently, whether GFP fluorescence can reliably quantify hemodynamic occlusion in two-photon microscopy remains uncertain.

      We concur; our data do not conclusively prove that the effect is only driven by hemodynamic occlusion. We have attempted to make this clearer in the text throughout the manuscript. In particular we have restructured the discussion to focus on this point. Regarding the specific alternatives the reviewer mentions here:

      a) Uncompensated brain motion. While this can certainly contribute, we think the effect is negligible in our interpretation for the following reasons. First, just to point out the obvious, as with all two-photon data we acquire in the lab, we only keep data with no visible z-motion (axial). Second, and more importantly, uncompensated brain motion results in a net decrease of fluorescence. As regions of interest (ROI) are selected to be centered on neurons (as opposed to be randomly selected, or next to, or above or below), movement will – on average – result in a decrease in fluorescence, as neurons are moved out of the ROIs. In the early days of awake two-photon imaging (when preps were still less stable) – we used this movement onset decrease in fluorescence as a sign that running onsets were selected correctly (i.e. with low variance). See e.g. the dip in the running onset trace at time zero in figure 3A of (Keller et al., 2012). Third, we find no evidence for any brain motion in the case of visual stimulation, while the GFP responses during locomotion and visual stimulation are of similar magnitude. We have added a quantification of brain motion (Figure S5) and a discussion of this point to the manuscript.

      b) Leakage of stimulation light. First, all light sources in the experimental room (the projector used for the mouse VR, the optogenetic stimulation light, as well as the computer monitors used to operate the microscope) are synchronized to the turnaround times of the resonant scanner of the two-photon microscope. Thus, light sources in the room are turned off for each line scan of the resonant scanner and turned on in the turnaround period. With a 12kHz scanner this results in a light cycle of 24 kHz (see Leinweber et al., 2014 for details). While the system is not perfect, we can occasionally get detectable light leak responses at the image edges (in the resonant axis as a result of the exponential off kinetics of many LEDs & lasers), these are typically 2 orders of magnitude smaller than what one would get without synchronizing, and far smaller than a single digit percentage change in GFP responses, and only detectable at the image edges. Second, while in visual cortex, dark running onsets are different from running onsets with the VR turned on (Figures 5A and B), they are indistinguishable in ACC (Figure 5C). Thus, stimulation light artefacts we can rule out.

      c) GFP’s sensitivity to changes in pH. Activity results in a decrease in neuronal intracellular pH (https://pubmed.ncbi.nlm.nih.gov/14506304/, https://pubmed.ncbi.nlm.nih.gov/24312004/) – decreasing pH decreases GFP fluorescence (https://pubmed.ncbi.nlm.nih.gov/9512054/).

      To reiterate, we don’t think hemodynamic occlusion is the only possible source to the effects we observe, but we do think it is most likely the largest.

      (2) Regardless of the underlying mechanisms driving the GFP responses, these activity-independent signals must be accounted for in functional imaging experiments. However, the present manuscript does not explore potential strategies to mitigate these effects. Exploring and demonstrating even partial mitigation strategies could have significant implications for the field.

      We concur – however, in brief, we think the only viable mitigation strategy (we are capable of), is to repeat functional imaging with GFP imaging. To unpack this: There have been numerous efforts to mitigate these hemodynamic effects using isosbestic illumination. When we started to use such strategies in the lab for widefield imaging, we thought we would calibrate the isosbestic correction using GFP recordings. The idea was that if performed correctly, an isosbestic response should look like a GFP response. Try as we may, we could not get the isosbestic responses to look like a GFP response. We suspect this is a result of the fact that none of the light sources we used were perfectly match to the isosbestic wavelength the GCaMP variants we used (not for a lack of trying, but neither lasers nor LEDs were available for purchase with exact wavelength matches). Complicating this was then also the fact that the similarity (or dissimilarity) between isosbestic and GFP responses was a function of brain region. Importantly however, just because we could not successfully apply isosbestic corrections, of course does not mean it cannot be done. Hence for the widefield experiments we then resorted to mitigating the problem by repeating the key experiments using GFP imaging (see e.g. (Heindorf and Keller, 2024)). Note, others have also argued that the best way to correct for hemodynamic artefacts is a GFP recording based correction (Valley et al., 2019). A second strategy we tried was using a second fluorophore (i.e. a red marker) in tandem with a GCaMP sensor. The problem here is that the absorption of the two differs markedly by blood and once again a correction of the GCaMP signal using the red channel was questionable at best. Thus, we think the only viable mitigation strategy we have found is GFP recordings and testing whether the postulated effects seen with calcium indicators are also present in GFP responses. This work is our attempt at a post-hoc mitigation of the problem of our own previous two-photon imaging studies.

      (3) Several methodology details are missing from the Methods section. These include: (a) signal extraction methods for two-photon imaging data (b) neuropil subtraction methods (whether they are performed and, if so, how) (c) methods used to prevent visual stimulation light from being detected by the two-photon imaging system (d) methods to measure blood vessel diameter/area in each frame. The authors should provide more details in their revision.

      Please excuse, this was an oversight. All details have been added to the methods.

      Reviewer #2 (Public Review):

      In this study, Yogesh et al. aimed at characterizing hemodynamic occlusion in two photon imaging, where its effects on signal fluctuations are underappreciated compared to that in wide field imaging and fiber photometry. The authors used activity-independent GFP fluorescence, GCaMP and GRAB sensors for various neuromodulators in two-photon and widefield imaging during a visuomotor context to evaluate the extent of hemodynamic occlusion in V1 and ACC. They found that the GFP responses were comparable in amplitude to smaller GCaMP responses, though exhibiting context-, cortical region-, and depth-specific effects. After quantifying blood vessel diameter change and surrounding GFP responses, they argued that GFP responses were highly correlated with changes in local blood vessel size. Furthermore, when imaging with GRAB sensors for different neuromodulators, they found that sensors with lower dynamic ranges such as GRAB-DA1m, GRAB5HT1.0, and GRAB-NE1m exhibited responses most likely masked by the hemodynamic occlusion, while a sensor with larger SNR, GRAB-ACh3.0, showed much more distinguishable responses from blood vessel change.

      Strengths

      This work is of broad interest to two photon imaging users and GRAB developers and users. It thoroughly quantifies the hemodynamic driven GFP response and compares it to previously published GCaMP data in a similar context, and illustrates the contribution of hemodynamic occlusion to GFP and GRAB responses by characterizing the local blood vessel diameter and fluorescence change. These findings provide important considerations for the imaging community and a sobering look at the utility of these sensors for cortical imaging.

      Importantly, they draw clear distinctions between the temporal dynamics and amplitude of hemodynamic artifacts across cortical regions and layers. Moreover, they show context dependent (Dark versus during visual stimuli) effects on locomotion and optogenetic light-triggered hemodynamic signals.

      Most of the first generation neuromodulator GRAB sensors showed relatively small responses, comparable to blood vessel changes in two photon imaging, which emphasizes a need for improved the dynamic range and response magnitude for future sensors and encourages the sensor users to consider removing hemodynamic artifacts when analyzing GRAB imaging data.

      Weaknesses

      (1) The largest weakness of the paper is that, while they convincingly quantify hemodynamic artifacts across a range of conditions, they do not quantify any methods of correcting for them. The utility of the paper could have been greatly enhanced had they tested hemodynamic correction methods (e.g. from Ocana-Santero et al., 2024) and applied them to their datasets. This would serve both to verify their findings-proving that hemodynamic correction removes the hemodynamic signal-and to act as a guide to the field for how to address the problem they highlight.

      See also our response to reviewer 1 comment 2.

      In the Ocana-Santero et al., 2024 paper they also first use GFP recordings to identify the problem. The mitigation strategy they then propose, and use, is to image a second fluorophore that emits at a different wavelength concurrently with the functional indicator. The authors then simply subtract (we think – the paper states “divisive”, but the data shown are more consistent with “subtractive” correction) the two signals to correct for hemodynamics. However, the paper does not demonstrate that the hemodynamic signals in the red channel match those in the green channel. The evidence presented that this works is at best anecdotal. In our hands this does not work (meaning the red channel does not match GFP recordings), we suspect this is a combination of crosstalk from the simultaneously recorded functional channel and the fact that hemodynamic absorption is strongly wavelength specific, or something we are doing wrong. Either way, we cannot contribute to this in the form of mitigation strategy.

      Given that the GFP responses are a function of brain area and cortical depth – it is not a stretch to postulate that they also depend on genetic cell type labelled. Thus, any GFP calibration used for correction will need to be repeated for each cell type and brain area. Once experiments are repeated using GFP (the strategy we advocate for – we don’t think there is a simpler way to do this), the “correction” is just a subtraction (or a visual comparison).

      (2) The paper attributes the source of 'hemodynamic occlusion' primarily to blood vessel dilation, but leaves unanswered how much may be due to shifts in blood oxygenation. Figure 4 directly addresses the question of how much of the signal can be attributed to occlusion by measuring the blood vessel dilation, but notably fails to reproduce any of the positive transients associated with locomotion in Figure 2. Thus, an investigation into or at least a discussion of what other factors (movement? Hb oxygenation?) may drive these distinct signals would be helpful.

      See also our response to reviewer 1 comment 1.

      We have added to Figure 4 an example of a positive transient. At running onset, superficial blood vessels in cortex tend to constrict and hence result in positive transients.

      We now also mention changes in blood oxygenation as a potential source of hemodynamic occlusion. And just to be clear, blood oxygenation (or flow) changes in absence of any fluorophore, do not lead to a two-photon signal. Just in case the reviewer was concerned about intrinsic signals – these are not detectable in two photon imaging.

      (3) Along these lines, the authors carefully quantified the correlation between local blood vessel diameter and GFP response (or neuropil fluorescence vs blood vessel fluorescence with GRAB sensors). To what extent does this effect depend on proximity to the vessels? Do GFP/ GRAB responses decorrelate from blood vessel activity in neurons further from vessels (refer to Figure 5A and B in Neyhart et al., Cell Reports 2024)?

      We indeed thought about quantifying this, but to do this properly would require having a 3d reconstruction of the blood vessel plexus above (with respect to the optical axis) the neuron of interest, as well as some knowledge of how each vessel dilates as a function of stimulus. The prime effect is likely from blood vessels that are in the 45 degrees illumination cone above the neuron (Author response image 2). Lateral proximity to a blood vessel is likely only of secondary relevance. Thus, performing such a measurement is impractical and of little benefit for others.

      Author response image 2.

      A schematic representation of the cone of illumination.

      While imaging a neuron (the spot on the imaging plane at the focus of the cone of illumination), the relevant blood vessels that primarily contribute to hemodynamic occlusion are those in the cone of illumination between the neuron and the objective lens. Blood vessels visible in the imaging plane (indicated by gray arrows), do not directly contribute to hemodynamic occlusion. Any distance dependence of hemodynamic occlusion in the observed response of a neuron to these blood vessels in the imaging plane is at best incidental.

      (4) Raw traces are shown in Figure 2 but we are never presented with the unaveraged data for locomotion of stimulus presentation times, which limits the reader's ability to independently assess variability in the data. Inclusion of heatmaps comparing event aligned GFP to GCaMP6f may be of value to the reader.

      We fear we are not sure what the reviewer means by “the unaveraged data for locomotion of stimulus presentation times”. We suspect this should read “locomotion or stimulus…”. We have added heat maps of the responses of all neurons of the data shown in Figure 1 – as Figure S2.

      (5) More detailed analysis of differences between the kinds of dynamics observed in GFP vs GCaMP6f expressing neurons could aid in identifying artifacts in otherwise clean data. The example neurons in Figure 2A hint at this as each display unique waveforms and the question of whether certain properties of their dynamics can reveal the hemodynamic rather than indicator driven nature of the signal is left open. Eg. do the decay rate and rise times differ significantly from GCaMP6f signals?

      The most informative distinction we have found is differences in peak responses (Figure 2B). Decay and rise time measurements critically depend on the identification of “events”. As a function of how selective one is with what one calls an event (e.g. easy in example 1 of Figure 2 – but more difficult in examples 2 and 3), one gets very different estimates of rise and decay times. Due to the fact that peak amplitudes are lower in GFP responses – rise and decay times will be either slower or noisier (depending on where the threshold for event detection is set).

      (6) The authors suggest that signal to noise ratio of an indicator likely affects the ability to separate hemodynamic response from the underlying fluorescence signal. Does the degree of background fluorescence affect the size of the artifact? If there was variation in background and overall expression level in the data this could potentially be used to answer this question. Could lower (or higher!) expression levels increase the effects of hemodynamic occlusion?

      There may be a misunderstanding (i.e. we might be misunderstanding the reviewer’s argument here). Our statement from the manuscript that the signal to noise ratio of an indicator matters is based on the simple consideration that hemodynamic occlusion is in the range of 0 to 2 % ΔF/F. The larger the dynamic range of the indicator, the less of a problem 2% ΔF/F are. Imagine an indicator with average responses in the 100’s of % ΔF/F - then this would be a non-problem. For indicators with a dynamic range less than 1%, a 2% artifact is a problem.

      Regarding “background” fluorescence, we are not sure what is meant here. In case the reviewer means fluorescence that comes from indicator molecules in processes (as opposed to soma) that are typically ignored (or classified as neuropil) – we are not sure how this would help. The occlusion effects are identical for both somatic and axonal or dendritic GFP (the source of the GFP fluorescence is not relevant for the occlusion effect). In case the reviewer means “baseline” fluorescence – above a noise threshold ΔF/F<sub>0</sub> should be constant independent of F<sub>0</sub> (i.e. baseline fluorescence). This also holds in the data, see Figure S4. We might be stating the trivial - the normalization of fluorescence activity as ΔF/F<sub>0</sub> has the effect that the “occluder" effect is constant for all values of all F<sub>0</sub>.

      (7) The choice of the phrase 'hemodynamic occlusion' may cause some confusion as the authors address both positive and negative responses in the GFP expressing neurons, and there may be additional contributions from changes in blood oxygenation state.

      Regarding the potential confusion with regards to terminology, occlusion can decrease or increase.

      Only under the (incorrect) assumption that occlusion is zero at baseline would this be confusing – no? If the reviewer has a suggestion for a different term, we’d be open to changing it.

      Regarding blood oxygenation – this is absolutely correct, we did not explicitly point this out in the previous version of the manuscript. Occlusion changes are driven by a combination of changes to volume and “opacity” of the blood. Oxygenation changes would be in the second category. We have clarified this in the manuscript.

      (8) The choice of ACC as the frontal region provides a substantial contrast in location, brain movement, and vascular architecture as compared to V1. As the authors note, ACC is close to the superior sagittal sinus and thus is the region where the largest vascular effects are likely to occur. The reader is left to wonder how much of the ROI may or may not have included vasculature in the ACC vs V1 recordings as the only images of the recording sites provided are for V1. We are left unable to conclude whether the differences observed between these regions are due to the presence of visible vasculature, capillary blood flow or differences in neurovasculature coupling between regions. A less medial portion of M2 may have been a more appropriate comparison. At least, inclusion of more example imaging fields for ACC in the supplementary figures would be of value.

      Both the choice of V1 and ACC were simply driven by previous experiments we had already done in these areas with calcium indicators. And we agree, the relevant axis is likely distance from midline, not AP – i.e. RSC and ACC are likely more similar, and V1 and lateral M2 more similar. We have made this point explicitly in the manuscript and have added sample fields of view as Figure S1.

      (9) In Figure 3, How do the proportions of responsive GFP neurons compare to GCaMP6f neurons?

      We have added the data for GCaMP responses.

      (10) How is variance explained calculated in Figure 4? Is this from a linear model and R^2 value? Is this variance estimate for separate predictors by using single variable models? The methods should describe the construction of the model including the design matrix and how the model was fit and if and how cross validation was run.

      This is simply a linear model (i.e. R^2) – we have added this to the methods.

      (11) Cortical depth is coarsely defined as L2/3 or L5, without numerical ranges in depth from pia.

      Layer 2/3 imaging was done at a depth of 100-250 μm from pia, and the same for layer 5 was 400-600 μm. This has been added to the methods.

      Overall Assessment:

      This paper is an important contribution to our understanding of how hemodynamic artifacts may corrupt GRAB and calcium imaging, even in two-photon imaging modes. Certain useful control experiments, such as intrinsic optical imaging in the same paradigms, were not reported, nor were any hemodynamic correction methods investigated. Thus, this limits both mechanistic conclusions and the overall utility with respect to immediate applications by end users. Nevertheless, the paper is of significant importance to anyone conducting two-photon or widefield imaging with calcium and GRAB sensors and deserves the attention of the broader neuroscience and in-vivo imaging community.

      Reviewer #3 (Public review):

      In this study, the authors aimed to investigate if hemodynamic occlusion contributes to fluorescent signals measured with two-photon microscopy. For this, they image the activity-independent fluorophore GFP in 2 different cortical areas, at different cortical depths and in different behavioral conditions. They compare the evoked fluorescent signals with those obtained with calcium sensors and neuromodulator sensors and evaluate their relationship to vessel diameter as a readout of blood flow.

      They find that GFP fluorescence transients are comparable to GCaMP6f stimuli-evoked signals in amplitude, although they are generally smaller. Yet, they are significant even at the single neuronal level. They show that GFP fluorescence transients resemble those measured with the dopamine sensor GRABDA1m and the serotonin sensor GRAB-5HT1.0 in amplitude an nature, suggesting that signals with these sensors are dominated by hemodynamic occlusion. Moreover, the authors perform similar experiments with wide-field microscopy which reveals the similarity between the two methods in generating the hemodynamic signals. Together the evidence presented calls for the development and use of high dynamic range sensors to avoid measuring signals that have another origin from the one intended to measure. In the meantime, the evidence highlights the need to control for those artifacts such as with the parallel use of activity independent fluorophores.

      Strengths:

      - Comprehensive study comparing different cortical regions in diverse behavioral settings in controlled conditions.

      - Comparison to the state-of-the-art, i.e. what has been demonstrated with wide-field microscopy.

      - Comparison to diverse activity-dependent sensors, including the widely used GCaMP.

      Weaknesses:

      (1) The kinetics of GCaMP is stereotypic. An analysis/comment on if and how the kinetics of the signals could be used to distinguish the hemodynamic occlusion artefacts from calcium signals would be useful.

      We might be misunderstanding what the reviewer means by “the kinetics of GCaMP are stereotypic”. The kinetics are clearly stereotypic if one has isolated single action potential responses in a genetically identified cell type. But data recorded in vivo looks very different, see e.g. example traces in figure 1g of (Keller et al., 2012). And these are selected example traces, the average GCaMP trace looks perhaps more like the three example traces shown in Figure 2 (this is not surprising if the GCaMP signals one records in vivo are a superposition of calcium responses and hemodynamic occlusion). All quantification of kinetics relies on identifying “events”. We cannot identify events in any meaningful way for most of the data (see e.g. examples 2 and 3 in Figure 2). The one feature we can reliably identify as differing between GCaMP and GFP responses is peak response amplitude (as quantified in Figure 2).

      (2) Is it possible that motion is affecting the signals in a certain degree? This issue is not made clear.

      See also our response to reviewer 1 comment 1. In brief, we have added a quantification of motion artefacts as Figure S5, and argue that motion artefacts could only account for locomotion onset responses (there is no detectable brain motion to visual responses) and would predict a decrease in fluorescence (not an increase).

      (3) The causal relationship with blood flow remains open. Hemodynamic occlusion seems a good candidate causing changes in GFP fluorescence, but this remains to be well addressed in further research.

      We agree – we have made this clearer in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2A shows three neurons with convincing GFP responses, with amplitudes often exceeding 100%. However, after seeing these data, I actually feel less convinced that these responses are related to hemodynamic occlusion. Blood vessel diameter changes by at most a few percent during behavior -- how could such small changes lead to >100% changes in GFP fluorescence?

      My guess is that these responses might instead be related to motion artifacts, particularly given the strong correlation between these responses and running speed (Figure 2A). One possible way to test this is by examining a pixelwise map of fluorescence changes (dF/F) during running vs. baseline. If hemodynamic effects are involved, one would likely see a shadow of the involved blood vessels in this map. Conversely, if motion artifacts are the primary factor, the map of dF/F should resemble the spatial gradients of the mean fluorescence image. Examining pixelwise maps of dF/F will likely provide insights regarding the nature of the GFP signals.

      The underlying assumption (“blood vessel diameter changes by at most a few percent”) might be incorrect here. (Note also, relevant is likely the cross section, not diameter.) See Figure 4A1 and B1 for quantification of example blood vessel area changes - both example vessels change area by approximately 50%. Also note, example 1 in Figure 2 is an extreme example. The example was chosen to highlight that effects can be large. To try to illustrate that this is not typical however, we also show the distribution of all neurons in Figure 2B and mark all three example cells – example 1 is at the very tail of the distribution.

      Regarding the analysis suggested, we have added examples of this for running onset to the manuscript (Figure S7). We have examples in which a blood vessel shadow is clearly visible. More typical however, is a general increase in fluorescence (on running onset) that we think is caused by blood vessels closer to the surface of the brain.

      (2) Figure 3A shows strong GFP responses during running, while visuomotor mismatch elicit virtually no GFP-responsive neurons. This finding is puzzling, as visuomotor mismatch has been shown by the same group to activate L2/3 neurons more strongly than running (see Figure 3A, Keller et al., 2012, Neuron). Stronger neuronal activation should, in theory, result in more pronounced hemodynamic effects, and therefore, a higher proportion of GFP-responsive neurons. The absence of GFP responses during visuomotor mismatch raises questions about whether GFP signals are directly linked to hemodynamic occlusion.

      An alternative explanation is that the strong GFP responses observed during running could instead be driven by motion artifacts, e.g., those associated with the increased head or body movements during running onsets. Such artifacts could explain the observed GFP responses, rather than hemodynamic occlusion.

      This might be a misunderstanding. Mismatch responses are primarily observed in mismatch neurons. These are superficial L2/3 neurons (possibly the population that in higher mammals is L2 neurons). The fact that mismatch responses are primarily observed in this superficial population is likely the reason they were discovered using two-photon calcium imaging (which tends to have a bias towards superficial neurons as the image quality is best there), and seen in much fewer neurons when using electrophysiological techniques (Saleem et al., 2013) that are biased to deeper neurons. In response to Reviewer #2, we have now also added a quantification of the fraction of neurons responsive to these stimuli when using GCaMP (Figure 3D-F). The fraction of neurons responsive to visuomotor mismatch is smaller than those responsive on locomotion or to visual stimuli.

      Thus, based on “average” responses across all cortical cell types (our L2/3 recordings here are as unbiased across all of L2/3 as possible) the response profiles (strong running onset and visual responses, and weak MM responses) are probably what one would expect in first approximation also in the blood vessel response profile. Complicating this is of course the fact that it is likely some cell type specific activity that contributes most to blood flow changes, not simply average neuronal activity.

      See response to public review comment 1 for a discussion of alternative sources, including motion artefacts.

      (3) Given the potential confound associated with brain motion, the authors might consider quantifying hemodynamic occlusion effects under more controlled conditions, such as in anesthetized animals, where brain movement is minimal. They could use drifting grating stimuli, which are known to produce wellcharacterized blood vessel and hemodynamic responses in V1. The effects of hemodynamic occlusion can then be quantified by imaging the fluorescence of an activity-independent marker. For maximal robustness, GFP should ideally be avoided, due to its known sensitivity to pH changes, as noted in the public review.

      Brain motion is negligible to visual stimuli in the awake mouse as well (Figure S5). This is likely the better control than anesthetized recordings – anesthesia has strong effects on blood pressure, heart rate, breathing, etc. all of which would introduce more confounds.

      (4) Regardless of the precise mechanism driving the observed GFP response, these activity-independent signals must be accounted for in functional imaging experiments. This applies not only to experiments using small dynamic range sensors but also to those employing 'high dynamic range' sensors like GCaMP6, which, according to the authors, exhibit responses only ~2-fold greater than those of GFP.

      In this context, the extensive GFP imaging data are highly valuable, as they could serve as a benchmark for evaluating the effectiveness of correction methods. Ideally, effective correction methods should produce minimal responses when applied to GFP imaging data. With these data at hand, I strongly encourage the authors to explore potential correction methods, as such methods could have far-reaching impact on the field.

      As discussed above, we have tested a number of such correction approaches for both widefield and two-photon imaging and could never recover a response profile that resembles the GFP response. The “correction method” we have come to favor, is repeating experiments using GFP (i.e. what we have done here).

      (5) Several correction approaches could be considered: for instance, the strong correlation between GFP responses and blood vessel diameter (as shown in Figure 4) could potentially be leveraged to predict and compensate for the activity-independent signals. Alternatively, expressing an activity-independent marker alongside the activity sensor in orthogonal spectral channels could enable simultaneous monitoring and correction of activity-independent signals. Finally, computational procedure to remove common fluctuations, measured from background or 'neuropil' regions (see, e.g., Kerlin et al., 2010, Neuron; Giovannucci et al., 2019, eLife), may help reduce the contamination in cellular ROIs. The authors could try some or all of these methods, and benchmark their effectiveness by assessing, e.g., the number of GFP responsive neurons after correction.

      Over the years we have tried many of these approaches. A correction using a second fluorophore of a different color likely fails because blood absorption is strongly wavelength dependent, making it challenging to calibrate the correction factor. Neuropil “correction” on GCaMP data, even with the best implementations, is just a common mode subtraction. The signal in the neuropil – as the name implies is just an average of many axons and dendrites in the vicinity – most of these processes are from nearby neurons making a neuropil response simply an average response of the neurons in some neighborhood. Adding the problem of hemodynamic responses (which on small scales will also influence nearby neurons and neuropil similarly) makes disentangling the two effects impossible (i.e. neuropil subtraction makes the problem worse, not better). However, just because we fail in implementing all of these methods, does not necessarily mean the method is faulty. Hence we have chosen not to comment on any such method, and simply provide the only mitigation strategy that works in our hands – record GFP responses.

      (6) Given the potential usefulness of the GFP imaging data, I encourage the authors to share these data in a public repository to facilitate the development of correction methods.

      Certainly – all of our data are always published. In the early years of the lab on an FMI repository here https://data.fmi.ch/ - more recently now on Zenodo.

      (7) As noted in the public review, several methodology details are missing. Most importantly, I could not find the description in the Methods section explaining how fluorescence signals from individual neurons were extracted from two-photon imaging data. The existing section on 'Extraction of neuronal activity' appears to cover only the wide-field analysis, with details about two-photon analysis seemingly absent.

      Please excuse the omission – this has all been added to the methods. In brief, to answer your questions:

      Were regions of interest (ROIs) for individual cells identified manually or automatically?

      We use a mixture of manual and automatic methods for our two-photon data. Based on a median filtered (spatially) version of the mean fluorescence image, we used a threshold based selection of ROIs. This was then visually inspected and manually corrected where necessary such that ROIs were at least 250 pixels and only labelled clearly identifiable neurons.

      Was fluorescence within each ROI calculated by averaging signals across pixels, or were signal de-mixing algorithms (e.g., PCA, ICA, or NMF) applied?

      We use the average fluorescence across pixels without any de-mixing algorithms here and in all our two-photon experiments. De-mixing algorithms can introduce a variety of artefacts.

      Additionally, did the authors account for and correct the contribution of surrounding neuropil?

      No neuropil correction was applied. It would also be difficult to see how this would help. If the model of hemodynamic occlusion is correct, one would expect occlusion effects to change on the length scale of blood vessels (i.e. tens to hundreds of microns). Thus, the effect of occlusion on neuropil and cells should be the similar. Neuropil “correction” is always based on the idea of removing signals that are common to both neuropil and somata, thereby complicating the interpretation of the resulting signal even further.

      Without these methodological details, it is difficult to accurately interpret the two-photon signals reported in the manuscript.

      (8) The rationale for using the average fluorescence of a ROI within the blood vessel as a proxy for blood vessel diameter is not entirely clear to me. The authors should provide a clearer justification for this approach in their revision.

      Consider a ROI placed within a blood vessel at the focus of the illumination cone (Author response image 3). Given the axial point-spread-function of two-photon imaging is in the range of 0.5 μm laterally and 3 μm axially (indicated by the bicone), emitted photons from the fluorescent tissue outside of the blood vessel but within the two-photon volume will contribute to change in fluorescence in the ROI. A change in the blood vessel volume, say an increase on dilation, would decrease the amount of emission photons reaching the objective by, one, pushing more of the fluorescent tissue outside of the two-photon volume, and two, by presenting greater hemodynamic occlusion to the photons emitted by the fluorescent tissue immediately below the vessel. Conversely, on vasoconstriction there are more emission photons at the objective.

      In line with this argument, as shown in Figure 4A1-A2, B1-B2 and C1-C2, we do find that the change in fluorescence of blood vessel ROI varies inversely with the area of the blood vessel. Of course, change in blood vessel ROI fluorescence is only a proxy for vessel size. Extracting blood vessel boundaries from individual two-photon frames was noisy and proved unreliable in the absence of specific dyes to label the vessel walls. We thus resorted to using blood vessel ROI fluorescence as a proxy for hemodynamic occlusion, and tested how much of the variance in GFP responses is explained by the change in blood vessel ROI response.

      We have added an explanation to the manuscript, as suggested.

      Author response image 3.

      Average response of ROIs placed within blood vessels co-vary with hemodynamic occlusion.

      (9) I find that the Shen et al., 2012, Nature Methods paper has gone quite far to demonstrate the effect of hemodynamic occlusion in two photon imaging. Therefore, I suggest the authors describe and cite this work not only in the discussion but also in the introduction, where they can highlight the key questions left unanswered by that study and explain how their manuscript aims to address them.

      We have added the reference and point to the work in the introduction as suggested.

      Reviewer #3 (Recommendations for the authors):

      I appreciate very much that the study is presented in a very clear manner.

      A few comments that could clarify it even further:

      (1) Fig. 1: make clear on legend if it is an average of full FOVs.

      The traces shown are the average over ROIs (neurons) – we have clarified in the figure legend as suggested.

      (2) Give a more complete definition of hemodynamic occlusion to understand the hypothesis in the relationship between blood vessel dilation and GFP fluorescence (116-119). Maybe, move the phrase from conclusion "Since blood absorbs light, hemodynamic occlusion can affect fluorescence intensity measurements" (219-220).

      Very good point – we expanded on the definition in the introduction.

      (3) For clarity, mention in the main text the method used to assess how a parameter explains the variance (126-129).

      Is implemented.

      (4) Discuss the possible relationship of the signals to neuronal activity.

      We have added this to the discussion.

      (5) Discuss if the measurements could provide any functional insights, whether they could be used to learn something about the brain.

      We have added this to the discussion.

    1. eLife Assessment

      This important study combines convincing evolution experiments with molecular and genetic techniques to study how a genetic lesion in MreB that causes rod-shaped cells to become spherical, with concomitant deleterious fitness effects, can be rescued by natural selection. The detailed mechanistic investigation increases our understanding of how mreB contributes to cell wall synthesis and shows how compensatory mutations may reestablish its homogeneity.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed experimental evolution of MreB mutants that have a slow growing round phenotype and studied the subsequent evolutionary trajectory using analysis tool from molecular biology. It was remarkable and interesting that they found that the original phenotype was not restored (most common in these studies) but that the round phenotype was maintained.

      Strengths:

      The finding that the round phenotype was maintained during evolution rather than that the original phenotype, rod shape cells, was recovered is interesting. The paper extensively investigates what happens during adaptation with various different techniques. Also the extensive discussion of the findings at the end of the paper is well thought through and insightful.

    3. Reviewer #3 (Public review):

      This paper addresses a long-standing problem in microbiology: the evolution of bacterial cell shape. Bacterial cells can take a range of forms, among the most common being rods and spheres. The consensus view is that rods are the ancestral form and spheres the derived form. The molecular machinery governing these different shapes is fairly well understood but the evolutionary drivers responsible for the transition between rods and spheres is not. Enter Yulo et al.'s work. The authors start by noting that deletion of a highly conserved gene called MreB in the Gram-negative bacterium Pseudomonas fluorescens reduces fitness but does not kill the cell (as happens in other species like E. coli and B. subtilis) and causes cells to become spherical rather than their normal rod shape. They then ask whether evolution for 1000 generations restores the rod shape of these cells when propagated in a rich, benign medium.

      The answer is no. The evolved lineages recovered fitness by the end of the experiment, growing just as well as the unevolved rod-shaped ancestor, but remained spherical. The authors provide an impressively detailed investigation of the genetic and molecular changes that evolved. Their leading results are:

      (1) The loss of fitness associated with MreB deletion causes high variation in cell volume among sibling cells after cell division;<br /> (2) Fitness recovery is largely driven by a single, loss-of-function point mutation that evolves within the first ~250 generations that reduces the variability in cell volume among siblings;<br /> (3) The main route to restoring fitness and reducing variability involves loss of function mutations causing a reduction of TPase and peptidoglycan cross-linking, leading to a disorganized cell wall architecture characteristic of spherical cells.

      The inferences made in this paper are on the whole well supported by the data. The authors provide a uniquely comprehensive account of how a key genetic change leads to gains in fitness and the spectrum of phenotypes that are impacted and provide insight into the molecular mechanisms underlying models of cell shape.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      As to the exceptionally minor issue, namely, correction for multiple statistical tests (minor because the data and the error are presented in the text). We have now conducted one-way ANOVA to back the data displayed in Fig 4A., and Supp. Figs 19 and 21. In each case ANOVA revealed a highly significant difference among means: Dunnett’s post hoc test was then used to test each result against SBW25, with the multiple comparisons corrected for in the analysis.

      This resulted in changes to the description of the statistical analysis in the following captions:

      To Figure 4.

      Where we previously referred to paired t-tests we now state:  ANOVA revealed a highly significant difference among means [F<sub>7,16</sub> = 8.19, p < 0.001] with Dunnett’s post-hoc test adjusted for multiple comparisons showing that five genotypes (*) differ significantly (p < 0.05) from SBW25.

      To Supplementary Figure 19.

      Where we previously referred to paired t-tests we now state: ANOVA revealed a highly significant difference among means [F<sub>7,16</sub> = 16.74, p < 0.001] with Dunnett’s post-hoc test adjusted for multiple comparisons showing that three genotypes (*) differ significantly (p < 0.05) from SBW25.

      To Supplementary Figure 21.

      Where we previously referred to paired t-tests we now state:  ANOVA revealed a highly significant difference among means [F<sub>7,89</sub> = 9.97, p < 0.0001] with Dunnett’s post-hoc test adjusted for multiple comparisons showing that SBW25 ∆mreB and SBW25 ∆PFLU4921-4925 are significantly different (*) from SBW25 (p < 0.05).


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors performed experimental evolution of MreB mutants that have a slow-growing round phenotype and studied the subsequent evolutionary trajectory using analysis tools from molecular biology. It was remarkable and interesting that they found that the original phenotype was not restored (most common in these studies) but that the round phenotype was maintained. 

      Strengths: 

      The finding that the round phenotype was maintained during evolution rather than that the original phenotype, rod-shaped cells, was recovered is interesting. The paper extensively investigates what happens during adaptation with various different techniques. Also, the extensive discussion of the findings at the end of the paper is well thought through and insighXul. 

      Weaknesses: 

      I find there are three general weaknesses: 

      (1) Although the paper states in the abstract that it emphasizes "new knowledge to be gained" it remains unclear what this concretely is. On page 4 they state 3 three research questions, these could be more extensively discussed in the abstract. Also, these questions read more like genetics questions while the paper is a lot about cell biological findings. 

      Thank you for drawing attention to the unnecessary and gratuitous nature of the last sentence of the Abstract. We are in agreement. It has been modified, and we have taken  advantage of additional word space to draw attention to the importance of the two competing (testable) hypotheses laid out in the Discussion. 

      As to new knowledge, please see the Results and particularly the Discussion. But beyond this, and as recognised by others, there is real value for cell biology in seeing how (and whether) selection can compensate for effects that are deleterious to fitness. The results will very often depart from those delivered from, for example, suppressor analyses, or bottom up engineering. 

      In the work recounted in our paper, we chose to focus – by way of proof-of principle – on the most commonly observed mutations, namely, those within pbp1A.  But beyond this gene, we detected mutations  in other components of the cell shape / division machinery whose connections are not yet understood and which are the focus of on-going investigation.  

      As to the three questions posed at the end of the Introduction, the first concerns whether selection can compensate for deleterious effects of deleting mreB (a question that pertains to evolutionary aspects); the second seeks understanding of genetic factors; the third aims to shed light on the genotype-to-phenotype map (which is where the cell biology comes into play).  Given space restrictions, we cannot see how we could usefully expand, let alone discuss, the three questions raised at the end of the Introduction in restrictive space available in the Abstract.   

      (2) It is not clear to me from the text what we already know about the restoration of MreB loss from suppressors studies (in the literature). Are there suppressor screens in the literature and which part of the findings is consistent with suppressor screens and which parts are new knowledge?  

      As stated in the Introduction, a previous study with B. subtilis (which harbours three MreB isoforms and where the isoform named “MreB” is essential for growth under normal conditions), suppressors of MreB lethality were found to occur in ponA, a class A penicillin binding protein (Kawai et al., 2009). This led to recognition that MreB plays a role in recruiting Pbp1A to the lateral cell wall. On the other hand, Patel et al. (2020) have shown that deletion of classA PBPs leads to an up-regulation of rod complex activity. Although there is a connection between rod complex and class A PBPs, a further study has shown that the two systems work semi-autonomously (Cho et al., 2016). 

      Our work confirms a connection between MreB and Pbp1A, and has shed new light on how this interaction is established by means of natural selection, which targets the integrity of cell wall. Indeed, the Rod complex and class A PBPs have complementary activities in the building of the cell wall with each of the two systems able to compensate for the other in order to maintain cell wall integrity. Please see the major part of the Discussion. In terms of specifics, the connection between mreB and pbp1A (shown by Kawai et al (2009)) is indirect because it is based on extragenic transposon insertions. In our study, the genetic connection is mechanistically demonstrated.  In addition, we capture that the evolutionary dynamics is rapid and we finally enriched understanding of the genotype-to-phenotype map.

      (3) The clarity of the figures, captions, and data quantification need to be improved.  

      Modifications have been implemented. Please see responses to specific queries listed below.

      Reviewer #2 (Public Review): 

      Yulo et al. show that deletion of MreB causes reduced fitness in P. fluorescens SBW25 and that this reduction in fitness may be primarily caused by alterations in cell volume. To understand the effect of cell volume on proliferation, they performed an evolution experiment through which they predominantly obtained mutations in pbp1A that decreased cell volume and increased viability. Furthermore, they provide evidence to propose that the pbp1A mutants may have decreased PG cross-linking which might have helped in restoring the fitness by rectifying the disorganised PG synthesis caused by the absence of MreB. Overall this is an interesting study. 

      Queries: 

      Do the small cells of mreB null background indeed have no DNA? It is not apparent from the DAPI images presented in Supplementary Figure 17. A more detailed analysis will help to support this claim. 

      It is entirely possible that small cells have no DNA, because if cell division is aberrant then division can occur prior to DNA segregation resulting in cells with no DNA. It is clear from microscopic observation that both small and large cells do not divide. It is, however, true, that we are unable to state – given our measures of DNA content – that small cells have no DNA. We have made this clear on page 13, paragraph 2.

      What happens to viability and cell morphology when pbp1A is removed in the mreB null background? If it is actually a decrease in pbp1A activity that leads to the rescue, then pbp1A- mreB- cells should have better viability, reduced cell volume and organised PG synthesis. Especially as the PG cross-linking is almost at the same level as the T362 or D484 mutant.  

      Please see fitness data in Supp. Fig. 13. Fitness of ∆mreBpbp1A is no different to that caused by a point mutation. Cells remain round.  

      What is the status of PG cross-linking in ΔmreB Δpflu4921-4925 (Line 7)? 

      This was not analysed as the focus of this experiment was PBPs. A priori, there is no obvious reason to suspect that ∆4921-25 (which lacks oprD) would be affected in PBP activity.

      What is the morphology of the cells in Line 2 and Line 5? It may be interesting to see if PG cross-linking and cell wall synthesis is also altered in the cells from these lines. 

      The focus of investigation was restricted to L1, L4 and L7. Indeed, it would be interesting to look at the mutants harbouring mutations in :sZ, but this is beyond scope of the present investigation (but is on-going). The morphology of L2 and L5 are shown in Supp. Fig. 9.

      The data presented in 4B should be quantified with appropriate input controls. 

      Band intensity has now been quantified (see new Supp. Fig .20). The controls are SBW25, SBW25∆pbp1A, SBW25 ∆mreB and SBW25 ∆mreBpbp1A as explained in the paper.

      What are the statistical analyses used in 4A and what is the significance value? 

      Our oversight. These were reported in Supp. Fig. 19, but should also have been presented in Fig. 4A. Data are means of three biological replicates. The statistical tests are comparisons between each mutant and SBW25, and assessed by paired t-tests.  

      A more rigorous statistical analysis indicating the number of replicates should be done throughout. 

      We have checked and made additions where necessary and where previously lacking. In particular, details are provided in Fig. 1E, Fig. 4A and Fig. 4B. For Fig. 4C we have produced quantitative measures of heterogeneity in new cell wall insertion. These are reported in Supp. Fig. 21 (and referred to in the text and figure caption) and show that patterns of cell wall insertion in ∆mreB are highly heterogeneous.

      Reviewer #3 (Public Review): 

      This paper addresses an understudied problem in microbiology: the evolution of bacterial cell shape. Bacterial cells can take a range of forms, among the most common being rods and spheres. The consensus view is that rods are the ancestral form and spheres the derived form. The molecular machinery governing these different shapes is fairly well understood but the evolutionary drivers responsible for the transition between rods and spheres are not. Enter Yulo et al.'s work. The authors start by noting that deletion of a highly conserved gene called MreB in the Gram-negative bacterium Pseudomonas fluorescens reduces fitness but does not kill the cell (as happens in other species like E. coli and B. subtilis) and causes cells to become spherical rather than their normal rod shape. They then ask whether evolution for 1000 generations restores the rod shape of these cells when propagated in a rich, benign medium. 

      The answer is no. The evolved lineages recovered fitness by the end of the experiment, growing just as well as the unevolved rod-shaped ancestor, but remained spherical. The authors provide an impressively detailed investigation of the genetic and molecular changes that evolved. Their leading results are: 

      (1) The loss of fitness associated with MreB deletion causes high variation in cell volume among sibling cells after cell division. 

      (2) Fitness recovery is largely driven by a single, loss-of-function point mutation that evolves within the first ~250 generations that reduces the variability in cell volume among siblings. 

      (3) The main route to restoring fitness and reducing variability involves loss of function mutations causing a reduction of TPase and peptidoglycan cross-linking, leading to a disorganized cell wall architecture characteristic of spherical cells. 

      The inferences made in this paper are on the whole well supported by the data. The authors provide a uniquely comprehensive account of how a key genetic change leads to gains in fitness and the spectrum of phenotypes that are impacted and provide insight into the molecular mechanisms underlying models of cell shape. 

      Suggested improvements and clarifications include: 

      (1) A schematic of the molecular interactions governing cell wall formation could be useful in the introduction to help orient readers less familiar with the current state of knowledge and key molecular players. 

      We understand that this would be desirable, but there are numerous recent reviews with detailed schematics that we think the interested reader would be better consulting. These are referenced in the text.

      (2) More detail on the bioinformatics approaches to assembling genomes and identifying the key compensatory mutations are needed, particularly in the methods section. This whole subject remains something of an art, with many different tools used. Specifying these tools, and the parameter settings used, will improve transparency and reproducibility, should it be needed. 

      We overlooked providing this detail, which has now been corrected by provision of more information in the Materials and Methods. In short we used Breseq, the clonal option, with default parameters. Additional analyses were conducted using Genieous. The BreSeq output files are provided https://doi.org/10.17617/3.CU5SX1 (which include all read data).

      (3) Corrections for multiple comparisons should be used and reported whenever more than one construct or strain is compared to the common ancestor, as in Supplementary Figure 19A (relative PG density of different constructs versus the SBW25 ancestor). 

      The data presented in Supp Fig 19A (and Fig 4A) do not involve multiple comparisons. In each instance the comparison is between SBW25 and each of the different mutants. A paired t-test is thus appropriate.

      (4) The authors refrain from making strong claims about the nature of selection on cell shape, perhaps because their main interest is the molecular mechanisms responsible. However, I think more can be said on the evolutionary side, along two lines. First, they have good evidence that cell volume is a trait under strong stabilizing selection, with cells of intermediate volume having the highest fitness. This is notable because there are rather few examples of stabilizing selection where the underlying mechanisms responsible are so well characterized. Second, this paper succeeds in providing an explanation for how spherical cells can readily evolve from a rod-shaped ancestor but leaves open how rods evolved in the first place. Can the authors speculate as to how the complex, coordinated system leading to rods first evolved? Or why not all cells have lost rod shape and become spherical, if it is so easy to achieve? These are important evolutionary questions that remain unaddressed. The manuscript could be improved by at least flagging these as unanswered questions deserving of further attention. 

      These are interesting points, but our capacity to comment is entirely speculative. Nonetheless, we have added an additional paragraph to the Discussion that expresses an opinion that has yet to receive attention:

      “Given the complexity of the cell wall synthesis machinery that defines rod-shape in bacteria, it is hard to imagine how rods could have evolved prior to cocci. However, the cylindrical shape offers a number of advantages. For a given biomass (or cell volume), shape determines surface area of the cell envelope, which is the smallest surface area associated with the spherical shape. As shape sets the surface/volume ratio, it also determines the ratio between supply (proportional to the surface) and demand (proportional to cell volume). From this point of view, it is more efficient to be cylindrical (Young 2006). This also holds for surface attachment and biofilm formation (Young 2006). But above all, for growing cells, the ratio between supply and demand is constant in rod shaped bacteria, whereas it decreases for cocci. This requires that spherical cells evolve complex regulatory networks capable of maintaining the correct concentration of cellular proteins despite changes in surface/volume ratio. From this point of view, rod-shaped bacteria offer opportunities to develop unsophisticated regulatory networks.”

      why not all cells have lost rod shape and become spherical.

      Please see Kevin Young’s 2006 review on the adaptive significance of cell shape

      The value of this paper stems both from the insight it provides on the underlying molecular model for cell shape and from what it reveals about some key features of the evolutionary process. The paper, as it currently stands, provides more on which to chew for the molecular side than the evolutionary side. It provides valuable insights into the molecular architecture of how cells grow and what governs their shape. The evolutionary phenomena emphasized by the authors - the importance of loss-of-function mutations in driving rapid compensatory fitness gains and that multiple genetic and molecular routes to high fitness are often available, even in the relatively short time frame of a few hundred generations - are well understood phenomena and so arguably of less broad interest. The more compelling evolutionary questions concern the nature and cause of stabilizing selection (in this case cell volume) and the evolution of complexity. The paper misses an opportunity to highlight the former and, while claiming to shed light on the latter, provides rather little useful insight. 

      Thank you for these thoughts and comments. However, we disagree that the experimental results are an overlooked opportunity to discuss stabilising selection. Stabilising selection occurs when selection favours a particular phenotype causing a reduction in underpinning population-level genetic diversity. This is not happening when selection acts on SBW25 ∆mreB leading to a restoration of fitness. Driving the response are biophysical factors, primarily the critical need to balance elongation rate with rate of septation. This occurs without any change in underlying genetic diversity.  

      Recommendations for the authors:  

      Reviewer 1 (Recommendations for the Authors): 

      Hereby my suggestion for improvement of the quantification of the data, the figures, and the text. 

      -  p 14, what is the unit of elongation rate?  

      At first mention we have made clear that the unit is given in minutes^-1

      -  p 14, please give an error bar for both p=0.85 and f=0.77, to be able to conclude they are different 

      Error on the probability p is estimated at the 95% confidence interval by the formula:1.96 , where N is the total number of cells. This has been added in the paragraph p »probability » of the Image Analysis section in the Material and Methods. 

      We also added errors on p measurement in the main text.

      -  p 14, all the % differences need an errorbar 

      The error bars and means are given in Fig 3C and 3D.

      -  Figure 1B adds units to compactness, and what does it represent? Is the cell size the estimated volume (that is mentioned in the caption)? Shouldn't the datapoints have error bars? 

      Compactness is defined in the “Image Analysis” section of the Material and Methods. It is a dimensionless parameter. The distribution of individual cell shapes / sizes are depicted in Fig 1B. Error does arise from segmentation, but the degree of variance (few pixels) is much smaller than the representations of individual cells shown.

      -  Figure 1C caption, are the 50.000 cells? 

      Correct. Figure caption has been altered.

      -  Figure 1D, first the elongation rate is described as a volume per minute, but now, looking at the units it is a rate, how is it normalized? 

      Elongation rate is explained in the Materials and Methods (see the image analysis section) and is not volume per minute. It is dV/dt = r*V (the unit of r is min^-1). Page 9 includes specific mention of the unit of r.

      -  Figure 1E, how many cells (n) per replicate? 

      Our apologies. We have corrected the figure caption that now reads:

      “Proportion of live cells in ancestral SBW25 (black bar) and ΔmreB (grey bar) based on LIVE/DEAD BacLight Bacterial Viability Kit protocol. Cells were pelleted at 2,000 x g for 2 minutes to preserve ΔmreB cell integrity. Error bars are means and standard deviation of three biological replicates (n>100).”

      -  Figure 1G, how does this compare to the wildtype 

      The volume for wild type SBW25 is 3.27µm^3 (within the “white zone”). This is mentioned in the text.

      -  Figure 2B, is this really volume, not size? And can you add microscopy images? 

      The x-axis is volume (see Materials and Methods, subsection image analysis). Images are available in Supp. Fig. 9.

      -  Figure 3A what does L1, L4 and L7 refer too? Is it correct that these same lines are picked for WT and delta_mreB 

      Thank you for pointing this out. This was an earlier nomenclature. It was shorthand for the mutants that are specified everywhere else by genotype and has now been corrected. 

      -  Figure 3c: either way write out p, so which probability, or you need a simple cartoon that is plotted. 

      The value p is the probability to proceed to the next generation and is explained in Materials and Methods  subsection image analysis.  We feel this is intuitive and does not require a cartoon. We nonetheless added a sentence to the Materials and Methods to aid clarity.

      -  Figure 4B can you add a ladder to the gel? 

      No ladder was included, but the controls provide all the necessary information. The band corresponding to PBP1A is defined by presence in SBW25, but absence in SBW25 ∆pbp1A.

      -  Figure 4c, can you improve the quantification of these images? How were these selected and how well do they represent the community? 

      We apologise for the lack of quantitative description for data presented in Fig 4C. This has now been corrected. In brief, we measured the intensity of fluorescent signal from between 10 and 14 cells and computed the mean and standard deviation of pixel intensity for each cell. To rule out possible artifacts associated with variation of the mean intensity, we calculated the ratio of the standard deviation divided by the square root of the mean. These data reveal heterogeneity in cell wall synthesis and provide strong statistical support for the claim that cell wall synthesis in ∆mreB is significantly more heterogeneous than the control. The data are provided in new Supp. Fig. 21. 

      Minor comments: 

      -  It would be interesting if the findings of this experimental evolution study could be related to comparative studies (if these have ever been executed).  

      Little is possible, but Hendrickson and Yulo published a portion of the originally posted preprint separately. We include a citation to that paper. 

      -  p 13, halfway through the page, the second paragraph lacks a conclusion, why do we care about DNA content? 

      It is a minor observation that was included by way of providing a complete description of cell phenotype.  

      -  p 17, "suggesting that ... loss-of-function", I do no not understand what this is based upon. 

      We show that the fitness of a pbp1A deletion is indistinguishable from the fitness of one of the pbp1A point mutants. This fact establishes that the point mutation had the same effects as a gene deletion thus supporting the claim that the point mutations identified during the course of the selection experiment decrease (or destroy) PBP1A function.

      -  p 25, at the top of the page: do you have a reference for the statement that a disorganized cell wall architecture is suited to the topology of spherical cells? 

      The statement is a conclusion that comes from our reasoning. It stems from the fact that it is impossible to entirely map the surface of a sphere with parallel strands.

    1. eLife Assessment

      The authors provide important insights into a system of insect camouflage where a coating of self-made nano-particles (brochosomes) reduces the reflection of UV-light leading to lower predation by spiders. Compelling evidence is provided by micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, histology, in-vivo predation assays and gene knock-downs. The phylogenetic analyses provide evidence that the genes coding for the brochosome proteins are clade-specific and have diversified by gene duplication.

    2. Reviewer #1 (Public review):

      Summary:

      Evading predation is of utmost importance for most animals and camouflage is one of the predominant mechanisms. Wu et al. set out to test the hypothesis of a unique camouflage system in leafhoppers. These animals coat themselves with brochosomes, which are spherical nanostructures that are produced in the Malpighian tubules and are distributed on the cuticle after eclosion. Based on previous findings on reflectivity properties of brochosomes, the authors provide convincing evidence that these nanostructures indeed reduce reflectivity of the animals thereby reducing predation by jumping spiders. Further, they identify four proteins, which are essential for proper development and function of brochosomes: In RNAi experiments, the regular brochosome structure is lost, the reflectivity reduced and the respective animals are prone to increased predation. Finally, the authors provide phylogenetic sequence analyses and speculate about the evolution of these genes.

      Strengths:

      The study is very comprehensive including careful optical measurements, EM and TM analysis of the nanoparticles and their production line in the malphigian tubules, in vivo predation tests and knock-down experiments to identify essential proteins. Indeed, the results are very convincingly in line with the starting hypothesis such that the study robustly assigns a new biological function to the brochosome coating system.

      A key strength of the study is that the biological relevance of the brochosome coating is convincingly shown by an in vivo predation test using a known predator from the same habitat.

      Another major step forward is an RNAi screen, which identified four proteins, which are essential for the brochosome structure (BSMs). After respective RNAi knock-downs, the brochosomes show curious malformations that are interesting in terms of the self-assembly of these nanostructures. The optical and in vivo predation tests provide excellent support for the model that the RNAi knock-down leads to a change of brochosomes structure, which reduces reflectivity, which in turn leads to a decrease of the antipredatory effect.

      Conclusion:

      The authors successfully tested their hypothesis in a multidisciplinary approach and convincingly assigned a new biological function to the brochosomes system. The results fully support their claims on the involvement of the four BSM genes in brochosome structure, the relevance of brochosomes for predation avoidance and they provide evidence for the evolution of these genes.

      The work is a very interesting study case of the evolutionary emergence of a new system to evade predators. Based on this study, the function of the BSM genes could now be studied in other species to provide insights into putative ancestral functions. Further, studying the self-assembly of such highly regular complex nano-structures will be strongly fostered by the identification of the four key structural genes.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the optical properties of brochosomes produced by leafhoppers. They hypothesize that brochosomes reduce light reflection on the leafhopper's body surface, aiding in predator avoidance. Their hypothesis is supported by experiments involving jumping spiders. Additionally, the authors employ a variety of techniques including micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, and bioassays. This study is highly interesting, and the experimental data is well-organized and logically presented.

      Strengths:

      The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage. This work provides the first biological evidence demonstrating that natural brochosomes can be used as a camouflage coating to reduce the leafhoppers' observability to their predators. The design of the experiments is novel.

      Weaknesses:

      (1) The observation that brochosome coatings become sparse after 25 days in both male and female leafhoppers, resulting in increased predation by jumping spiders, is intriguing. However, since leafhoppers consistently secrete and groom brochosomes, it would be beneficial to explore why brochosomes become significantly less dense after 25 days.

      (2) The authors demonstrate that brochosome coatings reduce UV (specular) reflection compared to surfaces without brochosomes, which can be attributed to the rough geometry of brochosomes as discussed in the literature. However, it would be valuable to investigate whether the proteins forming the brochosomes are also UV absorbing.

      (3) The experiments with jumping spiders show that brochosomes help leafhoppers avoid predators to some extent. It would be beneficial for the authors to elaborate on the exact mechanism behind this camouflage effect. Specifically, why does reduced UV reflection aid in predator avoidance? If predators are sensitive to UV light, how does the reduced UV reflectance specifically contribute to evasion?

      (4) An important reference regarding the moth-eye effect is missing. Please consider including the following paper: Clapham, P. B., and M. C. Hutley. "Reduction of lens reflection by the 'Moth Eye' principle." Nature 244: 281-282 (1973).

      (5) The introduction should be revised to accurately reflect the related contributions in literature. Specifically, the novelty of this work lies in the demonstration of the camouflage effect of brochosomes using jumping spiders, which is verified for the first time in leafhoppers. However, the proposed use of brochosome powder for camouflage was first described by R.B. Swain (R.B. Swain, Notes on the oviposition and life history of the leafhopper Oncometopta undata Fabr. (Homoptera: Cicadellidae), Entomol. News. 47: 264-266 (1936)). Recently, the antireflective and potential camouflage functions of brochosomes were further studied by Yang et al. based on synthetic brochosomes and simulated vision techniques (S. Yang, et al. "Ultra-antireflective synthetic brochosomes." Nature Communications 8: 1285 (2017)). Later, Lei et al. demonstrated the antireflective properties of natural brochosomes in 2020 (C.-W. Lei, et al., "Leafhopper wing-inspired broadband omnidirectional antireflective embroidered ball-like structure arrays using a nonlithography-based methodology." Langmuir 36: 5296-5302 (2020)). Very recently, Wang et al. successfully fabricated synthetic brochosomes with precise geometry akin to those natural ones, and further elucidated the antireflective mechanisms based on the brochosome geometry and their role in reducing the observability of leafhoppers to their predators (L. Wang et al. "Geometric design of antireflective leafhopper brochosomes." Proceedings of the National Academy of Sciences 121: e2312700121 (2024)).

      Comments on revisions:

      In this revision, the authors have addressed some of the key concerns I raised in our previous comments. However, a few issues remain unaddressed. Additionally, the new experimental data introduced in the manuscript require further clarification, which I outline below.

      (1) As I pointed out in my previous review comments, "The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage." While the authors did cite the original hypothesis proposed by R.B. Swain (1936), they have omitted important references that provide evidence on the use of antireflective properties of brochosomes for camouflage in a synthetic setting (see for example, Fig. 5a of S. Yang, et al. Nat. Commun. 8:1285, 2017). The authors are recommended to revise the Abstract and Introduction accordingly to ensure a fair and accurate representation of the existing literature.

      (2) The antireflection mechanisms of brochosome structures have been discussed in detail, specifically, how their geometries (i.e., brochosome diameter and pore size) contribute to reducing UV reflectance (L. Wang, et al., PNAS. 121: e2312700121, 2024 and P. Banergee, et al., Advanced Photonics Research 4:2200343, 2023). The authors should incorporate these recent findings into their discussion (line 381 - line 383 of the manuscript).

      (3) The authors presented new data brochosomes deposited on a quartz slide and measured their reflectance across UV, visible light, and infrared wavelengths. Since reflectance is highly sensitive to the uniformity of brochosome coverage on the substrate, it is crucial to quantify this coverage across the measurement area for comparison. While the authors include SEM images to illustrate the packing of brochosomes on both the leafhopper wing and the quartz substrate (Fig. S7) at a microscopic scale (~10 um view), it would be beneficial to also provide SEM images at a larger scale (e.g., 100 um - 1 mm) and quantify the density of brochosomes per unit area for comparison.

      (4) For the negative control using acetone to remove the brochosomes the leafhopper wing, have the authors confirmed the absence of brochosomes after treatment? If so, the authors should explicitly indicate this for clarity.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Evading predation is of utmost importance for most animals and camouflage is one of the predominant mechanisms. Wu et al. set out to test the hypothesis of a unique camouflage system in leafhoppers. These animals coat themselves with brochosomes, which are spherical nanostructures that are produced in the Malpighian tubules and are distributed on the cuticle after eclosion. Based on previous findings on the reflectivity properties of brochosomes, the authors provide very good evidence that these nanostructures indeed reduce the reflectivity of the animals thereby reducing predation by jumping spiders. Further, they identify four proteins, which are essential for the proper development and function of brochosomes. In RNAi experiments, the regular brochosome structure is lost, the reflectivity reduced and the respective animals are prone to increased predation. Finally, the authors provide some phylogenetic sequence analyses and speculate about the evolution of these essential genes.

      Strengths:

      The study is very comprehensive including careful optical measurements, EM and TM analysis of the nanoparticles and their production line in the malphigian tubules, in vivo predation tests, and knock-down experiments to identify essential proteins. Indeed, the results are very convincingly in line with the starting hypothesis such that the study robustly assigns a new biological function to the brochosome coating system.

      A key strength of the study is that the biological relevance of the brochosome coating is convincingly shown by an in vivo predation test using a known predator from the same habitat.

      Another major step forward is an RNAi screen, which identified four proteins, which are essential for the brochosome structure (BSMs). After respective RNAi knock-downs, the brochosomes show curious malformations that are interesting in terms of the self-assembly of these nanostructures. The optical and in vivo predation tests provide excellent support for the model that the RNAi knock-down leads to a change of brochosomes structure, which reduces reflectivity, which in turn leads to a decrease of the antipredatory effect.

      Thank you very much for your positive feedback and insightful comments on our manuscript. We are delighted that you acknowledge the efforts we have made in studying the components and functions of Brochosomal proteins. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings identified in our original submission. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      The reduction of reflectivity by aberrant brochosomes or after ageing is only around 10%. This may seem little to have an effect in real life. On the other hand, the in vivo predation tests confirm an influence. Hence, this is not a real weakness of the study - just a note to reconsider the wording for describing the degree of reflectivity.

      Thank you for your valuable suggestions. Based on your recommendations, we have revised the manuscript accordingly. Although the absolute reduction in light reflection due to Brochosomal coverage is approximately 10%, the relative decrease in light reflection on the leafhopper's surface is nearly 30%. Specifically, in the ultraviolet region, the reflection is reduced from about 30% to 20%, and in the visible light region, it is reduced from 20% to 10%. For detailed revisions, please refer to lines 151-156 of the revised manuscript.

      The single gene knockdowns seemed to lead to a very low penetrance of malformed brochosomes (Figure Supplement 3). Judging from the overview slides, less than 1% of brochosomes may have been affected. A quantification of regular versus abnormal particles in both, wildtype and RNAi treatments would have helped to exclude that the shown aberrant brochosomes did not just reflect a putative level of "normal" background defects. Of note, the quadruple knock-down of all BSMs seemed to lead to a high penetrance (Figure 4), which was already reflected in the microtubule production line. While the data shown are convincing, a quantification might strengthen the argument.

      While the RNAi effects seemed to be very specific to brochosomes and therefore very likely specific, an off-target control for RNAi was still missing. Finding the same/similar phenotype with a non-overlapping dsRNA fragment in one off-target experiment is usually considered required and sufficient. Further, the details of the targeted sequence will help future workers on the topic.

      Thank you for your valuable suggestions. Based on your recommendations, we have synthesized dsRNA targeting two non-overlapping regions of the coding sequences for four Brochosomal structural protein genes. These dsRNAs were injected individually and in combination for each gene. Our RNAi experiments for each BSM gene demonstrated that both individual and combined injections significantly suppressed the expression of the target genes, with the combined injection yielding slightly better silencing efficiency. Statistical analysis of the SEM observations revealed that the combined injection of dsRNAs targeting two non-overlapping regions led to a 60-70% reduction in the surface area coverage of Brochosomes. Additionally, approximately 20% of the remaining Brochosomes exhibited significant morphological changes. For detailed revisions, please refer to lines 199-211 of the revised manuscript, as well as Figures 3A and 3C, and Supplementary Figures 4 and 5.

      The main weakness in the current manuscript may be the phylogenetic analysis and the model of how the genes evolved. Several aspects were not clearly or consistently stated such that I felt unsure about what the authors actually think. For instance: Are all the 4 BSMs related to each other or only BSM2 and 3? If so, not only BSM2 and 3 would be called "paralogs" but also the other BSMs. If they were all related, then a phylogenetic tree including all BSMs should be shown to visualize the relatedness (including the putative ancestral gene if that is the model of the authors). Actually, I was not sure about how the authors think about the emergence of the BSMs. Are they real orphan genes (i.e. not present outside the respective clade) or was there an ancestral gene that was duplicated and diverged to form the BSMs? Where in the phylogeny does the first of the BSMs or ancestral proteins emerge (is the gene found in Clastoptera arizonana the most ancestral one?)? Maybe, the evolution of the BSMs would have to be discussed individually for each gene as they show somewhat different patterns of emergence and loss (BSM4 present in all species, the others with different degrees of phylogenetic restriction).

      Thank you very much for your constructive feedback on our phylogenetic analysis and the modeling of gene evolution. We fully agree with your insights and acknowledge that the evolutionary analysis of BSM genes remains somewhat ambiguous. This ambiguity is primarily due to the limited research on the precise structural protein composition of Brochosomes. While proteomics studies have analyzed and discussed the structural proteins of Brochosomes, the accurate composition of these proteins is still poorly understood. In this study, we identified four BSM proteins, but given the intricate structure of Brochosomes as proteinaceous spheres, we believe there may be additional BSM genes that have not yet been identified. Moreover, despite the presence of over ten thousand species within the Cicadomorpha, only three species have genome sequences available, and fewer than a hundred species have transcriptome sequencing data. The scarcity of research on Brochosomes, as well as the limited availability of genomic and transcriptomic data, poses significant challenges for our phylogenetic analysis and understanding of BSM gene evolution.

      Based on your suggestions, we have revised the manuscript accordingly. Specifically, we have updated Figure 5C by including ten additional species from Cereopoidea, Cicadoidea, and Fulgoroidea to better illustrate that BSM genes are true orphan genes. We have also added a phylogenetic tree of BSM genes within Cicadidae in Supplementary Figure 3. Additionally, we have expanded the discussion of BSM gene evolution in the manuscript (lines 503-556). For detailed revisions, please refer to Figure 5C, Supplementary Figure 3, and lines 507-585 of the revised manuscript.

      Related to these questions I remained unsure about some details in Figure 5. On what kind of analysis is the phylogeny based? Why are some species not colored, although they are located on the same branch as colored ones? What is the measure for homology values - % identity/similarity? The homology labels for Nephotetix cincticeps and N. virescens seem to be flipped: the latter is displayed with 100% identity for all genes with all proteins while the former should actually show this. As a consequence of these uncertainties, I could not fully follow the respective discussion and model for gene evolution.

      Thank you very much for your insightful comments and suggestions. We have carefully considered your feedback and have thoroughly revised our manuscript accordingly. Specifically, we have enhanced the description of the phylogenetic analysis process to provide greater clarity and transparency, with the detailed methods now included in lines 789-798. Regarding Figure 5C, we appreciate your attention to the coloring scheme. We would like to clarify that the family Cicadellidae comprises 25 subfamilies, many of which are represented by only one species in our figure. To ensure clarity and meaningful representation, we have chosen to color only those subfamilies with more than three species, thereby avoiding visual clutter and emphasizing the most relevant taxonomic groups. Additionally, we have corrected the inverted homology labels for Nephotetix cincticeps and Nephotetix virescens to ensure the accuracy and consistency of our data presentation.

      Conclusion:

      The authors successfully tested their hypothesis in a multidisciplinary approach and convincingly assigned a new biological function to the brochosomes system. The results fully support their claims - only the quantification of the penetrance in the RNAi experiments would be helpful to strengthen the point. The author's analysis of the evolution of BSM genes remained a bit vague and I remained unsure about their respective conclusions.

      The work is a very interesting study case of the evolutionary emergence of a new system to evade predators. Based on this study, the function of the BSM genes could now be studied in other species to provide insights into putative ancestral functions. Further, studying the self-assembly of such highly regular complex nano-structures will be strongly fostered by the identification of the four key structural genes.

      Reviewer #1 (Recommendations for the authors):

      Main manuscript:

      Please consider the annotated pdf with suggestions for wording and comments at the authors' discretion:

      Thank you very much for your detailed suggestions and comments provided in the annotated PDF. We have carefully reviewed each of your points and have revised the manuscript accordingly. All changes have been highlighted in red text for your convenience. The revised manuscript with tracked changes is available for your review. We believe these revisions have improved the clarity and quality of our manuscript. Thank you again for your valuable feedback.

      Supplementary Figure 2 C:

      Y-axes:

      - label: "surface coverage in %"

      - there are different scale values for the different days (e.g. 80-105 for day 5 and 0-80 at day 25). As a comparison between days is interesting, it would help to have the same scale values for all. That would show the decrease more intuitively.

      Thank you very much for your suggestion regarding the Y-axis in Supplementary Figure 2C. We agree that using a consistent scale across all time points is essential for clear and intuitive comparison. In the revised manuscript, we have standardized the Y-axis scale for Supplementary Figure 2C to a uniform range of 0-100% for all days. This change allows for a more straightforward visualization of the decreasing trend in surface coverage over time.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigate the optical properties of brochosomes produced by leafhoppers. They hypothesize that brochosomes reduce light reflection on the leafhopper's body surface, aiding in predator avoidance. Their hypothesis is supported by experiments involving jumping spiders. Additionally, the authors employ a variety of techniques including micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, and bioassays. This study is highly interesting, and the experimental data is well-organized and logically presented.

      Strengths:

      The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage. This work provides the first biological evidence demonstrating that natural brochosomes can be used as a camouflage coating to reduce the leafhoppers' observability of their predators. The design of the experiments is novel.

      We are extremely grateful for your positive feedback and insightful comments on our manuscript. We are delighted that you have recognized the efforts we have put into our research on how brochosomes serve as a camouflage coating to reduce the detectability of leafhoppers to their predators. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings of the original version. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      (1) The observation that brochosome coatings become sparse after 25 days in both male and female leafhoppers, resulting in increased predation by jumping spiders, is intriguing. However, since leafhoppers consistently secrete and groom brochosomes, it would be beneficial to explore why brochosomes become significantly less dense after 25 days.

      Thank you very much for your valuable suggestions. We appreciate your interest in the reduction of brochosomal density on the surface of leafhoppers after 25 days.We believe that the primary reason for the decreased density of brochosomes on the leafhopper surface after 25 days is the reduced synthesis and secretion of brochosomes. The Malpighian tubules are the main sites for brochosome synthesis. As shown in Figure 2D and Supplementary Figure 1, the thick glandular segments of the Malpighian tubules in both male and female leafhoppers begin to atrophy 15 days after reaching adulthood. This indicates a gradual decline in brochosome synthesis and secretion after day 15 of adulthood. Following your suggestion, we have revised the discussion section of the manuscript to elaborate on this observation. The detailed changes can be found in lines 474-491 of the revised manuscript.

      (2) The authors demonstrate that brochosome coatings reduce UV (specular) reflection compared to surfaces without brochosomes, which can be attributed to the rough geometry of brochosomes as discussed in the literature. However, it would be valuable to investigate whether the proteins forming the brochosomes are also UV absorbing.

      Thank you very much for your valuable suggestions. Following your advice, we have successfully expressed four BSM genes in a prokaryotic system, purified the corresponding proteins, and applied them to quartz glass surfaces. We then measured the light reflectance of the quartz glass surfaces coated with these purified proteins. The results showed that the purified BSM proteins did not exhibit better antireflective properties compared to the control GST protein. For more details, please refer to Supplementary Figure 8 in the revised manuscript.  We believe that the excellent antireflective properties of brochosomes are fundamentally due to their unique geometric shapes. The hollow pores within the brochosomes, with diameters of approximately 100 nm, are significantly smaller than most wavelengths in the visible spectrum. When light passes through these tiny pores, diffraction occurs, while light passing through the ridges of the brochosomes causes scattering. The interference between the diffracted and scattered light from these pores and ridges results in the observed extinction characteristics of brochosomes. We have incorporated these insights into the discussion section of the revised manuscript (lines 416-425 and lines 432-442 of the revised manuscript).

      (3) The experiments with jumping spiders show that brochosomes help leafhoppers avoid predators to some extent. It would be beneficial for the authors to elaborate on the exact mechanism behind this camouflage effect. Specifically, why does reduced UV reflection aid in predator avoidance? If predators are sensitive to UV light, how does the reduced UV reflectance specifically contribute to evasion?

      Thank you very much for your valuable suggestions. Based on your advice, we have included a detailed discussion on how reducing ultraviolet (UV) reflection can help insects avoid predation. The revised content can be found in lines 445-460 of the revised manuscript.

      “UV light serves as a crucial visual cue for various insect predators, enhancing foraging, navigation, mating behavior, and prey identification (Cronin & Bok, 2016; Morehouse et al., 2017; Silberglied, 1979). Predators such as birds, reptiles, and predatory arthropods often rely on UV vision to detect prey (Church et al., 1998; Li & Lim, 2005; Zou et al., 2011). However, UV reflectance from insect cuticles can disrupt camouflage, increasing the risk of detection and predation, as natural backgrounds like leaves, bark, and soil typically reflect minimal UV light (Endler, 1997; Li & Lim, 2005; Tovee, 1995). To mitigate this risk, insects often possess anti-reflective cuticular structures that reduce UV and broad-spectrum light reflectance. This strategy is widespread among insects, including cicadas, dragonflies, and butterflies, and has been shown to decrease predator detection rates (Hooper et al., 2006; Siddique et al., 2015; Zhang et al., 2006). For example, the compound eyes of moths feature hexagonal protuberances that reduce UV reflectance, aiding nocturnal concealment (Blagodatski et al., 2015; Stavenga et al., 2005). In butterflies, UV reflectance from eyespots on wings can attract predators, but reducing UV reflectance or eyespot size can lower predation risk and enhance camouflage (Chan et al., 2019; Lyytinen et al., 2004). Hence, the reflection of ultraviolet light from the insect cuticle surface increases the risk of predation by disrupting camouflage (Tovee, 1995)”

      (4) An important reference regarding the moth-eye effect is missing. Please consider including the following paper: Clapham, P. B., and M. C. Hutley. "Reduction of lens reflection by the 'Moth Eye' principle." Nature 244: 281-282 (1973).

      Thank you very much for pointing out the omission of the important reference on the “moth eye” effect. We sincerely apologize for the oversight. Based on your suggestion, we have now included the seminal paper by Clapham and Hutley (1973) in the revised manuscript. The reference has been added to both the Introduction and Discussion sections to provide a more comprehensive context for our discussion on anti-reflective structures in insects.

      (5) The introduction should be revised to accurately reflect the related contributions in literature. Specifically, the novelty of this work lies in the demonstration of the camouflage effect of brochosomes using jumping spiders, which is verified for the first time in leafhoppers. However, the proposed use of brochosome powder for camouflage was first described by R.B. Swain (R.B. Swain, Notes on the oviposition and life history of the leafhopper Oncometopta undata Fabr. (Homoptera: Cicadellidae), Entomol. News. 47: 264-266 (1936)). Recently, the antireflective and potential camouflage functions of brochosomes were further studied by Yang et al. based on synthetic brochosomes and simulated vision techniques (S. Yang, et al. "Ultra-antireflective synthetic brochosomes." Nature Communications 8: 1285 (2017)). Later, Lei et al. demonstrated the antireflective properties of natural brochosomes in 2020 (C.-W. Lei, et al., "Leafhopper wing-inspired broadband omnidirectional antireflective embroidered ball-like structure arrays using a nonlithography-based methodology." Langmuir 36: 5296-5302 (2020)). Very recently, Wang et al. successfully fabricated synthetic brochosomes with precise geometry akin to those natural ones, and further elucidated the antireflective mechanisms based on the brochosome geometry and their role in reducing the observability of leafhoppers to their predators (L. Wang et al. "Geometric design of antireflective leafhopper brochosomes." Proceedings of the National Academy of Sciences 121: e2312700121 (2024)).

      Thank you very much for your valuable suggestions regarding the revision of the introduction to accurately reflect the relevant contributions in the literature. Based on your feedback, we have thoroughly revised the introduction and added the suggested references to provide a comprehensive context for our study. The details of these revisions can be found in lines 84-94 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) In Figure 2E, the data for Male-5d appears to be missing. Please verify and ensure all relevant data is included.

      Thank you for pointing out the issue regarding the data presentation in Figure 2E.We apologize for any confusion caused by the overlapping data points and the less conspicuous color choice for Male-5d. We have carefully reviewed the data and confirmed that all relevant data points, including Male-5d, are indeed present in the dataset. In the revised manuscript, we have adjusted the color scheme for Male-5d and Female-5d in Figure 2E to ensure that both curves are clearly distinguishable, even in areas where they overlap. This adjustment should facilitate a more accurate and convenient observation of the data trends. We appreciate your attention to detail, and we believe these revisions have improved the clarity and readability of the figure.

      (2) In Figure 6, please clarify the reflectance data in the inset. Clearly explain what the blue and light blue curves represent.

      Thank you for your suggestion regarding Figure 6.We have revised the figure to improve clarity. The light blue curve now represents the reflectance measurements of leafhoppers with higher brochosome coverage, while the dark blue curve corresponds to those with lower coverage. These changes, along with updated labels in the figure legend, ensure that the data are clearly distinguishable and easy to interpret. We appreciate your feedback and believe these revisions have enhanced the overall clarity of the figure.

    1. eLife Assessment

      This timely and important study used functional near-infrared spectroscopy hyperscanning to examine the neural correlates of how group identification influences collective behavior. The work provides incomplete evidence to indicate that the synchronization of brain activity between different people underlies collective performance and that changes in brain activity patterns within individuals may, in turn, underlie this between-person synchrony. This study will be of interest to researchers investigating the neuroscience of social behaviour.

    2. Reviewer #1 (Public review):

      The article provides a timely and well-written examination of how group identification influences collective behaviors and performance using fNIRs and behavioral data.

      Comments on revisions:

      Most Reviewer concerns have been addressed in the revised manuscript, but some limitations persist with respect to core aspects of study design (e.g., long block durations and lack of counter-balancing) and analysis (i.e., the potential circularity of some analyses, the insufficiency of a mediation model to demonstrate causality, and a lack of clarity concerning the model us to map task activation).

      Editor's note: Although the Reviewers found the reviews generally responsive, some fundamental concerns remain which will not be changed by further revision.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses (clarifications needed):

      (1) Experimental Design:

      The study does not mention whether the authors examined sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers). Including these variables could provide a more nuanced understanding of group dynamics.

      We are grateful to the reviewer for pointing out this valuable question. We have clarified that future studies should include sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers) (p. 27).

      “Finally, future research should investigate additional variables, including sex differences and measures of attractiveness or hierarchy among participants, such as students versus teachers.”  p. 27

      (2) fNIRS Data Acquisition:

      The authors' approach to addressing individual differences in anatomy is lacking in detail. Understanding how they identified the optimal channels for synchrony between participants would be beneficial. Was this done by averaging to find the location with the highest coherence?

      We apologize for missing some details here. We have included the following information in the fNIRS data acquisition and fNIRS data analyses to clarify the details (pp. 8 and 12).

      We employed the one-sample t-test method to assess the GNS disparity between the baseline and task sessions, identifying particular channels of interest. This analysis did not ascertain the maximum coherence level, but rather pinpointed the channel exhibiting significant divergence between the two sessions, which we designated as pertinent to the group decision-making task. Furthermore, we selected the PFC and left TPJ as our reference brain regions, guided by existing literature.

      “Two optode probe sets were used to cover each participant's prefrontal and left TPJ regions (Figure S1). The DLPFC plays a crucial role in group decision-making processes, with findings suggesting that individuals exhibiting reduced prefrontal activity were more prone to out-group exclusion and demonstrated stronger in-group preferences (Goupil et al., 2021; Jankovic, 2014; Yang et al., 2020). Similarly, the left TPJ has been previously reported to be associated with decision-making and information exchange (Freitas et al., 2019; Tindale et al., 2019).”  p. 8

      “Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. Here, p-values were thresholded by controlling for FDR (p < 0.05; Benjamini & Hochberg, 1995). When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.” p. 12

      (3) Behavioral Analysis:

      For group identification, the analysis currently uses a dichotomous approach. Introducing a regression model to capture the degree of identification could offer more granular insights into how varying levels of group identification affect collective behavior and performance.

      Thank you for your suggestion. As suggested, we have conducted the regression model to examine how varying levels of group identification affect collective performance, with the score of group identification being the independent variable and collective performance as the dependent variable (pp.9 and 15).

      “Moreover, we employed a regression model to examine how varying levels of group identification affect collective performance, using group identification scores as the independent variable and collective performance as the dependent variable.”  p.9

      “The results from the regression model highlighted a significant association between the degree of group identification and collective performance (β \= 0.45, t = 4.56, p \= 0.019).”  p.15

      (4) Single Brain Activation Analysis:

      The application of the General Linear Model (GLM) is unclear, particularly given the long block durations and absence of multiple trials. Further explanation is needed on how the GLM was implemented under these conditions.

      Thank you for your suggestion, we have added more details in this section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set for different task stages (a. Reading information, b. Sharing private information, c. Discussing information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF) and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.’  p.11

      (5) Within-group neural Synchrony (GNS) Calculation:

      The method for calculating GNS could be improved by using mutual information instead of pairwise summation, as suggested by Xie et al. (2020) in their study on fMRI triadic hyperscanning. Additionally, the explanation of GNS calculation is inconsistent. At one point, it is mentioned that GNS was averaged across time and channels, while elsewhere, it is stated that channels with the highest GNS were selected. Clarification on this point is essential.

      We appreciate the reviewer for highlighting this inquiry. We utilized a conventional GNS calculation approach, as detailed in Line 296 of the manuscript, where the GNS was determined in pairs after the WTC computation, and then averaged. Further details regarding the second question have been provided in the article (p.12).

      (6) Placement of fNIRS Probes:

      The probes were only placed in the frontal regions, despite literature suggesting that the superior temporal sulcus (STS) and temporoparietal junction (TPJ) regions are crucial for triadic team performance. A justification for this choice or inclusion of these regions in future studies would be beneficial.

      The original manuscript clearly stated the use of two optode probe sets to encompass the prefrontal and left TPJ regions of each participant (see Figure S1, p. 8).

      (7) Interpretation of fNIRS Data:

      Given that fNIRS signals are slow, similar to BOLD signals in fMRI, the interpretation of Figure 6 raises concerns. It suggests that it takes several minutes (on the order of 4-5 minutes) for people to collaborate, which seems implausible. More context or re-evaluation of this interpretation is needed.

      The question you have pointed out is very pertinent, and we have added more explanation for this result (pp. 25-26).

      As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, the study of group consensus found that participants would show significant neural alignment after completing a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of tacit understanding between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time. Therefore, we believe that the 4-5 minutes of collaboration time shown in Figure 6 may be related to establishing consensus and the same preference of team members, which is reflected in the dynamic time change of neural synchronization.

      Moreover, previous studies on neural synchronization during social interaction and group decision-making revealed that substantial neural synchronization occurred around 50-55 seconds into a teaching task involving prior knowledge (Liu et al., 2019) and persisted approximately 6 minutes into the discussion period (Xie et al., 2023). These results collectively validate the suitability of utilizing fNIRS signal response time in our study (pp. 25-26).

      “Our study also has demonstrated significant increases in single-brain activation, DLPFC-OFC functional connectivity, and GNS at 7, 12, and 17 minutes, respectively, following task initiation. The significant increase in these neural activities together constructs the two-in-one neural model that explains how group identification influences the collective performance we proposed. As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, participants would exhibit significant neural alignment, but only after they had completed a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of cooperation efficiency between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time, which can affect the estimation of collaboration time. Prior research has shown that when the teaching task with prior knowledge began 50-55 seconds, significant neural synchronization could be generated between teacher and students, which meant that students and teacher achieved the same goal of learning knowledge (Liu et al., 2019). Moreover, a noteworthy increase in GNS was observed approximately 6 minutes into the group discussion period for better discussing and solving the problem (Xie et al., 2023). These findings are similar to ours. Therefore, the time points we found could reflect the dynamic time change of the neural process of team collaboration.’ pp.25-26

      Reviewer #2 (Public review):

      Weaknesses:

      The authors need to clearly articulate their hypothesis regarding why neural synchronization occurs during social interaction. For example, in line 284, it is stated that "It is plausible that neural synchronization is closely associated with group identification and collective performance...", but this is far from self-evident. Neural synchronization can occur even when people are merely watching a movie (Hasson et al., 2004), and movie-watchers are not engaged in collective behavior. There is no direct link between the IBS and collective behavior. The authors should explain why they believe inter-brain synchronization occurs in interactive settings and why they think it is related to collective behavior/performance.

      Thank you for bringing these points to our attention, we have clarified the relationship between neural synchronization and collective behavior in the Introduction section. (p.4). Moreover, in order to investigate whether neural synchronization stems from a common task or environment, we pseudo-randomized all pairs of subjects and created a null distribution consisting of 1,000 pseudo-groups, as described in Lines 311-315. This approach enabled us to eliminate neural synchronization resulting from factors other than social interaction, allowing us to identify neural patterns associated with collective performance (p.12).

      “Moreover, Ni et al. (2024) indicated that neural synchronization was linked to the strength of social-emotional communication and connections between individuals. An increase in neural synchronization has also been shown to predict the coordination and cooperation abilities of group members (Lu et al., 2023). Therefore, we hypothesize that neural synchronization may be related to group performance.” p.4

      “After that, the nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples. By pseudo-randomizing the data of all participants, a null distribution of 1000 pseudo-groups was generated (e.g., time series from member 1 in group 1 were grouped with member 2 in group 2 & member 3 in group 3). The GNS of 1,000 reshuffled pseudo-groups was computed, and the GNS of the real groups was assessed by comparing it with the values generated by 1000 reshuffled pseudo-groups.” p.12

      The authors state that "GNS in the OFC was a reliable neuromarker, indicating the influence of group identification on collective performance," but this claim is too strong. Please refer to Figure 4B. Do the authors really believe that collective performance can be predicted given the correlation with the large variance shown? There is a significant discrepancy between observing a correlation between two variables and asserting that one variable is a predictive biomarker for the other.

      Thank you for your suggestion, we have revised the relevant statement (p.18).

      “Through correlation and regression model analysis, we found that in group decision-making, the increase in group identity would affect group performance by improving GNS in the OFC brain region.”  p.18

      Why are the individual answers being analyzed as collective performance (See, L-184)? Although these are performances that emerge after the group discussion, they seem to be individual performances rather than collective ones. Typically, wouldn't the result of a consensus be considered a collective performance? The authors should clarify why the individual's answer is being treated as the measure of collective performance.

      We appreciate the insightful comment provided by the reviewer. The decision to utilize individual responses as a metric of overall performance is based on several key considerations. Previous studies on various hidden profile tasks have utilized averaged individual scores to represent collective performance (e.g., Stasser et al., 1995; Wittenbaum et al., 1996; Brockner et al., 2022). Secondly, while consensus outcomes are typically regarded as collective expressions, we argue that in the context of this study, individual responses are not independent entities but rather extensions of the group decision-making process. The collective deliberation process significantly influenced individual thinking and decision-making in this study. Through group discussions, members shared perspectives, adjusted their stances, and formulated their responses based on collective insights. The responses provided by participants in this study were molded by the dynamics of group conversations, serving as an indirect measure of group performance and potentially indicating the efficacy of collective deliberations.

      Performing SPM-based mapping followed by conducting a t-test on the channels within statistically significant regions constitutes double dipping, which is not an acceptable method (Kriegeskorte et al., 2011). This issue is evident in, for example, Figures 3A and 4A.

      Please refer to the following source: https://www.nature.com/articles/nn.2303

      We have carefully reviewed the articles provided by the reviewer, and we acknowledge the concerns regarding selective analysis and double dipping in our statistical approach. To address this, we believe it is important to clarify this issue further in the Discussion section (pp.26-27).

      Our study introduces a novel perspective while utilizing conventional fNIRS-based hyperscanning analyses (Liu et al., 2019; Pärnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), methods that are widely endorsed within the field. In our analysis, significant channels were first identified using a one-sample t-test, followed by additional analyses including ANOVA, independent samples t-tests, and other procedures. We would like to emphasize that the statistical assumptions underlying the one-sample t-test and paired-sample t-test in our study maintain a level of independence. Moreover, to further mitigate concerns about the potential for double dipping, we employed permutation testing to validate the robustness of our results and ensure that our findings are not influenced by biases inherent in the selection of significant regions.

      We recognize the importance of rigorous statistical practices and are committed to upholding the highest standards of analysis. As such, we have revisited our methodology and included a more detailed explanation of the steps taken to avoid double dipping and ensure the integrity of our analyses in the revised manuscript.

      “Although our study has found a new perspective, the analysis method still refers to and uses the traditional fNIR-based hyperscanning analyses (Liu et al., 2019; P¨arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), which is generally accepted by the majority of fNIR-based hyperscanning researchers. For example, we would first identify significant channels through a one-sample t-test and then conduct further analyses, such as ANOVA or independent samples t-tests. Selective analysis is a powerful tool and is perfectly justified whenever the results are statistically independent of the selection criterion under the null hypothesis (Kriegeskorte et al., 2019). However, it may lead to double dipping and missing information. In this study, the absence of statistically significant TPJ activation in the analyzed data led to the TPJ being ignored. In the future, it should be made explicit in the analysis, and the reliability of the results should be ensured by appropriate statistical methods (e.g., cross-validation, independent data sets, or techniques to control for selective bias).” p.26-27

      In several key analyses within this study (e.g., single-brain activation in the paragraph starting from L398, neural synchronization in the paragraph starting from L393), the TPJ is mentioned alongside the DLPFC. However, in subsequent detailed analyses, the TPJ is entirely ignored.

      We thank the reviewer for your careful review and valuable comment. TPJ is referenced in certain analyses within this paper (as detailed in paragraphs L414 and L440); however, its role remains inadequately investigated and expounded upon in subsequent more intricate analyses. This is due to the absence of statistically significant TPJ activation in the analyzed data. As pointed out by the reviewer, limitations may exist in pursuing further analyses through ROIs, a point we also have addressed in the Discussion section (p.27).

      The method for analyzing single-brain activation is unclear. Although it is mentioned that GLM (generalized linear model) was used, it is not specified what regressors were prepared, nor which regressor's β-values are reported as brain activity. Without this information, it is difficult to assess the validity of the reported results.

      We have revised the relevant description to clarify the analyses of single-brain activation (p. 11)

      While the model illustrated in Figure 7 seems to be interesting, for me, it seems not to be based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented.

      We regret the confusion that has arisen. Firstly, as highlighted by the reviewer, the model depicted in Figure 7 is not directly derived from the causal analysis conducted in this study. Our investigation did not directly explore the causal relationships among the three indicators; instead, we constructed a model based on correlations and potential mechanisms. In the revised manuscript, we have explicitly stated that Figure 7 represents a descriptive model (p.22).

      Regarding Figure 5D, the reviewer noted that while it may offer some explanatory value, it lacks the necessary analytical detail to elucidate the chart's significance clearly. We have clarified the details of the analysis in Figure 5 (pp.13-14). The model in Figure 5D suggested that the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity.

      “Finally, we employed correlation and mediation analyses to assess if brain activation connectivity could explain the connection between individuals’ single-brain activation and the related group’s GNS. We examined the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity. We utilized the PROCESS tool in SPSS to investigate the proposed moderation effect. Specifically, we applied Model 1 with 5000 bootstrap resamples to examine the interaction between the independent variable (i.e., single-brain activation) and the moderator (i.e., brain activation connectivity) in predicting the dependent variable (i.e., GNS). It is noteworthy that prior to analysis, all variables in the moderation model were mean-centered to reduce multicollinearity and improve the interpretability of interaction terms.”  p.13-14

      “Building on the above results, we have developed a two-in-one neural model that explains how group identification influences collective performance. This descriptive model aims to illustrate the potential interrelationships among these indicators and establish a conceptual framework to inspire forthcoming research endeavors.”  p.21

      The details of the experiment are not described at all. While I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study.

      As suggested, we have clarified the details of the experiment in the manuscript.

      (1) As stated in the public review, the details of the experiment are not described at all and while I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study. In points a-e below, I list the aspects that I could not fully understand, but I am not asking for direct answers to these points. Instead, please provide a detailed description of the experiment so that it can be replicated.

      Thank you for your suggestion; we have responded to each question sequentially and elaborated on the experiment specifics to ensure replicability.

      (a) Please provide more detailed information about the Group Identification Task. How much did each participant speak (was there any asymmetry in the amount of speaking, and was there any possibility that the asymmetry influenced the identification rating)? Did the three participants interact in person, or online? Are they isolated from experimenters? How was the rating conducted, what I mean is that it's a PC-based rating?

      We apologize for the lack of detail in our description of the procedures for the experiment.

      For the first question, we draw upon previous studies concerning the manipulation of group identity while controlling the content of pre-task conversations. Specifically, the high-identity group engaged in self-introductions and identified similarities among the three members, whereas the low-identity group discussed topics related to the current semester's classes (Xie et al., 2023; Yang et al., 2020). Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable. There was almost no asymmetry in the amount of speaking. We also conducted a manipulation check, which confirmed the effectiveness of our identity manipulation(pp.5-6).

      Xie, E., Li, K., Gu, R., Zhang, D., & Li, X. (2023). Verbal information exchange enhances collective performance through increasing group identification. NeuroImage, 279, 120339.

      Yang, J., Zhang, H., Ni, J., De Dreu, C. K., & Ma, Y. (2020). Within-group synchronization in the prefrontal cortex associates with intergroup conflict. Nature neuroscience, 23(6), 754-760.

      “Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable.”  p.5-6

      For the second question,the three participants interacted offline in a face-to-face setting, while the experimenter remained outside the laboratory (p.6).

      “The three participants conducted face-to-face offline interaction throughout the manipulation process.” p.6

      For the third question, at the beginning of the experimental task, participants were isolated from the experimenters (p.6).

      “In addition to explaining the next phase of the task and controlling the timer, experimenters would be isolated from participants.” p.6

      For the last question, the rating of group identification was conducted through a questionnaire presented on participants’ phones (p.6).

      “The questionnaire was presented on participants’ phones.” p.6

      (b) The procedures of the Main Task are also unclear. For the Reading Information (5 min): How was the information presented? PC-based or paper-based? How were the participants seated? Did they read it independently?

      We apologize for the missing details. We have included the following information in the article.

      For the first and last question, each participant would get a piece of paper, which presents the common information and private information. They read independently. (p.6)

      “Each participant would get a piece of paper, which presented the information. Participants could read independently.” p.6

      About how the participants sat, the three participants sat around a table without partitions between each other. Only in the discussion stage, they could communicate face-to-face (p.6).

      “They sat around a table without partitions between each other.” p.6

      “In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (c) For Sharing Private Information: The authors stated they share text messages using Tencent Meeting. If so, how and with what devices? How was the information displayed on the screen? Were the participants even in the same room?

      Thank you for your reminder. We have added more details now (p.6). Firstly, the experimenter sent the Tencent Meeting link to the participants. After the participants entered the meeting through their mobile phones, they could text the information they wanted to share in the chat box of the meeting. They were in the same room, with Tencent Meeting recording shared information, the participants could view them at any time.

      “During the group sharing, participants entered Tencent Meeting via their mobile phones and were able to text their private information in the chat box to their group members for 5 minutes.” p.6

      (d) For Discussing Information: It's a verbal interaction. How did they interact with others? What is the distance between them? I found a very small picture in Figure 8, but that is all information about experiment settings, that is provided by the authors.

      We are sorry about the missing details. As we have explained in the article it’s a verbal communication, so participants could talk face to face in one room. We have included the following information in the article (p.6).

      “Participants were sitting and communicating around a table. The distance between adjacent participants was about 15 cm, and the distance between face-to-face participants was about 40 cm. In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (e) For the Decision Process (5 min): How did they answer (What I mean is verbally, writing, or computer-based input), and how did the experimenters record these answers?

      The questions were presented on paper, so the participants could write down their answers and experimenters could count the answers on paper. We have included the following information in the article(p.7).

      “After discussion, all triads were given 5 minutes to answer the following questions (i) the probability of three suspects, 0%-100% for each suspect; (ii) the motivation and tool of crime; and (iii) deduced the entire process of crime. The three questions were presented on paper, allowing participants to write their answers directly on the same sheet. Subsequently, three independent raters used these paper questionnaires to record and calculate the scores for each group.” p.7

      (2) I find the model presented in Figure 7 to be intriguing. Understanding why inter-brain synchronization occurs and how it is supported by specific single-brain activations or intra-brain functional connectivity is indeed a critical area for researchers conducting hyperscanning studies to explore. However, the content depicted in this model is not based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented. Please include a detailed explanation.

      The specific answers are available on page 5 of our response letter.

      (3) The analysis of single-brain activation analysis (and probably other analyses) focuses on the period from reading to making decisions (L237). Why was this entire interval chosen for analysis? Reading does not involve social interaction. As mentioned in a previous comment, the details of the tasks are unclear, so it's difficult to understand what was actually done in the reading period. Anyway, why were these different phases combined as the focus of analysis? Please clarify the reasoning behind this choice.

      Thank you for your feedback. The decision to analyze the entire interval, spanning from reading to decision-making, was primarily made to grasp the continuum of information processing comprehensively. While reading itself lacks social interaction, it serves as the foundation for subsequent decision-making, during which participants' cognitive states and affective responses gradually evolve. Therefore, examining these two phases collectively enables a more thorough investigation into how information influences decision-making. Furthermore, considering the task details remain ambiguous, we aim to uncover the underlying cognitive and affective mechanisms through a holistic analysis.

      (4) The method for analyzing single-brain activation is unclear. Please provide a detailed description of the analysis methods.

      Thank you for your suggestion, we have added more details in the Method section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set to different task stages (a. Reading information, b. Sharing private information, c. Discussion information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF), and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.”  p.11

      (5) In the periods of Reading Information and Sharing Private Information, there appears to be no social interaction between participants (Figure1D). However, Figure 6 shows an increase in brain activity correlation even during the first 10 minutes (it corresponds to the Reading and Sharing period). Why does inter-brain correlation (GNS, in this study) increase even though there is no interaction between participants? Please provide an explanation.

      Sharing private information fosters interactive engagement, necessitating its exchange during Tencent Meetings to facilitate sharing. Previous research suggests that heightened correlations in brain activity can be attributed to (1) intrinsic cognitive processes, wherein participants display similar cognitive and emotional responses, fostering shared cognitive processing and brain activity synchronization despite limited external interaction; (2) emotional connections, as divulging private information elicits emotional responses that can be neurally correlated among individuals; and (3) environmental influences, where shared environments and contexts prompt neural interaction among participants even in the absence of direct social engagement. These factors collectively contribute to increased brain activity correlations without active interaction. Our primary focus, however, lies in the phase characterized by significant synchronized brain activity.

      Minor Comments:

      (6) Equation 1 Explanation: There is no explanation of Equation 1. It mentions Yi as the collective score, but what constitutes the collective score Yi is not defined in the manuscript. Additionally, while "i" is referred to as an item (in Line 196), the meaning of "item" is not clear. Therefore, the meaning of this equation is not understood.

      We apologize for this confusion. We have added a description in the manuscript (p.9).

      “In Eq.1, x is the individual score, y is the collective score (y is calculated from the three per capita scores), and i stands for the group number for the item. So, x_i means the individual score of participants in the _i group, and y_i means the collective score of the _i group. _d (x, y) r_epresents the distance from the individual to the collective score.”  p.9

      (7) Equation 2 Explanation: There is no explanation for Equation 2. Please provide descriptions for all variables such as S, t, and w.

      We have clearly stated the meaning of s, t, and w in the first edition of the manuscript article (p.12).

      As shown in L291-293: Here, t denotes the time, s denotes the wavelet scale, 〈⋅〉 represents a smoothing operation in time, and W is the continuous wavelet transform (Grinsted, Moore, & Jevrejeva, 2004).

      (8) Acronyms: Please define all acronyms upon their first appearance (e.g., CFI, TLI, RMSEA in L380).

      We apologize for these mistakes, and we have added full explanations for abbreviations upon their first use (p.16).

      “The mediation model demonstrated a satisfactory fit (CFI = 0.93, TLI = 0.93, RMSEA = 0.04) (CFI-Comparative Fit Index; TLI-Tucker-Lewis index; RMSEA-Root-Mean-Square Error of Approximation), suggesting that the perceived group identification of each individual affected the alterations in single-brain activations in the DLPFC, consequently leading to variations in their performance (β<sub>a</sub> = 0.16, t = 2.20, p = 0.030; β<sub>b</sub> = 0.26, t = 3.56, p < 0.001; β<sub>c</sub> = 0.18, t = 2.34, p = 0.020) (Figure 3C).”  p.16

      (9) Hyperscanning fMRI Studies: Since there are hyperscanning fMRI studies analyzing communication among three people (e.g., Xie et al., 2020, PNAS), it would be beneficial to cite this research. pnas.org/doi/pdf/10.1073/pnas.1917407117.

      As suggested, we have cited this paper. (p.4)

      (10) Line 272; Line 275: Should these references be to Benjamini & Hochberg (1995)?

      As suggested, we have revised our citation.

      (11) Research Objectives: The authors' aim seems to be understanding the relationship between Group Identification Level (High or Low), collective performance, and inter-brain synchronization (GNS). If so, shouldn't the results shown in Figure 6 illustrate how these differ between High and Low groups?

      We are grateful to the reviewer for your insightful comment. This study aimed to investigate the impact of group identity levels on collective performance and interbrain synchronization. Our analysis primarily focused on inter-group disparities to elucidate the potential influence of varying levels of group identification on collective behavior and neural synchrony, as highlighted by the reviewer. It is important to note that the relationship between group identification levels and collective performance, as well as neural synchronization, may represent a continuous or correlational process, rather than a binary comparison between two distinct groups. Notably, we treated group identification as a continuous variable and, consequently, Figure 6 was designed to illustrate trends in the association between group identification levels and both collective performance and neural synchronization, without conducting significance tests between groups. We are confident that the depiction in Figure 6 effectively captures the evolving dynamics between group identification levels and both collective performance and neural synchronization.

      (12) Figure 6 Star-Marker: What is the star marker shown in Figure 6? Please provide an explanation.

      We apologize for this confusion. We have added this explanation to the article. (p.21)

      “The red star sign indicates that at this time point, the neural signal began to increase significantly.” p.21

      (13) Pearson's Correlation: Use "Pearson's correlation" instead of "Pearson correlation."

      Thanks for your comments, we've changed Pearson correlation to Pearson's Correlation for a total of 10 places in the original text (pp. 9,11,13, 15,16, 19,23).

      “Moreover, the Pearson’s correlation was used to examine the relationship between group identification_2 and collective performance.” p.9

      “Subsequently, we used Pearson’s correlation analyses to investigate the relationship between single-brain activation and individual performance.” p.11

      “Second, the Pearson’s correlation between GNS and collective performance was performed.” p.13

      “Following that, we analyzed Pearson’s correlations between the original HbO data in the region related to individual and collective performance, denoted as brain activation connectivity (Lu et al., 2010).” p.13

      “Subsequently, the Pearson’s correlation between the quality of information exchange and collective performance was assessed.” p.15

      “Furthermore, the results of the Pearson’s correlation indicated that groups with higher group identification were more likely to exhibit better collective performance (r \= 0.38, p \= 0.003) (Figure 2B).” p.15

      “The Pearson’s correlation and its associated analyses were based on the data from group identification_2. *p < 0.05.” p.16

      “We first extracted the HbO brain activities related to individual performance (e.g., DLPFC, CH4) and collective performance (e.g., OFC, CH21) of each group member and conducted a Pearson’s correlation between the two.” p.19

      “Subsequently, Pearson’s correlation was used to test whether individual differences in the similarity in individual-collective performance were reflected by DLPFC-OFC connectivity.” p.19

      “Pearson’s correlation showed that the higher quality of information exchange, the better collective performance (r \= 0.36, p \= 0.007) (Figure 8C).” p.23

      (14) MNI Coordinates: The MNI coordinates for each channel are listed in the supporting information. How were these coordinates measured? Were they consistent for all participants? Was MRI conducted for each participant to obtain these coordinates?

      Thank you for your reminder, we have included the necessary instructions in the revised version. First, we need to clarify that we referred to previous literature to determine the placement of the optical probe plates. Following the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates. These coordinates were basically consistent for each participant. (p.8)

      “For each participant, one 3 × 5 optode probe set (8 emitters and 7 detectors forming 22 measurement points with 3 cm optode separation, see Table S1 for detailed MNI coordinates) was placed over the prefrontal cortex (reference optode is placed at Fpz, following the international 10-20 system for positioning). The other 2 × 4 probe set (4 emitters and 4 detectors forming 10 measurement points with 3 cm optode separation, see Table S2 for detailed MNI coordinates) was placed over the left TPJ (reference optode is placed at T3, following the international 10-20 system for positioning). The probe sets were examined and adjusted to ensure consistency of the positions across the participants. After the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates.”  p.8

    1. eLife Assessment

      This study unveils important data describing cell states of olfactory ensheathing cells, and how these cell states may relate to repair after spinal cord injury. The framework used for characterizing these cells is solid. This work will be of interest to stem cell biologists and spinal cord injury researchers.

    2. Reviewer #1 (Public review):

      The goal of this study was to identify the phenotype of olfactory ensheathing cells (OECs) that have been associated with neural tissue repair, and investigate the properties of these cells that can be used to identify them. OECs modify inhibitory glial scar formation, enabling axon regeneration past the scar border and into the lesion center. Single-cell RNA sequencing revealed diverse subtypes of OECs expressing novel marker genes associated with progenitor, axonal regeneration, repair, and microglia-like functions, suggesting their potential roles in wound healing, injury repair, and axonal regeneration. Additionally, the study identified secreted molecules such as Reelin and Connective tissue growth factor, which are important for neural repair and axonal outgrowth, further supporting the multifunctional nature of OECs in facilitating spinal cord injury recovery. This is an extremely well written and impactful series of experiments from a renowned leader in the field. The experimental questions are timely, with similar therapeutic approaches being prepared for clinical trial. The results address a gap that has persisted in the field for several decades, and one that has asked by many scientists long before technology existed to find answers. This highlights the importance of these experiments and the results reported here. The authors have also included a thoughtful discussion that highlights the importance of their data in the context of prior research. They have carefully interpreted their results and also indicate where additional studies in future work will continue to expand our knowledge of these important cells and their potential use for neural repair.

    3. Reviewer #2 (Public review):

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      Strengths

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Concerns about quantification raised during the review were suitably addressed by the authors.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Reviews:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      Strengths

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Challenges

      Despite the breadth of information presented, and although many of the suggestions in the initial review were addressed well, some points related to quantification and discussion of sex differences are not fully addressed in this revision.

      (1) The request for quantification of OEC bridges is not fully addressed. We note that this revision includes the following statement (page 6): "We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals." However, the title of the paper states that olfactory ensheathing cells promote neural repair and the abstract states that "OECs transplanted near the injury site modify the inhibitory glial scar and facilitate axon regeneration past the scar border and into the lesion." Statements such as these make it more crucial to include quantification of OEC bridges, because if single images are shown of remarkable, unusual bridges, but only one sentence acknowledges the low frequency of this occurrence, then this information taken together might present the wrong takeaway to readers.

      Including some sort of quantification of bridging, whether it be the number of rats exhibiting bridges, the percentage area of OECs near a lesion site, or some other meaningful analysis, would add rigor and clarity to the manuscript.

      The short answer to the OEC bridges quantification is that in our last 2 studies combined, we observed bridges in 3/13 OB-OEC-transplanted rats versus 0/16 control rats (p=0.042 by two-sample proportion test; Thornton et al., 2018, Dixie, 2019). In addition to the new data on bridge formation shown in the current manuscript, our previous and most impressive data of serotonergic axons (5-HT-labeled, red) that crossed the entire lesion site is shown below (from Thornton et al., 2018). The image together with Supplemental video 1 (https://ars.els-cdn.com/content/image/1-s2.0-S0014488618302632-mmc1.mp4) show a reconstruction of multiple sections containing serotonergic axons that bridge the injury site in one OEC-transplanted, completely transected rat (1/5 OEC vs. 0/5 fibroblast-transplanted rat). The video also shows retrogradely-labeled Pseudo-rabies virus taken up by a few scattered neurons (green dots) within and above the lesion site, additional evidence suggesting axonal regeneration.

      In addition to adding bridge quantification in the Results section, we now discuss quantified results on physiological and anatomical evidence of axon regeneration across the injury site from five of the six large spinal cord injury (SCI) studies conducted by the Phelps and Edgerton laboratories. Our studies used the most difficult SCI model, a complete, thoracic spinal cord transection in adult rats, followed by OB-OEC transplantation. This is the only model in which axon regeneration can be differentiated from axon sparing found in incomplete SCIs. An introductory paragraph now summarizes and references data generated from these studies that specifically addresses questions about how OECs modify the injury site and facilitate axonal outgrowth into and across into the lesion core. While relatively few axons cross the entire injury site to reach the caudal spinal cord, many more axons project into the injury site of OEC-transplanted rats compared to those in control rats. Quantification of axonal outgrowth into the lesion site of completely transected, OEC-transplanted rats from three previous long-term studies is now discussed in the Introduction. Based on both physiological and anatomical evidence reviewed from our previous work, we hope the editors and Reviewer agree that our previous studies have shown that OECs promote axonal outgrowth and modify the injury site.

      Page 5, Introduction:

      “Together with collaborators, we conducted six spinal cord injury studies in adult rats with a completely transected, thoracic spinal cord model followed by OB-OEC transplantation (Kubasak et al., 2008; Takeoka et al., 2011; Ziegler et al., 2011; Khankan et al., 2016; Thornton et al., 2018; Dixie, 2019). Results from five of our six studies showed physiological and anatomical evidence of axonal regeneration into and occasionally across the injury site. In 6-8-month-long studies, Takeoka et al. (2011) and Ziegler et al. (2011) reported physiological evidence of motor connectivity across the transection in OEC- but not media-transplanted rats. These experiments used transcranial electric stimulation of the motor cortex or brainstem to detect motor-evoked potentials (MEPs) with EMG electrodes in hindlimb muscles at 4- and 7-months post-transection. After 7 months, 70% of OEC-treated rats responded to stimulation with hindlimb MEPs (motor cortex, 5/20; brainstem 12/20; Takeoka et al, 2011). A complete re-transection above the original transection was carried out one month later and all MEPs in OEC-injected rats were eliminated. These results provide physiological evidence of axon conductivity across the injury site in OEC-treated rats. Additionally, three of our long-term studies evaluated anatomical axonal outgrowth of the descending serotonergic Raphespinal pathway into and through the injury site. Significantly more serotonergic-labeled axons crossed the rostral inhibitory scar border (Takeoka et al., 2011) or occupied a larger area within the injury site core (Thornton et al., 2018, Dixie, 2019) in OEC-transplanted rats than in fibroblast or media controls. In addition, significantly more neurofilament-labeled axons were found within the lesion core of OEC-transplanted versus control rats (Thornton et al., 2018, Dixie, 2019).”

      Page 7, Results: We revised the sentence below and added additional information.

      “We note, however, that such bridge formation is rare following severe spinal cord injury in adult mammals and was detected in 2 out of 8 OEC-transplanted rats and 0/11 media or fibroblast-transplanted controls in this study (Dixie, 2019). Combined with the 1/5 OEC-transplanted rats with axons crossing the injury and 0/5 fibroblast controls in our previous study (Thornton, 2018), we observed bridges in 3/13 OEC-transplanted rats vs 0/16 controls (p=0.042, two-sample proportion test). Bridge formation, in conjunction with the additional physiological and anatomical evidence of axonal connections across the injury site presented in our previous studies, strongly supports the capacity of OECs in neural repair.”

      Page 46, Figure legend 1: We added statistical data to the legend

      “Bridge formation across the injury site was observed in 2 of 8 OEC-transplanted and 0 of 11fibroblast- or media-transplanted spinal cord transected rats. Combined with the 1/5 OEC-transplanted rats with axons crossing the injury and 0/5 fibroblast controls in our previous study (Thornton, 2018), we observed bridges in 3/13 OEC-transplanted rats vs 0/16 controls (p=0.042, two-sample proportion test).”

      (2) The additional discussion of sex differences in OEC bridging elaborates on the choice to study female rats, citing bladder challenges in male rats, but does not note salient clinical implications of this choice. Men account for ~80% of spinal cord injuries and likely also have worsened urinary tract issues, so it would be important to acknowledge this clinical fact and consider including males in future studies.

      Response: We agree that studying SCI repair in male rodents is very important as most people with these injuries are male. We did find one publication by Walker et al. (2019, Journal of Neurotrauma 36:1974-1984) that looked at sex differences in aged-matched male and female rats after a moderate contusion SCI. They examined a number of histological and functional features, and did not find many differences between the genders. Compared to studies of moderate SCI, studies using a completely transected spinal cord model must carry out manual bladder expressions a minimum of twice a day throughout the entire 5 to 7-month study in order to maintain kidney health. Because male urethras are much longer than those of females, males are much more likely that females to die from kidney disease during a complicated, long-term studies such as ours. Fortunately, most SCIs in humans are contusions rather than complete transections so an incomplete contusion model is most appropriate for studying sex differences. We modified the previous statement in our Discussion section as below.

      Page 25, Discussion

      “We acknowledge that in humans, males account for ~80% of spinal cord injuries (National Spinal Cord Injury Statistical Center, 2024) and sustain more serious urinary tract issues than females. We examined females in the current study due to practical experimental considerations, but it is necessary to examine males in future studies.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It is strongly recommended that some sort of quantification of bridging be included in the figures or in a table, whether this is the number of rats showing bridges, the percent area of OECs near the lesion site, or some other meaningful analysis.

      As discussed in the response in Challenge section (1) above, we observed bridges in 3/13 OEC transplanted rats vs 0/16 controls across our two most recent studies. In addition, we added evidence of physiological and anatomical axonal connections across the injury site from our previous studies. We have added the additional information in the Introduction, Results, and Figure legend 1.

      (2) It is recommended that clinical sex differences in spinal cord injury (with ~80% occurring in men) be acknowledged in the Discussion. This clinical fact could be directly mentioned without much justification.

      See Challenge (2) above and addition to the Discussion on page 25.

      (3) Figs. 1, 5, 6: There is still no quantification included for these figures, which detracts from the ability of readers to understand the context and importance of these results. It is recommended to include quantification for these figures.

      Response regarding quantification associated with Figures 1, 5 and 6:

      Regarding Figure 1: We have discussed the additions to the text of the Introduction, Results and the legend of Figure 1 in detail on pages 2-3 of this response. These are important new additions to our paper.

      Regarding Figure 5: We added quantitative information regarding the analysis of Connective Tissue Growth Factor (Ctgf) expression in the injury site.

      Page 10-11, Results:

      “We found high levels of Ctgf expression in GFP-OECs (n=4 rats) that bridged much of the injury site and also detected Ctgf on near-by cells (Figure 5d, d1-2). GFP-labeled fibroblast transplantations (n=3 rats) served as controls and also expressed Ctgf.”

      Page 36, Methods:

      “To examine Ctgf expression in the spinal cord lesion site, we processed 1 slide per animal with ~6 equally-spaced sagittal sections throughout spinal cord from the Khankan et al. (2016) study. Our aim was to assess if transplanted OECs (n=4 rats) and transplanted fibroblasts (n=3 rats) express CTGF in the injury site.”

      Regarding Figure 6: The statistics for Figure 6 are found on page 13 of the Results section and page 38 of the Methods section. We now added the statistics to the Figure 6 legend on page 49.

      Page 13, Results:

      “To determine if the proliferative OECs differ in appearance from adult OECs, and whether there is concordance between our OEC subtypes based on gene expression markers and previously described morphology-based OEC subtyping (Franceschini & Barnett, 1996), we analyzed OECs identified with the anti-Ki67 nuclear marker and anti- Ngfr<sup>p75</sup> (Figure 6g-h). Of the Ki67-positive OECs in our cultures, 24% ± 8% were strongly Ngfr<sup>p75</sup>-positive and spindle-shaped, whereas 76% ± 8% were flat and weakly Ngfr<sup>p75</sup>-labeled (n=4 cultures, p\= 0.023). Here we show that a large percentage (~3/4<sup>ths</sup>) of proliferative OECs are characterized by large, flat morphology and weak Ngfr<sup>p75</sup> expression resembling the previously described morphology-based astrocyte-like subtype. Our results indicate the two types of OEC classifications share certain degrees of overlap, indicating similarities but also differences between the two classification methods.”

      Page 38, Methods: Morphological analyses of Ki67 OEC subtypes

      “To determine if OEC progenitor cells marked with Ki67 immunoreactivity have a distinctive morphology, purified and fixed OEC cultures from 4 rats were processed with anti- Ngfr<sup>p75</sup>, anti-Ki67 and counterstained with Hoechst (Bis-benzimide, 1:500, Sigma-Aldrich, #B2261). Images were acquired from 7-10 randomly selected fields/sample using an Olympus AX70 microscope and Zen image processing and analysis software (Carl Zeiss). We distinguished the larger, flat ‘astrocyte-like’ OECs from the smaller, fusiform ‘Schwann cell-like’ OECs, and recorded their expression of Ngfr<sup>p75</sup> and Ki67. Cell counts from each field were averaged per rat and then averaged into a group mean ± SEM. A Student t-test was conducted to compare the effect of Ngfr<sup>p75</sup>-labeled cell morphology and the proliferative marker Ki67. Statistical significance was determined by p < 0.05.”

      Page 49, Figure 6 legend:

      “Of the OEC progenitors that express Ki67, 76% ± 8 of them display low levels of Ngfr<sup>p75</sup> immunoreactivity and a “flat” morphology (g2, h2; green nuclei, arrowheads). The remainder of Ki67-expressing OECs express high levels of Ngfr<sup>p75</sup> and are fusiform in shape (24% ± 8%, n=4 cultures, Student-t test, p= 0.023).”

      (4) Fig. 9: Quantification is still not included in the figure for these Western blots, although it is appreciated that the authors included some quantification in their response letter. Including this in the figure would provide clarification for the reader.

      Thank you for your suggestion. We now add the quantification to figure 9, together with the methods used for western blot quantification and the figure legend.

      Page 32, Methods:

      “For quantification, ImageJ software (NIH) was used to analyze the densitometric data. Western blot images at 400, 300, and 150 kDa resolution were converted to grayscale followed by manually defining a Region of Interest (ROI) frame that captured the entire band in each lane using the "Rectangular" tool. The area of each selected band was measured by employing the same ROI frame around the band to record the integrated density, “Grey Mean Value”. Background measurements were similarly quantified, and background subtraction was performed by deducting the inverted background from the inverted band value. For relative quantification, target protein bands were normalized to the corresponding loading control (GAPDH) to derive normalized protein expression (fold change). Band intensities were quantified in triplicate for each sample. Data were analyzed with the Mann-Whitney U test to compare normalized protein expression between the Reln<sup>-/-</sup> group and the other groups. A one-sided p-value was calculated to test the hypothesis that protein expression levels in the other groups are greater than those in the Reln<sup>-/-</sup> group (negative control). Statistical significance was determined at p < 0.05. Analysis was performed using GraphPad Prism (version 9).”

      Page 52, Figure legend 9f:

      “(f) Quantitation of multiple isoforms of Reelin from 4-15% gradient gels. Positive and negative controls are Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse cortices. Both rat tissue from the ONL (n=3) and CM (n=9) contain more 400 and 300 kDa Reelin compared to the Reln<sup>-/-</sup> mouse. Bars represent the standard deviation of the mean. One-sided Mann-Whitney U test was used to test that protein expression levels in the other groups are greater than those in the Reln<sup>-/-</sup> group, indicative of significant expression of Reln in the test groups. *p < 0.05.”

    1. eLife Assessment

      This study reports important findings about pre-saccadic foveal prediction and the extent to which it is influenced by the visibility of the saccade target relative to its background. The research methodology and results make a convincing case that foveal congruency effects develop when salient local contrast variations at the saccade target location can be used to direct the eye movement. This work should be of broad interest to visual neuroscientists, as well as those interested in understanding perception in the context of eye movements and in modeling visually guided actions.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides new insights on the phenomenon of pre-saccadic foveal prediction previously reported by the same authors. In particular, this study examines to what extent this phenomenon varies based on the visibility of the saccade target. Visibility is defined as the contrast level of the target with respect to the noise background, and it is related to the signal-to-noise ratio of the target. A more visible target facilitates the oculomotor behavior planning and execution, however, as speculated by the authors, it can also benefit foveal prediction even if the foveal stimulus visibility is maintained constant. Remarkably, the authors show that presenting a highly visible saccade target is beneficial for foveal vision as detection of stimuli with an orientation similar to that of the saccade target is improved, the lower is the saccade target visibility, the less prominent is this effect. The results are convincing and the research methodology is technically sound.

      Comments on revisions:

      The authors addressed all the concerns raised in the previous rounds of reviews.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors ran a dual task. Subjects monitored a peripheral location for a target onset (to generate a saccade to), and they also monitored a foveal location for a foveal probe. The foveal probe could be congruent or incongruent with the orientation of the peripheral target. In this study, the authors manipulated the conspicuity of the peripheral target, and they saw changes in performance in the foveal task.

      Comments on revisions:

      The authors have addressed all comments. Thanks.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This study examines to what extent this phenomenon varies based on the visibility of the saccade target. Visibility is defined as the contrast level of the target with respect to the noise background, and it is related to the signal-to-noise ratio of the target. A more visible target facilitates the oculomotor behavior planning and execution, however, as speculated by the authors, it can also benefit foveal prediction even if the foveal stimulus visibility is maintained constant. Remarkably, the authors show that presenting a highly visible saccade target is beneficial for foveal vision as detection of stimuli with an orientation similar to that of the saccade target is improved, the lower is the saccade target visibility, the less prominent is this effect.

      Strengths:

      The results are convincing and the research methodology is technically sound.

      Weaknesses:

      It is still unclear why the pre-saccadic enhancement would oscillate for targets with higher opacity levels, and what would be the benefit of this oscillatory pattern. The authors do not speculate too much on this and loosely relate it to feedback processes, which are characterized by neural oscillations in a similar range.

      We thank the reviewer for their assessment. We intentionally decided to describe the oscillatory pattern without claiming to be able to pinpoint its origin. The finding was incidental and, based on psychophysical data alone, we would not feel comfortable doing anything but loosely relating it to potential mechanisms on an explicitly speculative basis. In the potential explanation we provide in the manuscript, the oscillatory pattern would likely not serve a benefit–rather, it would constitute an innate consequence and, thus, a coincidental perceptual signature of potential feedback processes.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors ran a dual task. Subjects monitored a peripheral location for a target onset (to generate a saccade to), and they also monitored a foveal location for a foveal probe. The foveal probe could be congruent or incongruent with the orientation of the peripheral target. In this study, the authors manipulated the conspicuity of the peripheral target, and they saw changes in performance in the foveal task. However, the changes were somewhat counterintuitive.

      We regret that our findings remain counterintuitive to the reviewer even after our extensive explanations in the previous revision round and the corresponding changes in the manuscript. We repeat that both the decrease in foveal Hit Rates and the increase in foveal enhancement with increasing target contrast were expected and preregistered prior to data collection.

      Strengths:

      The authors use solid analysis methods and careful experimental design.

      Comments on revisions:

      The authors have addressed my previous comments.

      One minor thing is that I am confused by their assertion that there was no smoothing in the manuscript (other than the newly added time course analysis). Figure 3A and Figure 6 seem to have smoothing to me.

      When the reviewer suggested that the “data appear too excessively smoothed” in the first revision, we assumed that they were referring to pre-saccadic foveal Hit and False Alarm rates, not to fitted distributions. As we state in the legend of Figure 3A (as well as in Figures 6 and S1), the “smoothed” curves constitute the probability density distributions of our raw data. Concerning the energy maps resulting from reverse correlation analyses, we described our proceeding in detail in our initial article (Kroell & Rolfs, 2022): 

      “Using this method, we obtained filter responses for 260 SF*ori combinations per noise image (Figure 6 in Materials and methods, ‘Stimulus analysis’). SFs ranged from 0.33 to 1.39 cpd (in 20 equal increments). Orientations ranged from –90–90° (in 13 equal increments). To normalize the resulting energy maps, we z-transformed filter responses using the mean and standard deviation of filter responses from the set of images presented in a certain session. To obtain more fine-grained maps, we applied 2D linear interpolations by iteratively halving the interval between adjacent values 4 times in each dimension. To facilitate interpretability, we flipped the energy maps of trials in which the target was oriented to the left. In all analyses and plots,+45° thus corresponds to the target’s orientation while –45° corresponds to the other potential probe orientation. Filter responses for all response types are provided at https://osf.io/v9gsq/.”

      We have added a pointer to this explanation to the current manuscript (see line 836).

      Another minor comment is related to the comment of Reviewer 1 about oscillations. Another possible reason for what looks like oscillations is saccadic inhibition. when the foveal probe appears, it can reset the saccade generation process. when aligned to saccade onset, this appears like a characteristic change in different parameters that is time-locked to saccade onset (about a 100 ms earlier). So, maybe the apparent oscillation is a manifestation of such resetting and it's not really an oscillation. so, I agree with Reviewer 1 about removing the oscillation sentence from the abstract.

      While we understand that a visible probe will result in saccadic inhibition (White & Rolfs, 2016), we are unsure how a resetting of the saccade generation process should manifest in increased perceptual enhancement of a specific, peripheral target orientation in the presaccadic fovea. Moreover, as we describe in our initial article (Kroell & Rolfs, 2022), we updated the background noise image every 50 ms and embedded our probe stimulus into the surrounding noise using smooth orientation filters and raised cosine masks to avoid a disruptive influence of probe appearance on movement planning and execution (Hanning, Deubel, & Szinte, 2019). And indeed, we demonstrated that the appearance of the foveal probe did not disrupt saccade preparation, that is, did not increase saccade latencies compared to ‘probe absent’ trials in which no foveal probe was presented (see Kroell & Rolfs, 2022; sections “Parameters of included saccades in Experiment 1” and “Parameters of included saccades in Experiment 2”). In the current submission, saccade latencies in ‘probe present’ trials exceeded saccade latencies in ‘probe absent’ trials by a mere 4.7±2.3 ms. Additionally, to inspect the variation of saccade execution frequency directly, we aligned the number of saccade generation instances to the onset of the foveal probe stimulus (see Author response image 1). In line with what we described in a previous paradigm employing flickering bandpass filtered noise patches (Kroell & Rolfs, 2021; 10.1016/j.cortex.2021.02.021), we observed a regular variation in saccade execution frequency that reflected the duration of an individual background noise image (50 ms in this investigation). In other words, the repeated dips in saccadic frequency are likely caused by the flickering background noise and not the onset of the foveal probe which would produce a single dip ~100 ms after probe onset. Given these results, we do not see a straight-forward explanation for how the variation of saccade execution frequency in 20 Hz intervals would boost peripheral-to-foveal feature prediction before the saccade in ~10 Hz intervals. Nonetheless, we removed the sentence referencing oscillations from the Abstract.

      Author response image 1.

       

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, The authors did a good job in addressing the points I raised. Two new sections were added to the manuscript, one to address how the mechanisms of foveal predictions would play out in natural viewing conditions, and another one examining more in depth the potential neural mechanisms implicated in foveal predictions. I found these two sections to be quite speculative, and at points, a bit convoluted but could help the reader get the bigger picture. I still do not have a clear sense of why the pre-saccadic enhancement would oscillate for targets with higher opacity levels, and what would be the benefit of this oscillatory pattern. The authors do not speculate too much on this and loosely relate it to feedback processes, which are characterized by neural oscillations in a similar range.  

      Please see our response to ‘Weaknesses’.

      I still find this a loose connection and would suggest removing the following phrase from the abstract "Interestingly, the temporal frequency of these oscillations corresponded to the frequency range typically associated with neural feedback signaling". 

      We have removed this phrase.

      Finally, the authors should specify how much of this oscillation is due to oscillations in HR of cong vs. oscillations in HR of incongruent trials or both.

      We fitted separate polynomials to congruent and incongruent Hit Rates instead of their difference. Peaks in enhancement relied on both, oscillatory increases in congruent Hit Rates and simultaneous decreases in incongruent Hit Rates. In other words, enhancement peaks appear to reflect a foveal enhancement of target-congruent feature information along with a concurrent suppression of target-incongruent features. We added this paragraph and Figure 4 to the Results section.

      Additional changes:

      Two figures had accidentally been labeled as Figure 5 in our first revision. We corrected the figure legends and all corresponding figure references in the text.

    1. eLife Assessment

      This important study reveals that the nucleolar protein Treacle undergoes liquid-liquid phase separation in vitro and in vivo. It provides convincing evidence that the ability of Treacle to form phase-separated condensates is necessary for the proper formation of the fibrillar center of the nucleolus, rRNA transcription, and rDNA repair. These findings will be of interest to the communities studying biomolecular condensates, nucleolar organization, and ribosome biogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Velichko et al. argues that the ability of nucleolar protein Treacles to form phase-separated condensates is necessary for its function in nucleolar organization, rRNA transcription, and rDNA repair. These findings may be of interest to the communities studying biomolecular condensates, nucleolar organization, and ribosome biogenesis. The authors propose that Treacle's ability to undergo liquid-liquid phase separation is the key to its role as a scaffold for the FC of the nucleolus. The experiments in this study were designed and performed well, particularly the overexpression studies, done in the absence of endogenous protein and accounted for the protein expression levels.

      Comments on revisions:

      I am satisfied with the authors' revisions; my earlier concerns have been addressed thoroughly, and the manuscript is considerably improved. This study is important for our understanding of the role of Treacle in nucleolar organization and function, as well as general principles of cellular compartmentalization that involve biomolecular condensates.

    3. Reviewer #2 (Public review):

      Summary:

      Velichko, et al. investigate the role played by the long intrinsically disordered protein Trecle in nucleolar morphology and function, with an interest in its potential ability to undergo condensation. The authors explore Treacle's role in core functions of the nucleolus (rRNA biogenesis and DNA repair), which has been a subject of continual investigation since it was identified that truncation of Treacle is the primary genetic cause of Treacher-Collins syndrome. They show that knock out of Treacle leads to de-mixing of canonical markers of the FC (UBF, RPA194) and DFC (FBL) phases of the nucleolus. They also show that replacing Treacle with mutants that either remove the central region of Treacle (∆83-1121) or reduce the segregation of charged residues by scrambling them (CS- Charge Scrambled) results in different FRAP behavior of the condensates that result from Treacle over-expression. These data give new insight into the role played by the charge-segregated central region of Treacle in terms of having the potential to undergo condensation.

      Strengths:

      The characterizations of changes to nuclear morphology upon Treacle knockout is the strength of this study. The authors characterized effects on the canonical markers of the FC and DFC phases support the idea that Treacle has a scaffolding function. While the effect of Treacle perturbations has been studied before, this has often been investigated in the context of organismal development or rRNA biogenesis and less often at the sub-cellular level, as the authors have carried out.

      Another strength of this study is its characterization of the effects of the charge scramble mutant. The authors find that replacing endogenous Treacle with this mutant reduces the bulk dynamics of Treacle as assessed by FRAP, de-mixes FBL from the DFC, lowers pre-rRNA synthesis, and abolishes the recruitment of the DNA-damage response factor TOPBP1.

      Weaknesses:

      The conclusion that Treacle is a core scaffold of the FC is weakly supported. Recombinant Treacle has intrinsic potential to condense, and its condensation is disrupted by the expected solution conditions (i.e., condensates fail to form at high salt but do form in the presence of an aliphatic alcohol). It should be kept in mind that all proteins will condense at sufficiently high concentrations and under crowding. The authors observed condensation at 100uM protein and 5% PEG8000.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides evidence that the protein Treacle plays an essential role in the structure and function of the fibrillar center (FC) of the nucleolus, which is surrounded by the dense fibrillar component (DFC) and the granular component (GC). The authors provide new evidence that, like the DFC and GC, the functional FC compartment involves a biomolecular condensate that contains Treacle as a key component. Treacle is essential to transcription of the rDNA as well as proper rRNA processing that the authors tie to a role in maintaining separation of FC components from the DFC. In vitro and in vivo experiments highlight that Treacle is itself capable of undergoing condensation in a manner that depends on concentration and charge-charge interactions, but is not affected by 1,6 hexanediol, which disrupts weak hydrophobic interactions. Attempting to generate separation-of-function mutants, the authors provide further evidence of complex interactions that drive proper condensation in the FC mediated by both the central repeat (low-complexity, likely driving the condensation) and C-terminal domain (which appears to target the specificity of the condensation to the proper location). Using mutant forms of Treacle defective in condensation, the authors provide evidence that these same protein forms are also disrupted in supporting Treacle's functions in rDNA transcription and rRNA processing. Last, the authors suggest that cells lacking Treacle are defective in the DNA damage response at the rDNA in response to VP16.

      Strengths:

      In general, the data are of high quality, the experiments are well-designed and the findings are carefully interpreted. The findings of the work complement prior high-impact studies of the DFC and GC that have identified constituent proteins as the lynchpins of the biomolecular condensates that organize the nucleolus into its canonical three concentric compartment structure and are therefore likely to be of broad interest. The attempts to generate separation-of-function mutants to dissect the contribution of condensation to Treacle function are ambitious and critical to demonstrating the relevance of this property to the biology of the FC. The complementarity of the methods applied to investigate Treacle function are appropriate and the findings integrate well towards a compelling narrative.

      Weaknesses:

      While the separation of function mutants of Treacle are a major strength of the work, further studies will be required to fully explore the relevance of Treacle condensation to the stability of the rDNA repeats.

    5. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      The interpretation of results obtained with opto-Treacle (related to Figure 2C) may be expanded.

      We thank the reviewer for their insightful comment regarding the interpretation of the results obtained with opto-Treacle. We understand the concern that the difference in the size of the condensates formed by opto-Treacle (Figure 2C) compared to Treacle-2S or other constructs may raise questions about the role of tetramerization in driving condensate formation, as 2S is known to tetramerize while FusionRed is not susceptible to multimerization.

      To address this concern, we emphasize that we have demonstrated that overexpressed Treacle forms large condensates even in the absence of any fluorescent protein, as included in the revised manuscript. This observation supports the conclusion that Treacle's ability to form condensates is intrinsic and does not depend on the multimerization capacity of the fluorescent tag.

      We believe that the observed difference in condensate size between opto-Treacle and Treacle-2S, Treacle-GFP, or untagged Treacle arises primarily from the time available for condensate assembly. Opto-Treacle condensation occurs rapidly, within approximately 10 seconds of blue light illumination, whereas Treacle-2S, Treacle-GFP, or untagged Treacle undergo condensation over the extended period of 24–48 hours of protein overexpression. This temporal difference likely accounts for the disparity in condensate size, as longer assembly times allow for larger and more mature condensates to form.

      Given this reasoning, we consider it unnecessary to further emphasize the size differences in the main text of the article, as we believe the underlying explanation is clear and supported by the data. Nonetheless, we are open to incorporating additional clarifications if the reviewer deems it necessary.

      The authors might reconsider referring to Treacle as a scaffold. Ultimately, the scaffold for the nucleolus is the rDNA with its bound proteins. Scaffold proteins, by definition, bind multiple protein partners and facilitate the formation of multiprotein complexes, a role not really attributed to homotypic LLPS.

      We thank the reviewer for raising this important point regarding the use of the term "scaffold" in relation to Treacle. We fully acknowledge that rDNA, along with its associated protein complexes, serves as the primary structural scaffold for the nucleolus. However, we believe that referring to Treacle as a scaffold is appropriate and justified within the specific context of our study.

      First, we emphasize that we describe Treacle as a scaffold specifically for nucleolar fibrillar centers (FCs), rather than for the nucleolus as a whole. This distinction is important, as our work focuses on the role of Treacle in organizing FC components, rather than the broader structural organization of the nucleolus.

      Second, as the reviewer notes, scaffold proteins are defined by their ability to bind multiple protein partners and facilitate the formation of multiprotein complexes. Our findings demonstrate that Treacle's condensation properties promote the binding and retention of key rDNA-associated protein partners, including RPA194, UBF, and Fibrillarin, within the FCs. This activity aligns with the functional definition of a scaffold protein, as Treacle supports the spatial organization and cooperative interactions of FC components essential for rRNA transcription and processing. Therefore, while we appreciate the reviewer's observation regarding the central role of rDNA as a nucleolar scaffold, we maintain that the use of the term "scaffold" to describe Treacle's role in organizing FCs is consistent with its demonstrated functional properties.

      If authors decide to add the "Ideas and Speculation" subsection to their Discussion, it may be interesting to discuss the following outstanding questions: does Treacle undergo homotypic or heterotypic LLPS? Does its overexpression favor homotypic interactions? How does it segregate FC and DFC compartments -by exclusion? How does phase-separated Treacle interact with other proteins?

      We thank the reviewer for these insightful questions. While we believe that adding a dedicated "Ideas and Speculation" subsection would be redundant, we have already addressed the questions regarding Treacle’s homotypic or heterotypic LLPS and its interactions with other proteins in the revised "Discussion" section. Additionally, we have included a new section in the manuscript specifically focused on investigating the role of Treacle condensation in its interactions with protein partners, further expanding on these points.

      In Materials and Methods, smFISH section -"probes were designed as described (Yao et al, 2019) and labeled with FITS on the 3'ends" - was it meant to say FITC (i.e. Fluorescein)?

      We thank the reviewer for catching this error. This was indeed a typo, and we have corrected it to "FITC (i.e., Fluorescein)" in the revised text.

      Reviewer #2 (Recommendations for the Authors):

      Regarding recombinant Treacle, the main concern is that the authors may not be observing the condensation of Treacle itself. The quality of the purchased recombinant Treacle is unclear (this reviewer could not find Treacle listed on the vendor website despite using the supplied catalog number or vapors search terms). Furthermore, it is not clear if the condensates observed are Treacle or potentially the Dextran crowder. Only small percentages (>1%-5%) of either Dextran or PEG are needed to induce phase separation in two-component mixtures of these polymers. PEG may be in the Treacle storage butter. In addition to clarifying the State of recombinant Treacle, these concerns could be further assuaged by direct visualizing of Treacle forming condensates (via fluorescent n-terminal tagging) and filling in more of the phase space to observe the loss of condensates at a threshold concentration of Treacle. In general, the gold standard for establishing condensation of a given protein is mapping the full binodal phase diagram diagram of the protein. Understanding that protein is a limited resource, most groups simply map the lower concentration arm of the binodal, and this is sufficient to characterize a protein as having intrinsic condensation behavior. A similar mapping effort of Treacle would be welcomed. 

      We thank the reviewer for their thoughtful comments and for highlighting concerns regarding the interpretation of our experiments with commercial recombinant Treacle. We recognize the importance of ensuring that the observed condensation properties are intrinsic to Treacle and not influenced by potential contaminants, storage buffer components, or tags on the protein.

      To address these concerns, we have re-evaluated the condensation properties of Treacle using a recombinant fragment independently purified in our laboratory. Specifically, we expressed and purified a Treacle fragment (amino acids 291–426), which includes two S/E-rich low-complexity regions (LCRs) and two linker regions, in E. coli. The protein was expressed as a TEV-cleavable maltose-binding protein (MBP) fusion, purified under native conditions via amylose resin, and subjected to TEV cleavage. This was followed by ion-exchange chromatography and extensive dialysis to remove any remaining impurities. These additional steps ensured that the purified Treacle fragment was of high purity and free from confounding components, such as polyethylene glycol (PEG). We have included detailed descriptions of this protocol in the revised manuscript.

      Using this purified Treacle fragment, we confirmed its intrinsic condensation behavior in vitro. In the presence of 5% PEG8000 as a crowding agent, the fragment formed liquid-like condensates that exhibited spherical morphology and dynamic fusion events, key hallmarks of liquid-liquid phase separation (LLPS). Additionally, we demonstrated that the condensation of this Treacle fragment was sensitive to changes in pH and salt concentration but unaffected by 1,6-hexanediol treatment, suggesting that the condensates are stabilized predominantly by electrostatic interactions (Fig. 4B of the revised manuscript). Importantly, these findings provide robust evidence that Treacle possesses intrinsic phase-separation properties. All results from the commercial Treacle protein used in the initial version of the manuscript have been replaced with data obtained using this independently purified recombinant fragment.

      We undestand that the condensation behavior of the fragment may not fully capture the behavior of full-length Treacle. Nevertheless, the in vitro experiments provide valuable mechanistic insights into the biophysical properties of Treacle. Furthermore, as emphasized in the revised manuscript, our study primarily focuses on understanding the condensation and functional role of Treacle in a cellular context, where we observe its critical involvement in organizing nucleolar structure and regulating rRNA transcription. These cellular experiments highlight the biological relevance of Treacle’s condensation behavior.

      With regard to mapping the binodal phase diagram of Treacle, we concur with the reviewer that such an effort would be ideal for a more comprehensive characterization of Treacle’s condensation properties. However, the limited availability of purified protein currently precludes a detailed mapping effort. Despite this limitation, we believe the qualitative assessments of Treacle’s condensation under varying conditions, now included in the revised manuscript, sufficiently demonstrate its intrinsic ability to phase-separate.

      In conclusion, we are grateful for the reviewer’s feedback, which has allowed us to refine our methodology and strengthen the evidence supporting the intrinsic condensation properties of Treacle. We are confident that the revised manuscript provides a robust and thorough characterization of Treacle’s phase-separation behavior and its functional role in the cell, addressing the reviewer’s concerns. Thank you for your constructive recommendations, which have significantly improved the quality of our work.

      Replacing 'liquid-phase' and 'liquid' with 'liquid-like' would make the language consistent with other papers in the field and more accurately reflect the degree of material state analysis carried out in the study.

      We thank the reviewer for this insightful recommendation. In response to the suggestion, we have revised the manuscript to replace the terms "liquid-phase" and "liquid" with "liquid-like" throughout the text. This change ensures consistency with terminology commonly used in the field and more accurately reflects the degree of material state analysis performed in our study. We believe this adjustment improves the clarity and precision of our findings, aligning the manuscript with standard practices in the field. Thank you for helping us enhance the quality of the presentation.

      The 'unclear' nature of the condensation behavior of the FC phase of the nucleolus is listed as a motivation for carrying out the study in the introduction; the authors could note here two recent papers that have investigated the nature of FC condensation: Jaberi-Lashkari et al. 2023 and King et al. 2024. The reviewer notes that while these were both pre-printed in late 2022, they were only recently published.

      We thank the reviewer for bringing these recent studies to our attention. In response to the suggestion, we have cited the papers by Jaberi-Lashkari et al. (2023) and King et al. (2024) in both the introduction and discussion sections of the revised manuscript. These references are highly relevant to the context of our study and provide valuable insights into the condensation behavior of the FC phase of the nucleolus. We agree that incorporating these works strengthens the framing of our study and situates it more effectively within the broader field. Thank you for this constructive recommendation.

      The statement that Treacle is "the main molecule present in the FC" is a substantial claim that does not need to be made to promote the author's case, nor is it well supported by the provided reference (Gal et al., 2022).

      We thank the reviewer for pointing out this overstatement in our original manuscript. In response, we have revised the text to provide a more accurate and well-supported description. Specifically, we have replaced the claim that Treacle is "the main molecule present in the FC" with a statement highlighting its direct interactions with UBF and RNA Pol I, as well as its colocalization with these proteins within the FC. This revision ensures alignment with the provided references and more accurately reflects the current understanding of Treacle's role in the FC. We appreciate the reviewer's attention to this detail, which has helped us improve the clarity and accuracy of our manuscript.

      The statement that "Treacle is one of the most intrinsically disordered proteins" is vague and unnecessarily grand. Treacle is a fully intrinsically disordered protein; these comprise 5% of the human proteome (Tsang et al. 2020), so Treacle is, indeed, unusual in that regard.

      We thank the reviewer for highlighting the vague and unnecessarily broad nature of the original statement. In response, we have revised the text to provide a more precise and accurate description of Treacle's structural properties. Specifically, we replaced the claim that "Treacle is one of the most intrinsically disordered proteins" with the statement that "According to protein structure predictors (e.g., AlphaFold, IUPred2, PONDR, and FuzDrop), Treacle is a fully intrinsically disordered protein." This wording reflects the unique nature of Treacle while remaining scientifically accurate and supported by reliable computational predictions. We appreciate the reviewer's feedback, which has allowed us to improve the rigor and clarity of our manuscript.

      A comment on the implications of the immobile pool of Treacle (which appears to be ~50% in WT and across a range of mutants) would be welcome. Additionally, the limitations of FRAP for interrogating material properties of condensed material in living systems are provided in Goetz and Mahamid, 2020. In this paper, the authors review instances where the ultrastructure of condensate is known and where FRAP data is available. They show that crystalline assemblies can recover faster than apparently liquid, spherical assemblies. A comment in the text about how these limitations apply to this study would be welcome.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the immobile pool of Treacle and the limitations of FRAP for characterizing material properties in living systems. As noted in our response to the public review, we believe the ~50% recovery rate after photobleaching observed in our experiments is best explained by the redistribution of Treacle molecules within the condensate, rather than significant exchange with the surrounding phase. This interpretation is strongly supported by the full- and half-FRAP analyses included in the revised manuscript, which demonstrated internal mixing dynamics within the condensates.

      There appears to be a typo in the following sentence: "The highly positively charged CD serves as the nucleation center for RD but exhibits ambivalent phase properties, transitioning from LLPS to LSPS in the absence of rRNA." The LLPS to LSPS behavior was observed for mutants to the central domain (RD), not the c-terminal domain (CD).

      Throughout the authors report single snapshots of representative cells and single line traces. Analysis of the key morphological feature across the population of cells would help the reader understand how widespread the observed phenotype is.

      We thank the reviewer for raising this important point regarding the representation of morphological features across the cell population. To address this concern, we have included widefield micrographs of cell fields in the revised figures to provide a more comprehensive view of the phenotypes observed.

      The statement that "The phase behavior of polymers is determined by interactions through associative motifs, referred to as stickers, separated by spacers, which are not the primary driving forces for phase separation" could be improved by pointing out that this is potentially incomplete for describing the kind of condensation that highly charged polymers undergo. The high charge and charge segregation of Treacle suggest that it is a blocky polyampholyte and that it condenses by coacervation. Models of associative polymers can be useful for describing coacervation, however, the driving forces for coacervation are less understood and have been proposed to include an entropic component (see Sathyavageeswaran et al. 2024, Sing and Perry 2020 and work from their groups as well as the Obermayer (Columbia) and Terrell (U. Chicago) Groups).

      We thank the reviewer for highlighting this important aspect of the phase behavior of charged polymers and for suggesting relevant references. In response, we have revised the discussion section of the manuscript to include a more nuanced explanation of the condensation mechanisms for highly charged polymers such as Treacle. Specifically, we now describe Treacle as a blocky polyampholyte, suggesting that its condensation behavior may be driven by coacervation mechanisms.The relevant references have been added to the discussion section of the revised manuscript.

      In addition to the above, the authors may consider citing two recent publications from the Pappu group (King et al. Cell 2024 and King et al. Nucleus 2024) that directly investigate the condensation potential of K-rich and E/D-rich' grammars' on nucleolar proteins and show that, like the authors, the K-rich region is essential for localization and is conserved across nucleolar proteins.

      We thank the reviewer for bringing these relevant publications to our attention. The suggested references from the Pappu group (King et al., Cell 2024, and King et al., Nucleus 2024) have been added to the introduction and discussion sections of the revised manuscript, and their findings have been appropriately integrated into our analysis.

      The authors could consider replacing the use of LLPS with a more generic term such as "condensation" or "biomolecular condensation." LLPS of polymers is a segregative transition driven by its incompatibility with the surrounding solvent. As indicated, Treacle is likely to be undergoing some form of coacervation (which is predominantly an associative tradition), which can be genetically described as condensation. See Pappu et al. 2023 for more details.

      We thank the reviewer for their insightful suggestion. Following the reviewer's recommendation, we have replaced the term "LLPS" with "condensation" or "coacervation" throughout the manuscript, where appropriate. Additionally, we have referenced Pappu et al. (2023) and other to provide further context and clarity regarding the distinctions between these terms.

      The authors cite Yao et al. 2019, but do not cite the follow-up study (Wu et al. 2021) or provide a statement on how the Chan group finds a role for the RGG domain of FBL in keeping the certain canonical markers of the FC and DFC de-mixed.

      We thank the reviewer for pointing out these important references. The relevant citations, including Wu et al. (2021), have been added to the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      The following comment is true but could be broadened to include examples of structured regions promoting biomolecular condensation. "In biological systems, phase separation is mainly a characteristic of multivalent or intrinsically disordered proteins (Banani et al, 2017; Shin & Brangwynne,2017; Uversky, 2019)."

      We have expanded the statement as recommended by the reviewer: "In biological systems, phase separation is facilitated by a combination of multivalent interactions mediated by intrinsically disordered proteins and site-specific interactions that drive percolation."

      Related to Figure 1.

      The authors report Treacle-dependent EU incorporation (Figure 1D), but are there any changes more broadly to nucleolar number or size as a consequence? How do the authors interpret that the quantitative effect of AMD treatment is more extreme than Treacle depletion (Figure 1E).

      We thank the reviewer for raising these important points. Regarding nucleolar number and morphology, we did not observe a change in the number of nucleoli upon Treacle depletion. However, nucleoli appeared more regularly rounded under these conditions, which we interpret as a consequence of the decreased rDNA transcription activity caused by Treacle depletion. A similar rounding of nucleoli is also observed upon actinomycin D (AMD) treatment, which is consistent with reduced transcriptional activity.

      As for the more pronounced effect of AMD compared to Treacle depletion on EU incorporation, this can be explained by the fundamentally different mechanisms through which these conditions affect transcription. Treacle depletion reduces the local concentration of transcription factors at rDNA sites, thereby impairing transcription initiation and elongation to a certain extent. However, under Treacle depletion, RNA polymerase I still retains the ability to bind to the promoter and support a residual level of transcription. In contrast, AMD acts as a potent intercalator in GC-rich regions of rDNA, physically blocking the ability of RNA polymerase I to move along rDNA, resulting in near-complete cessation of rRNA synthesis.

      Related to Figure 2.

      The authors observe that AMD leads to coalescence of individual Treacle-2S+ bodies (e.g. Figure 2E) - does this suggest that ongoing rRNA transcription is required to prevent such events?

      Thank you for your thoughtful question. Indeed, our observations strongly suggest that ongoing rRNA transcription is required to prevent the coalescence of Treacle-2S+ bodies, as observed upon AMD treatment. This interpretation aligns with the findings of Tetsuya Yamamoto et al., who demonstrated that nascent ribosomal RNA (pre-rRNA) acts as a surfactant to suppress the growth and fusion of fibrillar centers (FCs) in the nucleolus. Their work highlighted that nucleolar condensates formed via liquid-liquid phase separation (LLPS) tend to grow to minimize surface energy, provided sufficient components are available. However, the transcription of prerRNA stabilizes FCs by maintaining multiple microphases, preventing coalescence unless transcription is inhibited.

      According to Yamamoto et al., nascent pre-rRNAs tethered to FC surfaces by RNA Polymerase I generate lateral pressure that counteracts interfacial tensions, effectively suppressing FC fusion. This activity is analogous to the surfactant properties of molecules in physical systems. When transcription is inhibited (e.g., by AMD), the loss of nascent rRNA allows condensates to coalesce, consistent with the behavior we observe.

      We further propose that the AMD-induced coalescence of Treacle-2S+ bodies reflects the loss of this surfactant-like effect, as transcriptional activity ceases. This theory is also supported by the observation that Treacle condensates in the nucleoplasm, where rRNA transcription is absent, form larger structures. Collectively, these insights highlight the critical role of ongoing rRNA transcription in maintaining the structural integrity and dynamic organization of nucleolar substructures.

      Related to Figure 3.

      In the figure panels B-H the DAPI signal in gray obscures the Treacle localization, especially in Figure 3H. A non-merged image for each of these examples for the Treacle localization would be very helpful.

      We thank the reviewer for this observation. To address this, we have included wide-field images without the DAPI overlay for the deletion mutant lacking the 1121-1488 region. These are now presented in Supplementary Figure S5G of the revised manuscript.

      Related to Figure 5.

      Only a single representative nucleus is shown in the PLA analysis presented in Figure 5B.

      Quantification to assess the robustness of this response with the addition of VP16 is needed. The authors use ChIP and immunocytochemistry as orthogonal methods but it would be best to therefore show both for each manipulation that is performed - the immunostaining of TOPBP1 in the Treacle KD cells in S5A should be in the main Figure 5 to complement transformation of constructs as in Figure 5D.

      We appreciate the reviewer’s comment. To address this, we performed a quantitative analysis of PLA fluorescence signals in control and etoposide-treated cells, and the results are now presented in Supplementary Figure S8C. Additionally, as recommended, we have transferred the results of the immunocytochemistry of TOPBP1 in Treacle KD and Treacle KN cells to the main figure, now included as Figures 7D-E in the revised manuscript.

    1. eLife Assessment

      This is important work and provides a significant advance in our understanding of mechanosensation in the epidermis. The evidence presented is convincing and, barring a few minor weaknesses, strongly implicates activation of epidermal cells and store-operated calcium entry in the activation of nociceptive neurons innervating that tissue. This work will be of broad interest to neurobiologists, epithelial cell biologists, and mechanobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this meticulously conducted study, the authors show that Drosophila epidermal cells can modulate escape responses to noxious mechanical stimuli. First, they show that activation of epidermal cells evokes many types of behaviors including escape responses. Subsequently, they demonstrate that most somatosensory neurons are activated by activation of epidermal cells, and that this activation has a prolonged effect on escape behavior. In vivo analyses indicate that epidermal cells are mechanosensitive and require stored-operated calcium channel Orai. Altogether, the authors conclude that epidermal cells are essential for nociceptive sensitivity and sensitization, serving as primary sensory noxious stimuli.

      Strengths:

      The manuscript is clearly written. The experiments are logical and complementary. They support the authors' main claim that epidermal cells are mechanosensitive and that epidermal mechanically evoked calcium responses require the stored-operated calcium channel Orai. Epidermal cells activate nociceptive sensory neurons as well as other somatosensory neurons in Drosophila larvae, and thereby prolong escape rolling evoked by mechanical noxious stimulation.

      Weaknesses:

      In several places the text is unclear. For example, core details are missing in the protocols, including the level of LED intensity used, which are necessary for other researchers to reproduce the experiments. Secondly, the rationales are missing for some experiments (for experiments X, Y, and Z). It would be helpful to clarify for your readers why the experiments (for example Figure 3S2) were performed. Finally, for most experiments, the epidermal cells are activated for 60 s, which is long when considering that nocifensive rolling occurs on a timescale of milliseconds. It would be informative to know the shortest duration of epidermal cell activation that is sufficient for observing the behavioral phenotype (prolongation of escape behavior) and activation of sensory neurons.

    3. Reviewer #2 (Public review):

      Summary:

      The authors provide compelling evidence that stimulation of epidermal cells in Drosophila larvae results in the stimulation of sensory neurons that evoke a variety of behavioral responses. Further, the authors demonstrate that epidermal cells are inherently mechanoresponsive and implicate a role for store-operated calcium entry (mediated by Stim and Orai) in the communication to sensory neurons.

      Strengths:

      The study represents a significant advance in our understanding of mechanosensation. Multiple strengths are noted. First, the genetic analyses presented in the paper are thorough with appropriate consideration to potential confounds. Second, behavioral studies are complemented by sophisticated optogenetics and imaging studies. Third, identification of roles for store-operated calcium entry is intriguing. Lastly, conservation of these pathways in vertebrates raise the possibility that the described axis is also functional in vertebrates.

      Weaknesses:

      The study has a few conceptual weaknesses that are arguably minor. The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed. On a related note, the kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled. Finally, it is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      Revised manuscript:

      The authors have adequately addressed my original concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary.

      In this meticulously conducted study, the authors show that Drosophila epidermal cells can modulate escape responses to noxious mechanical stimuli. First, they show that activation of epidermal cells evokes many types of behaviors including escape responses. Subsequently, they demonstrate that most somatosensory neurons are activated by activation of epidermal cells, and that this activation has a prolonged effect on escape behavior. In vivo analyses indicate that epidermal cells are mechanosensitive and require stored-operated calcium channel Orai. Altogether, the authors conclude that epidermal cells are essential for nociceptive sensitivity and sensitization, serving as primary sensory noxious stimuli.

      Strengths.

      The manuscript is clearly written. The experiments are logical and complementary. They support the authors' main claim that epidermal cells are mechanosensitive and that epidermal mechanically evoked calcium responses require the stored-operated calcium channel Orai. Epidermal cells activate nociceptive sensory neurons as well as other somatosensory neurons in Drosophila larvae, and thereby prolong escape rolling evoked by mechanical noxious stimulation.

      Weaknesses.

      Core details are missing in the protocols, including the level of LED intensity used, which are necessary for other researchers to reproduce the experiments. For most experiments, the epidermal cells are activated for 60 s, which is long when considering that nocifensive rolling occurs on a timescale of milliseconds. It would be informative to know the shortest duration of epidermal cell activation that is sufficient for observing the behavioral phenotype (prolongation of escape behavior) and activation of sensory neurons.

      (1) We agree with the reviewer that the LED intensity is an important detail of the experimental paradigm. We updated the methods to include intensity measurements for the stimuli used throughout the manuscript.

      (2) The Reviewer asks about the shortest duration of epidermal cell activation sufficient for observing the behavior phenotype. We note in the manuscript that behavioral responses to optogenetic epidermal stimulation are apparent within 2 seconds of stimulus (see Figure 2F); this is consistent with our calcium imaging data in which C4da response reaches its maximum within 2-3 sec of stimulation.

      Reviewer #1 (Recommendations):

      (1) The epidermal cells in this study are activated for 60 s. In the real world, the nociceptive stimulation (a poke, such as penetration by the ovipositor of a parasitic wasp) that evokes escape rolling is short. Does optogenetic activation of 1 s or less still evoke rolling? For example, it is unclear in Figure 4K how long the epidermal cells need to be activated before the poke stimulus prolongs rolling. Is it possible to test behavior and GCaMP activity in sensory neurons when epidermal cells are briefly (1 second) activated?

      As described above, behavioral responses to optogenetic epidermal stimulation are apparent within 2 seconds of stimulus (see Figure 2F); this is consistent with our calcium imaging data in which C4da response reaches its maximum within 2-3 sec of stimulation. The kinetics are consistent with a role for epidermal cells in modulating neuronal responses to nocifensive stimuli, and similar to the response kinetics observed in mammalian epidermal cells that modulate neuronal touch and pain responses  (Maksimovic et al., 2014; Woo et al., 2014; Mikesell et al., 2022).

      (2) The protocol for optogenetic screening states that the authors used a 488-nm LED. Why was a 488-nm LED used instead of the 610-nm LED for Chrimson activation? No information (except figure 4K) about the light intensity is provided in the figure legend or the protocol section. Please state the LED intensity used for all optogenetic experiments (GCaMP imaging, behavioral experiments, etc.).

      We used 488 nm light for the initial screen for technical reasons. The screen was conducted by students at the MBL Neurobiology course (hence the affiliation; student authors are included in the manuscript), and the only LED available to us at that time delivered insufficient illumination at longer wavelenths to be useful. We chose to include the student’s data because (1) we found that the 488 nm light alone did not induce rolling in our setup, (2) we repeated and extended the studies with the epidermal drivers using a higher resolution imaging platform and longer wavelength stimulation (all studies other than Fig. 1), and (3) we observed qualitatively similar results when we repeated stimulation with all drivers using 561 nm light.

      We agree that the LED intensity is an important detail of the experimental paradigm. We updated the methods to include intensity measurements for the stimuli used throughout the manuscript. We also include the intensities here:

      - 30 μW/mm^2 for calcium imaging experiments Fig 3B-E, Fig 4A, Fig 3S1A-D, Fig 4S1A

      - 300 μW/mm^2 for behavior studies in Fig 2B-E, Fig 1S6, Fig 2S1, Fig 3E-F, Fig 3S2A-C

      - 25 μW/mm^2 for behavior studies in Fig 4E-J

      - 1.16 μW/mm^2 for behavior studies in Fig 4K

      (3) Lines 150 - 152: Although the authors refer to "a stereotyped behavior sequence" in Fig 2D, there are no data supporting this claim in Fig 2. Rather, the data appear to represent proportions of different types of behavior at each time point, rather than behavior sequences. If the authors wish to claim that the data show stereotyped behavior sequences, they should analyze the data using a different method (e.g., Markov models).

      We agree that in the absence of additional analysis we should avoid commenting on stereotypy of behavior sequences; we therefore adjusted the text to reflect the tendency of nociceptive behaviors to precede non-nociceptive behaviors. The raster plots shown in Supplemental Fig. 2A illustrate this point: in larvae exhibiting nociceptive behaviors, these behaviors appear first, followed by backing and frequently freezing. As one quantitative readout of this sequence we show that the latency of rolling (nociceptive) is shorter compared with backing or freezing (non-nociceptive) (Fig. 2F, Fig. S2G).

      (4) Figure 3A-E: a cursory glance at the data suggests that the most responsive sensory neurons are C1da, with all sensory neurons activated. However, at the behavioral level, only some sensory neurons are activated. If all sensory were activated by Chrimson, what behavioral phenotypes would the authors expect to see? Would it be the same as epidermal activation?

      The Reviewer raises an interesting question, but we intentionally avoid comparing the response properties among sensory neurons because of differences in driver strength. Likewise, extrapolating “activation” at the behavioral level is exceedingly difficult if/when multiple sensory neurons are simultaneously activated. In response to the Reviewer’s specific question, when all da neurons are activated simultaneously, larvae largely exhibited hunching rather than rolling (Hwang et al., 2007). We find that epidermal stimulation rarely elicits hunching; instead, epidermal stimulation generally triggers nocifensive behaviors followed by non-nocifensive behaviors such as backing and freezing, suggesting an order or priority in neurons activated by epidermal cells (or different response times). Defining the mechanisms by which epidermal cells communicate with different types of sensory neurons is therefore a top priority for future studies.

      (5) Figure 3S2; The behavior phenotypes between Fig. 3E, F and Fig 3S2 seems a slightly different. I suggest adding some comments in different behavior phenotype depending on the different GAL4. Specifically, is there increased freezing in some genotypes (e.g., ppk-LexA or NompC-lexA)? Can you show this without TNT data? Is this a background effect or specific GAL4 phenotype?

      We currently do not have the driver-only control for this experiment, but our effector-only control experiment (see Fig. 3S2A) suggests that larvae carrying the AOP-TNT insertion exhibit enhanced nociceptive behavioral responses. This point is addressed in our manuscript by the following (copied from the figure legend):

      “We note that although baseline rolling probability is elevated in all genetic backgrounds containing the AOP-LexA-TnT insertion, silencing C4da and C3da neurons significantly attenuates responses to epidermal stimulation.”

      (6) Calcium-free solution is used in Figure 3. Why do the authors still observe calcium influx? Does this mean that internal calcium stores are released? If so, does the calcium influx represent an action potential? How do the authors focus their LED stimulation to activate epidermal cells and avoid activation of the imaging laser?

      The specimens were imaged in calcium-free solution to minimize movement artifacts. However, the CNS is wrapped by glial cells and over short timescales such as those used for the imaging we speculate that extracellular calcium persists in the CNS.

      (7) It is unclear when animals begin to crawl after the epidermal cells are mechanically stimulated. How do the authors distinguish between peristaltic crawling and a poke by Orai receptors? Although the in vitro experiments beautifully show radial tensions, it is unclear to what extent A-P axis tension (peristaltic crawling) and radial tension (poke) differ. It might be helpful to explain in the discussion section how epidermal cells are selectively activated.

      The Reviewer raises an interesting question about the types and thresholds of forces required to elicit epidermal responses. We cannot eliminate the possibility that peristaltic crawling (or crawling through a 3D substrate) stimulates epidermal cells to a certain degree. Indeed, our results demonstrate a dose-dependent response of Drosophila epidermal cells and human keratinocytes to radial stretch. However, we do not have any information about selectivity in response to different stimuli, though we agree that this is an intriguing avenue for future studies. For example, we don't know whether stretch-responsive cells are more or less responsive to poke. But, a salient feature of our studies is the recruitment of greater numbers of responders with increasing stimulus intensity, therefore we added the following statement to the discussion to clarify our model:

      “Finally, we find that epidermal cells exhibit a dose-dependent response to radial stretch; we therefore anticipate that the output of epidermal cells is likewise dependent on the stimulus intensity.  Hence, rather than a fixed threshold beyond which epidermal cells are selectively activated, we hypothesize that increasing stimulus intensities drive increasing signal outputs to neurons.”

      (8) Some Protocols are missing. For example, in Figure 4, many stimulus combinations were used to test behavior. How were stimuli of different modalities applied to the animals? Further details need to be provided in the protocols.

      We thank the Reviewer for identifying this oversight. The methods section of our original submission detailed most of the stimulus combinations but omitted the opto + mechano combination (4F). We updated our methods to correct these omissions.

      (9) It might be helpful if the authors could provide a sample video for each behavior to clarify how they were each defined.

      Our manuscript includes a table with a detailed description of the behaviors (Table S2), and we added two annotated videos that show representative behavioral responses to optogenetic nociceptor or epidermis stimulation.

      (10) A supplementary summary table of genotypes might be helpful for the reader.

      Experimental genotypes are provided in the figure legends, and a detailed list of all alleles used in the study as well as their source is provided in supplemental table S1.

      Reviewer #2 (Public Review):

      Summary.

      The authors provide compelling evidence that stimulation of epidermal cells in Drosophila larvae results in the stimulation of sensory neurons that evoke a variety of behavioral responses. Further, the authors demonstrate that epidermal cells are inherently mechanoresponsive and implicate a role for store-operated calcium entry (mediated by Stim and Orai) in the communication to sensory neurons.

      Strengths.

      The study represents a significant advance in our understanding of mechanosensation. Multiple strengths are noted. First, the genetic analyses presented in the paper are thorough with appropriate consideration to potential confounds. Second, behavioral studies are complemented by sophisticated optogenetics and imaging studies. Third, identification of roles for store-operated calcium entry is intriguing. Lastly, conservation of these pathways in vertebrates raise the possibility that the described axis is also functional in vertebrates.

      Weaknesses.

      The study has a few conceptual weaknesses that are arguably minor. The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed. On a related note, the kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled. Finally, it is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      (1) The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed.

      Our studies suggest that mechanically evoked responses in epidermal cells involve both ER calcium release and store-operated calcium entry. Notably, we show that depletion of ER calcium stores before mechanical stimulation, by treating with thapsigargin, reduces (but does not eliminate) mechanically evoked calcium responses in fly epidermal cells (Fig. 6C-6F). Likewise, fly epidermal cells and human keratinocytes both exhibit mechanically evoked calcium responses in the absence of extracellular calcium (10mM EGTA to chelate all free calcium ions). These data support a model whereby mechanical stimuli trigger calcium release from ER stores and influx. Indeed, several cell types have been shown to display mechanically evoked release of calcium from stores. For example, mechanical stimulation of enteroendocrine cells of the gut epithelium results in both calcium release from ER stores and calcium influx across the plasma membrane (Knutson et al., 2023). Similar to our findings, Knutson et al found that depleting stores decreased mechanically evoked calcium signals by over 70% in these gut epithelial stores. In our revised manuscript we have more clearly emphasized these points.

      We agree with the reviewer that deciphering the mechanisms by which mechanical stimuli promote ER calcium release and subsequent store-operated calcium entry is an exciting topic to explore. One potential mechanism is the activation of a mechanosensitive receptor that promotes calcium release from the ER via calcium-induced calcium release or IP3 production, as has been proposed for enteroendocrine cells. A recent paper demonstrated that the ER itself is mechanosensitive and that mechanical stimuli promotes calcium release via the opening of calcium-permeable ion channels in the ER membrane (Song et al., 2024). Determining the relative contributions of store-operated calcium entry and ER calcium release and deciphering their underlying mechanisms will require a thorough investigation of ER calcium channels and receptors, thus we believe this would be beyond the scope of the present manuscript and merits publication on its own. However, we now include this in our discussion as an exciting new direction we aim to pursue.

      (2) The kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled.

      The Reviewer raises an interesting point regarding the mode of epidermal cell-neuronal communication. We demonstrated a requirement for dynamin-dependent vesicle release from epidermal cells in mechanical sensitization. However, the nature of the vesicular pool, the mode and kinetics of release, and the type of neuromodulator released remain to be characterized. Hence, it’s not clear that kinetics of synaptic vesicle release is an appropriate comparison. Our studies do demonstrate that behavioral responses to optogenetic epidermal stimulation are relatively slow – on the order of seconds – which is not incompatible with the kinetics of store-operated calcium entry. Furthermore, the primary functional output we define for epidermal mechanosensory responses, mechanical nociceptive sensitization, is apparent 10 sec following the stimulus and persists for minutes in our behavior assays. Consistent with this model, studies of the mammalian touch dome have shown that touch-sensitive Merkel cells secrete neurotransmitters to modulate neurons and promote sustained action potential firing on a similar timescale. Likewise, mechanically evoked ER calcium-release promotes sustained secretion of serotonin from enterochromaffin cells.

      (3) It is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      We appreciate the opportunity to clarify our results. We demonstrate that optogenetic epidermal stimulation elicits behavioral responses in larvae and calcium responses in somatosensory neurons, but we do not claim that optogenetic epidermal stimulation elicits SOC. Our optogenetic studies demonstrate the capacity for epidermal stimulation to modulate somatosensory function, but we characterize contributions of SOC only to mechanical stimuli which are more physiologically relevant. However, it is worth noting that CsChrimson is a calcium-permeable channel, suggesting that an increase in intracellular calcium may trigger epidermal-evoked neuronal responses and behaviors during optogenetic stimulation.

      References

      Hwang, RY, Zhong, L, Xu, Y, Johnson, T, Zhang, F, Deisseroth, K, and Tracey, WD (2007). Nociceptive neurons protect Drosophila larvae from parasitoid wasps. Curr Biol 17, 2105–2116.

      Knutson, KR, Whiteman, ST, Alcaino, C, Mercado-Perez, A, Finholm, I, Serlin, HK, Bellampalli, SS, Linden, DR, Farrugia, G, and Beyder, A (2023). Intestinal enteroendocrine cells rely on ryanodine and IP3 calcium store receptors for mechanotransduction. J Physiol 601, 287–305.

      Maksimovic, S, Nakatani, M, Baba, Y, Nelson, AM, Marshall, KL, Wellnitz, SA, Firozi, P, Woo, S-H, Ranade, S, Patapoutian, A, et al. (2014). Epidermal Merkel cells are mechanosensory cells that tune mammalian touch receptors. Nature 509, 617–621.

      Mikesell, AR, Isaeva, O, Moehring, F, Sadler, KE, Menzel, AD, and Stucky, CL (2022). Keratinocyte PIEZO1 modulates cutaneous mechanosensation. Elife 11, e65987.

      Song, Y, Zhao, Z, Xu, L, Huang, P, Gao, J, Li, J, Wang, X, Zhou, Y, Wang, J, Zhao, W, et al. (2024). Using an ER-specific optogenetic mechanostimulator to understand the mechanosensitivity of the endoplasmic reticulum. Dev Cell 59, 1396-1409.e5.

      Woo, S-H, Ranade, S, Weyer, AD, Dubin, AE, Baba, Y, Qiu, Z, Petrus, M, Miyamoto, T, Reddy, K, Lumpkin, EA, et al. (2014). Piezo2 is required for Merkel-cell mechanotransduction. Nature 509, 622–626.

    1. eLife Assessment

      This valuable study proposes using a rigorous computational model to assess memory deficits in Alzheimer's Disease with the goal of developing an early diagnosis tool for the disease. Using an established mouse model of the disease, the authors studied multiple behavioral tasks and ages with the goal of showing similarities in behavioral deficits across tasks. Using the model, the authors indicate specific deficits in memory (overgeneralization and overdifferentiation) in mice with the transgene for the disease. However, the evidence presented is incomplete as certain concerns remain regarding the interpretation of the behavioral results and the validation of the model fit.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show certain memory deficits in a mouse knock-in model of Alzheimer's Disease (AD). They show that the observed memory deficits can be explained by a computational model, the latent cause model of associative memory. The memory tasks used include the fear memory task (CFC) and the 'reverse' Barnes maze. Research on AD is important given its known huge societal burden. Likewise, better characterization of the behavioral phenotypes of genetic mouse models of AD is also imperative to advance our understanding of the disease using these models. In this light, I applaud the authors' efforts.

      Strengths:

      (1) Combining computational modelling with animal behavior in genetic knock-in mouse lines is a promising approach, which will be beneficial to the field and potentially explain any discrepancies in results across studies as well as provide new predictions for future work.

      (2) The authors' usage of multiple tasks and multiple ages is also important to ensure generalization across memory tasks and 'modelling' of the progression of the disease.

      Weaknesses:

      (1) I have some concerns regarding the interpretation of the behavioral results. Since the computational model then rests on the authors' interpretation of the behavioral results, it, in turn, makes judging the model's explanatory power difficult as well. For the CFC data, why do knock-in mice have stronger memory in test 1 (Figure 2C)? Does this mean the knock-in mice have better memory at this time point? Is this explained by the latent cause model? Are there some compensatory changes in these mice leading to better memory? The authors use a discrimination index across tests to infer a deficit in re-instatement, but this indicates a relative deficit in re-instatement from memory strength in test 1. The interpretation of these differential DIs is not straightforward. This is evident when test 1 is compared with test 2, i.e., the time point after extinction, which also shows a significant difference across groups, Figure 2F, in the same direction as the re-instatement. A clarification of all these points will help strengthen the authors' case

      (2) I have some concerns regarding the interpretation of the Barnes maze data as well, where there already seems to be a deficit in the memory at probe test 1 (Figure 6C). Given that there is already a deficit in memory, would not a more parsimonious explanation of the data be that general memory function in this task is impacted in these mice, rather than the authors' preferred interpretation? How does this memory weakening fit with the CFC data showing stronger memories at test 1? While I applaud the authors for using multiple memory tasks, I am left wondering if the authors tried fitting the latent cause model to the Barnes maze data as well.

      (3) Since the authors use the behavioral data for each animal to fit the model, it is important to validate that the fits for the control vs. experimental groups are similar to the model (i.e., no significant differences in residuals). If that is the case, one can compare the differences in model results across groups (Figures 4 and 5). Some further estimates of the performance of the model across groups would help.

      (4) Is there an alternative model the authors considered, which was outweighed in terms of prediction by this model? One concern here is also parameter overfitting. Did the authors try leaving out some data (trials/mice) and predicting their responses based on the fit derived from the training data?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript proposes that the use of a latent cause model for the assessment of memory-based tasks may provide improved early detection of Alzheimer's Disease as well as more differentiated mapping of behavior to underlying causes. To test the validity of this model, the authors use a previously described knock-in mouse model of AD and subject the mice to several behaviors to determine whether the latent cause model may provide informative predictions regarding changes in the observed behaviors. They include a well-established fear learning paradigm in which distinct memories are believed to compete for control of behavior. More specifically, it's been observed that animals undergoing fear learning and subsequent fear extinction develop two separate memories for the acquisition phase and the extinction phase, such that the extinction does not simply 'erase' the previously acquired memory. Many models of learning require the addition of a separate context or state to be added during the extinction phase and are typically modeled by assuming the existence of a new state at the time of extinction. The Niv research group, Gershman et al. 2017, have shown that the use of a latent cause model applied to this behavior can elegantly predict the formation of latent states based on a Bayesian approach, and that these latent states can facilitate the persistence of the acquisition and extinction memory independently. The authors of this manuscript leverage this approach to test whether deficits in the production of the internal states, or the inference and learning of those states, may be disrupted in knock-in mice that show both a build-up of amyloid-beta plaques and a deterioration in memory as the mice age.

      Strengths:

      I think the authors' proposal to leverage the latent cause model and test whether it can lead to improved assessments in an animal model of AD is a promising approach for bridging the gap between clinical and basic research. The authors use a promising mouse model and apply this to a paradigm in which the behavior and neurobiology are relatively well understood - an ideal situation for assessing how a disease state may impact both the neurobiology and behavior. The latent cause model has the potential to better connect observed behavior to underlying causes and may pave a road for improved mapping of changes in behavior to neurobiological mechanisms in diseases such as AD.

      Weaknesses:

      I have several substantial concerns which I've detailed below. These include important details on how the behavior was analyzed, how the model was used to assess the behavior, and the interpretations that have been made based on the model.

      (1) There is substantial data to suggest that during fear learning in mice separate memories develop for the acquisition and extinction phases, with the acquisition memory becoming more strongly retrieved during spontaneous recovery and reinstatement. The Gershman paper, cited by the authors, shows how the latent causal model can predict this shift in latent states by allowing for the priors to decay over time, thereby increasing the posterior of the acquisition memory at the time of spontaneous recovery. In this manuscript, the authors suggest a similar mechanism of action for reinstatement, yet the model does not appear to return to the acquisition memory state after reinstatement, at least based on the examples shown in Figures 1 and 3. Rather, the model appears to mainly modify the weights in the most recent state, putatively the 'extinction state', during reinstatement. Of course, the authors must rely on how the model fits the data, but this seems problematic based on prior research indicating that reinstatement is most likely due to the reactivation of the acquisition memory. This may call into question whether the model is successfully modeling the underlying processes or states that lead to behavior and whether this is a valid approach for AD.

      (2) As stated by the authors in the introduction, the advantage of the fear learning approach is that the memory is modified across the acquisition-extinction-reinstatement phases. Although perhaps not explicitly stated by the authors, the post-reinstatement test (test 3) is the crucial test for whether there is reactivation of a previously stored memory, with the general argument being that the reinvigorated response to the CS can't simply be explained by relearning the CS-US pairing, because re-exposure the US alone leads to increase response to the CS at test. Of course there are several explanations for why this may occur, particularly when also considering the context as a stimulus. This is what I understood to be the justification for the use of a model, such as the latent cause model, that may better capture and compare these possibilities within a single framework. As such, it is critical to look at the level of responding to both the context alone and to the CS. It appears that the authors only look at the percent freezing during the CS, and it is not clear whether this is due to the contextual US learning during the US re-exposure or to increased response to the CS - presumably caused by reactivation of the acquisition memory. For example, the instance of the model shown in Figure 1 indicates that the 'extinction state', or state z6, develops a strong weight for the context during the reinstatement phase of presenting the shock alone. This state then leads to increased freezing during the final CS probe test as shown in the figure. By not comparing the difference in the evoked freezing CR at the test (ITI vs CS period), the purpose of the reinstatement test is lost in the sense of whether a previous memory was reactivated - was the response to the CS restored above and beyond the freezing to the context? I think the authors must somehow incorporate these different phases (CS vs ITI) into their model, particularly since this type of memory retrieval that depends on assessing latent states is specifically why the authors justified using the latent causal model.

      (3) This is related to the second point above. If the question is about the memory processes underlying memory retrieval at the test following reinstatement, then I would argue that the model parameters that are not involved in testing this hypothesis be fixed prior to the test. Unlike the Gershman paper that the authors cited, the authors fit all parameters for each animal. Perhaps the authors should fit certain parameters on the acquisition and extinction phase, and then leave those parameters fixed for the reinstatement phase. To give a more concrete example, if the hypothesis is that AD mice have deficits in differentiating or retrieving latent states during reinstatement which results in the low response to the CS following reinstatement, then perhaps parameters such as the learning rate should be fixed at this point. The authors state that the 12-month-old AD mice have substantially lower learning rate measures (almost a 20-fold reduction!), which can be clearly seen in the very low weights attributed to the AD mouse in Figure 3D. Based on the example in Figure 3D, it seems that the reduced learning rate in these mice is most likely caused by the failure to respond at test. This is based on comparing the behavior in Figures 3C to 3D. The acquisition and extinction curves appear extremely similar across the two groups. It seems that this lower learning rate may indirectly be causing most of the other effects that the authors highlight, such as the low σx, and the changes to the parameters for the CR. It may even explain the extremely high K. Because the weights are so low, this would presumably lead to extremely low likelihoods in the posterior estimation, which I guess would lead to more latent states being considered as the posterior would be more influenced by the prior.

      (4) Why didn't the authors use the latent causal model on the Barnes maze task? The authors mention in the discussion that different cognitive processes may be at play across the two tasks, yet reversal tasks have been suggested to be solved using latent states to be able to flip between the two different task states. In this way, it seems very fitting to use the latent cause model. Indeed, it may even be a better way to assess changes in σx as there are presumably 12 observable stimuli/locations.

    4. Reviewer #3 (Public review):

      Summary:

      This paper seeks to identify underlying mechanisms contributing to memory deficits observed in Alzheimer's disease (AD) mouse models. By understanding these mechanisms, they hope to uncover insights into subtle cognitive changes early in AD to inform interventions for early-stage decline.

      Strengths:

      The paper provides a comprehensive exploration of memory deficits in an AD mouse model, covering the early and late stages of the disease. The experimental design was robust, confirming age-dependent increases in Aβ plaque accumulation in the AD model mice and using multiple behavior tasks that collectively highlighted difficulties in maintaining multiple competing memory cues, with deficits most pronounced in older mice.

      In the fear acquisition, extinction, and reinstatement task, AD model mice exhibited a significantly higher fear response after acquisition compared to controls, as well as a greater drop in fear response during reinstatement. These findings suggest that AD mice struggle to retain the fear memory associated with the conditioned stimulus, with the group differences being more pronounced in the older mice.

      In the reversal Barnes maze task, the AD model mice displayed a tendency to explore the maze perimeter rather than the two potential target holes, indicating a failure to integrate multiple memory cues into their strategy. This contrasted with the control mice, which used the more confirmatory strategy of focusing on the two target holes. Despite this, the AD mice were quicker to reach the target hole, suggesting that their impairments were specific to memory retrieval rather than basic task performance.

      The authors strengthened their findings by analyzing their data with a leading computational model, which describes how animals balance competing memories. They found that AD mice showed somewhat of a contradiction: a tendency to both treat trials as more alike than they are (lower α) and similar stimuli as more distinct than they are (lower σx) compared to controls.

      Weaknesses:

      While conceptually solid, the model struggles to fit the data and to support the key hypothesis about AD mice's ability to retain competing memories. These issues are evident in Figure 3:

      (1) The model misses key trends in the data, including the gradual learning of fear in all groups during acquisition, the absence of a fear response at the start of the experiment, the increase in fear at the start of day 2 of extinction (especially in controls), and the more rapid reinstatement of fear observed in older controls compared to acquisition.

      (2) The model attributes the higher fear response in controls during reinstatement to a stronger association with the context from the unsignaled shock phase, rather than to any memory of the conditioned stimulus from acquisition.

      These issues lead to potential overinterpretation of the model parameters. The differences in α and σx are being used to make claims about cognitive processes (e.g., overgeneralization vs. overdifferentiation), but the model itself does not appear to capture these processes accurately.

      The authors could benefit from a model that better matches the data and that can capture the retention and recollection of a fear memory across phases.

      Conclusion:

      Overall, the data support the authors' hypothesis that AD model mice struggle to retain competing memories, with the effect becoming more pronounced with age. While I believe the right computational model could highlight these differences, the current model falls short in doing so.

    5. Author response:

      We appreciate the reviewers’ constructive comments and suggestions. We plan the following revisions to address the public reviews.

      Regarding model selection (from Reviewers 1 and 3)

      We will test whether the latent cause model has a better explanatory power for the observed reinstatement data compared with at least two other models, including the Rescorla-Wagner model. For each model, the prediction errors across all trials and those in the test 3 trial (reinstatement) will be calculated for individual animals. The explanatory power of the models will be discussed based on these results. 

      Regarding model validation (from Reviewers 1, 2, and 3)

      We acknowledge the reviewers’ concerns about potential parameter overfitting and misinterpretation. First, the simulation in the latent cause model will be run under other possible conditions to test whether our original condition can be justified, then clarify how certain parameters affect the predicted CR. Second, we will confirm if the prediction errors are comparable between experimental groups, present the correlation between parameters, and discuss this result in the revision. 

      To evaluate the effect of context in explaining reinstatement in the latent cause model, simulations of CR in test 3 when only context or tone is presented will also be performed and discussed with the behavioral data.

      Regarding the interpretation of the behavioral data (from Reviewers 1, 2, and 3) We will clarify our interpretation of the behavioral data by incorporating the additional analyses mentioned above; for example, to clarify the contribution of context in test 3, we will provide data on the CR before the tone presentation in our revision. In addition, how we expected and interpreted the reversal Barnes maze results from the memory modification characteristics estimated in the reinstatement test will be further discussed.

      Regarding the application of the latent cause model to the reversal Barnes maze task (from Reviewers 1, 2)

      We acknowledge the reviewers’ suggestions to apply the latent cause model to our Barnes maze results to strengthen the link and consistency. To further clarify the reason for including Barnes maze results, we will explicitly discuss how associative learning is involved in spatial learning in the revision. However, we will not be able to directly apply the latent cause model for the Barnes maze data for the following reasons. As we noted in the Results and Discussion, the latent cause model was built on associative learning and cannot be directly applied to the Barnes maze data. The cognitive processes in the Barnes maze task involve maintaining spatial representation of the environment, integrating own position and expected goal, and evaluating potential actions. Importantly, the chosen actions in this task directly affect subsequent observations, while an animal’s response based on an expected outcome typically does not alter future observation in a simple associative learning paradigm. 

      Thus, although associative learning (e.g., associations between the spatial cue and the location of the escape box) is certainly a critical building block and contributes to performance in the Barnes maze task, this mechanism alone cannot fully explain the animal’s navigation in the maze. We agree that having solid modeling results in the reversal Barnes maze task is an important direction, but extending the latent cause model for this purpose is beyond the scope of this study. We have suggested some possible approaches in the Discussion and will elaborate further on these conceptual distinctions and how latent cause framework assists in the interpretation of results.

    1. eLife Assessment

      This useful study reports datasets on gene expression and chromatin accessibility profiles of spermatogonia at different postnatal ages in mice. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics are solid. This study may be of interest to biomedical researchers working on male germline stem cells and male fertility.

    1. eLife Assessment

      Huang and colleagues examined neural responses in mouse anterior cingulate cortex (ACC) during a discrimination-avoidance task. The authors present useful findings that ACC neurons encode primarily post-action variables over extended periods rather than the outcomes or values of those actions. Though the methodological approach was sound, the evidence ruling out alternative explanations is incomplete and requires substantial control analyses.

    2. Reviewer #1 (Public review):

      Summary:

      In the current study, Huang et al. examined ACC response during a novel discrimination-avoid task. The authors concluded that ACC neurons primarily encode post-action variables over extended periods, reflecting the animal's preceding actions rather than the outcomes or values of those actions. Specifically, they identified two subgroups of ACC neurons that responded to different aspects of the actions. This work represents admirable efforts to investigate the role of ACC in task-performing mice. However, in my opinion, alternative explanations of the data were not sufficiently explored, and some key findings were not well supported.

      Strengths:

      The development of the new discrimination-avoid task is applauded. Single-unit electrophysiology in task-performing animals represents admirable efforts and the datasets are valuable. The identification of different groups of encoding neurons in ACC can be potentially important.

      Weaknesses:

      One major conclusion is that ACC primarily encodes the so-called post-action variables (specifically shuttle crossing). However, only a single example session was included in Figure 2, while in Supplementary Figure 2 a considerable fraction of ACC neurons appears to respond to either the onset of movement or ramp up their activity prior to movement onset. How did the authors reach the conclusion that ACC preferentially respond to shuttle crossing?

      In Figure 4, it was concluded that ACC neurons respond to action independent of outcome. Since these neurons are active on both correct and incorrect shuttle but not stay trials, they seem to primarily respond to overt movement. If so, the rationale for linking ACC activity and adaptive behavior/associative learning is not very clear to me. Further analyses are needed to test whether their firing rates correlated with locomotion speed or acceleration/deceleration. On a similar note, to what extent are the action state neurons actually responding to locomotion-related signals? And can ACC activity actually differentiate correct vs. incorrect stays?

      Given that a considerable amount of ACC neurons encode 'action content', it is not surprising that by including all neurons the model is able to make accurate predictions in Figure 6. How would the model performance change by removing the content neurons?

      Moving on to Figure 7. Since Figure 4 showed that ACC neurons respond to movement regardless of outcome, it is somewhat puzzling how ACC activity can be linked to future performance.

      Two mice contributed about 50% of all the recorded cells. How robust are the results when analyzing mouse by mouse?

      Lastly, the development of the new discrimination-avoid task is applauded. However, a major missing piece here is to show the importance of ACC in this task and what aspects of this behavior require ACC.

    3. Reviewer #2 (Public review):

      Summary:

      The current dataset utilized a 2x2 factorial shuttle-escape task in combination with extracellular single-unit recording in the anterior cingulate cortex (ACC) of mice to determine ACC action coding. The contributions of neocortical signaling to action-outcome learning as assessed by behavioral tasks outside of the prototypical reward versus non-reward or punished vs non-punished is an important and relevant research topic, given that ACC plays a clear role in several human neurological and psychiatric conditions. The authors present useful findings regarding the role of ACC in action monitoring and learning. The core methods themselves - electrophysiology and behavior - are adequate; however, the analyses are incomplete since ruling out alternative explanations for neural activity, such as movement itself, requires substantial control analyses, and details on statistical methods are not clear.

      Strengths:

      (1) The factorial design nicely controls for sensory coding and value coding, since the same stimulus can signal different actions and values.

      (2) The figures are mostly well-presented, labeled, and easy to read.

      (3) Additional analyses, such as the 2.5/7.5s windows and place-field analysis, are nice to see and indicate that the authors were careful in their neural analyses.

      (4) The n-trial + 1 analysis where ACC activity was higher on trials that preceded correct responses is a nice addition, since it shows that ACC activity predicts future behavior, well before it happens.

      (5) The authors identified ACC neurons that fire to shuttle crossings in one direction or to crossings in both directions. This is very clear in the spike rasters and population-scaled color images. While other factors such as place fields, sensory input, and their integration can account for this activity, the authors discuss this and provide additional supplemental analyses.

      Weaknesses:

      (1) The behavioral data could use slightly more characterization, such as separating stay versus shuttle trials.

      (2) Some of the neural analyses could use the necessary and sufficient comparisons to strengthen the authors' claims.

      (3) Many of the neural analyses seem to utilize long time windows, not leveraging the very real strength of recording spike times. Specifics on the exact neural activity binning/averaging, tests, classifier validation, and methods for quantification are difficult to find.

      (4) The neural analyses seem to suggest that ACC neurons encode one variable or the other, but are there any that multiplex? Given the overwhelming evidence of multiplexing in the ACC a bit more discussion of its presence or absence is warranted.

    4. Reviewer #3 (Public review):

      Summary:

      The authors record from the ACC during a task in which animals must switch contexts to avoid shock as instructed by a cue. As expected, they find neurons that encode context, with some encoding of actions prior to the context, and encoding of neurons post-action. The primary novelty of the task seems to be dynamically encoding action-outcome in a discrimination-avoidance domain, while this is traditionally done using operant methods. While I'm not sure that this task is all that novel, I can't recall this being applied to the frontal cortex before, and this extends the well-known action/context/post-context encoding of ACC to the discrimination-avoidance domain.

      While the analysis is well done, there are several points that I believe should be elaborated upon. First, I had questions about several details (see point 3 below). Second, I wonder why the authors downplayed the clear action coding of ACC ensembles. Third, I wonder if the purported 'novelty' of the task (which I'm not sure of) and pseudo-debate on ACC's role undermines the real novelty - action/context/outcome encoding of ACC in discrimination-avoidance and early learning.

      Strengths:

      Recording frontal cortical ensembles during this task is particularly novel, and the analyses are sophisticated. The task has the potential to generate elegant comparisons of action and outcome, and the analyses are sophisticated.

      Weaknesses:

      I had some questions that might help me understand this work better.

      (1) I wonder if the field would agree that there is a true 'debate' and 'controversy' about the ACC and conflict monitoring, or if this is a pseudodebate (Line 34). They cite 2 very old papers to support this point. I might reframe this in terms of the frontal cortex studying action-outcome associations in discrimination-avoidance, as the bulk of evidence in rodents comes from overtrained operant behavior, and in humans comes from high-level tasks, and humans are unlikely to get aversive stimuli such as shocks.

      (2) Does the purported novelty of the task undermine the argument? While I don't have an exhaustive knowledge of this behavior, the novelty involves applying this ACC. There are many paradigms where a shock triggers some action that could be antecedents to this task.

      (3) The lack of details was confusing to me:

      a) How many total mice? Are the same mice in all analyses? Are the same neurons? Which training day? Is it 4 mice in Figure 3? Five mice in line 382? An accounting of mice should be in the methods. All data points and figures should have the number of neurons and mice clearly indicated, along with a table. Without these details, it is challenging to interpret the findings.

      b) How many neurons are from which stage of training? In some figures, I see 325, in some ~350, and in S5/S2B, 370. The number of neurons should be clearly indicated in each figure, and perhaps a table.

      c) Were the tetrodes driven deeper each day? The depth should be used as a regressor in all analyses?

      d) Was is really ACC (Figure 2A)? Some shanks are in M2? All electrodes from all mice need to be plotted as a main figure with the drive length indicated.

      e) It's not clear which sessions and how many go into which analysis

      f) How many correct and incorrect trials (<7?) are there per session?

      g) Why 'up to 10 shocks' on line 358? What amplitudes were tried? What does scrambled mean?

      (4) Why do the authors downplay pre-action encoding? It is clearly evident in the PETHs, and the classifiers are above chance. It's not surprising that post-shuttle classification is so high because the behavior has occurred. This is most evident in Figure S2B, which likely should be a main figure.

      (5) The statistics seem inappropriate. A linear mixed effects model accounting for between-mouse variance seems most appropriate. Statistical power or effect size is needed to interpret these results. This is important in analyses like Figure 7C or 6B.

      (6) Better behavioral details might help readers understand the task. These can be pulled from Figures S2 and S5. This is particularly important in a 'novel' task.

      (7) Can the authors put post-action encoding on the same classification accuracy axes as Figure 6B? It'd be useful to compare.

      (8) What limitations are there? I can think of several - number of animals, lack of causal manipulations, ACC in rodents and humans.

      Minor:

      (1) Each PCA analysis needs a scree plot to understand the variance explained.

      (2) Figure 4C - y and x-axes have the same label?

      (3) What bin size do the authors use for machine learning (Not clear from line 416)?

      (4) Why not just use PCA instead of 'dimension reduction' (of which there are many?)

      (5) Would a video enhance understanding of the behavior?

    5. Author response:

      We thank the reviewers for their insightful feedback. Incorporating their recommendations will greatly enhance our manuscript for resubmission. Based on the review, it seems a major challenge to the interpretation of our study surrounds whether locomotion, itself, is responsible for increased ACC activity during our task. This was a shared concern for us during our analysis. We included data in our initial submission hoping to address these concerns. Specifically, we show that post-action activity outlasts movement termination, in most cases, on the order seconds after termination (Supplementary Fig 2). Likewise, post-action activity is not tied to shuttle initiations as ACC activity onset can vary greatly before and after initiation (Supplementary Fig 2). Lastly, the unique nature of action content neurons further supports a distinction from locomotor activity. They selectively fire for specific directions and, as a result, do not fire during movement in opposite directions. Despite these findings, we agree with reviews that inclusion of additional analyses, such as examining firing rates in respect to locomotion speed and acceleration/deceleration, will greatly strengthen our claim of ACC’s role in post-action activity. In our resubmission, we will seek to perform such an analysis, among others, to elucidate completely the role of locomotion in ACC post-action activity.

      Reviewers also pointed out an overall lack of details surrounding our task, analysis, statistical methods and experimental approaches. We will consider all the recommendations from the reviewers and integrate them into our resubmission to provide more detailed information. Notably, we will adjust our approach in describing our task. Reviewers discussed some criticism regarding the perceived novelty of the task as it shares many similarities with previous discrimination-avoidance tasks. The distinction with our task is regarding the nuance of how the meaning (safety vs shock) of the context and sensory stimuli dynamically changes based on the current environment (context x sound). This requires not only the discrimination of contextual and sensory stimuli but also the inter-modal integration of stimuli, which varies throughout the task. Sound A/B leads to different outcomes depending on the context, and similarly, the meaning of the context shifts in a sound-dependent manner.

      Lastly, in our follow-up submission we will work to include more robust analyses to utilize our temporal sensitivity of our recordings. We also will provide greater clarity on how each individual animal contributes to our overall findings. To conclude, we would like to once again thank our reviewers for their feedback and evaluation of our manuscript. We look forward to making the necessary adjustments for our future submission.

    1. eLife Assessment

      This manuscript by Kaur et al. identifies differential gene expression observed in distinct mouse lung cell populations, namely myeloid and lymphoid cells, upon short-term exposure to e-cig aerosols with various flavors. Their findings are potentially useful because the single-cell sequencing data provides a reference for future studies of genes and cellular pathways that are most affected by e-cig aerosols and their components. However, the evidence is incomplete due to limited statistical analyses and few biological replicates, as well as a lack of experimental validation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobacco-flavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG:VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further sub-clustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      Weaknesses:

      (1) The authors included a valuable control group - the PG:VG group, since PG:VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG:VG group, and the PG:VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology but similar effects were not corroborated by the single-cell data.

      (2) The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

    3. Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analysed and how many cells in each group were compared.

      Other specific weaknesses were identified in addition to the ones above:

      (1) Only 71,725 cells means only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      (2) The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      (3) There is no rigorous quantification of Ly6G+ and Ly6G- cells int he flow cytometry data.

      (4) Eosinophils are heavily involved in lung biology but are missing from the analysis.

      (5) The figures had no titles so were difficult to navigate.

      (6) PGVG is not defined and not introduced early enough.

      (7) Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      (8) It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      (9) Many statistics are only vs air control, but it would be more useful as a flavour comparison to see these vs PGVG. In some cases, the carrier PGVG looks worse than some of the flavours (which have nicotine).

      (10) The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavour vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions

      (11) The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      (12) Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

    4. Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4) Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      Weaknesses:

      (1) The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      (2) Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      (3) Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      (4) Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      (5) The manuscript lacks validation of findings in tissue by other methods such as staining.

      (6) This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobacco-flavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG/VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further sub-clustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      We thank the reviewer for identifying the strengths of our work.

      Weaknesses:

      The authors included a valuable control group - the PG/VG group, since PG/VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG/VG group, and the PG/VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology, but similar effects were not corroborated by the single-cell data.

      We thank the reviewer for this query. We agree that PG/VG group is the foundation of the e-liquid formulation and hence comparisons with this group is of significance to understand the effect of individual flavors on the cell population. Though we compared the flavored e-cig groups with PG/VG group, we did not discuss it in detail within the manuscript to avoid confusions in interpretation for such a big dataset. However, we will include the comparisons with the PG/VG group as a Supplement File in our revised manuscript to facilitate proper interpretation of our omics data to interested readers.

      While we agree that flavor-specific effects might be of interest, we did not delve into exploring them in detail as the fruit flavored e-liquids have now been regulated for sale in the US. Thus, from regulatory point of view, the effects of tobacco- and menthol-flavored e-liquids hold most interest. Since at the time of conducting this study, fruit flavors were in the market, we have still included the data. However, studying it further was not the focus of this work. Nevertheless, interested readers of our manuscript can have access to our dataset to allow further analyses and interpretation of our results.

      The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

      We agree with the reviewer that the loss of Ly6G on neutrophils is a very interesting find and we are in process of designing neutrophil specific experiments to study the impact of e-cig exposure on neutrophil maturation and function which will be discussed in subsequent work by our group. However, to address the concerns raised by the reviewer, we are staining the lung tissue samples from air-and differently flavored e-cig aerosol exposed mouse lungs with Ly6G and S100A8 (universal marker for neutrophil) to see the infiltration of Ly6g+ vs Ly6g- neutrophils within the lungs of exposed and unexposed mice. This would also address the question if these populations were neutrophils or belong to another myeloid origin as suggested by recent publications. We will share the results from our findings in the revised manuscript and update our interpretations accordingly with better validations.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      We appreciate the reviewer for recognizing the strength of this work.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analyzed and how many cells in each group were compared.

      We thank the reviewer for the critique to allow us to improve our analyses. We understand that the low number of replicates in this work makes the analyses difficult to draw solid conclusions, but this was a pilot study to understand the changes in the mouse lung upon acute exposures to flavored e-cig aerosols at a single cell level. So far, the e-cig field has been primarily focused on conducting toxicological studies to help regulatory bodies to set standards and enforce laws to better regulate the manufacture, sale and distribution of e-cig products. However, adolescents and young adults are still getting access to these products, and there is little to no understanding of how this may affect the lung health upon acute and chronic exposures. Single cell technology is a powerful tool to analyze the gene expression changes within cell populations to study cell heterogeneity and function. Yet, it is a costly tool, owing to which, conducting such analyses on large sample sizes is not ideal. This pilot study was designed to get some initial leads for future studies involving larger sample sizes and chronic exposures. Further, we still intend to share our results with the scientific community due to the value of such a dataset for a wider audience interested in learning about the mechanistic underpinnings of e-cig exposures in vivo.

      We understand that the validations are limited in our current work and so we are in process of conducting some immunostaining to validate a few targets made through this work. We also want to add here that validating single cell findings using any of the classical methods of experimentation including ELISA, qPCR or flow cytometry is sometimes difficult as many of these techniques still investigate the tissue while the changes shown in single cell analyses are mainly pertaining to a single cell type. This could be a probable reason for the scRNA seq results not aligning with our findings from flow cytometry. The data/findings from this pilot study have now allowed us to be better informed to design an effective flow panel for our future studies. In terms of the statistics and the number of cells for each analysis, we will share the detailed account and information for each to allow better interpretation of our results.

      Only 71,725 cells means only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      We do agree that the number of cells could be too low, but to avoid this we never studied the gene expression variations at the finest level of cell identity. We classified the cell clusters into general annotations -myeloid, lymphoid, endothelial, stromal and epithelial- and identified the changes in the gene expressions. Of these, only two clusters (myeloid and lymphoid) with more than ~1000 cells per cell type per group were studied in detail. We will include the cell count information to allow better interpretation of our results in the revised manuscript.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      This is a well-made point, and we thank the reviewer for this comment. We agree that the dynamic range RNA measurement is limited and for low cell numbers that could lead to bias. We are in process of validating the findings regarding the presence of Ly6G+ and Ly6G- cells in our control and treated lungs, the outcome of which will be discussed in the revised manuscript. We will also provide the cell number for the Ly6G- cell cluster for each sample with more detailed discussion of our findings. Due to the small sample size and cell capture, few limitations are hard to overcome which will be further elaborated upon in our revisions.

      There is no rigorous quantification of Ly6G+ and Ly6G- cells in the flow cytometry data.

      We understand that flow-based quantification of our scRNA seq findings would be interesting. However, flow cytometry and single cell suspension to perform sequencing were performed parallelly for this study. We used a basic flow panel using single markers to identify individual immune cell type. We did identify changes in the Ly6G population in our treated and control samples using scRNA seq and intend to include it as a marker for our future studies using flow cytometry. But unfortunately, the same analyses could not be performed for the current batch of samples. We will still include results from IHC staining to identify the Ly6G+ and Ly6G- population in the lung tissues from control and treated mice in revised manuscript to address some of the concerns raised here.

      Eosinophils are heavily involved in lung biology but are missing from the analysis.

      We used RBC lysis buffer to remove the excess RBCs during lung digestion for preparation of single cell suspension for scRNA seq in this study. Reports suggest that RBC lysis could adversely affect the eosinophil number and function. We did not identify any cell cluster, representing markers for eosinophils through our scRNA seq data and we believe that our lung digestion protocol could be the reason for the same. We have studied the eosinophil number changes through flow cytometry in these samples and have found significant changes as well. However due to our inability to find cell clusters for eosinophil through scRNA seq data, we did not include these results in the final manuscript. To avoid confusions and maintain transparency we will include our results from flow cytometry experiments in the revised manuscript.

      The figures had no titles so were difficult to navigate.

      We will make necessary adjustments to the data representation and include the titles to enable easy navigation of the Figures.

      PG/VG is not defined and not introduced early enough.

      We agree that PG/VG is an important control to compare in e-cig studies. This was the reason why this group was included, and we performed comparisons with this group for scRNA seq studies as well. However, to reduce the complexity of the study, we only shared the comparisons with Air control in this manuscript. We will include the comparisons made with PG/VG group as a Supplementary File in the revised manuscript to allow the interested readers have access to the study results and make necessary interpretations for future research.

      Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      We thank the reviewer for this suggestion; however, we cannot perform the BrDU or other proliferation assay on neutrophils for now. We are planning to include these in the study designs of our future work, however we have limitations of funds to continue further experimentation to support this claim for this study. We mention clearly that this is only a scRNA seq finding and requires further study to avoid over-interpretation of our results.

      It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      We thank the reviewer for bringing this concern. We understand that this is a valid point and will include all the necessary information regarding the statistics and other related parameters in the revised manuscript.

      Many statistics are only vs air control, but it would be more useful as a flavor comparison to see these vs PG/VG. In some cases, the carrier PG/VG looks worse than some of the flavors (which have nicotine).

      We will include the comparisons with PG/VG as supplementary file in our revised manuscript, however we do not intend to describe all those changes in detail in the main manuscript.

      The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavor vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions.

      While we agree in part with the concern raised here, we wish to point out that there are limitations to every experiment. In our opinion, an omics study is not necessarily aimed to find the changes at transcript level with absolute certainty, rather to identify probable cell and gene targets to validate with subsequent work. We never claim that our findings are absolute outcomes but rather add the limitation of sample number and need for further research at every step. The strength of this work is to be the first study of its kind looking at changes in the lung cell population at single cell level upon e-cig aerosol exposure. This study has provided us with interesting gene and cell targets that we are now validating with future work. We still strongly believe that a dataset like this is a useful resource for a wider audience to allow efficient study designs and hence it is befitting to be published and discussed amongst our peers.

      The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      We thank the reviewer for pointing this out. We realize that the data might be difficult to understand due to scaling of the color codes for the heatmap. We will change the graphical representation and include actual number for fold change in our revised manuscript to allow easy interpretation of these results.

      Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

      While we have included a pictorial representation of the n number in Figure 1A and mentioned n number in the Figure legends for each figure, we understand that it maybe difficult to navigate. We will attempt to address this in a better manner in the revised manuscript.

      However, with respect to the second comment we would like to differ from the reviewer’s opinion. Each scRNA seq data had 2 samples – one for male and another for female which has been clearly shown in the current figures. The pooling of cells as mentioned in the comment happened at the stage of preparation of cell suspension from each sex/group at the start of the sequencing. We do not have any means to show the variability amongst pooled samples, which we acknowledge as a shortcoming of our work. So, in terms of representation of the heatmaps and data analyses we have included all the needed information to uphold transparency of our study design and data visualization for each figure and would like to stick to the current representations.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4) Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      We thank the reviewer for identifying the key strengths of our work and listing it in a concise and well-rounded fashion.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      This study was not designed to study the effects of chronic exposures on lung tissues. We were interested in delineating the effect of acute exposures for which the proposed study design was chosen. Previous work by our group has performed similar exposures and has been well received by the community. We understand that chronic exposures will be interesting to look at, however that was not the purpose of this pilot study. We will now explicitly mention this aspect in the revised manuscript.

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We thank the reviewer for this observation, and we will include the necessary validations and details of the sex-based statistical analyses in the revised version of this manuscript.

      Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      We thank the reviewer and will include all the necessary statistical details with more details in the revised manuscript.

      Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      We are in process of performing a few validatory experiments and intend to include few other pieces of data to this manuscript to add to the overall merit of our findings. However as pointed out by the reviewer themselves the strength of this work is in the first ever scRNA seq analyses of mouse exposed to differently flavored e-cig aerosols in vivo. We also show cell-specific differential gene expression and address some of the major queries made around e-cig research including release of metals on a day-to-day basis from the same coil. The limited sample number make it difficult to draw solid conclusions from this work, which has been discussed as a shortcoming. However the major strength of this work is not in identifying specific trends but rather to explore the possible cell and gene targets to expand the study for longer (chronic) exposures with a larger sample group.

      The manuscript lacks validation of findings in tissue by other methods such as staining.

      We are conducting some studies and will include the validatory experiments and staining in the revised manuscript to support our findings.

      This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for this observation. The cell numbers for some cell clusters (especially epithelial cells) were too low. So, though we have performed the differential gene expression analyses on all the cell clusters, we refrained from discussing it in the manuscript to avoid over interpretation of our results. Only clusters with high enough (~1000) cells per sex per group were used to plot the heatmaps. We will also include the cell numbers for each cell type in the revisions to allow better interpretation of our data. Furthermore, the raw data from this study will be freely available to the public upon publication of this manuscript. This would enable the interested readers to access the raw data and study the cell types of interest in detail based on their study requirements. This data will be a useful resource for all in this community to inform and design future studies.

    1. eLife Assessment

      This important study suggests that adolescent mice exhibit less accuracy than adult mice in a sound discrimination task when the sound frequencies are very similar. While the evidence supporting this observation is solid, demonstrating that this effect arises from cognitive differences between adolescent and adult mice requires more thorough documentation of task performance, as well as control of impulsivity and baseline licking. The authors should also clarify how difficult and easy trials are interleaved in the task and provide a more comprehensive discussion of the cortical inactivation results in relation to the overall task difficulty.

    2. Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performances on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single-cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescents. Overall, the differences between adolescent and adult neuronal data correlate with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      (1) The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across ages. The experiments with optogenetics and novice mice complete the research question in a convincing way.

      (2) The analysis, including the systematic comparison of task performance across the two age groups, is most interesting and reveals differences in learning (or learning strategies?) that are compelling.

      (3) Neuronal recording during both behavioral training and passive sound exposure is particularly powerful and allows interesting conclusions.

      Weaknesses:

      (1) The presentation of the paper must be strengthened. Inconsistencies, mislabeling, duplicated text, typos, and inappropriate color code should be changed.

      (2) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      (3) The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other does not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas? If this is not the case, how is it compatible with the published literature?

      (4) Some analysis interpretations should be more cautious. For example, I do not understand how the lick bias, defined -according to the method- as the inverse normal distribution of the z-score (hit rate) +z-scored (false alarm rate; Figure 1j?, l.749-750), should reflect a cognitive difficulty (l. 161-162, l.171). A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how - and how well - adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds were close in frequency and thus difficult to distinguish, and could, at least in part, be attributed to the younger mice's inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation, and high-density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them. The results of the optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Benedikt et al. sought to understand how adolescents and adult mice differ in auditory cortical processing, performance on a go/nogo sound-guided task, and learning. They report that behavioral performance is superior in adults. They also report that neuronal representations of both the acoustic stimulus and behavioral choice are weaker and sluggish in adolescents compared to adults and that these differences were larger in expert mice than in novices. The neural basis of adolescent auditory cognition is an important topic (both clinically and from a basic science perspective) and vastly understudied. However, many aspects of the study fell short, thereby undermining the primary conclusions drawn by the authors. My major concerns are as follows:

      (1) The authors report that "adolescent mice showed lower auditory discrimination performance compared to adults" and that this performance deficit was due to (among other things) "weaker cognitive control". I'm not fully convinced of this interpretation, for a few reasons. First, the adolescents may simply have been thirstier, and therefore more willing to lick indiscriminately. The high false alarm rates in that case would not reflect a "weaker cognitive control" but rather, an elevated homeostatic drive to obtain water. Second, even the adult animals had relatively high (~40%) false alarm rates on the freely moving version of the task, suggesting that their behavior was not particularly well controlled either. One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      (2) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      (3) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions? And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

    5. Author response:

      Reviewer #1:

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We will revise the manuscript to correct the abovementioned issues.

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We will carefully review, verify claims, and correct conclusions where needed.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We will analyse the data in Figure 7 separately for AUDp and secondary auditory cortices to test regional differences. Additionally, we will provide a table summarizing key neuronal firing properties for each area during passive recordings to clarify how activity varies across cortical subregions and developmental stages.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      We will address issues around lick bias including alternative explanations, such as differences in motivation or impulsivity.

      Reviewer #2:

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We will edit the discussion and clarify these points. In addition, we will adjust and extend the methodology section to clarify the rationale of our analysis.

      B) The results of the optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We agree that the effects observed in our optogenetic manipulation warrant further discussion. We will extend on the analysis and discussion of ACx silencing.

      Reviewer #3:

      A) One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      We recognize the need for a more nuanced analysis for the head-fixed version of the task. We will extend the behavioral analysis and provide more details to clarify these points.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We appreciate the reviewer pointing out instances where our citations may not fully support our claims. We will carefully review the relevant citations and revise them to ensure they accurately reflect the findings of the cited studies. We will update references in lines 64–66 and 72–74 to better align with the specific stimulus types and developmental timelines discussed.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We agree that pooling neurons from multiple auditory cortical regions could potentially obscure region-specific differences. However, we addressed this concern by analyzing regional differences in neuronal firing properties, as shown in Supplementary Figures S4-1 and S4-2, and Supplementary Tables 2 and 3. Additionally, we examined stimulus-related and choice-related activity across regions and found no significant differences, as presented in Supplementary Figure S4-3. Please see our response to Reviewer 1, where we further elaborate on this point.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We acknowledge that other cortical layers are also of interest and may contribute differently to auditory processing across development. Our focus on layers 5/6 was motivated by both methodological considerations and biological relevance. These layers contain many of the principal output neurons of the auditory cortex, and are therefore well positioned to influence downstream decision-making circuits. We will clarify this rationale in the revised manuscript and note the limitations of our approach.

    1. eLife Assessment

      This study presents important information about the role of mu opioid receptors in synaptic communication between the medial habenula and the interpeduncular nucleus. The authors provide solid evidence that mu opioid receptor activation has differential effects on glutamate release from substance P neurons and cholinergic neurons, with a canonical reduction in release from the former but a novel increase in release from the latter. They also show that blocking potassium channels can unmask a nicotinic cholinergic synaptic response that is also facilitated by mu opioid receptor activation. This work will be of interest to those studying the interpeduncular nucleus, as well as the larger neuroscience community studying opioids and motivated behavior.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors demonstrate for the first time that opioid signaling has opposing effects on the same target neuron depending on the source of the input. Further, the authors provide evidence to support the role of potassium channels in regulating a brake on glutamatergic and cholinergic signaling, with the latter finding being developmentally regulated and responsive to opioid treatment. This evidence solves a conundrum regarding cholinergic signaling in the interpeduncular nucleus that evaded elucidation for many years.

      Strengths:

      This manuscript provides 3 novel and important findings that significantly advance our understanding of the medial habenula-interpeduncular circuitry:

      (1) Mu opioid receptor activation (mOR) reduces postsynaptic glutamatergic currents elicited from substance P neurons while simultaneously enhancing postsynaptic glutamatergic currents from cholinergic neurons, with the latter being developmentally regulated.

      (2) Substance P neurons from the Mhb provide functional input to the rostral nucleus of the IPN, in addition to the previously characterized lateral nuclei.

      (3) Potassium channels (Kv1.2) provide a break in neurotransmission in the IPN.

      Weaknesses:

      Overall I find the data presented compelling, but I feel that the number of observations is quite low (typically n=3-7 neurons, typically one per animal). While I understand that only a few slices can be obtained for the IPN from each animal, the strength of the novel findings would be more convincing with more frequent observations (larger n, more than one per animal). The findings here suggest that the authors have identified a novel mechanism for the normal function of neurotransmission in the IPN, so it would be expected to be observable in almost any animal. Thus it is not clear to me why the authors investigated so few neurons per slice and chose to combine different treatments into one group (e.g. Figure 2f), even if the treatments have the same expected effect.

      There are also significant sex differences in nAChR expression in the IPN that might not be functionally apparent using the low n presented here. It would be helpful to know which of the recorded neurons came from each sex, rather than presenting only the pooled data.

      There are also some particularly novel observations that are presented but not followed up on, and this creates a somewhat disjointed story. For example, in Figure 2, the authors identify neurons in which no response is elicited by light stimulation of ChAT-neurons, but the application of DAMGO (mOR agonist) un-silences these neurons. Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons?

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Chittajallu and colleagues present compelling evidence that mu opioid receptor (MOR) activation can potentiate synaptic neurotransmission in a medial habenula to interpeduncular nucleus (mHb-IPN) subcircuit. While, projections from mHb tachykinin 1 (Tac1) neurons onto lateral IPN neurons show a canonical opioid-induced synaptic depression in glutamate release, excitatory neurotransmission in mHb choline acetyltransferase (ChAT) projections to the rostral IPN is potentiated by opioids. This process may require the inhibition of voltage-gated potassium channels (Kv1.2) and results in an augmented co-release of glutamate and acetylcholine. This function emerges around age P27 in mice, when MOR expression in the IPN peaks.

      Strengths:

      Carefully executed electrophysiological experiments with appropriate controls. Interesting description of a neurodevelopmental change in the effects of opioids on mHb-IPN signaling.

      Weaknesses:

      The genetic strategy used to target the mHb-IPN pathway (constitutive expression in all ChAT+ and Tac1+ neurons) is not specific to this projection. In addition, a braking mechanism involving Kv1.2 has not been identified.

    4. Reviewer #3 (Public review):

      Summary:

      Here the authors describe the role of mORs in synaptic glutamate release from substance P and cholinergic neurons in the medial habenula to the interpeduncular nucleus (IPN) circuit in adult mice. They show that mOR activation reduces evoked glutamate release from substance P neurons yet increases evoked glutamate release and Ach release from cholinergic neurons. Unlike glutamate release, Ach release is only detected when potassium channels are blocked with 4-AP or dendrotoxin, implicating Kv1.2. The authors also report a previously unidentified glutamatergic input to IPR mediated from SP neurons and describe the developmental timing of mOR-facilitation in adolescent mice.

      Strengths:

      (1) The experiments provide new insight into the role of mORs in controlling evoked glutamate release in a circuit with high levels of mORs and established roles in relevant behaviors.

      (2) The experimental design is generally rigorous, and the results are clear-cut. The conclusions are largely supported by the data.

      (3) The findings will be of interest to those working in the field.

      Weaknesses:

      (1) The mechanistic underpinnings of the most interesting results are not pursued. For example, the experiments do not provide new insight into the differential effects of evoked and spontaneous glutamate/Ach release by Gi/o coupled mORs, nor the differential threshold for glutamate versus Ach release.

      (2) The significance of the ratio of AMPA versus nACh EPSCs shown in Figure 6 is unclear since nAChR EPSCs measured in the K+ channel blockers are compared to AMPA EPSCs in control (presumably 4-AP would also increase AMPA EPSCs).

      (3) The authors note that blocking Kv1 channels typically enhances transmitter release by slowing action potential repolarization. The idea that Kv1 channels serve as a brake for Ach release in this system would be strengthened by showing that these channels are the target of neuromodulators or that they contribute to activity-dependent regulation that allows the brake to be released.

    1. eLife Assessment

      This study provides proof-of-principle data for the use of trained immunity to modulate macrophage interactions with tumours. The study makes a valuable contribution to the field of trained immunity. However, the study is incomplete without vivo data, which would have made the claims in the paper stronger by providing meaningful context for the in vitro experiments that were conducted.

    2. Reviewer #1 (Public review):

      Summary:

      The authors were attempting to describe whether trained innate immunity would modulate antibody-dependent cellular phagocytosis (ADCP) and/or efferocytosis.

      Strengths:

      The use of primary murine macrophages, and not a cell line, is considered a strength.

      The trained immunity-mediated changes to phagocytosis affected both melanoma and breast cancer cells. The broad effect is consistent with trained immunity.

      Weaknesses:

      The most significant weakness, also noted by the authors in the discussion, is the lack of in vivo data. Without these data, it is not possible to put the in vitro data in context. It is unknown if the described effects on efferocytosis will be relevant to the in vivo progression of cancer.

    3. Reviewer #2 (Public review):

      Summary:

      The authors follow up their preclinical work on beta-glucan-induced trained immunity in murine tumor models that they published in Cell in 2020. In particular, they focus on the role of trained immunity and efferocytosis of cancer cells

      Strengths:

      While properly conducted, the work is underwhelming and fully depends on in vitro observations performed with co-cultures of bone marrow derived macrophages from beta-glucan-treated mice and tumor cell lines. From these in vitro studies, the authors conclude that trained immunity induction has no effect on antibody-dependent cellular phagocytosis, while it decreases efferocytosis.

      Weaknesses:

      It would be important to study these phenomena in tumor mouse models in vivo. The authors clearly have the expertise as they have shown in previous studies. Especially because the in vitro observation appears to conflict with the in vivo anti-tumor found in mice prophylactically treated with beta-glucan. Clearly, trained immunity is associated with diverse cellular responses and mechanisms, some of which may promote tumor growth, as the current manuscript suggests, but in the absence of in vivo studies, it is merely a mechanistic exercise of which the relevance is difficult to determine.

    4. Reviewer #3 (Public review):

      Summary:

      Chatzis et al showed that β-glucan trained macrophages have decreased phagocytic activity of apoptotic tumor cells and that is accompanied by lower levels of secreted IL-1β using a mouse model.

      Strengths:

      This finding has a potential impact on designing new cancer immunotherapeutic approaches by targeting macrophage efferocytosis.

      Weaknesses:

      Whether this finding could be applied to other scenarios is underdetermined.

      (1) Does the decrease of efferocytosis also occur in human monocytes/macrophages after training?

      (2) Both β-glucan and BCG are well-trained innate immunity agents, the authors showed that β-glucan decreased efferocytosis via IL-1 β, so it is interesting to know whether BCG has a similar effect.

    1. eLife Assessment

      Using fMRI-based pRF mapping, this important study presents a novel method to estimate visual field (VF) and VF loss/or potential restoration, through analysis of contrast sensitivity patterns in the early visual cortex. While the approach is very interesting and the evidence supporting the claims of the authors is solid, some methodological concerns need to be addressed. The work will be of interest to researchers in vision/clinical vision, neuroscience, and brain imaging.

    2. Reviewer #1 (Public review):

      Summary:

      Integrating large-field stimulation with a retinotopic atlas, this study introduces an fMRI-based method for measuring contrast sensitivity across the visual field. Retinotopy was assessed using pRF mapping and a calibrated Benson atlas. The authors validate their method by replicating known patterns of contrast sensitivity across eccentricities and visual field quadrants in healthy subjects and demonstrate its potential clinical utility through case studies of both simulated and real visual field loss.

      Strengths:

      The new method is promising, with potential clinical utility in assessing visual field loss.

      Weaknesses:

      The current claims should be better supported by more evidence.

      In the first experiment, have the statistics undergone multiple comparison corrections (e.g., Line 441-442)? Given the small sample size, incorporating additional statistical tests (such as the Bayes Factor) could strengthen the analysis.

      The authors claim that "structure-based atlases can replace the need for pRF mapping in cases where it might otherwise be difficult or impossible to collect pRF data." This claim needs further scrutiny. Currently, only one simulated condition of visual field loss was examined in one subject. Also, in Figure 7, contrast sensitivity in the periphery differs between pRF mapping and the Benson atlas. How do the authors explain this discrepancy?

      Overall, the writing could be significantly improved.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses functional MRI to evaluate visual contrast sensitivity across the visual field at the level of the visual cortex, testing the method in a small group of normally sighted individuals and one with sight loss as proof of principle. The results suggest a promising technique to measure vision objectively across the visual field and overcomes the requirement for careful fixation which is often challenging in those with low vision or sight loss.

      Strengths:

      (1) Objective measure of central vision: The proposed method may provide a more comprehensive and objective assessment of residual visual function in individuals with sight loss. This may be particularly useful for those with central visual field loss without the requirement of stable fixation or subjective motor responses.

      (2) More sensitive measure: The use of slope to calculate contrast sensitivity across a range of contrasts within the brain is clever and likely more sensitive than single threshold measurements or standard clinical measures of visual acuity using letter charts. Standard supra-threshold (high contrast) tests are not ideal for capturing residual vision or partial vision loss.

      (3) Good agreement with standard atlas: The Benson atlas provides a good estimate of visual field maps within V1 based on anatomical landmarks, and the authors take steps to refine this informed by cortical magnification and V1 surface area (brain size) for each individual participant. This could allow the technique to be generalised without the need to collect lengthy individual mapping data from every participant.

      (4) Within-subject reproducibility: The measurements appear to be sensitive and reproducible, particularly in those with normal vision, and are consistent with known features of visual sensitivity differences in different parts of the visual field.

      (5) Potential tool to measure visual field sensitivity in controls: Even if the proposed methods are not ideal for widespread clinical translation, they do offer an exciting tool to test hypotheses about visual field differences in healthy controls. For example, there seems to be an increase in sensitivity on either side of the simulated ring scotoma (Figure 6 - perhaps due to the release of lateral inhibition?). Reliability measures suggest that individual differences are consistent in healthy controls (although not tested statistically, perhaps due to the small sample size?). Whether they reflect behaviourally meaningful differences in visual field sensitivity could be tested in individuals by comparing them to behavioural measures across the visual field.

      (6) Potential tool to test novel treatments: The proposed techniques could be used to test within-subject changes in visual function in environments that are equipped to measure and analyse fMRI data, including clinical trials aimed at determining the success of novel treatments. Further testing should reveal whether the method is suitable for testing low-vision patients with unstable fixation (e.g., nystagmus) and whether this affects slope and contrast sensitivity estimates. In theory, it should not have a substantial effect, except perhaps in regions near the stimulus edges.

      Weaknesses:

      (1) Questionable sensitivity to differences in patients. The variability in heat maps across healthy control participants is somewhat surprising. Do differences between individuals represent actual visual sensitivity differences, or are they an artifact of the measurement technique, e.g., due to signal-to-noise differences introduced by local variations in brain anatomy? Will the substantial variance across controls allow for a sufficiently stable baseline to detect meaningful differences in individual patients? Also, as the authors rightly point out, Benson atlas does not model differences along meridians, so upper/lower field differences might not be detectable.

      (2) Effects of unstable fixation/eye movements not explicitly tested: The methods state, 'In all tasks, participants were asked to report when the color of a central fixation dot changed', suggesting participants maintained fairly good fixation. Most of the results seem to pertain to measurements where central fixation is required. How does unstable fixation affect measurements?

      (3) Potential for clinical translation. Although it is a sensitive measure, functional MRI is costly, is not available in all clinical settings, requires significant post-processing analyses, and may be contraindicated in some individuals due to safety (e.g., metallic implants) or other concerns (e.g., claustrophobia). These could present significant barriers to widespread clinical translation if this were the ultimate goal of the study.

      (4) Limited range of spatial frequencies. The spatial frequencies tested were still quite low (0.3 and 3cpd) compared to measures such as visual acuity. Extending the measurements to higher spatial frequencies could allow better characterization of central vision, although necessarily for peripheral vision.

    4. Reviewer #3 (Public review):

      Summary:

      Chow-Wing-Bom et al. introduce an innovative wide-field visual stimulation setup for 3T experiments that enables stimulation up to a diameter of 40{degree sign} visual angle while allowing continuous gaze tracking. Using this setup, the authors systematically investigate contrast sensitivity across the visual field by presenting subjects with sinusoidal gratings varying in contrast and spatial frequency. Their findings confirm the expected organization of contrast sensitivity, demonstrating a preference for high spatial frequencies in the central field and lower frequencies in the periphery. They also extend these measurements to eccentricities up to 20{degree sign}, which exceeds previous fMRI-based reports. Moreover, the study explores the potential of using contrast sensitivity calculations as a method for detecting visual field defects, as demonstrated in both a healthy subject with an artificial, ring-shaped scotoma and a patient with LHON.

      Strengths:

      (1) The manuscript is well written and provides comprehensive methodological details, ensuring high transparency and reproducibility.

      (2) The visual stimulation setup represents a significant technical advance by enabling wide-field stimulation with continuous eye tracking, which is crucial for both research and potential clinical applications.

      (3) The study confirms established findings regarding the organization of contrast sensitivity while extending them to a larger eccentricity range.

      (4) The efforts to establish a measure for visual field losses align with current efforts to develop objective alternatives to conventional perimetry.

      Weaknesses:

      (1) The authors should more strongly emphasize their findings on the organization of contrast sensitivity, particularly in light of the stimulation extent provided by the wide-field setup.

      (2) Certain methodological aspects require further clarification, particularly regarding the correction of eccentricity values from the Benson atlas. It's not clear which V1 masks are used for the specific analysis which could have a substantial impact on the reported differences between the two approaches of pRF mapping and atlas-based pRF parameters.

      (3) Minor inconsistencies in reporting, e.g., the introduction of a second session in the Results section.

      (4) The conclusion that high-contrast patterns as in pRF mapping are not optimal to test for subtle but potentially clinically relevant changes in the visual field coverage is very valid. The suggested use of contrast sensitivity can therefore be a potentially well-suited parameter for estimating visual field losses. The presented work is an interesting starting point and the proposed method of using contrast sensitivity as a measure for partial vision loss should further be explored.

    1. eLife Assessment

      The medicinal leech preparation is an amenable system in which to understand the neural basis of locomotion. Here a previously identified non-spiking neuron was studied in leech and found to alter the mean firing frequency of a crawl-related motoneuron, which fires during the contraction phase of crawling. The findings are valuable and the experiments were diligently done and generally solid; however, the presentation could improve. The work could be taken to the next level by further experiments, and by providing more overall context for the study.

    2. Reviewer #1 (Public review):

      Summary:

      The Szczupak lab published a very interesting paper in 2012 (Rodriquez et al. J Neurophysiol 107:1917-1924) on the effects of the segmentally-distributed non-spiking (NS) cell on crawl-related motoneurons. As far as I can tell, the working model presented in 2012, for how the non-spiking (NS) cell impacts the crawling motor pattern, is the same functional model presented in this new paper. Unfortunately, the Discussion does not address any of the findings in the previous paper or cite them in the context of NS alterations of fictive crawling. Aside from different-looking figures and some new analyses, the results and conclusions are the same.

      Strengths:

      The figures are well illustrated.

      Weaknesses:

      The paper is a mix of what appears to be two different studies and abruptly switches gears to examine how closely the crawl patterning is in the intact animal as compared to the fictive crawl patterning in the intact animal. Unfortunately, previous studies in other labs are not cited even though identical results have been obtained and similar conclusions were made. Thus, the novelty of the results is missing for those who are familiar with the leech preparation. The lack of appropriate citations and discussion of previous studies also deprives the scientific community of fully comprehending the impact of the data presented and the science it was built upon.

      (1) Results, Lines 167-170: "While multiple extracellular recordings have been performed previously (Eisenhart et al., 2000), these results present the first quantitative analysis of motor units activated throughout the crawling cycle. The In-Phase units are expected to control the contraction stage by exciting or inhibiting the longitudinal or circular muscles, respectively, and the Anti-Phase units to control the elongation stage by exciting or inhibiting the circular or longitudinal muscles, respectively."

      The first line above is misleading. The study by Puhl and Mesce (2008, J. Neurosci, 28:4192- 420) contains a comprehensive analysis of the motoneurons active during fictive crawling with the aim of characterizing their roles and phase relationships and solidifying the idea that the oscillator for crawling resides in a single ganglion. Intracellular recordings from a number of key crawl-related motoneurons were made in combination with extracellular recordings of motoneuron DE-3, a key monitor of crawling. In their paper, it was shown that motoneurons AE, VE-4, DI-1, VI-2, and CV were all correlated with crawl activity, and fired repeatedly either in phase or out-of-phase with DE-3. They were shown to be either excitatory or inhibitory.

      At a minimum, the above paper should be cited. The submitted paper would be strengthened if some of these previously identified motoneurons were again recorded with intracellular electrodes and concomitant NS cell stimulation. The power of the leech preparation is that cells can be identified as individuals with dual somatic (intracellular) and axonal recordings (extracellular). The shortfall of this aspect of the study (Figure 5) is that the extracellular units have not been identified here. In fact, these units might not even be motoneurons. They could represent activity from the centrally located sensory neurons, dopamine-modulated afferent neurons or peripherally projecting modulatory neurons. Essentially, they may not have much to do with the crawl motor pattern at all.

      (2) Results Lines 206-210: "with the elongation and contraction stages of in vivo behavior. However the isometric stages displayed in vivo have no obvious counterpart in the electrophysiological recordings. It is important to consider that the rhythmic movement of successive segments along the antero-posterior axis of the animal requires a delay signal that allows the appropriate propagation of the metachronal wave, and this signal is probably absent in the isolated ganglion."

      The so-called isometric stages, indeed, have an electrophysiological counterpart due in part to the overlapping activities across segments. This submitted paper would be considerably strengthened if it referred to the body of work that has examined how the individual crawl oscillators operate in a fully intact nerve cord, excised from the body but with all the ganglia (and cephalic ganglion) attached. Puhl and Mesce 2010 (J. Neurosci 30: 2373-2383) and Puhl et al. 2012 (J. Neurosci, 32:17646 -17657) have shown that "appropriate propagation of the metachronal wave" requires the brain, especially cell R3b-1. They also show that the long-distance projecting cell R3b-1 synapses with the CV motoneuron, providing rhythmic excitatory input to it.

      For this and other reasons, the paper would be much more informative and exciting if the impacts of the NS cell were studied in a fully intact nerve cord. Those studies have never been done, and it would be exciting to see how and if the effects of NS cell manipulation deviated from those in the single ganglion.

      (3) Discussion Lines 322-324. "The absence of descending brain signals and/or peripheral signals are assumed as important factors in determining the cycle period and the sequence at which the different behavioral stages take place."

      The authors could strengthen their paper by including a more complete picture of what is known about the control of crawling. For example, Puhl et al. 2012 (J Neurosci, 32:17646-17657) demonstrated that the descending brain neuron R3b-1 plays a major role in establishing the crawl-cycle frequency. With increased R3b-1 cell stimulation, DE-3 periods substantially shortened throughout the entire nerve cord. Thus, the importance of descending brain inputs should not be merely assumed; empirical evidence exists.

      (4) Discussion Lines 325-327: "the sequence of events, and the proportion of the active cycle dedicated to elongation and contraction were remarkably similar in both experimental settings. This suggests that the network activated in the isolated ganglion is the one underlying the motor behavior."

      The results and conclusions drawn in the current manuscript mirror those previously reported by Puhl and Mesce (2008, J. Neurosci, 28:4192- 420) who first demonstrated that the essential pattern-generating elements for leech crawling were contained in each of the segmental ganglia comprising the nerve cord. Furthermore, the authors showed that the duty cycle of DE-3, in a single ganglion treated with dopamine, was statistically indistinguishable from the DE-3 duty cycle measured in an intact nerve cord showing spontaneous fictive crawling, in an intact nerve cord induced to crawl via dopamine, and in the intact behaving animal. What was statistically significant, however, was that the DE-3 burst period was greatly reduced in the intact animal (i.e., a higher crawl frequency), which was replicated in the submitted paper.

      In my opinion, the novelty of the results reported in the submitted manuscript is diminished in the light of previously published studies. At a minimum, the previous studies should be cited, and the authors should provide additional rationale for conducting their studies. They need to explain in the discussion how their approach provided additional insights into what has already been reported.

    3. Reviewer #2 (Public review):

      The paper is well-written overall. The findings are clearly presented, and the data seems solid overall. I do have, however, a few major and some minor comments representing some concerns. My major comments are below.

      (1) This may seem somewhat semantic, yet, it has implications on the way the data is presented and moreover on the conclusions drawn - a single ganglion cannot show fictive crawling. It can demonstrate rhythmic patterns of activity that may serve in the (fictive) crawling motor pattern. The latter is a result of the intrinsic within single-ganglion connectivity AND the inter-ganglia connections and interactions (coupling) among the sequential ganglia. It may be affected by both short-range and long-range connections (e.g., descending inputs) along the ganglia chain.

      (2) The point above is even more critical where the authors set to compare the motor pattern in single ganglia with the intact animals. It would have made much more sense to add a description of the motor pattern of a chain of interconnected ganglia. The latter would be expected to better resemble the intact animal. Furthermore, this project would have benefitted from a three-way comparison (isolated ganglion-interconnected ganglia-intact animal.

      (3) Two previous studies by the same group are repeatedly mentioned (Rela and Szczupak, 2003; Rodriguez et al., 2009) and serve as a basis for the current work. The aim of one of these previous studies was to assess the role of the NS neurons in regulating the function of motor networks. The other (Rodriguez et al., 2009) reported on a neuron (the NS) that can regulate the crawling motor pattern. LL 71-74 of the current report presents the aim of this study as evaluating the role of the known connectivity of the premotor NS neuron in shaping the crawling motor pattern. The authors should make it very clear what indeed served as background knowledge, what exactly was known about the circuitry beforehand, and what is different and new in the current study.

    1. eLife assessment

      The work is interesting in its characterization of a large number of antibiotic persisters from a wild-type strain. Previous work was typically limited to directly observe either high persister strains or a smaller number of wt persisters. Therefore, it sheds new light on the elusive non-dormant persisters present in exponentially growing cultures and should help resolve previous conflicting observations.

    2. Reviewer #1 (Public Review):

      The work of Umetani et al. monitors the death of about 100,000 cells caused by lethal antibiotic treatments in a microfluidic device. They observe that the surviving bacteria are either in a dormant or in a non-dormant state prior to the antibiotic treatment. They then study the relative abundances of these different persister cells when varying the physiological state of the culture. In agreement with previous observations, they observe that late stationary phase cultures harbor a high number of dormant persister cells and that this number goes down as the culture is more exponential but remains non-zero, suggesting that cultures at the exponential phase contain different types of persister bacteria. These results were qualitatively similar in a rich and poor medium. Further characterization of the growing persister bacteria shows that they often form L-forms, have low RpoS-mcherry expression levels and grow only slightly more slowly than the non-persister bacteria. Taken together, these results draw a detailed view of persister bacteria and the way they may survive extensive antibiotic treatments. However, in order to represent a substantial advance on previous knowledge, a deeper analysis of the persister bacteria should be done.

    3. Reviewer #2 (Public Review):

      The main question asked by Umenati et al. is whether persister cells to ampicillin arise preferentially from dormant, non-dividing cells or from cells that are actively growing before antibiotic exposure. The authors tracked persister cells generated from populations at different growth phases and culture media using a microfluidic device coupled to fluorescence microscopy, which is a challenge due to the low frequency of these persister cells. One of the main conclusions is that the majority of persisters arising in exponentially-growing populations originated from actively-dividing cells before the antibiotic treatment, reinforcing the idea that dormancy is not a prerequisite for persister formation. The authors made use of a fluorescent reporter monitoring RpoS activity (RpoS-mCherry fusion) and observed that RpoS levels in these persister cells were low. In the few lineages that exhibited no growth before the ampicillin treatment, RpoS levels were low as well, indicating that RpoS is not a predictive marker for persistence. By performing the same experiment with early and late stationary phase cultures, the authors observed that the proportion of persister cells that originated from dormant cells before the ampicillin treatment is significantly increased under these conditions. In the late stationary phase condition, dormant cells were expressing high levels of RpoS. The authors suggested that RpoS-mCherry proteins form aggregates which were suggested by the authors to be a characteristic of 'deep dormancy'. These cells were mostly unable to restart growth after the antibiotic removal while others with the lowest levels of RpoS tended to be persister. Confirming that these cells indeed contain protein aggregates as well as determining the physiological state of these cells appears to be crucial.

    4. Reviewer #3 (Public Review):

      In their manuscript, Umetani, et al. address the question of the origin of persister bacteria using single-cell approaches. Persistence refers to a physiological state where bacteria are less sensitive to antibiotherapy, although they have not acquired a resistance mutation; importantly, the concept of persistence has been refined in the past decade to distinguish it from tolerance where bacteria are only transiently insensitive. Since persister cells are very rare in growing populations (typically 1e-5 or 1e-6), it is very challenging to observe them directly. It had been proposed that individual cells surviving antibiotics are not growing at the start of the treatment, but recent studies (nicely reviewed in the introduction) where persister bacteria were observed directly do not support this link. Following a similar line, the authors nonetheless still aim at "investigating whether non-growing cells are predominantly responsible for bacterial persistence". Based on new experimental data, they claim the contrary that most surviving cells were "actively growing before drug exposure" and that their work "reveals diverse survival pathways underlying antibiotic persistence".

      The main strengths of the manuscript are in my opinion:

      - To report on direct observation of E. coli persisters to ampicillin (200µg/mL) in 5 different growth media (typically 20 persisters or more per condition, one condition with 12 only), which constitutes without a doubt an experimental tour de force.

      - To aim at bridging the population level and the single-cell level by measuring relevant variables for each and analyzing them jointly.

      - To demonstrate that in most conditions a large fraction of surviving cells was actively growing before drug exposure.

      In addition, although it is well-known that E. coli doesn't need to maintain its rod shape for surviving and dividing, I found very remarkable in their data the extent to which morphology can be affected in persister cells and their progeny, since this really challenges our understanding of E. coli's "lifestyle" (these swimming amoeba-like cells in Supp Video 11 are mind-blowing!).

      Unfortunately, these positive aspects are counter-balanced by several shortcomings in the way experiments are analyzed and interpreted, which I explain below. Moreover, the manuscript is written in a way that makes it very hard to find important information on how experiments are done and is likely to leave the reader with an impression of confusion about what the main findings actually are.

      My major concerns are the following:

      (1) The main interpretation framework proposed by the authors is to assess whether cells not growing before drug exposure (so-called "dormant") are more or less likely to survive the treatment than growing ones ("non-dormant"). Fig 2A and Fig 3G show the main conclusions of the article from this perspective, that growing cells can survive the treatment and that the fraction of persisters in a given condition is not explained by the fraction of "dormant" cells, respectively. With this analysis, the authors essentially assume that "dormant" cells are of the same type in their different conditions, which ignores the progress in this field over the last decade (Balaban et al. 2019). I argue on the contrary that the observation of "diverse modes of survival in antibiotic persistence" is expected from their experimental design. In particular, the sensitivity of E. coli to beta-lactams such as ampicillin is expected to be much lower during the lag out of the stationary phase, a phenomenon which has been coined "tolerance"; hence in the Late Stationary condition, two subpopulations coexist for which different response to ampicillin is expected. I propose steps toward a more compelling interpretation of the experimental data. Should this point be taken seriously by the authors, it, unfortunately, implies a major rewriting of the article, including its title.

      (2) The way the authors describe their experiments with bacteria in the stationary phase is very problematic. For instance, they write that they "sampled cells from early and late stationary phases (...) and exposed them to 200 μg/mL of Amp in both batch and single-cell cultures." For any reader in a hurry (hence skipping methods and/or supplementary figure), this leads to believe that bacteria sampled in the stationary phase were exposed to the drug right away (either by adding the drug to the stationary phase sample, or more classically by transferring cells to fresh media with antibiotics). However, it turns out that, after sampling and loading in the microfluidic device, bacteria are grown 2 h in LB (or 4 h in M9) - I don't know what to think of such a blatant omission. The names chosen for each condition should reflect their most important aspects, here "stationary" is simply not appropriate - maybe something like "post early stationary" instead. In any case, I believe that this point highlights further the misconception pointed out in 1 and implies that the average reader will be at best confused, and probably misled.

      (3) Figures 4 and 5 are of very minor significance, and the methodology used in Fig 4 is questionable. The authors measure the abundance of an Rpos-mCherry translational fusion because its "high expression has been suggested to predict persistence". The rationale for this (that an RpoS-mCherry fusion would be a proxy for intracellular ppGpp levels, and in turn predict persistence) has never been firmly established, and the standards used in the article where this reporter was introduced (Maisonneuve, Castro-Camargo, and Gerdes 2013) are notoriously low (which eventually led to its retraction) - I don't know what to think of the fact that the authors cite a review by this group rather than their retracted article. While transcriptional fusions of promoters regulated by RpoS have been proposed to measure its regulatory activity (Patange et al. 2018), the combination of self-regulation and complex post-translational regulation of rpoS makes the physical meaning of the reporter used here completely unclear. Moreover, this translational fusion is introduced without doing any of the necessary controls to demonstrate that the activity of RpoS is not impaired by the addition of the fluorescent protein. Fig 5 simply reports the existence of persisters to ciprofloxacin growing before the treatment. This might be a new observation but it is not unexpected given that a similar observation has been made with a similar drug, ofloxacin (Goormaghtigh and van Melderen 2019), as pointed out in the introduction. There is no further quantitative claim on this.

      (4) The authors don't mention the dead volume nor the speed of media exchange in their device. Hopefully, it is short compared to the duration of the treatment; however, it is challenging to remove all antibiotics after the treatment and only 1e-3 or 1e-4 of the treatment concentration is already susceptible to affecting regrowth in fresh media. If this is described in another article, it would be worth adding a comment in the main text.

      (5) Fig 2A supports the main finding that a significant fraction of bacteria surviving the treatment are growing before drug exposure, but it uses a poorly chosen representation.<br /> - In order to compare between conditions, one would like to see the fraction of each type in the population.<br /> - The current representation (of a fraction of each type among surviving cells) requires a side-by-side comparison with a random sample (which will practically be equivalent to the fraction of each type among killed cells) in order to be informative.

    5. Author response:

      Reviewer #1 (Public Review):

      The work of Umetani et al. monitors the death of about 100,000 cells caused by lethal antibiotic treatments in a microfluidic device. They observe that the surviving bacteria are either in a dormant or in a non-dormant state prior to the antibiotic treatment. They then study the relative abundances of these different persister cells when varying the physiological state of the culture. In agreement with previous observations, they observe that late stationary phase cultures harbor a high number of dormant persister cells and that this number goes down as the culture is more exponential but remains non-zero, suggesting that cultures at the exponential phase contain different types of persister bacteria. These results were qualitatively similar in a rich and poor medium. Further characterization of the growing persister bacteria shows that they often form Lforms, have low RpoS-mcherry expression levels and grow only slightly more slowly than the non-persister bacteria. Taken together, these results draw a detailed view of persister bacteria and the way they may survive extensive antibiotic treatments. However, in order to represent a substantial advance on previous knowledge, a deeper analysis of the persister bacteria should be done.

      We thank the reviewer for suggesting the addition of more detailed analyses of persister cells. As we wrote in our response to Essential Revision 1, we now include a new section titled “Response of growing persisters to Amp exposure is heterogeneous” (Page 11-12) and present the results of the detailed analyses of single-cell dynamics of growth and cell morphology over the course of the pre-exposure, exposure, and post-exposure periods (Fig. 2D and H, Fig. 4B and D, Fig. 4 – figure supplement 1 and 2, Fig. 5B and D, Fig. 5 – figure supplement 1, Fig. 8B and D, and Figure 8 – figure supplement 1). The new results characterize differential responses to Amp treatment among growing persister cells (Fig. 4A-D, Fig. 4 – figure supplement 1, Fig. 4 – figure supplement 2A, Fig. 5A-D, and Fig. 5 – figure supplement 1), comparable division rates of MG1655 between non-surviving cells and persister cells growing prior to antibiotic treatments (Fig. 4E and Fig. 8E), except for the post-exponential phase cell populations of MF1 to Amp treatment in the LB medium and the post-exponential phase cell populations of MG1655 to Amp treatment in the M9 medium (Fig. 4 – figure supplement 2B and Fig. 5E) and the presence of persister cells to CPFX that avoid filamentation after the treatment (Fig. 8C and D, and Fig. 8 – figure supplement 1). We believe that these new analyses would provide new insights into the diverse dynamics and survival modes of antibiotic persistence at the single-cell level and represent important contributions to the field.

      Reviewer #2 (Public Review):

      The main question asked by Umenati et al. is whether persister cells to ampicillin arise preferentially from dormant, non-dividing cells or from cells that are actively growing before antibiotic exposure. The authors tracked persister cells generated from populations at different growth phases and culture media using a microfluidic device coupled to fluorescence microscopy, which is a challenge due to the low frequency of these persister cells. One of the main conclusions is that the majority of persisters arising in exponentially-growing populations originated from actively-dividing cells before the antibiotic treatment, reinforcing the idea that dormancy is not a prerequisite for persister formation. The authors made use of a fluorescent reporter monitoring RpoS activity (RpoS-mCherry fusion) and observed that RpoS levels in these persister cells were low. In the few lineages that exhibited no growth before the ampicillin treatment, RpoS levels were low as well, indicating that RpoS is not a predictive marker for persistence. By performing the same experiment with early and late stationary phase cultures, the authors observed that the proportion of persister cells that originated from dormant cells before the ampicillin treatment is significantly increased under these conditions. In the late stationary phase condition, dormant cells were expressing high levels of RpoS. The authors suggested that RpoS-mCherry proteins form aggregates which were suggested by the authors to be a characteristic of 'deep dormancy'. These cells were mostly unable to restart growth after the antibiotic removal while others with the lowest levels of RpoS tended to be persister. Confirming that these cells indeed contain protein aggregates as well as determining the physiological state of these cells appears to be crucial.

      We thank reviewer #2 for pointing out the critical issue with the RpoS-mCherry fusion that we used to quantify RpoS expression levels in single cells in the original manuscript. As explained in our reply to the comments below, we performed a suggested experiment and confirmed that the RpoS function was impaired by tagging it with mCherry. To resolve this issue, we repeated almost all the experiments using the wild-type strain MG1655 and confirmed the reproducibility of the main results (Fig. 3, Fig. 3 – figure supplement 1, and Fig. 7). Due to this change of the main strain used in this study, we removed the results on the correlation between RpoS expression and the persistence trait in the revised manuscript because it may not reflect the relationship of intact RpoS. However, we decided to still keep and show some of the results with the MF1 strain, such as the population killing curves and the survival mode analyses, because they also provide insight into the role of RpoS in antibiotic persistence. In particular, we found both beneficial and detrimental effects of RpoS on antibiotic persistence, depending on culture conditions and duration of antibiotic treatment (Fig. 1 – figure supplement 3 and Fig. 6 – figure supplement 1). Therefore, we have included these results and related discussions in the revised manuscript.

      Reviewer #3 (Public Review):

      In their manuscript, Umetani, et al. address the question of the origin of persister bacteria using single-cell approaches. Persistence refers to a physiological state where bacteria are less sensitive to antibiotherapy, although they have not acquired a resistance mutation; importantly, the concept of persistence has been refined in the past decade to distinguish it from tolerance where bacteria are only transiently insensitive. Since persister cells are very rare in growing populations (typically 1e-5 or 1e-6), it is very challenging to observe them directly. It had been proposed that individual cells surviving antibiotics are not growing at the start of the treatment, but recent studies (nicely reviewed in the introduction) where persister bacteria were observed directly do not support this link. Following a similar line, the authors nonetheless still aim at "investigating whether non-growing cells are predominantly responsible for bacterial persistence". Based on new experimental data, they claim the contrary that most surviving cells were "actively growing before drug exposure" and that their work "reveals diverse survival pathways underlying antibiotic persistence".

      We thank the reviewer for this helpful comment, which suggested to us that some revisions in our Introduction would better place our study in the context of previous understanding of antibiotic persistence. As mentioned in our response to Essential Revision 4 and the second comment of Reviewer 1's Recommendations for the authors, we have modified the Introduction to more appropriately place our study in the context of the field.

      The main strengths of the manuscript are in my opinion:

      - To report on direct observation of E. coli persisters to ampicillin (200µg/mL) in 5 different growth media (typically 20 persisters or more per condition, one condition with 12 only), which constitutes without a doubt an experimental tour de force.

      - To aim at bridging the population level and the single-cell level by measuring relevant variables for each and analyzing them jointly.

      - To demonstrate that in most conditions a large fraction of surviving cells was actively growing before drug exposure.

      In addition, although it is well-known that E. coli doesn't need to maintain its rod shape for surviving and dividing, I found very remarkable in their data the extent to which morphology can be affected in persister cells and their progeny, since this really challenges our understanding of E. coli's "lifestyle" (these swimming amoeba-like cells in Supp Video 11 are mind-blowing!).

      We are grateful to the reviewer for the articulation of the strength of this study. 

      Unfortunately, these positive aspects are counter-balanced by several shortcomings in the way experiments are analyzed and interpreted, which I explain below. Moreover, the manuscript is written in a way that makes it very hard to find important information on how experiments are done and is likely to leave the reader with an impression of confusion about what the main findings actually are.

      We thank the reviewer for pointing out these important issues regarding the original manuscript. Please see our replies below regarding how we corresponded to each specific comment to resolve the issue. To make the experimental methods and procedures more accessible and interpretable, we have added more explanations of the experimental details to the Results and Methods sections. Furthermore, since we understood that some of the confusions came from the insufficient explanation of the preculture procedures for the microfluidic experiments, we have modified the schematic illustration of the method shown in Fig. S1 in the original manuscript and moved it as the first main figure in the revised manuscript (Fig. 1C and D). We have also added an illustration that explains the cultivation procedures for the batch culture experiments as Fig.

      6A. 

      My major concerns are the following:

      (1) The main interpretation framework proposed by the authors is to assess whether cells not growing before drug exposure (so-called "dormant") are more or less likely to survive the treatment than growing ones ("non-dormant"). Fig 2A and Fig 3G show the main conclusions of the article from this perspective, that growing cells can survive the treatment and that the fraction of persisters in a given condition is not explained by the fraction of "dormant" cells, respectively. With this analysis, the authors essentially assume that "dormant" cells are of the same type in their different conditions, which ignores the progress in this field over the last decade (Balaban et al. 2019). I argue on the contrary that the observation of "diverse modes of survival in antibiotic persistence" is expected from their experimental design. In particular, the sensitivity of E. coli to beta-lactams such as ampicillin is expected to be much lower during the lag out of the stationary phase, a phenomenon which has been coined "tolerance"; hence in the Late Stationary condition, two subpopulations coexist for which different response to ampicillin is expected. I propose steps toward a more compelling interpretation of the experimental data. Should this point be taken seriously by the authors, it, unfortunately, implies a major rewriting of the article, including its title.

      We thank the reviewer for bringing to our attention the point that may have caused confusion in the original manuscript. 

      The primary purpose of this manuscript was not to assess whether non-growing cells prior to drug exposure are more or less likely to survive treatment than growing cells. Rather, we wanted to examine how different persister cell dynamics emerge at the single-cell level depending on previous cultivation history, growth media, and antibiotic types. We believe that this point is clearer in the revised manuscript with the newly added single-cell dynamics data (Fig. 2D, 2H, 4B, 4D, Fig. 4 – figure supplement 1 and 2A, Fig. 5B, 5D, Fig. 5 – figure supplement 1, Fig. 8B, 8D, and Fig. 8 – figure supplement 1). 

      We also did not mean to imply that "dormant cells" were of the same type under different conditions, as we were aware of the diversity of cellular states of non-growing cells, as well as the reduced sensitivity of cells to antibiotics during the lag out of stationary phase. We believe that one of the reasons this point may have been unclear is that in the previous version we had referred to all cells that were not growing prior to antibiotic treatment as "dormant cells", a term that is often used in a more restricted way to refer to cells under prolonged growth arrest. Therefore, in the revised manuscript, we have avoided the term "dormant cells" and instead simply referred to these as "non-growing cells". Accordingly, we have changed the title of the paper from "Observation of non-dormant persister cells reveals diverse modes of survival in antibiotic persistence" to "Observation of persister cell histories reveals diverse modes of survival in antibiotic persistence".

      To further address these points, we have improved the description of the experimental procedures for the single-cell measurements (see the reviewer's next comment as well). The nongrowing persisters of the MF1 strain found in the post-exponential phase cell populations must be of a different type than those found in the post-early and post-late stationary phase cell populations due to the experimental design. All early and late stationary phase cells were maintained in a non-growing state by flowing conditioned media prepared from the early and late stationary phase cultures until the start of the time-lapse measurements. Thus, aside from potential physiological heterogeneity, the non-growing cells prior to drug treatment are all long lagging cells. On the other hand, for the post-exponential phase condition, we maintained exponential growth conditions during the period from the start of the second pre-culture to the start of antibiotic treatment, including the period during sample preparation for time-lapse measurements. Given the exponential dilution by growth of cell populations, the non-growing persisters are unlikely to be long lagging cells (see our response to Reviewer 2's third comment  in "Recommendations for the authors"). We now describe these experimental procedures in more detail in the Results section (L161-178, L287-297). In addition, we discuss the diversity of cellular states of both non-growing and growing cells in Discussion, citing literature (L545-557).

      (2) The way the authors describe their experiments with bacteria in the stationary phase is very problematic. For instance, they write that they "sampled cells from early and late stationary phases (...) and exposed them to 200 μg/mL of Amp in both batch and single-cell cultures." For any reader in a hurry (hence skipping methods and/or supplementary figure), this leads to believe that bacteria sampled in the stationary phase were exposed to the drug right away (either by adding the drug to the stationary phase sample, or more classically by transferring cells to fresh media with antibiotics). However, it turns out that, after sampling and loading in the microfluidic device, bacteria are grown 2 h in LB (or 4 h in M9) - I don't know what to think of such a blatant omission. The names chosen for each condition should reflect their most important aspects, here "stationary" is simply not appropriate - maybe something like "post early stationary" instead. In any case, I believe that this point highlights further the misconception pointed out in 1 and implies that the average reader will be at best confused, and probably misled.

      We again thank the reviewer for pointing out the insufficient explanation of the method for the single-cell measurements and the helpful recommendation regarding our nomenclature for different conditions. As mentioned above, we now present the previous supplementary figure that schematically explains the experimental procedure as the first main figure to clarify how we prepared the cells loaded into the microfluidic device for single-cell measurements (Fig. 1C and D). Also, following the reviewer's suggestion, we now refer to the conditions as "post-exponential phase," "post-early stationary phase," and "post-late stationary phase" in the revised manuscript. 

      We included a 2-hour (or 4-hour in M9) cultivation period in fresh medium in batch cultures for measuring killing curves to make the cultivation conditions prior to antibiotic treatment as similar as possible between batch and microfluidic experiments. We have clarified the presence of preexposure cultivation of post-early stationary and post-late stationary phase cell populations in the fresh medium before treating them with antibiotics (L264-269, Fig. 6A), so that readers can more easily recognize the experimental conditions.

      (3) Figures 4 and 5 are of very minor significance, and the methodology used in Fig 4 is questionable. The authors measure the abundance of an Rpos-mCherry translational fusion because its "high expression has been suggested to predict persistence". The rationale for this (that an RpoS-mCherry fusion would be a proxy for intracellular ppGpp levels, and in turn predict persistence) has never been firmly established, and the standards used in the article where this reporter was introduced (Maisonneuve, Castro-Camargo, and Gerdes 2013) are notoriously low (which eventually led to its retraction) - I don't know what to think of the fact that the authors cite a review by this group rather than their retracted article. While transcriptional fusions of promoters regulated by RpoS have been proposed to measure its regulatory activity (Patange et al. 2018), the combination of self-regulation and complex post-translational regulation of rpoS makes the physical meaning of the reporter used here completely unclear. Moreover, this translational fusion is introduced without doing any of the necessary controls to demonstrate that the activity of RpoS is not impaired by the addition of the fluorescent protein. Fig 5 simply reports the existence of persisters to ciprofloxacin growing before the treatment. This might be a new observation but it is not unexpected given that a similar observation has been made with a similar drug, ofloxacin (Goormaghtigh and van Melderen 2019), as pointed out in the introduction. There is no further quantitative claim on this.

      We thank the reviewer for pointing out the issue of the RpoS-mCherry fusion. As we mentioned in our response to Essential Revision 2 and also to the comment from reviewer #2, we have tested the sensitivity of this fluorescent reporter strain to oxidative stress and confirmed that it is as sensitive as the rpoS strain (Fig. 1 – figure supplement 1C). Therefore, the RpoS function seems to be defective in this strain, as now explained in Results (L69-79). After confirming the problem with the RpoS-mCherry fusion, we removed all analyses and related arguments that relied on the RpoS expression level (previous Figure 4). In addition, we repeated almost all the experiments with the original MG1655 strain to confirm that the observed results are not specific to the problematic reporter strain. 

      Regarding the experiments with CPFX, we have added a more detailed analysis of single cell dynamics and found that, contrary to the reported results for ofloxacin, not all persistent cells show filamentation after drug withdrawal (Fig. 8C and D, Fig. 8 – figure supplement 1). In addition, we performed new microfluidic experiments in which we treated post-late stationary phase cells with CPFX (Fig. 3). In contrast to the Amp treatment result and the previous study that reported the persistence of post-stationary phase cell populations to ofloxacin (ref. 20), all the persisters for which we identified the pre-exposure growth traits in this condition grew normally prior to CPFX treatment. These newly added analyses and experiments clarify the significance of the CPFX experiments. 

      (4) The authors don't mention the dead volume nor the speed of media exchange in their device. Hopefully, it is short compared to the duration of the treatment; however, it is challenging to remove all antibiotics after the treatment and only 1e-3 or 1e-4 of the treatment concentration is already susceptible to affecting regrowth in fresh media. If this is described in another article, it would be worth adding a comment in the main text.

      We thank the reviewer for bringing up this important point. We have added the perfusion chamber volume and medium flow rate information in the Methods section (L809-817).   

      In the study in which two of the authors participated, the medium exchange rate across the semipermeable membrane was evaluated in a similar device with similar microchamber dimensions (ref. 26). There, we confirmed that the medium exchange was completed within 5 min, which is much shorter than the period of antibiotic treatment and post-antibiotic treatment periods for observing regrowth. We have also included this information in the main text with the reference (L58-63).

      Despite the relatively high medium exchange rate, we cannot formally exclude the possibility that a small amount of antibiotic may remain in the device, e.g. due to non-specific adsorption on the internal surface of the microchambers. In such cases, the residual antibiotics may influence the physiological states of the cells and the regrowth kinetics in the post-exposure periods, as suggested by the reviewer. However, the frequencies of persister cells in the cell populations in our single-cell measurements are comparable to those in the batch culture measurements. Therefore, the removal of antibiotic drugs in our device is at least as efficient as in the batch culture assay. To clarify this point, we have added a paragraph to the Discussion with a reference that reviews the influence of antibiotics at concentrations significantly lower than the MICs (L482-

      489).    

      (5) Fig 2A supports the main finding that a significant fraction of bacteria surviving the treatment are growing before drug exposure, but it uses a poorly chosen representation.

      - In order to compare between conditions, one would like to see the fraction of each type in the population.

      - The current representation (of a fraction of each type among surviving cells) requires a side-byside comparison with a random sample (which will practically be equivalent to the fraction of each type among killed cells) in order to be informative.

      We have changed the style of the previous Fig. 2A to show the fraction of each type in the population instead of the fraction of each type among surviving cells (Fig. 3 and Fig. 3-figure supplement 1).

    1. eLife Assessment

      This important study compares the cortical projections to primary motor and sensory areas originating from the ipsilateral and contralateral hemispheres. Results show that, while there is substantial symmetry between the two hemispheres regarding the areas sending projections to these primary cortical areas, contra-hemispheric projections had more inputs from layer 6 neurons than ipsi-projecting ones. The evidence is compelling and the conclusions are supported by rigorous analyses.

    2. Reviewer #1 (Public review):

      Weiler, Teichert, and Margrie systematically analyzed long-range cortical connectivity, using a retrograde viral tracing strategy to identify layer and region-specific cortical projections onto the primary visual, primary somatosensory, and primary motor cortices. Their analysis revealed several hundred thousand inputs into each region, with inputs originating from almost all cortical regions but dominated in number by connections within cortical sub-networks (e.g. anatomical modules). Generally, the relative areal distribution of contralateral inputs followed the distribution of corresponding ipsilateral inputs. The largest proportion of inputs originated from layer 6a cells, and this layer 6 dominance was more pronounced for contralateral than ipsilateral inputs, which suggests that these connections provide predominantly feedback inputs. The hierarchical organization of input regions was similar between ipsi- and contralateral regions, except for within-module connections, where ipsilateral connections were much more feed-forward than contralateral. These results contrast earlier studies which suggested that contralateral inputs only come from the same region (e.g. V1 to V1) and from L2/3 neurons. The conclusions of this paper are well-supported by the data and analysis, and useful follow-up analyses and discussions are present in the supplemental figures. Taken together, these results provide valuable data supporting a view of interhemispheric connectivity in which layer 6 neurons play an important role in providing modulatory feedback.

    3. Reviewer #2 (Public review):

      Summary:

      Weiler et al use retrograde tracers, two-photon tomography, and automatic cell detection to provide a detailed quantitative description of the laminar and area sources of ipsi- and contralateral cortico-cortical inputs to two primary sensory areas and a primary motor area. They found considerable bilateral symmetry in the areas providing cortico-cortical inputs. However, although the same regions in both hemispheres tended to supply inputs, a larger proportion of inputs from contralateral areas originated from deeper layers (L5 and L6).

      Strengths:

      The study applies state-of-the-art anatomical methods, and the data is very effectively presented and carefully analyzed. The results provide many novel insights on the similarities and differences of inputs from the two hemispheres. While over the past decade there has been many studies quantitively and comprehensively describing cortico-cortical connections, by directly comparing inputs from the ipsi and contralateral hemispheres, this study fills in an important gap in the field. It should be of great utility and an important reference for future studies on inter hemispheric interactions.

      Weaknesses:

      Overall, I do not find any major weakness in the analyses or their interpretation. However, one must keep in mind that the study only analyses inputs projecting to three areas. This is not an inherent flaw of the study; however, it warrants caution when extrapolating the results to callosal projections terminating in other areas. As inputs to two primary sensory areas and one is the primary motor cortex are studied, some of the conclusions could potentially be different for inputs terminating in high-order sensory and motor areas. Given that primary areas were injected, there are few instances of feedforward connections sampled in the ipsilateral hemisphere. The study finds that while ipsi- projections from visual cortex to barrel cortex are feedforward given its fILN values, those from the contralateral visual cortex are feedback instead. This is now acknowledged in the revised discussion.

      Another issue that is left unexplored is that, in the current analyses the barrel and primary visual cortex are analyzed as a uniform structure. It is well established that both the laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L). Similarly, callosal projections differ when terminating the border of S1 (A row of whiskers ) then in other parts of S1. Thus, some of the conclusions regarding the laminar sources of callosal inputs might depend on whether one is analyzing inputs terminating or originating in these border regions. This is now acknowledged in the revised version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weiler, Teichert, and Margrie systematically analyzed long-range cortical connectivity, using a retrograde viral tracing strategy to identify layer and region-specific cortical projections onto the primary visual, primary somatosensory, and primary motor cortices. Their analysis revealed several hundred thousand inputs into each region, with inputs originating from almost all cortical regions but dominated in number by connections within cortical sub-networks (e.g. anatomical modules). Generally, the relative areal distribution of contralateral inputs followed the distribution of corresponding ipsilateral inputs. The largest proportion of inputs originated from layer 6a cells, and this layer 6 dominance was more pronounced for contralateral than ipsilateral inputs, which suggests that these connections provide predominantly feedback inputs. The hierarchical organization of input regions was similar between ipsi- and contralateral regions, except for within-module connections, where ipsilateral connections were much more feed-forward than contralateral. These results contrast earlier studies which suggested that contralateral inputs only come from the same region (e.g. V1 to V1) and from L2/3 neurons. Thus, these results provide valuable data supporting a view of interhemispheric connectivity in which layer 6 neurons play an important role in providing modulatory feedback.

      The conclusions of this paper are mostly well-supported by the data and analysis, but additional consideration of possible experimental biases is needed.

      We thank the reviewer for their positive feedback on our manuscript.

      Further discussion or analysis is needed about possible biases in uptake efficiency for different cell types. Is it possible that the nuclear retro-AAV has a tropism for layer 6 axons? Quantitative comparisons with results obtained with alternative methods such as rabies virus (Yao et al., 2023) or anterograde tracing (Harris et al., 2019) may be helpful for this.

      We appreciate this technical comment. For the reasons indicated below we are confident that our AAV approach successfully and rather comprehensively labels inputs to the three target areas. Firstly, in the brains in which we injected our retrograde nuclear-AAV tracer into VISp, SSp-bfd or MOp we found several instances where layer 5 and/or layer 2/3 as was the dominant cortical projection layer (please see e.g. Figure 3 heatmaps). This was true for both ipsilateral and contralateral projection. 

      Secondly, by way of comparison Yao et al., 2023 injected rabies virus into VISp (but not in SSp-bfd or MOp) and their results show notable similarities to ours: 1) They show that contralateral inputs to VISp (and higher visual areas) were mainly located in Layers 5 and 6. 2) Retrogradely labelled neurons in higher visual areas revealed anatomical hierarchy that reflects the known functional hierarchy of the mouse cortical visual system and that shown by our retro-AAV approach. Thus, as AAV and rabies based tracing lead to similar results, this is further evidence against bias via tropism of our AAV tracer. That said, direct comparisons of the results between our study and the Yao et al., 2023 study should be viewed with some caution since Yao et. al.  injected rabies virus into specific Cre-driver lines in which the rabies virus targets individual genetically defined cell types in specific layers. Importantly, because of the lack of a specific cre-driver line, L6 cortico-cortical (L6 CC) cells could not be targeted by their approach. Thus, the dataset in Yao et al., overlook the contribution of L6 CCs due to the lack of available Cre-lines. 

      Thirdly, in a recent study (Weiler et al., 2024) we found that in a specific pathway (SSp-bfd→ VISp) both retro-AAV and the more traditional non-viral tracer cholera toxin subunit B (CTB) identified neurons in Layer 6 as the main source of projection neurons. The same results for the same pathway was shown by Bieler et al., 2019 (Bieler et al., 2017) using Fluorogold for retrograde tracing. Thus, the described dominance of Layer 6 projection neurons in specific pathways is likely not the result of a tropism of retro-AAV tracers. 

      Please also see that we have now further extended the summary of these points in our revised manuscript in the discussion section (e.g. lines 457-463): 

      Quantitative analysis of the injection sites should be included to account for possible biases. For example, L6 neurons are known to be the main target of contralateral inputs into the visual cortex (Yao et al., 2023). Thus, if the injections are biased towards or against layer 6 neurons, this may change the layer distribution of retrogradely labeled input cells. Comparison across biological replicates may help reveal sensitivity to particular characteristics of the injections.

      In response to the reviewers' feedback, please see we have now quantified the injection volume per cortical layer, as shown in the revised Fig. S3D. Our results indicate that the injections were not biased toward Layer 6. Instead, the injected tracer volumes in Layers 1, 4, 5, and 6 were similar across all animals and injected areas. However, we observed that the injected tracer volume in Layer 2/3 tended to be higher than in other layers. Although the tracer volumes in Layers 2/3 appeared to be higher, the proportion of input neurons located in Layers 2/3 for most of the cortical projection areas was consistently lower than that from Layer 6. These findings provide strong evidence against injection bias towards L6 inputs.

      The possibility of labelling axons of passage within the white matter should be addressed. This could potentially lead to false positive connections, contributing to the broad connectivity from most cortical regions that were observed.

      For clarification, please see Fig.S2B in our revised manuscript. In this panel we plot the average percentage volume of the viral boli in the target areas and in all other nearby structures including the white matter. The percentage of virus injected into the white matter (WM) was 0.0824 ± 0.0759% for VISp and 0.0650 ± 0.0481 for SSp-bfd injections. Notably, injections into MOp showed no leakage into white matter (0%). These minimal volumes of virus in the white matter are unlikely to significantly influence the observed profile of widespread connectivity. Please see we have added a sentence to the Results section (lines 84-86) where we state that we only used brains that had a transduction of the white matter below 0.1%.

      Reviewer #2 (Public review):

      Summary:

      Weiler et al use retrograde tracers, two-photon tomography, and automatic cell detection to provide a detailed quantitative description of the laminar and area sources of ipsi- and contralateral cortico-cortical inputs to two primary sensory areas and a primary motor area. They found considerable bilateral symmetry in the areas providing cortico-cortical inputs. However, although the same regions in both hemispheres tended to supply inputs, a larger proportion of inputs from contralateral areas originated from deeper layers (L5 and L6).

      Strengths:

      The study applies state-of-the-art anatomical methods, and the data is very effectively presented and carefully analyzed. The results provide many novel insights into the similarities and differences of inputs from the two hemispheres. While over the past decade there have been many studies quantitatively and comprehensively describing cortico-cortical connections, by directly comparing inputs from the ipsi and contralateral hemispheres, this study fills in an important gap in the field. It should be of great utility and an important reference for future studies on inter-hemispheric interactions.

      We thank the reviewer for this encouraging feedback on our manuscript.

      Weaknesses:

      Overall, I do not find any major weakness in the analyses or their interpretation. However, one must keep in mind that the study only analyses inputs projecting to three areas. This is not an inherent flaw of the study; however, it warrants caution when extrapolating the results to callosal projections terminating in other areas. As inputs to two primary sensory areas and one is the primary motor cortex are studied, some of the conclusions could potentially be different for inputs terminating in high-order sensory and motor areas. Given that primary areas were injected, there are few instances of feedforward connections sampled in the ipsilateral hemisphere. The study finds that while ipsi-projections from the visual cortex to the barrel cortex are feedforward given its fILN values, those from the contralateral visual cortex are feedback instead. One is left to wonder whether this is due to the cross-modal nature of these particular inputs and whether the same rule (that contralateral inputs consistently exhibit feedback characteristics regardless of the hierarchical relationship of their ipsilateral counterparts with the target area,) would also apply to feedforward inputs within the same sensory cortices.

      We acknowledge that what we find for primary sensory and motor target areas may not hold for other functionally different areas such as anterior cingulate cortex, retrosplenial cortex or frontal lobe that might be expected to receive strong feedforward cortical input. To begin to understand the organization of the global cortical input we have however first explored with primary sensory and motor areas. Please see that we have now added a sentence to the Discussion section of our manuscript that highlights the importance of investigating the hierarchical organization of intra and interhemispheric input onto higher cortical areas or within subregions of a given sensory area.

      Another issue that is left unexplored is that, in the current analyses the barrel and primary visual cortex are analyzed as a uniform structure. It is well established that both the laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L). Similarly, callosal projections differ when terminating the border of S1 (a row of whiskers), and then in other parts of S1. Thus, some of the conclusions regarding the laminar sources of callosal inputs might depend on whether one is analyzing inputs terminating or originating in these border regions.

      The aim of the present study was to analyse the global projectome to the VISp, SSp-bfd and MOp, irrespective of which subregions were included. Importantly, we purposely injected rather large bolus volumes to achieve large sample sizes of target neurons in each cortical layer.  For SSp-bfd, we utilised our previously reconstructed barrel map (Weiler et al., 2024) to precisely map our viral injection sites onto the barrels (Author response image 1). Analysis revealed that the six injection sites consistently encompassed 7–13 barrels (Author response image 1, three exemplary injection sites). Additionally, we determined the centres of mass for each injection site and mapped them onto the barrel map. Four of the injection sites were located in the lateral part of SSp-bfd, two in the central region, and none in the medial part. Notably, the injection sites within SSp-bfd exhibited significant overlap. As a result, a selective analysis of callosal projections targeting these injection sites would likely not yield distinct projection patterns, as the projectomes would inevitably include projections to surrounding barrels, leading to contamination.

      Author response image 1.

      Left: exemplary Injection sites mapped onto the 3D barrel map of SSp-bfd within the Mouse Allen Brain Atlas. Barrels were reconstructed using a specialized software as described previously (Weiler et al., 2024) Right: Centres of mass of all SSp-bfd injection sites mapped onto the 3D barrel map.

      Due to the fact we covered a significant proportion of the respective target primary sensory area any further subdivision of these data is not possible and requires more tailored injections into clearly defined subareas. Investigating the separate projectomes onto these subregions (e.g. onto V1M and V1B) remains an important interesting research question that we, at least in part, will address in a future study.

      Finally, while the paper emphasizes that projections from L6 "dominate" intra and contralateral cortico-cortical inputs, the data shows a more nuanced scenario. While it is true that the areas for which L6 neurons are the most common source of cortico-cortical projections are the most abundant, the picture becomes less clear when considering the number of neurons sending these connections. In fact, inputs from L2/3 and L5 combined are more abundant than those from L6 (Figure 3B), challenging the view that projections from L6 dominate ipsi- and contralateral projecting cortico-cortical inputs.

      We agree in the case of the barrel cortex, layer 5 significantly contributes in terms of the number of brain areas projecting from within the ipsilateral and contralateral hemispheres. Please see we have replaced the term “dominates” in the title, abstract and in the manuscript where relevant.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The sections analyzing the role of L6 towards feedback (pg. 11-13, Figure 6) were a bit verbose and confusing to me. Three possible models are proposed:

      (1) a decrease in L23 projections, (2) an increase in L56 projections, or (3) both.

      However, what is being quantified appears to be the fractions inputs, with L23. L5, and L6 summing to 1. Thus, a decrease in L23 would necessarily result in an increase in L56 projections. It seems like it would make more sense to quantify the percent change in the total number of inputs (rather than fractional) from each layer so that the 3 models are actually independent possibilities.

      The issue with the suggested analysis is that, with one exception (one area projecting to MOp), the number of projection neurons in contralateral areas is always ~60-80% lower compared to their ipsilateral counterparts. Consequently, this is also true for the number of projection neurons in the different cortical layers. Thus, quantifying the percentage change from the ipsilateral to the contralateral hemisphere in the total number of inputs from each layer will always result in negative values. 

      Nevertheless, we addressed the reviewer’s issue by calculating the preservation index (1(ipsi-contra)/(ipsi+contra)) for the sensory-motor areas independently for the absolute number of neurons within L2/3, 5 and 6 for the cortical areas projecting to VISp, SSp-bfd and MOp (see Author response image 2). When analysing the shift from the ipsilateral to the contralateral hemisphere, we observed that significantly more projection neurons were preserved in L6 compared to L2/3 for VISp and SSp-bfd. This shows that the number of L6 projection neurons declines less from the ipsilateral to the contralateral hemisphere compared to L2/3. However, our focus was on the fraction of projection neurons within each layer relative to the other layers per hemisphere (see Fig.6 of our manuscript). This measure is critical for distinguishing between feedforward and feedback connectivity. Calculating the change for each layer independently unfortunately does not provide insights into this comparison, as it does not capture the relative distribution of projection neurons across layers, which is central to our analysis. Therefore, we chose to present the data as layer fractions normalised within each hemisphere separately, enabling a comparison of relative changes between hemispheres, as shown in Fig.6 in the manuscript. We agree that with our approach a decrease in the fraction of L2/3 neurons would necessarily lead to an increase in the fraction of L5+6 neurons. However, as we analysed the fractional change for L5 and L6 separately, we found that the fraction of projection neurons in L5 generally showed only minor changes, while the fraction of L6 projection neurons increased substantially (Fig.6C). In addition, excluding L5 from the ipsi- or contralateral default network had significant effects on the fILN in only a relatively small number of projection areas. Excluding L6 resulted in significant changes in many more projection areas than layer 5.

      Author response image 2.

      Preservation index for L2/3, L5 and L6 of the 24 sensory-motor areas projecting onto the three target areas VISp, SSp-bfd and MOp.

      Reviewer #2 (Recommendations for the authors):

      I feel that there are a few conclusions that could be strengthened in the paper:

      (1) The laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L. Similarly, callosal inputs are different close to the border of S1 with S2 than in the rest of the barrel cortex. From the methods sections and Figure S2, it seems that some injections targeted the V1 binocular zone while others were aimed at the monocular zone. Thus, it would be of interest to compare the laminar distribution and fILM of the contra inputs in inputs to the binocular and monocular zones (and S1 border vs the rest, if possible within this dataset).

      Please see the answer for the reviewer’s second point in the public review (above).

      (2) The results are currently a bit unclear on whether the contra inputs reflect the cortical hierarchy. Figure 4E-F makes it clear that the ipsi and contra fILMs do not always match. However, it seems from the plots in Figure 4D and Figure S6 that, while the contra fILM values are always higher, there might be a correlation between the ipsi and contra fILM. This could be addressed by directly plotting contra vs ipsi fILM.

      Similarly, it would be useful to directly address if there is any hint of the visual hierarchy, as calculated in Figure S5 for the contra inputs.

      Regarding the first point of the reviewer: We appreciate this comment. We do indeed find a positive correlation between the fILN ipsilateral and fILN contralateral across the individual cortical areas for all three targets. (please see Author response image 3 below). This is indeed an interesting observation that indicates a high degree of preservation concerning the rank order of the anatomical hierarchy within the input arising from both hemispheres. Please see we have included this new figure 4F into the manuscript and added a sentence in the results (lines 282-288): 

      Regarding the second point of the reviewer: For visual hierarchy, although weaker, we find that the hierarchical ranking of the higher cortical visual areas is preserved for the contralateral hemisphere (see Author response image 3 below). 

      Author response image 3.

      Rank ordered average fILN values (± sem) of higher visual cortical areas of the ventral and dorsal visual stream for the ipsilateral and contralateral hemisphere.

      (3) I find the emphasis in the title and other parts of the paper on Layer 6 corticocortical cells dominating the anatomical organization of intra and interhemispheric feedback a bit of an overstatement. While it is true that the areas for which L6 is the most abundant source of cortico-cortical projections are the most abundant (Figure 3C), when just focusing on the number of neurons sending corticocortical connections (Figure 3B), this is less clear. Ipsi connections are roughly divided 1/3, 1/3 , 1/3 between L2/3 , L5 and L6. In the contra, while projections from L6 neurons are the most abundant, there are not a majority and are less than those of L2/3 and L5 together. I suggest revising the statement about L6 cells dominating cortico-cortical connections to more accurately reflect these nuances.

      (4) The observations from Figure 3 discussed above suggest that L6 inputs dominate in areas with less abundant projections to the injected areas. Is this the case? Is the fraction of L6 inputs inversely correlated with the number of inputs from that area?

      Please see the following correlation plots for the total number of inputs versus the fraction of L6 inputs per area for both the ipsilateral and contralateral hemisphere. We do find on the ipsilateral hemisphere a negative correlation between the total number of inputs and the L6 input fraction for VISp and to a lesser degree for SSp-bfd. Interestingly, we find the opposite correlation for the ipsilateral MOp, contralateral VISp, SSp-bfd and MOp (Author response image 4, Author response table 1). While this is an interesting finding, the correlations often appeared to be weak and often absent within the individual animals and across the three target areas (Author response table 1). Thus, these correlations are seemingly not a general feature of cortical connectivity.

      Author response image 4.

      Total number of cells versus fraction of cells within L6 per cortical brain area (average across animals) for the ipsilateral (top) and contralateral (bottom) hemisphere for the three target areas VISp, SSp-bfd and MOp.

      Author response table 1: Respective correlations between total numbers of cells and fraction of cells within L6 per cortical brain area for the ipsilateral and contralateral hemisphere for the three target areas (significant correlations highlighted with green).

      Minor issues:

      (5) Where was the mouse in Figure 3A injected?

      In this exemplary mouse the retrograde tracer was injected into VISp. We added this information in the Figure legend of Figure 3A1. 

      (6) Clarify in panel 4F that the position of the circle corresponds to the area location.

      Done as suggested. 

      References

      Bieler M, Sieben K, Cichon N, Schildt S, Röder B, Hanganu-Opatz IL. 2017. Rate and Temporal Coding Convey Multisensory Information in Primary Sensory Cortices. eNeuro 4. doi:10.1523/ENEURO.0037-17.2017

      Weiler S, Rahmati V, Isstas M, Wutke J, Stark AW, Franke C, Graf J, Geis C, Witte OW, Hübener M, Bolz J, Margrie TW, Holthoff K, Teichert M. 2024. A primary sensory cortical interareal feedforward inhibitory circuit for tacto-visual integration. Nat Commun 15:3081. doi:10.1038/s41467-024-47459-2

      Yao S, Wang Q, Hirokawa KE, Ouellette B, Ahmed R, Bomben J, Brouner K, Casal L, Caldejon S, Cho A, Dotson NI, Daigle TL, Egdorf T, Enstrom R, Gary A, Gelfand E, Gorham M, Griffin F, Gu H, Hancock N, Howard R, Kuan L, Lambert S, Lee EK, Luviano J, Mace K, Maxwell M, Mortrud MT, Naeemi M, Nayan C, Ngo N-K, Nguyen T, North K, Ransford S, Ruiz A, Seid S, Swapp J, Taormina MJ, Wakeman W, Zhou T, Nicovich PR, Williford A, Potekhina L, McGraw M, Ng L, Groblewski PA, Tasic B, Mihalas S, Harris JA, Cetin A, Zeng H. 2023. A whole-brain monosynaptic input connectome to neuron classes in mouse visual cortex. Nat Neurosci 26:350–364. doi:10.1038/s41593-022-01219-x

    1. eLife Assessment

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. The study is useful in terms of both method development and biological advances, as the authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction. However, the current experimental data are considered incomplete, as many key experimental details are missing.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      No description of how mass spectrometry was done and what type of validation was done.

      The threshold for enrichment seems arbitrary.

      Inconsistent nomenclature and punctuation usage.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

    3. Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We agree that we should have included a comprehensive comparison of proteins captured in the different species.  We are assembling this table and it will be included in the revised manuscript.  There is, indeed, significant conservation of many of the proteins enriched in both species.

      No description of how mass spectrometry was done and what type of validation was done.

      Since the mass spec was outsourced to a core facility, we had not included methodological details.  We have requested these and will include full details in the revised version of the manuscript.  In terms of “validation,” enrichment of proteins at electrical synapses was determined based on capture relative to control samples (non-transgenic zebrafish retinas or non-transgenic mouse retinas infected with the dGBP-TurboID virus) captured and processed at the same time.  Actual validations based on protein co-localization and pull-downs is the subject of the rest of the manuscript, and could only be done for a fraction of the identified proteins.  This type of validation can be pursued in many future studies. 

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary.  This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina).  This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins. 

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      To describe the mass spec procedure, we will get in touch with the mass spec facility and provide the details in the next round of submission.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa.  The background mainly originates from these two proteins.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 150. There are many gap junctions at which Cx35b is not colocalized with Cx34.7. 

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.  

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript.  We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

    1. eLife Assessment

      This study presents a valuable finding on the importance of the plasma metabolome in glaucoma risk prediction. The evidence supporting the claims of the authors is solid and the work offers insights for the design of protective therapeutic strategies for glaucoma. The authors have addressed the concerns of the reviewers and reported on the limitations of the study.

    2. Reviewer #1 (Public review):

      Summary:

      The Authors explore associations between plasma metabolites and glaucoma, a primary cause of irreversible vision loss worldwide. The study relies on measurements of 168 plasma metabolites in 4,658 glaucoma patients and 113,040 controls from the UK Biobank. The Authors show that metabolites improve the prediction of glaucoma risk based on polygenic risk score (PRS) alone, albeit weakly. The Authors also report a "metabolomic signature" that is associated with a reduced risk (or "resilience") for developing glaucoma among individuals in the highest PRS decile (reduction of risk by an estimated 29%). The Authors highlight the protective effect of pyruvate, a product of glycolysis, for glaucoma development and show that this molecule mitigates elevated intraocular pressure and optic nerve damage in a mouse model of this disease.

      Strengths:

      This work provides additional evidence that glycolysis may play a role in the pathophysiology of glaucoma. Previous studies have demonstrated the existence of an inverse relationship between intraocular pressure and retinal pyruvate levels in animal models (Hader et al. 2020, PNAS 117(52)) and pyruvate supplementation is currently being explored for neuro-enhancement in patients with glaucoma (De Moraes et al. 2022, JAMA Ophthalmology 140(1)). The study design is rigorous and relies on validated standard methods. Additional insights gained from a mouse model are valuable.

      Weaknesses:

      Caution is warranted when examining and interpreting the results of this study. Among all participants (cases and controls) glaucoma status was self-reported, determined on the basis of ICD codes or previous glaucoma laser/surgical therapy. This is problematic as it is not uncommon for individuals in the highest PRS decile to have undiagnosed glaucoma (as shown in previous work by some of the authors of this article). The Authors acknowledge a "relatively low glaucoma prevalence in the highest decile group" but do not explore how undiagnosed glaucoma may affect their results. This also applies to all controls selected for this study. The Authors state that "50 to 70% of people affected [with glaucoma] remain undiagnosed". Therefore, the absence of self-reported glaucoma does not necessarily indicate that the disease is not present. Validation of the findings from this study in humans is, therefore, critical. This should ideally be performed in a well-characterized glaucoma cohort, in which case and control status has been assessed by qualified clinicians.

      The authors indicate that within the top decile of PRS participants with glaucoma are more likely to be of white ethnicity, while they are more likely to be of Black and Asian ethnicity if they are in the bottom half of PRS. Have the Authors explored how sensitive their predictions are to ethnicity? Since their cohort is predominantly of European ancestry (85.8%), would it make sense to exclude other ethnicities to increase the homogeneity of the cohort and reduce the risk for confounders that may not be explicitly accounted for?

      The authors discuss the importance of pyruvate, and lactate for retinal ganglion cell survival along with that of several lipoproteins for neuroprotection. However, there is a distinction to be made between locally produced/available glycolysis end products and lipoproteins and those circulating in the blood. It may be useful to discuss this in the manuscript, and for the Authors to explore if plasma metabolites may be linked to metabolism that takes place past the blood-retinal barrier.

      Comments on revisions:

      The Authors have addressed all of my concerns.

    3. Reviewer #2 (Public review):

      Summary

      The authors have used the UK Biobank data to interrogate the association between plasma metabolites and glaucoma.

      (1) They initially assessed plasma metabolites as predictors of glaucoma: The addition of NMR-derived metabolomic data to existing models containing clinical and genetic data was marginal.<br /> (2) They then determined whether certain metabolites might protect against glaucoma in individuals at high genetic risk: Certain molecules in bioenergetic pathways (lactate, pyruvate and citrate) conferred protection.<br /> (3) They provide support for protection conferred by pyruvate in a murine model.

      Weaknesses

      (1) Although it is an invaluable treasure trove of data, selection bias and self-reporting are inescapable problems when using the UK Biobank data for glaucoma research. The high-impact glaucoma-related GWAS publications (Ref 26 and 27) referenced in support of the method suffer the same limitations. This doesn't negate the conclusions but must be taken into consideration. The authors might note that it is somewhat reassuring that the proportion of glaucoma cases (4%) is close to what would be expected in a population-based study of 40-69-year-olds of predominantly white ethnicity.<br /> (2) As noted by the authors, a limitation is the predominantly white ethnicity profile that comprises the UK Biobank.<br /> (3) Also as noted by the authors, the study is cross-sectional and is limited by the "correlation does not imply causation" issue.<br /> (4) The optimal collection, transport and processing of the samples for NMR metabolite analysis is critical for accurate results. Strict policies were in place for these procedures, but deviations from protocol remain an unknown influence on the data.<br /> (5) In addition, all UK Biobank blood samples had unintended dilution during the initial sample storage process at UK Biobank facilities. (Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat Commun 14, 604 (2023) Samples from aliquot 3, used for the NMR measurements, suffered from 5-10% dilution. (Allen, Naomi E., et al. Wellcome Open Research 5 (2021): 222.) Julkunen et al. report that "The dilution is believed to come from mixing of participant samples with water due to seals that failed to hold a system vacuum in the automated liquid handling systems. While this issue is likely to have an impact on some of the absolute biomarker concentration values, it is expected to have limited impact on most epidemiological analyses."

      Strengths

      The huge sample size supports a powerful statistical analysis and the opportunity for the inclusion of multiple covariates and interactions without overfitting the models.<br /> The authors have constructed a robust methodology and statistical design.<br /> The manuscript is well-written, and the study is logically presented.<br /> The Figures are of good quality.

      Broadly, the conclusions are justified by the findings.

      Impact<br /> The findings advance personalized prognostics for glaucoma that combine metabolomic and genetic data. In addition, the protective effect of certain metabolites influences further research on novel therapeutic strategies.

      Comments on revisions:

      The authors have thoughtfully and comprehensively addressed my comments. I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors explore associations between plasma metabolites and glaucoma, a primary cause of irreversible vision loss worldwide. The study relies on measurements of 168 plasma metabolites in 4,658 glaucoma patients and 113,040 controls from the UK Biobank. The authors show that metabolites improve the prediction of glaucoma risk based on polygenic risk score (PRS) alone, albeit weakly. The authors also report a "metabolomic signature" that is associated with a reduced risk (or "resilience") for developing glaucoma among individuals in the highest PRS decile (reduction of risk by an estimated 29%). The authors highlight the protective effect of pyruvate, a product of glycolysis, for glaucoma development and show that this molecule mitigates elevated intraocular pressure and optic nerve damage in a mouse model of this disease.

      Strengths:

      This work provides additional evidence that glycolysis may play a role in the pathophysiology of glaucoma. Previous studies have demonstrated the existence of an inverse relationship between intraocular pressure and retinal pyruvate levels in animal models (Hader et al. 2020, PNAS 117(52)) and pyruvate supplementation is currently being explored for neuro-enhancement in patients with glaucoma (De Moraes et al. 2022, JAMA Ophthalmology 140(1)). The study design is rigorous and relies on validated, standard methods. Additional insights gained from a mouse model are valuable.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      Caution is warranted when examining and interpreting the results of this study. Among all participants (cases and controls) glaucoma status was self-reported, determined on the basis of ICD codes or previous glaucoma laser/surgical therapy. This is problematic as it is not uncommon for individuals in the highest PRS decile to have undiagnosed glaucoma (as shown in previous work by some of the authors of this article). The authors acknowledge a "relatively low glaucoma prevalence in the highest decile group" but do not explore how undiagnosed glaucoma may affect their results. This also applies to all controls selected for this study. The authors state that "50 to 70% of people affected [with glaucoma] remain undiagnosed". Therefore, the absence of self-reported glaucoma does not necessarily indicate that the disease is not present. Validation of the findings from this study in humans is, therefore, critical. This should ideally be performed in a well-characterized glaucoma cohort, in which case and control status has been assessed by qualified clinicians.

      We appreciate the comment regarding the challenges of glaucoma ascertainment in UK Biobank. This is a valid limitation, as glaucoma in UK Biobank is based on self-reports and hospital records rather than comprehensive ophthalmologic examinations for all participants. To the best of our knowledge, there is no comparably sized dataset where all participants have undergone standardized glaucoma assessments, comprehensive metabolomic profiling, and high-throughput genotyping. Work is currently ongoing to link UK Biobank data to ophthalmic EMR data, which will help confirm self-reported diagnoses. This work is not complete, and the coverage of the cohort from such linkage is uncertain at present. Nonetheless, several factors speak to the validity of our findings. The top members of the metabolomic signature associated with resilience in the top decile of glaucoma polygenic risk score (PRS) decile—lactate (P=8.8E-12) and pyruvate (P=1.9E-10) —had robust values for statistical significance after appropriate adjustment for multiple comparisons, with additional validation for pyruvate in a human-relevant, glaucoma mouse model. Strikingly, the glaucoma odds ratio (OR) for subjects in the highest quartile of glaucoma PRS and metabolic risk score (MRS) was 25-fold, using participants in the lowest quartile of glaucoma PRS and MRS as the reference group. An effect size this large for a putative glaucoma determinant has only been seen for intraocular pressure (IOP), which is now largely accepted to be in the causal pathway of the disease.

      The Discussion now contains the following statement: “A second limitation is that glaucoma ascertainment in the UK Biobank is based on self-reported diagnoses and hospital records rather than comprehensive ophthalmologic examinations. Nonetheless, it is reassuring that the prevalence of glaucoma in our sample (~4%) is similar to a directly performed disease burden estimate in a comparable, albeit slightly older, United Kingdom sample (2.7%)(79)”. (Lines 379-382)

      The authors indicate that within the top decile of PRS participants with glaucoma are more likely to be of white ethnicity, while they are more likely to be of Black and Asian ethnicity if they are in the bottom half of PRS. Have the authors explored how sensitive their predictions are to ethnicity? Since their cohort is predominantly of European ancestry (85.8%), would it make sense to exclude other ethnicities to increase the homogeneity of the cohort and reduce the risk for confounders that may not be explicitly accounted for?

      Comparing data in Tables 3 and 4 of the manuscript, we observe that, on a percentage basis, more individuals have glaucoma in the highest 10th percentile of risk compared to the lowest 50th percentile of risk across all ancestral groups.  We recently reported that the risk of glaucoma increases with each standard deviation increase in the glaucoma PRS across ancestral groups in the UK Biobank, utilizing a slightly different sample size (see Author response table 1 below). (1)Since the PRS is applicable across ancestral groups, we aim to make our results as generalizable as possible; therefore, we prefer to report our findings for all ethnic groups and not restrict our results to Europeans.

      Author response table 1.

      Performance of the mtGPRS Across Ancestral Groups in the UK Biobank

      Abbreviations: mtGPRS, multitrait analysis of GWAS polygenic risk score; OR, odds ratio; CI, confidence interval.(1)

      UK Biobank ancestry was genetically inferred based on principal component analysis. The OR represents the risk associated with each standard deviation change in mtGRS and is adjusted for multiple covariates including age, sex, and medical comorbidities.

      In the discussion, we stated that “... we chose to analyze Europeans and non-Europeans together to make the results as generalizable as possible.” (Lines 378-379)

      The authors discuss the importance of pyruvate, and lactate for retinal ganglion cell survival, along with that of several lipoproteins for neuroprotection. However, there is a distinction to be made between locally produced/available glycolysis end products and lipoproteins and those circulating in the blood. It may be useful to discuss this in the manuscript, and for the authors to explore if plasma metabolites may be linked to metabolism that takes place past the blood-retinal barrier.

      As the reviewer points out, it is crucial to interpret the results for lipoproteins within the context of their access to the blood-retinal barrier. Even for smaller metabolites like pyruvate and lactate, it is essential to consider local production versus serum-derived molecules in mediating any neuroprotective effects. Our murine data suggest that exogenous pyruvate contributed to neuroprotection. However, for the other glycolysis-related metabolites (lactate and citrate), we cannot rule out the possibility that locally produced metabolites may also contribute to neuroprotection. None of the lipoproteins identified as potential resilience biomarkers had an adjusted P-value of less than 0.05. Nevertheless, HDL analytes can cross blood-ocular barriers to enter the aqueous humor.(2) Therefore, it is also possible for serum-derived HDL to influence retinal ganglion cell homeostasis. Overall, much more research is needed to clarify the roles of locally produced versus serum-derived factors in conferring resilience to genetic predisposition to glaucoma.

      We have added the following sentences to the discussion:

      “Notably, although our validation data confirm the neuroprotective effects of exogenous pyruvate, it remains possible that endogenously produced pyruvate within ocular tissues may also contribute to RGC protection.” (Lines 329-331)

      “Furthermore, as HDL analytes can cross blood-ocular barriers,(78) there is a plausible route for serum-derived HDL to influence RGC homeostasis. Nonetheless, the relative contributions of circulating lipoproteins versus local synthesis within ocular tissues remain unclear and warrant further investigation.” (Lines 355-358)

      “Incorporating ocular physiology and blood-retinal barrier considerations when interpreting lipoproteins as potential resilience biomarkers will be critical for future studies aimed at understanding and therapeutically targeting increased glaucoma risk.” (Lines 360-363)

      Reviewer #2 (Public review):

      Summary

      The authors have used the UK Biobank data to interrogate the association between plasma metabolites and glaucoma.

      (1) They initially assessed plasma metabolites as predictors of glaucoma: The addition of NMR-derived metabolomic data to existing models containing clinical and genetic data was marginal.

      (2) They then determined whether certain metabolites might protect against glaucoma in individuals at high genetic risk: Certain molecules in bioenergetic pathways (lactate, pyruvate, and citrate) conferred protection.

      (3) They provide support for protection conferred by pyruvate in a murine model.

      Strengths

      (1) The huge sample size supports a powerful statistical analysis and the opportunity for the inclusion of multiple covariates and interactions without overfitting the models.

      (2) The authors have constructed a robust methodology and statistical design.

      (3) The manuscript is well written, and the study is logically presented.

      (4) The figures are of good quality.

      (5) Broadly, the conclusions are justified by the findings.

      We thank the reviewer for these supportive comments.

      Weaknesses

      (1) Although it is an invaluable treasure trove of data, selection bias and self-reporting are inescapable problems when using the UK Biobank data for glaucoma research. The high-impact glaucoma-related GWAS publications (references 26 and 27) referenced in support of the method suffer the same limitations. This doesn't negate the conclusions but must be taken into consideration. The authors might note that it is somewhat reassuring that the proportion of glaucoma cases (4%) is close to what would be expected in a population-based study of 40-69-year-olds of predominantly white ethnicity.

      While there are limitations when open-angle glaucoma (OAG) is ascertained by self-report, as discussed above, we agree with the reviewer that the prevalence of glaucoma is consistent with data from population-based studies of Europeans who are 40-69 years of age. 

      We also want to point out that references 26 and 27 indicate glaucoma self-reports can be an acceptable surrogate for OAG that is ascertained by clinical evaluation. Consider the methodologic details for each study:

      Reference 26 is a 4-stage genome-wide meta-analysis to identify loci for OAG from 21 independent populations. The phenotypic definition of OAG was based on clinical assessment in the discovery stage, and 7286 glaucoma self-reports from the UK Biobank served as an effective replication set.  It is also important to note that 120 out of the 127 discovered OAG loci were nominally replicated in 23andMe, where glaucoma was ascertained entirely by self-report.

      Reference 27 is a genome-wide meta-analysis to identify IOP genetic loci, an important endophenotype for OAG. The study identified 112 loci for IOP. These loci were incorporated into a glaucoma prediction model in the NEIGHBORHOOD study and the UK Biobank. The area under the receiver operator curve was 0.76 and 0.74, respectively, in these studies. While the AUCs were similar, OAG was ascertained clinically in NEIGHBORHOOD and largely by self-report in UK Biobank. 

      Finally, a strength of the UK Biobank is that selection bias is minimized. Patients need not be insured or aligned to the study for any reason aside from being a UK resident. There is indeed a healthy bias in the UK Biobank. Ambulatory patients who tend to be health conscious and willing to donate their time and provide biological specimens tend to participate. We agree with the reviewer that the use of self-reported cases does not negate the conclusions, and hopefully, future iterations of the UK Biobank where clinical validation of self-reports are performed will confirm these findings, which already have some validation in a preclinical glaucoma model.

      We add the following sentence to the first action item above regarding our case ascertainment method. “Nonetheless, it is reassuring that the prevalence of glaucoma in our sample (~4%) is similar to a directly performed disease burden estimate in a comparable, albeit slightly older, United Kingdom sample (2.7%)..”(3) (Lines 381-383)

      (2) As noted by the authors, a limitation is the predominantly white ethnicity profile that comprises the UK Biobank. 

      (3) Also as noted by the authors, the study is cross-sectional and is limited by the "correlation does not imply causation" issue.

      While the epidemiological arm of our study was cross-sectional, the studies testing the ability of pyruvate to mitigate the glaucoma phenotype in mice with the Lmxb1 mutation were prospective.

      We already pointed out in the discussion that pyruvate supplementation reduced glaucoma incidence in a human-relevant genetic mouse model.

      (4) The optimal collection, transport, and processing of the samples for NMR metabolite analysis is critical for accurate results. Strict policies were in place for these procedures, but deviations from protocol remain an unknown influence on the data.

      Comments 4 and 5 are related and will be addressed after comment 5.

      (5) In addition, all UK Biobank blood samples had unintended dilution during the initial sample storage process at UK Biobank facilities. (Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat Commun 14, 604 (2023) Samples from aliquot 3, used for the NMR measurements, suffered from 5-10% dilution. (Allen, Naomi E., et al. Wellcome Open Research 5 (2021): 222.) Julkunen et al. report that "The dilution is believed to come from mixing of participant samples with water due to seals that failed to hold a system vacuum in the automated liquid handling systems. While this issue is likely to have an impact on some of the absolute biomarker concentration values, it is expected to have limited impact on most epidemiological analyses."

      We thank the reviewer for making us aware of the unintended sample dilution issue from aliquot 3, used for NMR metabolomics in UK Biobank participants. While ~98% of samples experienced a 5-10% dilution, this would not affect our reported results, which did not rely on absolute biomarker concentrations. All metabolites in the main tables were probit transformed and used as continuous variables per 1 standard deviation increase.  Nonetheless, in supplemental material, we show the unadjusted median levels of pyruvate (in mmol/L) were higher in participants without glaucoma vs those with glaucoma, both in the population overall and in those in the top 10 percentile of glaucoma risk. 

      Furthermore, we see no evidence in the literature that unidentified protocol deviations might impact metabolite results in UK Biobank participants. For example, a recent study evaluated the relationship between a weighted triglyceride-raising polygenic score (TG.PS) and type 3 hyperlipidemia (T3HL) in the Oxford Biobank (OBB) and the UK Biobank. In both biobanks, metabolomics was performed on the Nightingale NMR platform. A one standard deviation increase in TG.PS was associated with a 13% and 15.2% increased risk of T3HL in the OBB and UK Biobank, respectively.(4) Replication of the OBB result in the UK Biobank suggests there are no additional concerns regarding the processing of the UK Biobank for NMR metabolomics. Of course, we remain vigilant for protocol deviations that might call our results into question and will seek to validate our findings in other biobanks in future research.

      Impact

      The findings advance personalized prognostics for glaucoma that combine metabolomic and genetic data. In addition, the protective effect of certain metabolites influences further research on novel therapeutic strategies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Given the uncertainty in the proportion of controls with undiagnosed glaucoma, it may be appropriate to include a sensitivity analysis in the manuscript. The authors could then provide the readers with an estimate of how sensitive their predictions are to the proportion of undiagnosed individuals among controls.

      Since UK Biobank participants did not undergo standardized clinical assessments, it is not possible to perform sensitivity analyses as we don’t know which controls might have glaucoma, although we can offer the following comments.

      We are performing a cross-sectional, prospective, detailed glaucoma assessment of participants in the top and bottom 10 percent of genetic predisposition recruited from BioMe at Icahn School of Medicine at Mount Sinai and Mass General Brigham Biobank at Harvard Medical School. We find that 21% of people in the top decile of genetic risk have glaucoma,(5) which compares reasonably well to the 15% of people in the top 10% of genetic risk in the UK Biobank. This underscores the assertion that our definition of glaucoma in the UK Biobank, while not ideal, is a reasonable surrogate for a detailed clinical assessment.

      Currently, 10,077 subjects in the top decile of glaucoma genetic predisposition did not meet our definition of glaucoma. If we assume that the glaucoma prevalence is 3% and 50% of people with glaucoma are undiagnosed, then that would translate to an additional 150 cases misclassified as controls, which could either drive our result to the null, have no impact on our current result or contribute to a false positive result, depending on their pyruvate (and other metabolite) levels.   

      We have already addressed the issue of a lack of standardized exams in the UK Biobank and the need for more studies to confirm our findings.

      Reviewer #2 (Recommendations for the authors):

      (1) I am curious about the proposed reason for some individuals having metabolic profiles conferring resilience. Plasma pyruvate levels are normally distributed. Is it simply the case that some individuals with naturally high levels of pyruvate are fortuitously protected against glaucoma? Some sort of self-regulation mechanism seems unlikely.

      Thank you for your insightful question regarding the potential mechanism underlying the association between pyruvate levels and glaucoma resilience. There may be modest inter-individual differences which can have significant physiological implications, particularly in the context of neurodegeneration and metabolic stress. One possibility is that individuals with naturally higher pyruvate levels may benefit from pyruvate's known neuroprotective and metabolic support functions(6–8), which could confer resilience against the oxidative and bioenergetic challenges associated with glaucoma. Pyruvate is important for cellular metabolism, redox balance, and mitochondrial function - processes that are increasingly implicated in glaucomatous neurodegeneration. (9)Elevated pyruvate levels support mitochondrial ATP production(10), buffer oxidative stress,(11) and impact metabolic flux(12,13) through pathways such as the tricarboxylic acid cycle and NAD+/NADH homeostasis. This is consistent with prior studies suggesting that mitochondrial dysfunction contributes to retinal ganglion cell vulnerability in glaucoma.(14–17) While a direct self-regulation mechanism may seem unlikely, both genetic and environmental factors can influence pyruvate metabolism, which could lead to subtle but clinically meaningful variations in its levels. Our findings are supported by validation in a mouse model, which suggests that the association is less likely fortuitous, but there may be an underlying biological process that merits further mechanistic investigation. Future studies incorporating longitudinal metabolic profiling and functional validation in human-derived models will help better understand this relationship.

      (2) Conceivably, the higher levels of pyruvate and lactate may have resulted from recent exercise and may reflect individuals with high levels of exercise that confers resilience against glaucoma by independent mechanisms such as improved blood flow. Any way to rule that out from the UK Biobank data?

      Thank you for raising this important point. To account for the potential confounding effects of physical activity, we adjusted for metabolic equivalents of task (METs) in our models, a standardized measure of physical activity available in the UK Biobank. By incorporating METs as a covariate, we aimed to minimize the influence of individual exercise levels on plasma pyruvate and lactate levels. This helps us ascertain that the observed associations are not solely attributable to differences in physical activity. However, we do acknowledge that longitudinal analysis of exercise patterns would provide further clarity on this relationship. 

      (3) It may be worth mentioning that the retinal ganglion cells contain a plasma membrane monocarboxylate transporter that supports pyruvate and lactate uptake from the extracellular space.

      Thank you for this extremely insightful suggestion on retinal ganglion cell (RGC) expression of monocarboxylate transporters, which can facilitate the uptake of pyruvate and lactate from the extracellular space. This is relevant for our study, given the high metabolic demands of RGCs and their reliance on both glycolytic and oxidative metabolism for neuroprotection and survival.

      We acknowledged this in the discussion section of the manuscript by adding the following statement: "RGCs express monocarboxylate transporters, which facilitate the uptake of extracellular pyruvate and lactate, improving energy homeostasis, neuronal metabolism, and survival.” (Lines 309-311)

      (4) The mechanism of protection in the mice, at least in part, is likely due to the lower IOP in the pyruvate-treated animals. Did the authors investigate the influence of pyruvate on IOP in the UK Biobank data (about 110,000 individuals had IOP measurements)?

      Thank you for your suggested investigation. We ran the suggested analysis among 68,761 individuals with IOP measurements and metabolomic profiling. Imputed pretreatment IOP values for participants using ocular hypotensive agents were calculated by dividing the measured IOP by 0.7, based on the mean IOP.

      We plotted the relationship between IOP and pyruvate levels (probit transformed). We compared participants with pyruvate levels +2 standard deviations, above the mean (red dashed line), which has a probit-transformed value of 2 and an absolute concentration of 0.15 mmol/L. We found a statistically significant difference between the groups (p=0.017) using the Welch two-sample t-test. We have not added this analysis to the manuscript, but readers can find the data here as the reviews are public. We acknowledge and addressed the dilutional issue above, where we utilized probit-transformed metabolite levels analyzed as continuous variables per 1 SD increase, rather than absolute concentrations.

      Author response image 1.

      (5) Line 88: I suggest changing "patients" to "affected individuals". The term "patients" tends to imply that the individual has already been diagnosed, but the idea being conveyed is about underdiagnosis in the population.

      Thank you for your suggestion.

      We have added the change from "patients" to "affected individuals" in the introduction. (Line 90)

      (6) Line 93: "However, glaucoma is also significantly affected by environmental and lifestyle factors,10-14". Although lifestyle risk factors such as caffeine intake, alcohol, smoking, and air pollution have been reported, the associations are generally weak and inconsistently reported. Consider modifying this notion to stress the emerging evidence around gene-environment interactions (reference 14) rather than environmental factors per se, with the implication that genes + metabolism may be greater than the sum of the parts.

      Thank you for this thoughtful suggestion to highlight gene-environment interactions, where genetic susceptibility may amplify or mitigate the impact of metabolic and environmental influences on glaucoma progression. We have revised the statement to better reflect the synergistic effects of genetics and metabolism rather than considering environmental factors in isolation.

      Revised sentence for inclusion in the introduction of the manuscript: "Glaucoma risk is influenced by both genetic and metabolic factors, with emerging evidence suggesting that gene-environment interactions may play a greater role in conferring disease risk than independent exposures alone.” (Lines 95-97)

      (7) Lines 156-161: In model 4, rather than stating that the very small increase in AUC with the addition of metabolic data compared to clinical and genetic data alone, "modestly enhances the prediction of glaucoma", it may be better interpreted as a marginal difference that was statistically significant due to the very large sample size but not clinically significant.

      Thank you for your suggested comment.

      We have adjusted the wording by changing “modestly” to “marginally” to address that the statistical significance is in the context of the study’s large sample size in the results section (Line 162) and throughout the manuscript.

      NB: We made other minor edits to correct minor grammatical errors, improve clarity, and streamline the revised manuscript. Furthermore, the paragraph regarding slit lamp examination in the Methods was inadvertently omitted but is added back in the revised manuscript (Lines 571-579).

      References:

      (1) Kim J, Kang JH, Wiggs JL, et al. Does Age Modify the Relation Between Genetic Predisposition to Glaucoma and Various Glaucoma Traits in the UK Biobank? Invest Ophthalmol Vis Sci. 2025;66(2):57. doi:10.1167/iovs.66.2.57

      (2) Cenedella RJ. Lipoproteins and lipids in cow and human aqueous humor. Biochim Biophys Acta BBA - Lipids Lipid Metab. 1984;793(3):448-454. doi:10.1016/0005-2760(84)90262-5

      (3) Minassian DC, Reidy A, Coffey M, Minassian A. Utility of predictive equations for estimating the prevalence and incidence of primary open angle glaucoma in the UK. Br J Ophthalmol. 2000;84(10):1159-1161. doi:10.1136/bjo.84.10.1159

      (4) Pieri K, Trichia E, Neville MJ, et al. Polygenic risk in Type III hyperlipidaemia and risk of cardiovascular disease: An epidemiological study in UK Biobank and Oxford Biobank. Int J Cardiol. 2023;373:72-78. doi:10.1016/j.ijcard.2022.11.024

      (5) Zhao H, Pasquale LR, Zebardast N, et al. Screening by glaucoma polygenic risk score to identify primary open-angle glaucoma in two biobanks: An updated report. ARVO 2025 meeting. Published online 2025.

      (6) Zilberter Y, Gubkina O, Ivanov AI. A unique array of neuroprotective effects of pyruvate in neuropathology. Front Neurosci. 2015;9. doi:10.3389/fnins.2015.00017

      (7) Quansah E, Peelaerts W, Langston JW, Simon DK, Colca J, Brundin P. Targeting energy metabolism via the mitochondrial pyruvate carrier as a novel approach to attenuate neurodegeneration. Mol Neurodegener. 2018;13(1):28. doi:10.1186/s13024-018-0260-x

      (8) Gray LR, Tompkins SC, Taylor EB. Regulation of pyruvate metabolism and human disease. Cell Mol Life Sci. 2014;71(14):2577-2604. doi:10.1007/s00018-013-1539-2

      (9) Harder JM, Guymer C, Wood JPM, et al. Disturbed glucose and pyruvate metabolism in glaucoma with neuroprotection by pyruvate or rapamycin. Proc Natl Acad Sci. 2020;117(52):33619-33627. doi:10.1073/pnas.2014213117

      (10) Kim MJ, Lee H, Chanda D, et al. The Role of Pyruvate Metabolism in Mitochondrial Quality Control and Inflammation. Mol Cells. 2023;46(5):259-267. doi:10.14348/molcells.2023.2128

      (11) Wang X, Perez E, Liu R, Yan LJ, Mallet RT, Yang SH. Pyruvate Protects Mitochondria from Oxidative Stress in Human Neuroblastoma SK-N-SH Cells. Brain Res. 2007;1132(1):1-9. doi:10.1016/j.brainres.2006.11.032

      (12) Tilton WM, Seaman C, Carriero D, Piomelli S. Regulation of glycolysis in the erythrocyte: role of the lactate/pyruvate and NAD/NADH ratios. J Lab Clin Med. 1991;118(2):146-152.

      (13) Li X, Yang Y, Zhang B, et al. Lactate metabolism in human health and disease. Signal Transduct Target Ther. 2022;7(1):305. doi:10.1038/s41392-022-01151-3

      (14) Zhang ZQ, Xie Z, Chen SY, Zhang X. Mitochondrial dysfunction in glaucomatous degeneration. Int J Ophthalmol. 2023;16(5):811-823. doi:10.18240/ijo.2023.05.20

      (15) Ju WK, Perkins GA, Kim KY, Bastola T, Choi WY, Choi SH. Glaucomatous optic neuropathy: Mitochondrial dynamics, dysfunction and protection in retinal ganglion cells. Prog Retin Eye Res. 2023;95:101136. doi:10.1016/j.preteyeres.2022.101136

      (16) Jassim AH, Inman DM, Mitchell CH. Crosstalk Between Dysfunctional Mitochondria and Inflammation in Glaucomatous Neurodegeneration. Front Pharmacol. 2021;12. doi:10.3389/fphar.2021.699623

      (17) Yang TH, Kang EYC, Lin PH, et al. Mitochondria in Retinal Ganglion Cells: Unraveling the Metabolic Nexus and Oxidative Stress. Int J Mol Sci. 2024;25(16):8626. doi:10.3390/ijms25168626

    1. eLife Assessment

      The authors examine the role of Numb, a Notch inhibitor, in intestinal stem cell self-renewal in Drosophila during homeostasis and regeneration. The significance is important as the authors demonstrate the ISC maintenance is reduced when both BMP signaling and Numb expression is reduced. The strength of evidence is convincing as large sample sizes and statistical analyses are provided.

    2. Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. But the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that numb15 and numb4 clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb4 and numb15 clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths

      i. Data are quantified with statistical analysis<br /> ii. Experiments have appropriate controls and large numbers of samples<br /> iii. Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses

      None noted in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, eLife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.<br /> (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:<br /> (1) Aspects of the experiments could be better controlled or annotated:<br /> (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.<br /> (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      (3) Additional quantification of many phenotypes would be desired.<br /> (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).<br /> (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.<br /> (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?<br /> (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?<br /> (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.<br /> (f) How many times were experiments performed?

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      Comments on revisions:

      The authors have by and large addressed my main points.

    4. Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad1-2 allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers a clearer picture how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad1-2 allele. The mad RNAi and mad1-2 alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad1-2 allele, and sometime the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage, and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak, and more variable, the use of specific reagents is important.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      For the revision, the authors have provided detailed responses, comments, and a revised manuscript that together satisfactorily answer all my questions. The manuscript read well and the flow of information is quite clear. I do not have further concerns and support the manuscript moving forward.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. However the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that numb<sup>15</sup> and numb<sup>4</sup> clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb<sup>4</sup> and numb<sup>15</sup> clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths:

      (i) Most data is quantified with statistical analysis

      (ii) Experiments have appropriate controls and large numbers of samples

      (iii) Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses:

      (i) No quantification for Fig. 1

      Quantification of Fig.1 has been added. 

      (ii) The premise is a bit unclear. Under homeostasis, strong loss of BMP (Put) leads to loss of ISCs, presumably regardless of Numb level (which was not tested). But moderate loss of BMP (Mad) does not show ISC loss unless Numb is also reduced. I am confused as to why numb does not play a role in Put mutants. Did the authors test whether concomitant loss of Put and Numb leads to even more ISC loss than Put-mutation alone.

      We have tested the genetic interaction between put and numb using Put RNAi and Numb RNAi driven by esg<sup>ts</sup>. According to the results in this study and our previously published data, put mutant clone or esg<sup>ts</sup> > Put-RNAi induced a rapid loss of ISC (whin 8 days). We did not observe further enhancement of stem cell loss phenotype in Put and Numb double RNAi guts.

      (iii) I think that the use of the word "essential" is a bit strong here. Numb plays an important role but in either during homeostasis or regeneration, most numb clones or mad, numb double mutant clones still have ISCs. Therefore, I think that the authors should temper their language about the role of Numb in ISC maintenance.

      We have revised the language and changed “essential” to important”.

      Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, Elife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.

      (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:

      (1) Aspects of the experiments could be better controlled or annotated:

      (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.

      Thank you for the suggestion. In fact, we conducted all the analyses in region 4, we have added statement to clarify this in the revised manuscript.

      (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      We kept the flies at 18°C to avoid spontaneous clone.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      In our hand, Dl-RNAi is very effective and exhibited loss of N pathway activity (as determined by the N pathway reporter Su(H)-lacZ ) after RNAi for 8 days (Fig. 1D). Therefore, the ectopic Su(H)-lacZ expression in Punt Dl double RNAi (fig. 1E) is unlikely due to residual Dl expression. Nevertheless, we have changed the statement “BMP signaling blocks ligand-independent N activity” to” Loss of BMP signaling results in ectopic N pathway activity even when Dl is depleted”

      (3) Additional quantification of many phenotypes would be desired.

      (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).

      We focused on R4 region for quantification where the cell density did not exhibit apparent change in different experimental groups. In addition, we have examined many guts for quantification. It is very unlikely that the difference in the esg-GFP+ cell number is caused by change in cell density.

      (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.

      Unfortunately, we didn’t have the suggested quantification. However, we believe that quantification of the percentage of ISC or EB among all progenitor cells, as we did here, provides a meaningful measurement of the self-renewal status of each experimental group.

      (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?

      Quantification of Fig.1 has been added. 

      (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?

      Anti-Pros and anti-E(spl)mβ-CD2 were stained in the same channel (magenta).  Pros+ exhibited “dot-like” nuclear staining while CD2 staining outlined the cell membrane of EBs. We have clarified this in the revised figure legend.

      (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.

      Because of the heterogeneity of clone size for each genotype, we chose to group clones based on their sizes ( 2, 3-6, 6-8, >8 cells) and quantified the distribution of individual groups for each genotype, which clearly showed an overall reduction in clone size for mad numb double mutant clones. We and others have used the same clone size analysis in previous studies (e.g., Tian and Jiang, eLife 2014).

      (f) How many times were experiments performed?

      All experiments were performed at least 3 times.

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      Guts containing numb<sup>4</sup> clones treated with DSS exhibited a slight reduction of clone size, evident by a higher percentage of 2-cell clones and lower percentage of > 8 cell clones. This reduction is less significant in guts containing numb<sup>15</sup> clones. However, the percentage of Dl<sup>+</sup>-containing clones is similar between DSS and mock-treated guts. It is possible that ISC proliferation is lightly reduced due to numb<sup>4</sup> mutation or the genetic background of this stock.

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      We have modified the sentence.

      Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad<sup>1-2</sup> allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers with a clearer picture of how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      Thanks for the comment. We have tested other components of BMP pathway in our previously study (Tian et al., 2014). More complete loss of BMP signaling (for example, Put clones, Put RNAi, Tkv/Sax double mutant clones or double RNAi) resulted in ISC loss regardless the status of numb, suggesting a more predominant role of BMP signaling in ISC self-renewal compared with Numb. We speculate that the weak stem cell loss phenotype associated with numb mutant clones in otherwise wild type background could be due to fluctuation of BMP signaling in homeostatic guts.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad<sup>1-2</sup> allele. The mad RNAi and mad<sup>1-2</sup> alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad<sup>1-2</sup> allele, and sometimes the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      We used mad<sup>1-2</sup> and mad RNAi here to test the genetic interaction with numb because our previous studies showed that partial loss of BMP signaling under these conditions did not cause stem cell loss, therefore, may provide a sensitized background to determine the role of Numb in ISC self-renewal. The increased proliferation of ISC/ clone size associated with mad<sup>1-2</sup> and mad RNAi is due to the fact that reduction of BMP signaling in either EC or EB non-autonomously induces stem cell proliferation. However, in mad numb double mutant clones, there was a reduction in clone size due to loss of ISC in many clones.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage, and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to a highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak and more variable, the use of specific reagents is important.

      We have included information about the two numb alleles in the “Materials and Methods”. numb<sup>15</sup> is a null allele, and the nature of numb<sup>4</sup> has not been elucidated. According to Domingos, P.M. et al., numb<sup>15</sup> induced a more severe phenotype than numb<sup>4</sup> did. Consistently, we also found that more numb<sup>15</sup> mutant clones were void of stem cell than numb<sup>4</sup> mutant clones.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      Activation of BMP (esgts>Tkv<sup>CA</sup>) alone induced stem cell tumor (Tian et al., 2014) whereas overexpression of Numb did not induce increase stem cell number although overexpression of Numb in wing discs produced phenotypes indictive of inhibition of N (our unpublished observation), making it difficult to test the synergistic effect of activating both BMP and Numb.

      Reviewer #1 (Recommendations for the authors):

      - Cartoon of RGT in Fig 4 needs to be improved. We need to know what chromosome harbors the esgts. It is not sufficient to simply put the location of the ubi-GFP and ubi-RFP (on 19A) and not show the location of other components of the RGT system.

      Thank you for the suggestion. We have revised the cartoon in Fig. 4 to include all three pairs of chromosomes and indicate where the esgts driver and UAS-RNAi are located. In addition, we have included the genotypes for all the genetic experiments in the Method section.

      - Quantification of the results in Fig. 1

      Quantification of Fig.1 has been added. 

      - The authors need to explain the premise more carefully (see above) and explain whether or not they tested put, numb double knockdowns.

      We have explained why not testing put numb double RNAi (see above).

      Reviewer #2 (Recommendations for the authors):

      The number of times the experiments have been performed would be useful to include.

      This information has been added in the figure legends.

    1. eLife Assessment

      This valuable study shows that an odorant that is typically thought of as a repellant actually activates both attractant and repellant olfactory neurons in C. elegans. Solid evidence is provided that nematode worms can integrate signals using different pathways to drive different behavioral responses to the same cue. These findings will be of interest to scientists interested in combinatorial coding in sensory systems.

    2. Reviewer #1 (Public review):

      The authors investigated the response of worms to the odorant 1-octanol (1-oct) using a combination of microfluidics-based behavioral analysis and whole-network calcium imaging. They hypothesized that 1-oct may be encoded through two simultaneous, opposing afferent pathways: a repulsive pathway driven by ASH, and an attractive pathway driven by AWC. And the ultimate chemotactic outcome is likely determined by the balance between these two pathways.

      It is not surprising that 1-octanol is encoded as attractive at low concentrations and repulsive at higher concentrations. However, the novel aspect of this study is the discovery of the combinatorial coding of 1-oct in the periphery, where it serves as both an attractant and a repellent. Furthermore, the study uses this dual encoding as a model to explore the neural basis of sensory-driven behaviors at a whole-network scale in this organism. The basic conclusions of this study are well supported by the behavioral and imaging experiments, though there are certain aspects of the manuscript that would benefit from further clarification.

      A key issue is that several previous studies have demonstrated a combinatorial and concentration-dependent coding of odorant sensing in the nematode peripheral nervous system. Specifically, ASH and AWC are the primary receptors for repellent and attractive responses, respectively. However, other neurons such as AWB, AWA, and ADL are also involved in the coding process. These neurons likely communicate with different interneurons to contribute to 1-oct-induced outputs. The authors' conclusion that loss of tax-4 reduces attractive responses and that osm-9 mutants reduce repulsive responses is not entirely convincing. TAX-4 is required for both AWC (an attractive neuron) and AWB (a repulsive neuron), and osm-9 is essential for ASH, ADL, and AWA (attraction-associated). Therefore, the observed effects on the attractive and repulsive responses could be more complex. Additionally, the interpretation of results involving the use of IAA to reduce the contribution of AWC at lower concentrations lacks clarity. A more effective approach might involve using transgenically expressed miniSOG or histamine (HisCl1) to specifically inhibit AWC neurons.

      The authors did not observe any increased correlation between motor command interneurons and sensory neurons, which is consistent with the absence of a consistent relationship between state transitions and 1-oct application. Furthermore, they did not observe significant entrainment of AIB activity with the 2.2 mM 1-oct application. This might be due to the animals being anesthetized with 1 mM tetramisole hydrochloride, which could affect neural activity and/or feedback from locomotion. It is unclear whether subtracting AVA activity from AIB activity provides a valid measure. Similarly, it is unclear how the behavioral data from freely moving worms compares to the whole-network calcium imaging results obtained from immobilized worms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used whole-network imaging to identify sensory neurons that responded to the repellant 1-octanol. While several olfactory neurons responded to the initial onset of odor pulses, two neurons consistently responded to all the pulses, ASH and AWC. ASH typically activates in response to repellants, and AWC typically activates in response to the removal of attractants. However, in this case, AWC activated in response to the removal of 1-octanol, which was unexpected because 1-octanol is a harmful repellant to the worm. The authors further investigated this phenomenon by testing different concentrations of 1-octanol in a chemotaxis assay and found that at lower (less harmful) concentrations the odor is actually an attractant, but becomes repulsive at higher concentrations. The amplitude of the ASH response appeared to be modulated by concentration, but this was not true for AWC. The authors propose a model where the behavioral response of the worm is the result of integrating these two opposing drives, where repulsion is a result of the increased ASH activity overriding the positive drive from AWC. The authors further tested this theory by testing mutants that ablated the AWC response (tax-4) or ASH response (osm-9), which produced results consistent with their hypothesis. While the interneuron(s) that integrate these signals to influence behavior were not identified, the authors did find that increasing concentrations of 1-octanol did increase the likelihood of AVA activity, a neuron that drives reversals (and hence, behavioral repulsion).

      Strengths:

      This was simple and elegant work that identified specific neurons of interest which generated a hypothesis, which was further tested with mutants that altered neuronal activity. The authors performed both neuronal imaging and behavioral experiments to verify their claims.

      Weaknesses:

      tax-4, but not osm-9 mutants were used in chemotaxis and imaging assays. It would have been nice to have osm-9 results as well for these assays. The mutants are not specific to AWC and ASH. Cell-specific rescue of these neurons would have strengthened the proposed model.

    4. Reviewer #3 (Public review):

      Summary:

      This work describes how two chemosensory neurons in C. elegans drive opposite behaviors in response to a volatile cue. Because they have different concentration dependencies, this leads to different behavioral responses (attraction at low concentration and repulsion at high concentration). It has been known that many odorants that are attractive at low concentrations are aversive at high concentrations, and the implicated neurons (at least AWC for attraction and ASH for repulsion) have been well established. Nonetheless, studying behavior and neural responses in a common context (odor pulses, as opposed to gradients) provides a clear picture of how these sensory neurons may guide the dose-dependent response by separately modulating odor entry and odor exit behaviors.

      Strengths:

      (1) There is good evidence that worms are attracted to low concentrations and repelled by high concentrations of 1-oct. Calcium imaging also makes it clear that dose dependence is stronger for ASH than AWC.

      (2) There is good evidence for conc. dependent responses via ASH (Figure 4E) and attractive inhibition via tonic IAA (Figure 7A).

      (3) This work presents calcium imaging and behavior with the same stimulus (sudden pulses in volatile odor concentration), while previous studies often focus on using neuronal responses to pulses to understand the navigation of gentle gradients.

      Weaknesses:

      (1) It is not clear precisely how important AWC is (compared to other cells) for the attractive response, though the presence of odor-off behavior implicates it. This could be resolved by looking at additional mutants (tax-4 is broad).

      (2) Relatedly, dose-dependent chemotaxis data (Figure 4C, D) should be provided for osm-9 animals to get a sense of the degree to which dose-dependence is explained by ASH.

      (3) Figure 4A, B should include average traces with errors, as there are several ways the responses can vary across conditions.

      (4) The data in Figure 6G does not appear to have error bars. Also, it would help to include a more conventional demonstration of AIB responding to stimuli (e.g. averaging stimulus-aligned responses as a percent of the fluorescence value at stimulus onset to perform the desired subtraction). Subtracted calcium traces are harder to interpret. As it stands, the evidence that sensory signals are persisting in AIB and not being shunted by proprioceptive feedback in microfluidic devices is not strong.

    5. Author response:

      We thank the reviewers for their thoughtful comments on our submitted manuscript.

      The major point from all three reviewers was that the sensory inputs may be more complex than simply ASH and AWC, since mutations in osm-9 and tax-4 will affect many more sensory neurons. We fully agree. The differential effects of osm-9 and ta_x-_4 allowed us to recognize that there were two distinct afferent pathways operating simultaneously, mediating repulsion and attraction separately. However, it remains to be determined which sensory neurons are contributing to each pathway. We have planned a full analysis of the sensory inputs, not limited to just ASH and AWC, using neuron-specific rescue and neuron-specific chemogenetic inactivation (using HisCl1). While this analysis falls outside the scope of the present study, we will perform the inactivations of ASH and AWC and include the data for the revised version of this study. We expect to demonstrate whether ASH and AWC inputs are sufficient or whether other sensory neurons make significant contributions. Additionally, we will include chemotaxis dose-response data for osm-9 mutants as part of this analysis and make the minor corrections in data presentation requested.

    1. eLife Assessment

      This valuable study sets new standards in analyzing the ultrastructure of insect eyes, which have long served as models for understanding how vision works. The way it describes an entire eye with the resolution of electron microscopy is convincing. On top of this, a miniaturized visual system provides additional, remarkable insights towards understanding optimized solutions.

    1. eLife Assessment

      This valuable study sets new standards in analyzing the ultrastructure of insect eyes, which have long served as models for understanding how vision works. The way it describes an entire eye with the resolution of electron microscopy is convincing. On top of this, a miniaturized visual system provides additional, remarkable insights towards understanding optimized solutions.

    2. Reviewer #2 (Public review):

      Summary:

      Makarova et al. provide the first complete cellular-level reconstruction of an insect eye. They use the extremely miniaturized parasitoid wasp, Megaphragma viggiani and apply improved and optimized volumetric EM methods they can describe, the size, volume and position of every single cell in the insect compound eye.

      This data has previously only been inferred from TEM cross-sections taken in different parts of the eye, but in this study and in the associated 3d datasets video and stacks, one can observe the exact position and orientation in 3D space.<br /> The authors have made a very rigorous effort to describe and assess the variation in each cell type and have also compared two different classes of dorsal rim and non-dorsal rim ommatidia and the associated visual apparatus for each, confirming previous known findings about the distribution and internal structure that assists in polarization detection in these insects.

      Strengths:

      The paper is well written and strives to compare the data with previous literature wherever possible and goes beyond cell morphology, calculating the optical properties of the different ommatidia and estimating light sensitivity and spatial resolution limits using rhabdom diameter, focal length and showing how this varies across the eye.

      Finally, the authors provide very informative and illustrative videos showing how the cones, lenses, photoreceptors, pigment cells, and even the mitochondria are arranged in 3D space, comparing the structure of the dorsal rim and non-dorsal rim ommatidia. They also describe three 'ectopic' photoreceptors in more anatomical detail providing images and videos of them.

      Comments on revisions:

      The updates improve the manuscript.

    3. Reviewer #3 (Public review):

      Summary:

      The article presents a meticulous and quantitative anatomical reconstruction of the compound eye of a tiny wasp at the level of subcellular structures, cellular and optical organization of the ommatidia and reveals the ectopic photoreceptors, which are decoupled from the eye's dioptrical apparatus.

      Strengths:

      The graphic material is of very high quality, beautifully organized and presented in a logical order. The anatomical analysis is fully supported by quantitative numerical data at all scales, from organelles to cells and ommatidia, which should be a valuable source for future studies in cellular biology and visual physiology. The 3D renders are highly informative and a real eye candy.

      Weaknesses:

      The claim that the large dorsal part of the eye is the dorsal rim area (DRA), supported by anatomical data on rhabdomere geometry and connectomics in authors' earlier work, would eventually greatly benefit from additional evidence, obtained by other methods.

      Comments on revisions:

      Thank you for considering my remarks and advice. All is fine.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      Weaknesses:

      As this paper only uses anatomical analyses, no functional interpretations of cell function are tested.

      The aim of this paper was to describe the ultrastructural organization of compound eyes in the extremely small wasp Megaphragma viggianii. The authors successfully achieved this aim and provided an incredibly detailed description of all cell types with respect to their location, volume, and dimensions. As this is the first of its kind, the results cannot easily be compared with previous work. The findings are likely to be an important reference for future work that uses similar techniques to reconstruct the eyes of other insect species. The FIB-SEM method used is being used increasingly often in structural studies of insect sensory organs and brains and this work demonstrates the utility of this method.

      We thank you for your high assessment of our work. Unfortunately, it is hard to test our functional interpretations and check them with electrophysiological methods due to the extremely small size of the animal. Studies on three-dimensional ultrastructural datasets obtained using vEM have just started to appear, and we hope that a lot of data will become available for comparison in the nearest future.

      Reviewer #2:

      Thank you for your work and for your high assessment of our manuscript.

      Reviewer #3:

      Weaknesses:

      The claim that the large dorsal part of the eye is the dorsal rim area (DRA), supported by anatomical data on rhabdomere geometry and connectomics in authors' earlier work, would eventually greatly benefit from additional evidence, obtained by immunocytochemical staining, that could also reveal a putative substrate for colour vision. The cell nuclei that are located in the optical path in the DRA crystalline cone have only a putative optical function, which may be either similar to pore canals in hymenopteran DRA cornea (scattering) or to photoreceptor nuclei in camera-type eyes (focussing), both explanations being mutually exclusive.

      We thank the Reviewer for high assessment of our study and for detailed analysis of our manuscript. Your comments and recommendations are very valued and helped us to improve the text. We understand that immunocytochemical methods could improve our findings and supply additional evidence, but there is no technical possibility for this in present. Megaphragma is a very complicated model organism for such methods. We are currently working on the optimization of the protocol for staining, which is needed because of the high level of autoluminescence and because of insufficient penetration of dyes into the samples.

      Recommendations for the authors:

      Reviewer #1:

      I do not have any major concerns about the content of the paper.

      There are some minor spelling and grammatical errors throughout the text but these can be identified most readily using a spelling/grammar check.

      We have revised the text, checked the spelling, and fixed the grammatical errors throughout the text.

      I suggest consistency when referring to the capitalization of the term 'non-DRA' as it is sometimes 'Non-DRA' in the text.

      We have fixed the term “non-DRA” throughout the text. Thank you.

      Also, check carefully the spelling of headings in the tables as there are a few mistakes in Table 1 and 5 in particular.

      The grammar errors have been fixed.

      Figure 7 legend: an explanation of the abbreviation RPC should be added.

      We have done so.

      Reviewer #2:

      (1) The paper presents the data in great detail, however, since this is the first time the technique has been applied to get whole insect eyes, even if on a small insect, it would be worth outlining in the methods section what innovations in the staining/ scanning or sample preparation allowed these improvements and a roadmap for extending this method to larger insects if possible.

      The whole method, including sample preparation, staining, and scanning, was described in our previous paper (Polilov et al., 2021), where it was presented in every detail. Due to the complicated methodology we suppose that it is not necessary to include all the stages of the technique in the present paper, and thus described it more briefly.

      (2) The optical modelling needs a statement in the discussion providing a disclaimer on parameters like sensitivity, anatomical measurements can provide limits and some measure, but the inherent optics are also key and it is worth qualifying these as only estimates and measurements that give a sense of the variation in morphology, only coupled with optical and potentially neural measurements could one confirm the true sensitivity and acceptance angle.

      In the absence of experimental data or precise computational models of Megaphragma vision, we try to discuss rather carefully the functions of structures based on their morphology, ultrastructure, first-order visual connectome, and analogies with other species. This is reflected in the methods and those sections of our paper that contain functional interpretations.

      Reviewer #3

      (1) The finding that the CNS neurons are enucleated, while the compound eye contains cell nuclei, deserves another word. I would confidentially say that the optical demands of a miniaturized compound eye (the minimal size of the optics due to diffraction, the rhabdomere size, and the minimal thickness of optically insulating granules) are such that further cellular miniaturization is not possible, and the minimal sizes even render the cells that build the eye sufficiently large to accommodate cell nuclei. This is in my opinion a parsimonious explanation, yet speculative and I leave it up to you to embrace it or not.

      We agree with the Reviewer and understand the limiting factors and the optical demands of a miniaturized compound eye. According to our data, nuclei occupy a considerable volume in the eye (in the cells of compound eye there are more nuclei than in the whole brain), and on average the cell volume is larger than in Trichogramma, which is minute, but larger than Megaphragma. But as the Reviewer rightly assumed, it is speculative; therefore, we would like to avoid it.

      (2) Our current understanding of DRA optics and function is limited and I claim that your interpretation of the cell nuclei in the DRA dioptrical apparatuses is inappropriate. Please consider a few articles on hymenopteran DRA, starting with the one below and the citing literature:

      Meyer, E.P., Labhart, T. Pore canals in the cornea of a functionally specialized area of the honey bee's compound eye. Cell Tissue Res. 216, 491-501 (1981). https://doi.org/10.1007/BF00238646

      Honebyee DRA has a milky appearance under a stereomicroscope and can be discerned from the outside. This is due to pore canals in the cornea. I happen to be studying this exact structure and its function right now. I found that the result of those canals is not so much the extended receptor acceptance angles, but rather a minimized light gain. This is counterintuitive, but think of the following. The DRA photoreceptors must encode the limited range of polarization contrasts with a maximal working dynamic range (= voltage) of the photoreceptors, which results in a very steep stimulus-response curve.

      Physiologically such a curve is due to very high transduction gain and a high cell input resistance. In most of the retina, small contrasts are transcoded by LMC neurons, but DRA receptors are long visual fibres and must do the job themselves. The skylight intensity (especially antisolar, where the polarized pattern is maximal) varies little during the day. Hence, the DRA receptors work almost at a fixed intensity range. In order to prevent receptor saturation and keep steep contrast coding, the corneal lenses in DRA have a built-in diffusor ring, which diminishes the light influx. Unfortunately, I have yet to publish this and I may be wrong, of course. But if I look into your data, I see consistently smaller corneal lenses and crystalline cones in the DRA, plus the cell nuclei obstructing the incident light. I think this is similar to the optics of honeybee DRA.

      You do not support your claim that the nuclei additionally focus light by optical calculations, but cite literature on camera-type eyes, which is not OK.

      In any case, I think it is fair to limit the discussion by saying that the nuclei may have an optical role. Further evidence from hymenopteran and vertebrate literature is controversial. “so that the nuclei act as extra collecting lenses, as was reported for rod cells of nocturnal vertebrates (Solovei et al., 2009; Błaszczak et al., 2014)” - please consider omitting this.

      We thank the Reviewer for this piece of advice. And we have rewritten the text, to omit the comparison with vertebrates, but left the citation as an illustration of the fact that nuclei could perform the optical role.

      “Since the nuclei in DRA and non-DRA ommatidia are arranged differently in cone cells, we suggest that the nuclei of the cone cells of DRA ommatidia in M. viggianii perform some optical role, facilitating the specialization of this group of ommatidia. The optical function for nuclei was described for rod cells of nocturnal vertebrates, where chromatin inside the cell nucleus has a direct effect on light propagation (Solovei et al., 2009; Błaszczak et al., 2014; Feodorova et al., 2020).”

      (3) Please consider comparing the structure and function of ectopic receptors with the eyelet in Drosophila (i.e. https://doi.org/10.1523/JNEUROSCI.22-21-09255.2002 )

      We thank the Reviewer for this advice and have included the comparison fragment into the text:

      “The position of ePR, their morphology and synaptic targets look similar to the eyelet (extraretinal photoreceptor cluster) discovered in Drosophila (Helfrich-Förster et al., 2002). Eyelets are remnants of the larval photoreceptors, Bolwig’s organs in Drosophila (Hofbauer, Buchner, 1989). Unlike Drosophila, Trichogrammatidae are egg parasitoids and their central nervous system differentiation is shifted to the late larva and even early pupa (Makarova et al., 2022). According to the available data on the embryonic development of Trichogrammatidae, no photoreceptors cells were found during the larval stages (Ivanova-Kazas, 1954, 1961).”

      According to this, the analogy question remains open.

      (4) Minor remarks:

      “but also to trace the pathways that connect the analyzer with the brain.” - I find the word analyzer a bit stretched here; sure, the DRA is polarization analyzer, but if the main retina was monochromatic, it would only be a detector, not an analyzer.

      The sentence was changed according to the Reviewer’s advice.

      Table I: thikness -> thickness, wigth -> width

      We have fixed these misprints.

      “The cross-section of Non-DRA ommatidia has a strongly spherical shape” - perhaps circular, not spherical. And not necessary to say “strongly”

      The spelling was changed according to the Reviewer’s advice.

      “which can be rarely visualized in the cell's projections not far from the basement membrane.” - I'd suggest saying “which are nearly absent in retinula axons”

      The spelling was changed according to the Reviewer’s advice.

      “The pigment granules of the retinula cells have an elongated nearly oval shape” - please consider replacing 'elongated nearly oval' with 'prolate' (try googling for “prolate” or “oblate spheroids”; the adjective describes precisely what you wanted to say)

      We thank the Reviewer for this piece of advice but prefer to leave our original phrasing, because it is more readily understandable.

      “The results of our morphological analysis of all ommatidia in Megaphragma are consistent with the light-polarization related features in Hymenoptera and other insects” - please add citations, see my comment on the DRA above.

      We have added the citations according to the Reviewer’s advice.

      “The group of short PRs (R1-R6)” - please consider renaming into “short visual fibre photoreceptors” (as opposed to “long visual fibre PRs”; hence SVFs and LVFs). This naming is quite common.

      The naming was changed according to the Reviewer’s advice.

      “The total rhabdom shortening in M. viggianii ommatidia probably favors polarization and absolute sensitivity,” - please see comments on DRA. Wide rhabdom means also a wider acceptance angle.

      Shortening of DRA rhabdoms does not result in their widening compared to other rhabdoms, so it is difficult to say how this may be related to sensitivity. The comments on DRA given earlier have been taken into account.

      “Ommatidia located across the diagonal area of the eye are more sensitive to light” - I don't understand what is diagonal area.

      We have deleted the sentence.

      “Estimated optical sensitivity of the eyes very close to those reported for diurnal hymenopterans with apposition eyes (Greiner et al., 2004; Gutiérrez et al., 2024) and possess around 0.19 {plus minus} 0.04 μm2 sr. M. viggianii have reasonably huge values of acceptance angle Δρ, and thus should result in a low spatial resolution” - please correct English here. “eyes IS very close”, “should result in a low”

      The grammatical errors were fixed.

      Table 6 legend: “SPC - secondary pigment cells.” -> “SPC – secondary pigment cells.”

      Citation “(Makarova et al., 2025).” - probably 2015

      The typos were fixed.

      Methods, FIB-SEM: I can't understand the sentence “The volumetric data of lenses and cones, some linear measurements (lens thickness, cone length, cone width, curvature radius) and to visualize the complete 3D-model of eye we use (measure or reconstruct) the elements from another eye (left).”

      The sentence is a continuation of the previous one. We have rewritten it as follows to clarify the meaning and move it to the 3D reconstruction section:

      “The right eye, on which the reconstruction was performed, has several damaged regions from milling (see Appendix 1С), which hinder the complete reconstructions of lenses and cones on a few ommatidia. According to this, for the volumetric data on lenses and cones, some linear measurements (lens thickness, cone length, cone width, curvature radius), we use (measure or reconstruct) the corresponding elements from the other (left) eye.”

      “The cells of single interfacet bristles were not reconstructed, because of damaging on right eye and worst quality of section on the left.” - please change to “The cells of the single interfacet bristle were not reconstructed, because of damage to the right eye and inferior quality of the sections of the left eye.”

      The text has been changed as follows:

      “The cells of single interfacet bristles were not reconstructed, because of the damage present in the right eye and because of the generally lower quality of this region on the left eye.”

      “Morphometry. Each ommatidia was” -> “Morphometry. Each ommatidium was”

      The grammatical error has been fixed.

    1. eLife Assessment

      This study presents useful data on sex differences in gene expression across organs of four mice taxa. While the methods and analysis are largely sound, the strength of evidence is solid only in parts and the conclusions drawn from the results are not always appropriate.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa.

      Strengths:

      The experimental methods and data analysis appear appropriate. The authors promote their study as unprecedented in its size and technical precision.

      Weaknesses:

      The manuscript does not present a clear set of novel evolutionary conclusions. The major findings recapitulate many previous comparative transcriptomics studies - gene expression variation is prevalent between individuals, sexes, and species; and genes with sex-biased expression evolve more rapidly than genes with unbiased expression - but it is not clear how the study extends our understanding of gene expression or its evolution.

      Many gene expression differences between individual animals are selectively neutral, because these differences in mRNA concentration are buffered at the level of translation, or differences in protein abundance have no effect on cellular or organismal function. The hypothesis that sex-biased genes are enriched for selectively neutral expression differences is supported by the excess of inter-individual expression variance and inter-specific expression differences in sex-biased genes. A higher rate of adaptive coding evolution is inferred among sex-biased genes as a group, but it is not clear whether this signal is driven by many sex-biased genes experiencing a little positive selection, or a few sex-biased genes experiencing a lot of positive selection, so the relationship between expression and protein-coding evolution remains unclear. It is likely that only a subset of the gene expression differences detected here will have phenotypic effects relevant for fitness or medicine, but without some idea of how many or which genes comprise this subset, it is difficult to interpret the results in this context.

      Throughout the paper the concepts of sexual selection and sexually antagonistic selection are conflated; while both modes of selection can drive the evolution of sexually dimorphic gene expression, the conditions promoting and consequence of both kinds of selection are different, and the manuscript is not clear about the significance of the results for either mode of selection.

      The manuscript's conclusion that "most of the genetic underpinnings of sex-differences show no long-term evolutionary stability" is not supported by the data, which measured gene expression phenotypes but did not investigate the underlying genetic variation causing these differences between individuals, sexes, or species. Furthermore, most of the gene expression differences are observed between sex-specific organs such as testes and ovaries, which are downstream of the sex-determination pathway that is conserved in these four mouse species, so these conclusions are limited to gene expression phenotypes in somatic organs shared by the sexes.

      The differences between sex-biased expression in mice and humans are attributed to differences in the two species effective population sizes; but the human samples have significantly more environmental variation than the mouse samples taken from age-matched animals reared in controlled conditions, which could also explain the observed pattern.

      The smoothed density plots in Figure 5 are confusing and misleading. Examining the individual SBI values in Table S9 reveals that all of the female and male SBI values for each species and organ are non-overlapping, with the exception of the heart in domesticus and mammary gland in musculus, where one male and one female individual fall within the range of the other sex. The smoothed plots therefore exaggerate the overlap between the sexes; in particular, the extreme variation shown in the SBI in the mammary glands in spretus females and spicilegus males is hard to understand given the normalized values in Table S3. The R code used to generate the smoothed plots is not included in the Github repository, so it is not possible to independently recreate those plots from the underlying data.

      The correlations provided in Table S9 are confusing - most of the reported correlations are 1.0, which are not recovered when using the SBI values in Table S9, and which does not support the manuscript's assertion that sex-biased gene expression can vary between organs within an individual. Indeed, using the SBI values in Table S9, many correlations across organs are negative, which is expected given the description of the result in the text.

    3. Reviewer #3 (Public review):

      This manuscript reports interesting data on sex differences in expression across several somatic and reproductive tissues among 4 mice species or subspecies. The focus is on sex-biased expression in the somatic tissues, where the authors report high rates of turnover such that the majority of sex-biased genes are only sex-biased in one or two taxa. The authors show sex-biased genes have higher expression variance than unbiased genes but also provide some evidence that sex-bias is likely to evolve from genes with higher expression variance. The authors find that sex-biased genes (both female- and male-biased) experience more adaptive evolution (i.e., higher alpha values) than unbiased genes. The authors develop a summary statistic (Sex-Bias Index, SBI) of each individual's degree of sex-bias for a given tissue. They show that the distribution of SBI values often overlap considerably for somatic (but not reproductive) tissues and that SBI values are not correlated across tissues, which they interpret as indicating an individual can be relatively "male-like" in one tissue and relatively "female-like" in another tissue.

      Though the data are interesting, there are some disappointing aspects to how the authors have chosen to present the work. For example, their criteria for sex-bias requires an expression ratio of one sex to the other of 1.25. A reasonably large fraction of the "sex-biased genes" have ratios just beyond this cut-off (Fig. S1). A gene which has a ratio of 1.27 in taxa 1 can be declared as "sex-biased" but which has a ratio of 1.23 in taxa 2 will not be declared as "sex-biased". It is impossible to know from how the data are presented in the main text the extent to which the supposed very high turnover represents substantial changes in dimorphic expression. A simple plot of the expression sex ratio of taxa 1 vs taxa 2 would be illuminating but the authors declined this suggestion.

      I was particularly intrigued by the authors' inference of the proportion of adaptive substitutions ("alpha") in different gene sets. The show alpha is higher for sex-biased than unbiased genes and nicely shows that the genes that are unbiased in focal taxa but sex-biased in the sister taxa also have low alpha. It would be even stronger that sex-bias is associated with adaptive evolution to estimate alpha for only those genes that are sex-biased in the focal taxa but not in the sister taxa (the current version estimates alpha on all sex-biased genes within the focal taxa, both those that are sex-biased and those that are unbiased in the sister taxa).

      The author's Sex Bias Index is measured in an individual sample as: SBI = median(TPM of female-biased genes) - median(TPM of male-biased genes). This index has some strange properties when one works through some toy examples (though any summary statistic will have limitations). The authors do little to jointly discuss the merits and limitations of this metric. It would have been interesting to examine their two key points (degree of overlapping distributions between sexes and correlation across tissues) using other individual measures of sex-bias.

      Figure 5 shows symmetric gaussian-looking distributions of SBI but it makes me wonder to what extent this is the magic of model fitting software as there are only 9 data points underlying each distribution. Whereas Figure 5 shows many broadly overlapping distributions for SBI, Figure 6 seems to suggest the sexes are quite well separated for SBI (e.g., brain in MUS, heart in DOM).

      Fig. S1 should be shown as the log(F/M) ratio so it is easier to see the symmetry, or lack thereof, of female and male-biased genes.<br /> It is important to note that for the variance analysis that IQR/median was calculated for each gene within each sex for each tissue. This is a key piece of information that should be in the methods or legend of the main figure (not buried in Supplemental Table 17).

    4. Author response:

      The following is the authors’ response to the current reviews.

      We are disappointed that the reviewers do not acknowledge that our data constitute a major step forward for the field. We will prepare a revised version that takes care of the remaining small issues concerning the technical descriptions and a detailed response to the current round of comments. We will also add a summary of the major new findings of our study.


      The following is the authors’ response to the original reviews.

      We appreciate the time of the reviewers and their detailed comments, which have helped to improve the manuscript.

      Our study presents the largest systematic dataset so far on the evolution of sex-biased gene expression in animals. It is also the first that explores the patterns of individual variation in sex-biased gene expression and the SBI is an entirely new procedure to directly visulize these variance patterns in an intuitive way.

      Also, we should like to point out that our study contradicts recent conclusions that had suggested that a substantial set of sex-biased genes has conserved functions between humans and mice and that mice can therefore be informative for gender-specific medicine studies. Our data suggest that only a very small set of genes are conserved in their sex-biased expression between mice and humans in more than one organ.

      In the revised version we have made the following major updates:

      - added a rate comparison of gene regulation turnover between sex-biased and non-sex-biased genes

      - added additional statistics to the variance comparisons and selection tests

      - added a regulatory module analysis that shows that much of the gene turnover happens within modules

      - added a mosaic pattern analysis that shows the individual complexity of sex-biased patterns

      - extended introduction and discussion

      Reviewer #1 (Public Review):<br /> The authors describe a comprehensive analysis of sex-biased expression across multiple tissues and species of mouse. Their results are broadly consistent with previous work, and their methods are robust, as the large volume of work in this area has converged toward a standardized approach.

      I have a few quibbles with the findings, and the main novelty here is the rapid evolution of sex-biased expression over shorter evolutionary intervals than previously documented, although this is not statistically supported. The other main findings, detailed below, are somewhat overstated.

      (1) In the introduction, the authors conflate gametic sex, which is indeed largely binary (with small sperm, large eggs, no intermediate gametic form, and no overlap in size) with somatic sexual dimorphism, which can be bimodal (though sometimes is even more complicated), with a large variance in either sex and generally with a great deal of overlap between males and females. A good appraisal of this distinction is at . This distinction in gene expression has been recognized for at least 20 years, with observations that sex-biased expression in the soma is far less than in the gonad.

      For example, the authors frame their work with the following statement:

      "The different organs show a large individual variation in sex-biased gene expression, making it impossible to classify individuals in simple binary terms. Hence, the seemingly strong conservation of binary sex-states does not find an equivalent underpinning when one looks at the gene-expression makeup of the sexes"

      The authors use this conflation to set up a straw man argument, perhaps in part due to recent political discussions on this topic. They seem to be implying one of two things. a) That previous studies of sex-biased expression of the soma claim a binary classification. I know of no such claim, and many have clearly shown quite the opposite, particularly studies of intra-sexual variation, which are common - see https://doi.org/10.1093/molbev/msx293, https://doi.org/10.1371/journal.pgen.1003697, https://doi.org/10.1111/mec.14408, https://doi.org/10.1111/mec.13919, https://doi.org/10.1111/j.1558-5646.2010.01106.x for just a few examples. Or b) They are the first to observe this non-binary pattern for the soma, but again, many have observed this. For example, many have noted that reproductive or gonad transcriptome data cluster first by sex, but somatic tissue clusters first by species or tissue, then by sex (https://doi.org/10.1073/pnas.1501339112, https://doi.org/10.7554/eLife.67485)

      Figure 4 illustrates the conceptual difference between bimodal and binary sexual conceptions. This figure makes it clear that males and females have different means, but in all cases the distributions are bimodal.

      I would suggest that the authors heavily revise the paper with this more nuanced understanding of the literature and sex differences in their paper, and place their findings in the context of previous work.

      We are sorry that our introduction seems to have been too short to make our points sufficiently clear. Of course, overlapping somatic variation has been shown for morphological characters, but we were aiming to assess this at the sex-biased transcriptome level. Previous studies looking at sex-biased genes were usually limited by the techniques that were available at their times, resulting in a focus on gonads in most studies and almost all have too few individuals included to study within-group variation. We detail this below for the papers that are mentioned by the referee. In view of this, we cite them now as examples for the prevalent focus on gonadal comparisons in most studies. Only Scharmann et al. 2021 on plant leaf dimorphism is indeed relevant for our study with respect to its general findings and we make now extensive reference to it. In addition, we have generally modified the introduction and substantially extended the discussion to make our points clear.

      Snell-Rood 2010: the paper focuses on sex-specific morphological structures in beetles. It samples six somatic tissues for four individuals each of each class. Analysis is done via microarray hybridizations. While categorial differences were traced, variability between individuals was not discussed. By today´s standards, microarrays have anyway too much technical variability to even consider such a discussion.

      Pointer et al. 2013: this paper studies three sexual phenotypes in a bird species, females, dominant males and subordinate males. Tissues include telencephalon, spleen and left gonad. The focus of the analysis is on the gonads, since only few sex-biased genes were found in spleen and brain (according to suppl. Table S1, 0 for the spleen and 2 for the brain). No inferences could be made on somatic variation.

      Harrison 2015: this paper focuses on gonads plus spleen in six bird species with between 2-6 individuals for each sex collected. In the spleen, only one female biased gene and no male biased gene was detected. Hence, the data do not allow to infer patterns of somatic variation.

      Dean et al. 2016: this paper compares four categories of fish caught around nests, with four to seven individuals per category. Only gonads were analyzed, hence no inferences could be made about somatic variability between individuals.

      Cardoso et al. 2017: this paper test categories of fish with alternative reproductive tactics based on brain transcriptomes. While it uses 9-10 individuals per category, it uses pools for sequencing with two pools per category. This does not allow to make any inference on individual variation.

      Todd et al 2017: this paper focuses on three categories of a fish species, females and dominant and sneaker males. It uses brain and gonads as samples with five individuals each for each category. For the brain, more different genes were found between the two types of males, rather than between females and males (3 and 9 respectively). The paper focuses on individual gene descriptions and does not mention somatic variation.

      Scharmann 2021: the paper focuses on 10 species of plants with sexually dimorphic leafs. 5-6 individuals were sampled per sex. The major finding is that sex-biased gene expression does not correlate with the degree of sexual dimorphism of the leafes. The study shows also a fast evolution of sex-biased expression and states that signatures of adaptive evolution are weak. But it does not discuss variance patterns within populations.

      (2) The authors also claim that "sexual conflict is one of the major drivers of evolutionary divergence already at the early species divergence level." However, making the connection between sex-biased genes and sexual conflict remains fraught. Although it is tempting to use sex-biased gene expression (or any form of phenotypic dimorphism) as an indicator of sexual conflict, resolved or not, as many have pointed out, one needs measures of sex-specific selection, ideally fitness, to make this case (https://doi.org/10.1086/595841, 10.1101/cshperspect.a017632). In many cases, sexual dimorphism can arise in one sex only without conflict (e.g. 10.1098/rspb.2010.2220). As such, sex-biased genes alone are not sufficient to discriminate between ongoing and resolved conflict.

      We imply sexual conflict as a driver of genomic divergence patterns in a similar way as it has been done by many authors before (e.g. Mank 2017a, Price et al. 2023, Tosto et al. 2023). While we fully appreciate the point of the referee, we do not really see where we deviate from the standard wording that is used in the context of genomic data. In such data, it is of course usually assumed that they represent solved conflicts (Figure 1D in Cox and Calsbeek) where selection differentials would not be measurable anyway. (Please note also that the phylogenetic approach used in Oliver and Monteiro 2010 becomes rather problematic in view of introgressive hybridization patterns in butterflies), We have extended the discussion to address this.

      (3) To make the case that sex-biased genes are under selection, the authors report alpha values in Figure 3B. Alpha value comparisons like this over large numbers of genes often have high variance. Are any of the values for male- female- and un-biased genes significantly different from one another? This is needed to make the claim of positive selection.

      Sorry, we had accidentally not included the statistics in the final version of the figure. We have added this now in the supplementary table but have also generally changed the statistical approach and the design of the figure.

      Reviewer #2 (Public Review):

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa. The experimental methods and data analysis are appropriate; however, most of the conclusions drawn in the manuscript have either been previously reported in the literature or are not fully supported by the data.

      We are not aware of any study that has analyzed somatic sex-biased expression in such a large and taxonomically well resolved closely related taxa of animals. Only the study by Scharman et al. 2021 on plant leaves comes close to it, but even this did not specifically analyze the intragroup variation aspects. Of course, some of our results confirm previous conclusions, but we should still like to point out that they go far beyond them.

      There are two ways the manuscript could be modified to better strengthen the conclusions.

      First, some of the observed differences in gene expression have very little to no effect on other phenotypes, and are not relevant to medicine or fitness. Selectively neutral gene expression differences have been inferred in previous studies, and consistent with that work, sex-biased and between-species expression differences in this study may also be enriched for selectively neutral expression differences. This idea is supported by the analysis of expression variance, which indicates that genes that show sex-biased expression also tend to show more inter-individual variation. This perspective is also supported by the MK analysis of molecular evolution, which suggests that positive selection is more prevalent among genes that are sex-biased in both mus and dom, and genes that switch sex-biased expression are under less selection at the level of both protein-coding sequence and gene expression.

      We have now revisited these points by additional statistical analysis of the variance patterns and an extended discussion under the heading "Neutral or adaptive?". 

      As an aside, I was confused by (line 176): "implying that the enhanced positive selection pressure is triggered by their status of being sex-biased in either taxon." - don't the MK values suggest an excess of positive selection on genes that are sex-biased in both taxa?

      There are different sets of genes that are sex-biased in these two taxa - hence this observation is actually a strong argument for selection on these genes. We have changed the correspondiung text to make this clearer.

      Without an estimate of the proportion of differentially expressed genes that might be relevant for broader physiological or organismal phenotypes, it is difficult to assess the accuracy and relevance of the manuscript's conclusions. One (crude) approach would be to analyze subsets of genes stratified by the magnitude of expression differences; while there is a weak relationship between expression differences and fitness effects, on average large gene expression differences are more likely to affect additional phenotypes than small expression differences.

      We agree that it remains a challenge to show functional effects for the sex-biased genes. The argument that they should have a function is laid out above (and stated in many reviews on the topic). To use the expression level as a proxy of function does not seem justified, given the current literature. For example, genes that are highly conected in modules are not necessrily highly expressed (e.g. transcription factors). Also, genes may be highly expressed in a rare cell type of an organ and have an important funtion there, but this would not show up across the RNA of the whole organ. The most direct functional relationship between sex-biased expression and phenotype comes from the human data in Naqvi et al. 2019 - which we had cited.

      Another perspective would be to compare the within-species variance to the between-species variance to identify genes with an excess of the latter relative to the former (similar logic to an MK test of amino acid substitutions).

      Such an analysis was actually our intial motivation for this study. However, the new (and surprising!) result is that the status of being sex-biased shows such a high turnover that not many genes are left per organ where one could even try to make such a test. However, we have extended the variance analysis with reciprocal gene sets (as we had done it for the MK test) and extended the discussion on the topic, including citation of our prior work on these questions.

      Second, the analysis could be more informative if it distinguished between genes that are expressed across multiple tissues in both sexes that may show greater expression in one sex than the other, versus genes with specialized function expressed solely in (usually) reproductive tissues of one sex (e.g. ovary-specific genes). One approach to quantify this distinction would be metrics like those used defined by [Yanai I, et al. 2005. Genome-wide midrange transcription profiles reveal expression-level relationships in human tissue specification. Bioinformatics 21:650-659.] These approaches can be used to separate out groups of genes by the extent to which they are expressed in both sexes versus genes that are primarily expressed in sex-specific tissue such as testes or ovaries. This more fine-grained analysis would also potentially inform the section describing the evolution/conservation of sex-biased expression: I expect there must be genes with conserved expression specifically in ovaries or testes (these are ancient animal structures!) but these may have been excluded by the requirement that genes be sex-biased and expressed in at least two organs.

      Given that our study focuses on somatic sex-biased genes, we refrain from a comparative analysis of genes that are only expressed in the sex-organs in this paper. With respect to sharing of sex-biased gene expresssion between the somatic tissues, we show in Figure 8 that there are only very few of them (8 female-biased and 3 male-biased). A separate statistical treatment is not possible for this small set of genes.

      There are at least three examples of statements in the discussion that at the moment misinterpret the experimental results.

      The discussion frames the results in the context of sexual selection and sexually antagonistic selection, but these concepts are not synonymous. Sexual selection can shape phenotypes that are specific to one sex, causing no antagonism; and fitness differences between males and females resulting from sexually antagonistic variation in somatic phenotypes may not be acted on by sexual selection. Furthermore, the conditions promoting and consequence of both kinds of selection can be different, so they should be treated separately for the purposes of this discussion.

      We cannot make such a distinction for gene expression patterns - and we are not aware that this was done before in the literature (except gene expression was directly linked to a morphological structure). We have updated this discussion accordingly.

      The discussion claims that "Our data show that sex-biased gene expression evolves extremely fast" but a comparison or expectation for the rate of evolution is not provided. Many other studies have used comparative transcriptomics to estimate rates of gene expression evolution between species, including mice; are the results here substantially and significantly different from those previous studies? Furthermore, the experimental design does not distinguish between those gene expression phenotypes that are fixed between species as compared to those that are polymorphic within one or more species which prevents straightforward interpretation of differences in gene expression as interspecific differences.

      Our statement was in relation to the comparison between somatic and gondadal gene turnover, as well as the comparison to humans. We have now included an additional analysis for a direct comparison with non-sex-biased genes in the same populations (Figure 2B). Note that gene expression variances cannot get fixed anyway, they can only become different in average and magnitude.

      The conclusion that "Our results show that most of the genetic underpinnings of sex differences show no long-term evolutionary stability, which is in strong contrast to the perceived evolutionary stability of two sexes" - seems beyond the scope of this study. This manuscript does not address the genetic underpinnings of sex differences (this would involve eQTL or the like), rather it looks at sex differences in gene expression phenotypes.

      This comes back to the points discussed above about the validity to infer function from sex-biased expression. We have updated the text to clarify this.

      Simply addressing the question of phenotypic evolutionary stability would be more informative if genes expressed specifically in reproductive tissues were separated from somatic sex-biased genes to determine if they show similar patterns of expression evolution.

      Our study is generally focused on somatic gene expression. The comparison with reproductive tissues serves merely as a reference. Since they are of course very different tissues, they should not be compared with each other in the same way. We have now specifically addressed this point in the discussion.

      Reviewer #3 (Public Review):

      This manuscript reports some interesting and important patterns. The results on sex-bias in different tissues and across four taxa would benefit from alternative (or additional) presentation styles. In my view, the most important results are with respect to alpha (fraction of beneficial amino acid changes) in relation to sex-bias (though the authors have made this as a somewhat minor point in this version).

      The part that the authors emphasize I don't find very interesting (i.e., the sexes have overlapping expression profiles in many nongonadal tissues), nor do I believe they have the appropriate data necessary to convincingly demonstrate this (which would require multiple measures from the same individual).

      This is the first study that reports such overlaps and we show that this is not always the case (e.g. liver and kidney data in mice). We are not aware of any preditions of how such patterns would look like and how they would evolve - why should such a new finding not be interesting? Concerning the appropriateness of the data we do not agree with the point the referee makes - see response below.

      This study reports several interesting patterns with respect to sex differences in gene expression across organs of four mice taxa. An alternative presentation of the data would yield a clearer and more convincing case that the patterns the authors claim are legitimate.

      I recommend that the authors clarify what qualifies as "sex-bias".

      This is defined by the statistical criteria that we have applied, following the general standard of papers on this topic.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) "However, already Darwin has pointed out that the phenotypes of the sexes should evolve fast". I think the authors mean that Darwin was quick to point out that sex-specific phenotypes evolve quickly".

      We have modified this text part.

      (2) Non-gonadal is more often referred to as somatic. I would encourage the authors to use this more common term for accessibility.

      We have adopted this term

      (3) Figure 5 is interesting, however, it is difficult to know whether the decreased bimodality in humans compared to mice is biological or technical due to the differences in the underlying data. For example, the mouse samples tightly controlled age and environmental conditions within each species. It is not possible to do that with human samples, and there are very good reasons to think that these factors will affect variance in both sexes.

      Yes, this is certainly true and we know this also from other comparative data between mice and humans. Still, this is human reality vs mouse artificialness. We pick this now up in the discussion.

      (4) Line 273. The large numbers of cells needed for single-cell analysis require that most studies pool multiple samples, however these pools are helpful in themselves. This approach was used by https://doi.org/10.1093/evlett/qrad013 to quantify the degree of sex-bias within cell types across multiple tissues and to compare how bulk and single-cell sex-bias measures compare. Sex-bias in some somatic cell types was very high, even when bulk sex-bias in those tissues was not. This suggests that the bulk data the authors use in this study may in fact obscure the pattern of sex-bias.

      Yes, we agree, and this is exactly how we did the analysis and interpretation, based on the cited paper.

      (5)- Line 379 "Total RNAs were" should be "Total RNA was"

      Corrected

      References cited in this review and which should be included in the manuscript :

      Sam L Sharpe, Andrew P Anderson, Idelle Cooper, Timothy Y James, Alexandra E Kralick, Hans Lindahl, Sara E Lipshutz, J F McLaughlin, Banu Subramaniam, Alicia Roth Weigel, A Kelsey Lewis, Sex and Biology: Broader Impacts Beyond the Binary, Integrative, and Comparative Biology, Volume 63, Issue 4, October 2023, Pages 960-967.

      Included

      Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism Pointer MA, Harrison PW, Wright AE, Mank JE (2013) Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism. PLOS Genetics 9(8): e1003697.

      Included

      Erica V Todd, Hui Liu, Melissa S Lamm, Jodi T Thomas, Kim Rutherford, Kelly C Thompson, John R Godwin, Neil J Gemmell, Female Mimicry by Sneaker Males Has a Transcriptomic Signature in Both the Brain and the Gonad in a Sex-Changing Fish, Molecular Biology and Evolution, Volume 35, Issue 1, January 2018, Pages 225-241.

      Included

      Cardoso SD, Gonçalves D, Goesmann A, Canário AVM, Oliveira RF. Temporal variation in brain transcriptome is associated with the expression of female mimicry as a sequential male alternative reproductive tactic in fish. Mol Ecol. 2018; 27: 789-803.

      Included

      Dean, R., Wright, A.E., Marsh-Rollo, S.E., Nugent, B.M., Alonzo, S.H. and Mank, J.E. (2017), Sperm competition shapes gene expression and sequence evolution in the ocellated wrasse. Mol Ecol, 26: 505-518.

      Included

      Emilie C. Snell‐Rood, Amy Cash, Mira V. Han, Teiya Kijimoto, Justen Andrews, Armin P. Moczek, DEVELOPMENTAL DECOUPLING OF ALTERNATIVE PHENOTYPES: INSIGHTS FROM THE TRANSCRIPTOMES OF HORN‐POLYPHENIC BEETLES, Evolution, Volume 65, Issue 1, 1 January 2011.

      Not included, since its technical approach is not really comparable

      Harrison PW, Wright AE, Zimmer F, Dean R, Montgomery SH, Pointer MA, Mank JE (2015) Sexual selection drives evolution and rapid turnover of male gene expression. Proceedings of the National Academy of Sciences, USA 112: 4393-4398.

      Included

      Mathias Scharmann, Anthony G Rebelo, John R Pannell (2021) High rates of evolution preceded shifts to sex-biased gene expression in Leucadendron, the most sexually dimorphic angiosperms eLife 10:e67485.

      Included

      Sexually Antagonistic Selection, Sexual Dimorphism, and the Resolution of Intralocus Sexual Conflict. Robert M. Cox and Ryan Calsbeek , The American Naturalist 2009 173:2, 176-187.

      Included

      Ingleby FC, Flis I, Morrow EH. Sex-biased gene expression and sexual conflict throughout development. Cold Spring Harb Perspect Biol. 2014 Nov 6;7(1):a017632.

      Included

      Oliver JC, Monteiro A 2011. On the origins of sexual dimorphism in butterflies. Proc Biol Sci 278: 1981-1988.

      Included

      Iulia Darolti, Judith E Mank, Sex-biased gene expression at single-cell resolution: cause and consequence of sexual dimorphism, Evolution Letters, Volume 7, Issue 3, June 2023, Pages 148-156.

      Included

      Reviewer #2 (Recommendations For The Authors):

      I am concerned the smoothed density plots in Figure 4 may be providing a misleading sense of the distributions since each distribution is inferred from only 9 values. A boxplot might better represent the data to the reader.

      Boxplots with 9 values are much more difficult to interpret for a reader, this is the very reason why one tends to smoothen them. In this way, they also become similar to the standard plots that are used for showing morphological variation between the sexes. Note that the original data are availble for the individual values, if these are of special interest in some cases. In addition, our new “mosaic” analysis (Figure 6) provides another presentation for readers.

      Line 235: "the overall numbers are lower" I assume this is the number of genes included in the analyses, but this should be explicitly stated.

      Clarified in the text

      The analysis of gene expression from different brain regions in control individuals from the Alzheimer's study (line 273) suffers from low power and it is not clear to me how much taking samples from different brain regions eliminates the issue of different cell types within a sample (the stated motivation for this analysis). While I support publishing negative results, this section does not feel like it adds much to the manuscript and could be cut in my opinion.

      This is actually a study on single cell types, differentiating each of them. We are sorry that the text was apparently unclear about this. Given that there are studies that show the importance of looking at single cell data, we still think that is a suitable analysis. We have updated the text to make it clearer.

      It might be useful to separate out X-linked genes from autosomal genes to see if they show consistent patterns with regard to sex-bias.

      We have added this information in suppl. Table S2 and include some description in the text.

      Reviewer #3 (Recommendations For The Authors):

      Comments follow the order of the Results section:

      (1) The latter half of this line in the Methods is too vague to be helpful: "We have explored a range of cutoffs and found that a sex-bias ratio of 1.25-fold difference of MEDIAN expression values combined with a Wilcoxon rank sum test and Benjamini-Hochberg FDR correction (using FDR <0.1 as cutoff) (Benjamini & Hochberg, 1995) yields the best compromise between sensitivity and specificity". What precisely is meant by "the best compromise between sensitivity and specificity"?

      We explain now that this was based on pre-tests with comparing randomized with actual data. However, we agree that this is in the end a subjective decision, but there is no single standard used in the literature, especially when somatic organs are included. We consider our criteria as rather stringent.

      (2) The 1.25 number for sex bias is, ultimately, an arbitrary cut-off. It is common in this literature to choose some arbitrary level and, in this sense, the authors are following common practice. The choice of 1.25 should be stated in the main text as it is a lower (but not reasonable) value than has been used in many other papers.

      It is not only the cutoff, but also the Wilcoxon test and FDR correction that defines the threshold. See also comment above.

      (3) In truth, dimorphism is continuous rather than discrete (i.e, greater or less than 1.25 fold different). Thus, where possible it would be useful to present results in a fashion that allows readers to see the continuous range of ratios rather than having to worry about whether the patterns are due to the rather arbitrary choices of how genes were binned into sex-bias categories.

      It is necessary to work with cutoffs in such cases - and this is the usual practice for any such paper. But we provide now in Figure 1 Figure supplement 1 plots with the female/male ratio distributions.

      a) Number of genes that are female- / male-biased. I would like to be able to see a version of Figure 1 showing the full distribution of TPM ratios rather than bar graphs of the numbers of (arbitrarily defined) female- and male-biased genes. This will be, of course, a larger figure (a full distribution rather than 2 bars for each species for each organ) and so could be relegated to Supplementary Material (assuming the message of that figure is the same as the current Figure 1).

      This is a very unusual request, given that no other paper has done this either. It would indeed result in a non-managable figure size, or many separate figures that would be difficult to scrutinize. Note that there would be one plot of two (female and male) TPM distributions for each sex-biased gene in each organ and each taxon, leading to hundreds of thousands of plots. We think that by providing the general distributions as plots (see above), and the original data as supplements is sufficient.

      b) Turnover of genes with sex bias. This important issue is addressed in Figure 2. First, it is not precisely clear what "percentages of sums of shared genes for any pairwise comparison" in Figure 2 legend means and no further detail is given in the Methods; this must be made clearer or the info in Figure 2 is meaningless. Regardless, this approach again relies heavily on the arbitrary criterion of defining sex-bias. Thus, I would like to see correlation plots of the log(TPM ratio) between taxa as done in the classic multispecies fly paper of Zhang et al. 2007. In Figure 2 it is quite clear that male-biased genes evolve with respect to sex bias more rapidly than female-biased genes.

      We have provided a better explanation of this analysis. Note that the Zhang et al. 2007 paper was not focussing on somatic expression and covers a much broader evolutionary spectrum. Hence, the results are not comparable. Also, we doubt that it would be so helpful to generate a huge figure with all these plots.

      (4) Is there a simpler explanation for the results in the "Variance patterns" section? The total variance for any variable can be decomposed into the variance within and among "groups". If we use "sex" as the group, then there are genes - labelled sex-biased genes - that were identified as such, in essence, because they have high among-group variance. Given that we then know a priori at the start of this section of sex-biased genes have high among-group variance, is it at all surprising that they have higher total variance than the unbiased genes (which we know a priori have low among-group variance)? Perhaps I misunderstood the point of this section. Maybe it would be more meaningful to examine the WITHIN-SEX variance (averaged across the two sexes) instead.

      We did calculate IQR/median (“normalized variance”) with the nine mice for each gene and each sex in each organ, hence sex is not a variance factor in this calculation. The algorithm steps are outlined in suppl. Table S17. We have now also added a variance calculation for reciprocal gene sets and added an extended discussion of these results.

      (5) Analysis of alpha for sex-biased genes. This was the most interesting part of this manuscript to me.

      (a) More information about what SNVs were used is required.

      i. Were only sites where SPR was fixed used? (If not, how was polarization done?)

      ii. Were sites only considered diverged if they were fixed for different bases in DOM and MUS? (If not, what was the criteria?)

      iii. Using, say, DOM as the focal species, a site must be polymorphic in DOM. But did its status (polymorphic/fixed) in MUS matter?

      We have added a more detailed description on this in the Methods section. For the direct answers of the three questions: (i) yes; (ii) yes; (iii) no, considering that DOM and MUS are two subspecies of Mus musculus separating recently, a variant might occur before separating and there might be gene flow between them.

      (b) A particularly interesting part of the analysis is the investigation of alpha for genes that are NOT sex-biased in one taxa but are sex-biased in the other. At the moment (as I understand it), alpha is only calculated for these genes in the taxa where they are NOT sex-biased (and this alpha value can be compared to the alpha of sex-biased genes and of unbiased genes in that taxa). I would like to see both sets of genes (set 1: those sex-biased in MUS and not in DOM; set 2: those sex-biased DOM and not in MUS) analyzed in each of the 2 species, with results presented in a 2x2 table.

      By definition of these categories, these genes are sex-biased in the respective other taxon, hence the values are already in the table. They are named as “reciprocal”.

      (c) No confidence intervals are given for the alpha values, despite the legend of Figure 3 referring to them.

      These were accidentally omitted - we now included the full table in suppl. Table S6; Figure 3 was modified to show violin plots of the bootstrap distributions

      The author's creation and use of a "sex-bias index" (SBI). My greatest skepticism of this manuscript is with respect to the value of their manufactured index, SBI. Of course, it is possible to create such an index but does this literature really need this index or does this just add to the "clutter" in the literature for this field? Is it helping to illuminate important patterns? This index is presumably some attempt to quantify how "male-like" or "female-like" overall expression is for a given individual (for a given organ). It is calculated as SBI = (MEDIAN of all female-biased tpm) - (MEDIAN of all male-biased tpm).

      (6) A main result that comes from this is that the sexes tend to overlap for these values for most nongonad tissues but are clearly distinct for gonadal tissues. I do not think this result would come as a surprise to almost anyone and I'm far from convinced that this metric is a good way to quantify that point. Let's consider testes vs. ovaries. Compared to non-gonadal tissues, I am reasonably certain that not only are there many more genes that are classified as "sex-biased" in gonads but also the magnitude of sex-bias among these genes is typically much greater than it is for the so-called sex-biased genes in nongonadal tissue (density plots requested in #3a would make this clear). In other words, males and females are, on average, very different with respect to expression in gonads so even allowing for variation within each sex will still result in a clear separation of all individuals of the two sexes. In contrast, males and females are, on average, much less different in, say, heart so when we consider the variation within each sex, there is overlap. One could imagine a variety of different metrics which could be used to make this point. The merits of "SBI" are unclear. It is a novel metric and its properties are poorly understood. (A simple alternative would be looking at individual scores along the axis separating mean/median males and females; almost certainly, for gonads, this would be very similar to PC scores for PC1.)

      As throughout the text, we use gonadal comparisons only as general reference, not as the main result. The main result that we are stressing is the fast turnover of these patterns, including from binary to overlapping for kidney and liver in mouse. We consider this as a new finding. If it comes "not to a surprise to anyone", isn´t it great that one does not have to guess anymore but has finally real data on this?

      We have now also added a mosaic analysis to show that the SBI can be used as summary measure in different presentations.

      The use of a single PC axis is no good alternative, since it throws away the information from the other axis.

      We have now included an explicit discussion on the usefulness of the SBI.

      (7) For simplicity, let's assume all males are identical and all females are identical. Let's imagine that heart and kidney have the exact same set of sex-biased genes. There are 20 female-biased genes; they all happen to be identical in expression level (within tissue) and look like this:

      Female TPM Male TPM TPM ratio (F:M)

      Heart 4 2 2

      Kidney 40 20 2

      And there are 20 male-biased genes that look like this:

      Female TPM Male TPM TPM ratio (F:M)

      Heart 1 3 1/3

      Kidney 10 30 1/3

      Most people would describe these two tissues as equally sex-biased.

      However, the SBIs would be:

      Female SBI Male SBI Sex difference (F - M)

      Heart 4-1 = 3 2 - 3 = -1 4

      Kidney 40-10 =30 20-30 = -10 40

      Is it a desirable property that by this metric these two tissues have wildly different SBI values for each sex as well as for the difference between sexes? (At the very least, shouldn't you make readers aware of these strange properties of SBI so they can decide how much value they put into them?)

      Actually, in this example the simple ratio between the expression levels has a strange property, since it does not reflect a much higher expression of the relevant genes in the kidney. The SBI is actually more suitable for making such cases clear. Of course, this is under the assumption that expression level has a meaning for the phenotype, but this is the general assumption for all RNA-Seq experiment comparisons.

      (8) With respect to Figure 4, why do females often have mean SBI values close to zero or even negative (e.g., kidney, mammary glands)? Is this simply because the female-biased genes tend to have lower TPM than the male-biased genes? It seems that the value zero for this metric is really not very biologically meaningful because this metric is a difference of two things that are not necessarily expected to be equal.

      This is the extra information about the expression levels that is gained via the SBI values (see comment above). However, we noticed that people can get confused about this. We have now added a re-scaling step to focus completely on the variance information in these plots.

      (9) Interpreting variances. A substantial fraction of the latter half of the manuscript focuses on interpreting variances among individual samples. This is problematic because there is no replication within individuals (i.e.., "repeatability"), thus it is impossible to infer the extent of observed variance among individuals of a given group (e.g., among females) is due to true biological differences among individuals or is simply due to noise (i.e., "measurement error" in the broad sense). Is the larger variance for mammary glands than liver or gonads just due to measurement error? What is the evidence?

      This point was of course a major issue during the times where microarrays were used for transcriptome studies. However, the first systematic RNA-Seq studies showed already that the technical replicability is so high, that technical replicates are not required. In fact, practically all RNA-Seq studies are done without technical replicates for this reason.

      (10) Because I have little confidence in the SBI metric (#7-8) and in interpreting within sex variances (#9), I found little value in the human results and how SBI distributions (and degree of overlap between sexes) compare between humans and mice.

      We disagree - the current published status is that there are thousands of sex-biased gene in humans and this has implications for gender-specific medicine (Oliva et al. 2020). Our results show a much more nuanced picture in this respect.

      (11) I found even less value in the single-cell data. It too suffers from the issues above. Further, as the authors more or less state, the data are too limited to say much of value here. It is impossible to tell to what extent the results are simply due to data limitations.

      We have pointed out that it is still valuable to have them. They are good enough to exclude the possibility that only a small set of cells drives the overall pattern across an organ. We have further clarified this in the text.

      (12) The code for data analysis should be posted on GitHub or some other repository.

      The code for the sex-biased gene detection and analysis has been posted on GitHub (see Code availability in the manuscript).

    1. eLife Assessment

      This valuable study unravels the mechanisms underlying mammalian sperm-oocyte recognition and penetration, shedding light on cross-species interactions. It provides solid evidence that exposure of sperm to oviductal fluid or OVGP1 proteins from bovine, murine, or human sources imparts species-specific zona pellucida (ZP) recognition, ensuring that only sperm from the corresponding species can penetrate the ZP, regardless of its origin. These findings hold significant potential for reproductive biology, offering insights to enhance porcine in vitro fertilization (IVF), which frequently suffers from polyspermy, as well as advancing human IVF through improved intrinsic sperm selection.

    2. Reviewer #1 (Public review):

      Summary:

      This interesting manuscript first shows that human, murine, and feline sperm penetrate the zona pellucida (ZP) of bovine oocytes recovered directly from the ovary, although first cleavage rates are reduced. Similarly, bovine sperm can penetrate superovulated murine oocytes recovered directly from the ovary. However, bovine oocytes incubated with oviduct fluid (30 min) are generally impenetrable by human sperm.

      Thereafter, the cytoplasm was aspirated from murine oocytes - obtained from the ovary or oviduct. Binding and penetration by bovine and human sperm was reduced in both groups relative to homologous (murine) sperm. However, heterologous (bovine and human) sperm penetration was further reduced in oviduct vs. ovary derived empty ZP. These data show that outer (ZP) not inner (cytoplasmic) oocyte alterations reduce heterologous sperm penetration as well as homologous sperm binding.

      This was repeated using empty bovine ZP incubated, or not, with bovine oviduct fluid. Prior oviduct fluid exposure reduced non-homologous (human and murine) empty ZP penetration, polyspermy, and sperm binding. This demonstrates that species-specific oviduct fluid factors regulate ZP penetrability.

      To test the hypothesis that OVGP1 is responsible, the authors obtained histidiine-tagged bovine and murine OVGP1 and DDK-tagged human OVGP1 proteins. Tagging was to enable purification following over-expression in BHK-21 or HEK293T cells. The authors confirm these recombinant OVGP1 proteins bound to both murine and bovine oocytes. Moreover, previous data using oviduct fluid was mirrored using bovine oocytes supplemented with homologous (bovine) recombinant OVGP1, or not. This confirms the hypothesis, at least in cattle.

      Next, the authors exposed bovine and murine empty ZP to bovine, murine, and human recombinant OVGP1, in addition to bovine, murine, or human sperm. Interestingly, both species-specific ZP and OVGP1 seem to be required for optimal sperm binding and penetration.

      Lastly, empty bovine and murine ZP were treated with neuraminidase, or not, with or without pre-treatment with homologous OVGP1. In each case, neuraminidase reduced sperm binding and penetration. This further demonstrates that both ZP and OVGP1 are required for optimal sperm binding and penetration.

      In summary, the authors demonstrate that two mechanisms seem to underpin mammalian sperm recognition and penetration, the first being specific (ZP-mediated) and the second non-specific (OVGP1 mediated).

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Oviductin sets the species-specificity of the mammalian zona pellucida", de la Fuente et al analyze the species specificity of sperm-egg recognition by looking at sperm binding and penetration of zonae pellucidae from different mammalian species and find a role for the oviductal protein OVGP1 in determining species specificity.

      Strengths:

      By combining sperm, oocytes, zona pellucida (ZP), and oviductal fluid from different mammalian species, they elucidate the essential role of OVGP1 in conferring species-specific fertilization.

      Weaknesses:

      Mice with OVGP1 deletion are viable and fertile. It would be quite interesting to investigate the species-specificity of sperm-ZP binding in this model. That would indicate whether OVGP1 is the only glycoprotein involved in determining species-specificity. Alternatively, the authors could immunodeplete OVGP1 from oviductal fluid and then ascertain whether this depleted fluid retains the ability to impede cross-species fertilization.

      Comments on revisions:

      This resubmission addresses most of my comments and concerns.