10,000 Matching Annotations
  1. Aug 2025
    1. eLife Assessment

      This useful study advances our understanding of how organisms respond to chronic oxidative stress. Using the nematode C. elegans, the authors identified key neuronal signaling molecules and their receptors that are required for stress signaling and survival. The evidence supporting the conclusions is solid, with rigorous genetics, stress response analysis, and transcriptional profiling. This research will be of broad interest to neuroscientists and researchers working in the field of oxidative stress regulation.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers aimed to identify which neurotransmitter pathways are required for animals to withstand chronic oxidative stress. This work thus has important implications for disease processes that are caused/linked to oxidative stress. This work identified specific neurotransmitters and receptors that coordinate stress resilience, both prior to and during stress exposure. Further, the authors identified specific transcriptional programs coordinated by neurotransmission that may provide stress resistance.

      Strengths:

      The manuscript is very clearly written with a well-formulated rationale. Standard C. elegans genetic analysis and rescue experiments were performed to identify key regulators of the chronic oxidative stress response. These findings were enhanced by transcriptional profiling that identified differentially expressed genes that likely affect survival when animals are exposed to stress.

      Weaknesses:

      Where the gar-3 promoter drives expression was not discussed in the context of the rescue experiments in Figure 7.

    3. Reviewer #2 (Public review):

      In this paper, Biswas et al. describe the role of acetylcholine (ACh) signaling in protection against chronic oxidative stress in C. elegans. They showed that disruption of ACh signaling in either unc-17 mutants or gar-3 mutants led to sensitivity to toxicity caused by chronic paraquat (PQ) treatment. Using RNA seq, they found that approximately 70% of the genes induced by chronic PQ exposure in wild type failed to upregulate in these mutants. The overexpression of gar-3 selectively in cholinergic neurons was sufficient to promote protection against chronic PQ exposure in an ACh-dependent manner. The study points to a previously undescribed role for ACh signaling in providing organism-wide protection from chronic oxidative stress, likely through the transcriptional regulation of numerous oxidative stress-response genes. The paper is well-written, and the data are robust, though some conclusions seem preliminary and do not fully support the current data. While the study identifies the muscarinic ACh receptor gar-3 as an important regulator of the response to PQ, the specific neurons in which gar-3 functions were not unambiguously identified, and the sources of ACh that regulate GAR-3 signaling and the identities of the tissues targeted by gar-3 were not addressed, limiting the scope of the study.

      Major Comments:

      (1) The site of action of cholinergic signaling for protection from PQ was not adequately explored. The authors' conclusion that cholinergic motor neurons are protective is based on studies using overexpression of gar-3 and an unc-17 allele that may selectively disrupt ACh in cholinergic motor neurons (Figure 9F), but these approaches are indirect. To more directly address the site of action, the authors should conduct rescue experiments using well-defined heterologous promoters. Figure 7G shows that gar-3 expressed under a 7.5 kb promoter fragment fully rescues the defect of gar-3 mutants, but the authors did not report where this promoter fragment is expressed, nor did they conduct rescue experiments of the specific tissues where gar-3 is known to be expressed (cholinergic neurons, GABAergic neurons, pharynx, or muscles). UNC-17 rescue experiments could also be useful to address the site of action. Does expression of unc-17 selectively in cholinergic motor neurons rescue the stress sensitivity of unc-17 mutants (or restore resistance to gar-3(OE); unc-17 mutants)? These experiments may also address whether ACh acts in an autocrine or paracrine manner to activate gar-3, which would be an important mechanistic insight to this study that is currently lacking.

      (2) The genetic pan-neuronal silencing experiments presented in Figure 1 motivated the subsequent experiments, but the authors did not relate these observations to ACh/gar-3 signaling. For example, the authors did not address whether silencing just the cholinergic motor neurons at the different times tested has the same effects on survival as pan-neuronal silencing.

      (3) It is assumed that protection occurs through inter-tissue signaling of ACh to target tissues, where it impacts gene expression. While this is a reasonable assumption, it has not been directly shown here. It is recommended that the authors examine GFP reporter expression of a sampling of the genes identified in this study (including proteasomal genes that the authors highlight) that are regulated by unc-17 and gar-3. This would serve to independently confirm the RNAseq data and to identify target tissues that are subject to gene expression regulation by ACh, which would significantly strengthen the study.

    4. Author response:

      Reviewer #1 (Recommendations for the authors):

      “The gar-3 promoter expression pattern was not discussed in the context of rescue experiments.”

      We agree that the expression pattern of the gar-3 promoter used in our rescue experiments should be clarified. We will include a description of the tissues where the 7.5 kb gar-3 promoter fragment is expressed, based on both prior studies and our own expression data. We will also discuss how the gar-3 cell and tissue expression pattern relates to both our analysis of gar-3 expression in the genome edited strain we generated as well as the observed rescue effects.

      Reviewer #2 (Recommendations for the authors):

      (1) The site of action of cholinergic signaling was not adequately explored.

      We plan to perform additional rescue experiments using heterologous promoters to drive gar-3 expression in specific tissues (e.g. cholinergic neurons, muscle). These experiments will help clarify the sufficiency of unc-17 expression in specific cell types for rescue. However, we point out that cell-specific unc-17 knockdown by RNAi using the unc-17b promoter (expression largely restricted to ventral cord ACh motor neurons) increases sensitivity to PQ in our long-term survival assays. Combined with our analysis of unc-17(e113) mutants, we believe our data offer robust support of a requirement for unc-17 expression in cholinergic motor neurons.

      (2) Pan-neuronal silencing experiments were not connected to ACh/GAR-3 signaling.

      We will expand our discussion to relate the pan-neuronal silencing results to our analysis of ACh signaling. We used the pan-neuronal silencing to motivate further analysis of various neurotransmitter systems. We note that our studies implicate both glutamatergic and cholinergic systems in protective responses to oxidative stress. The effects of silencing on survival during long-term PQ exposure may therefore be derived solely from cholinergic neurons, glutamatergic neurons, or a combination of both neuronal populations. We hope the reviewer will agree that distinguishing between these possibilities may be quite complicated and is not central to the main message of our paper. We therefore suggest this additional analysis lies outside the scope of this revision.

      (3) Inter-tissue signaling and transcriptional regulation by ACh were assumed but not directly shown.

      We will generate GFP reporters for a subset of genes (including proteasomal genes) identified in our RNA-seq analysis or assess their expression by quantitative RT-PCR to validate cholinergic regulation. These experiments will help to identify target tissues and confirm transcriptional regulation by cholinergic signaling.

      We appreciate the opportunity to revise our manuscript and believe that these additions will significantly strengthen the mechanistic insights and overall impact of our study. Please let us know if further clarification is needed.

    1. eLife Assessment

      This important work by Lesser et al provides a first and comprehensive description of Drosophila wing proprioceptors at an EM resolution. By linking peripheral neurons with information on their morphology and connectivity in the central nervous system, the authors provide new hypotheses and tools to study proprioceptive motor control of the wing in the fruit fly. The evidence and techniques supporting this work are solid, and this resource will contribute to connectome-based modeling of fly behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This "tour-de-force" provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications for understanding wing control across other insects.

      Strengths:

      (1) The authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) The authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      3) In addition to providing a full description of wing proprioceptors, the authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons, implicating the role of the tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that the authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) The authors do their main analysis on data from the FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform a similar analysis to the one they have done in Figure 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      (2) The authors speculate about the presence of gap junctions based on the density of mitochondria. I'm not convinced about this, given that mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      (3) I'm intrigued by how the tegula CO is negative for iav. I wonder if authors tried other CO labeling genes like nompc. And what does this mean for the nature of this CO. Some more discussion on this anomaly would be helpful.

      (4) The authors conclude there are no proprioceptive neurons in sclerite pterale C based on Chat-Gal4 expression analysis. It would be much more rigorous if authors also tried a pan-neuronal driver like nsyb/elav or other neurotransmitter drivers (Vglut, GAD, etc) to really rule this out. (I hope I didn't miss this somewhere.)

      Overall, I consider this an exceptional analysis that will be extremely valuable to the community.

    3. Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      (1) With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. The comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Here, the authors have not compared the left and right side sensory axons from the wing nerve, leaving potential for developmental variability across samples and left/right hemisegments.

      (2) Not all links between the EM reconstructions and driver lines are convincing. To strengthen these, for all EM-LM matches in Figures 3-7, rotated views of the driver line (matching the rotated EM views) should be shown to provide a clearer comparison of the data. In particular, Figure 3G and Figure 7B are not very convincing based on the images shown. MCFO imaging of the driver lines in Figure 3G and 7B would make this position stronger if a clone that matches the EM reconstruction could be identified.

      (3) Figure 7B looks like the driver line might have stochastic expression in the sensory neuron, which further reduces confidence in the result shown in Figure 7C. Is this expression pattern in the wing consistently seen? Many split-GAL4s have stochastic expressions. The evidence would be strengthened if the authors presented multiple examples (~4-5) of each driver line's expression pattern in the supplement.

      (4) Certain claims in this work lack quantitative evidence. On line 128, for instance, "Overall, our comprehensive reconstruction revealed many morphological subgroups with overlapping postsynaptic partners, suggesting a high degree of integration within wing sensorimotor circuits." If a claim of subgroups having shared postsynaptic partners is being made, there should have been quantitative evidence. For example, cosine similar amongst members of each group compared to the cosine similarity of shuffled/randomised sets of axons from different groups. The heat map of cosine similarity in Figure 2B alone is not sufficient.

      (5) Similarly, claims about putative electrical connections to b1 motor neurons are very speculative. The authors state that "their terminals contain very densely packed mitochondria compared to other cells", without providing a quantitative comparison to other sensory axons. There is also no quantitative comparison to the one example of another putative electrical connection from the literature. Further, it should be noted that this connection from Trimarchi and Murphey, 1997, is also stated as putative on line 167, which further weakens this evidence. Quantification would strongly strengthen this position. Identification of an example of high mitochondrial density at a confirmed electrical connection would be even better. In the related discussion section "A potential metabolic specialization for flight circuitry", it should be more clearly noted that the dense mitochondria could be unrelated to a putative electrical connection. If the authors have an alternative hypothesis about the mitochondria density, this should be stated as well.

      (6) It would be appropriate to cite previous work using a similar strategy to match sensory axons to their cell bodies/dendrites at the periphery using driver lines and connectomics (see Figure 5 for example in the following paper: https://doi.org/10.7554/eLife.40247 ).

      The methods section is very sparse. For the sake of replicability, all sections should be expanded upon.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end-organ origin in the fly's wing of all sensory neurons in the anterior dorsomedial nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome, and identify their origin with a review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near-complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy, neuron morphology, connectomics, and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state-of-the-art methods allow to create a near-complete map of the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior, as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome, the authors create a lot of hypotheses on neuronal function, partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly's wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections. Further, together with their companion paper, Dhawan et al. 2025, describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

      Weaknesses:

      The connectomic data are only available upon request; the inclusion of a connectivity table of the reconstructed neurons would aid analysis reproducibility and cross-dataset comparisons.

    1. eLife Assessment

      This fundamental study identifies specific neural mechanisms through which HIF-1 signaling in ADF serotonergic neurons extends lifespan in C. elegans, revealing that downstream signaling in multiple types of neurons, as well as other neuromodulators like GABA, tyramine, and NLP-17, is required for this effect. The strength of the evidence is largely convincing, as the authors establish the necessity and causality of key neuronal components using multiple genetic tools and functional dissection in a well-validated model organism.

    2. Reviewer #1 (Public review):

      Summary:

      In this study by Kitto et al., the authors set out to identify specific signaling components regulating the hypoxic response from the neurons to the periphery and which components are required for lifespan extension. Their previous work had shown that expression of a stabilized HIF-1 mutant in the nervous system extends lifespan through the serotonin receptor SER-7 and leads to the induction of fmo-2 in the intestine. In the current study, they mapped the precise neural circuits required for this response, as well as the signaling mediators. Their work reveals that neurotransmitters GABA and tyramine, and the neuropeptide NLP-17, act downstream of neuronal HIF-1 to convey a "hypoxic signal" to peripheral tissues. Through cell-type-specific expression studies, targeted knockouts, and comprehensive lifespan analysis, the authors provide robust evidence to support their conclusions. The insights gained from the study are both moving the field forward as they advance our understanding of neuro-peripheral hypoxic signaling, but they also lay the groundwork for potential therapeutic strategies aimed at the modulation of such signaling pathways.

      Strengths:

      (1) This study provides new evidence further delineating signaling components required for hypoxic signaling-mediated longevity, from the nervous system to the periphery. Using a rigorous approach where they express stabilized HIF-1 mutant selectively in ADF, NSM, and HSN serotonergic neurons, followed by cell-type-specific tph-1 knockouts to pinpoint ADF-dependent serotonin signaling as essential for both lifespan extension and intestinal fmo-2 induction.

      This was followed by generating 11 transgenic lines that drive SER-7 expression under distinct neuron-specific promoters, to systematically tease out in which of 27 candidate neurons SER-7 functions to mediate hypoxia-induced longevity. This ultimately highlighted the RIS interneuron as the required signaling hub.

      (2) As the intestine lacks direct neuronal innervation, the authors employ neuron-specific RNAi (TU3311 strain) and dense core vesicle analyses to identify that the neuropeptide NLP-17 is required to transmit the hypoxic signal from RIS to induce fmo-2 in the intestine.

      (3) Overall, the paper is very well written. The experiments were carried out carefully and thoroughly, and the conclusions drawn are also well supported by the results they are showing.

      Weaknesses:

      Overall, I don't see many weaknesses. One point relates to their read-outs, which rely heavily on lifespan measurements and fmo-2 induction without evaluating other physiological processes that serotonin or NLP-17 might affect. For translational relevance, it would be valuable to assess or mention potential adverse effects, such as changes in reproduction, pharyngeal pumping, or proteostasis capacity (proteostasis capacity specifically in the tissue showing fmo-2 upregulation).

      While lifespan assays and fmo-2 expression do provide strong evidence, incorporating additional markers of stress resistance could strengthen the link between hypoxic signaling and organismal health as well.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to identify the specific neurons, neurotransmitters, and neuropeptides that mediate the longevity effects of the hypoxic response in C. elegans. By genetically dissecting the pathway downstream of HIF-1, they define a neural circuit involving ADF serotonergic neurons, the SER-7 receptor in the RIS interneuron, tyraminergic signaling from RIM, and neuropeptide NLP-17, ultimately linking neuronal hypoxic sensing to pro-longevity signaling in the intestine.

      Strengths:

      The study employs a diverse genetic toolkit, including neuron-specific transgenes, tissue-specific knockouts and rescues, RNAi knockdowns, allowing the authors to pinpoint causality, sufficiency, and necessity with high resolution. The comprehensive mapping of cell-nonautonomous signaling adds depth to our understanding of how HIF and serotonin signaling interface with aging pathways. The conclusions are supported by consistent survival assays and fmo-2 gene expression analyses.

      Weaknesses:

      A key limitation is the lack of clear evidence showing epistasis of so many identified molecular/neuronal components downstream of HIF-1 and serotonin. Thus, the mechanisms of how a diverse set of molecules/neurons coordinate and mediate neuronal HIF-1 effects on intestinal fmo-2 and longevity remain murky. Some rescue strategies may inadvertently cause non-physiological expression. Additionally, environmental hypoxia was not tested in parallel, so the claim on "hypoxia respone" throughout the manuscript is not justified by genetic manipulation alone, and the translational relevance of the genetic manipulations remains somewhat uncertain.

    4. Reviewer #3 (Public review):

      Summary:

      This study found that ADF serotonergic neurons have a significant role in extending lifespan mediated by HIF-1, as well as serotonin receptor SER-7 in the GABAergic RIS interneurons. The author focuses on the sufficiency and necessity of components from the central nervous system and how they contribute to aging upon hypoxia.

      Previous work from the lab has identified that the stabilization of HIF-1 in neurons is sufficient to extend lifespan through the serotonin receptor, SER-7, which subsequently activates fmo-2 in the intestine and leads to lifespan extension. Building on this, the author sought to determine which serotonergic neurons are involved and found that serotonin signaling in ADF neurons is required for lifespan extension mediated by HIF-1.

      The author next tested which subset of neurons requires Ser-7 expression to rescue hypoxic response. They found that ser-7 expression in multiple neurons is sufficient to induce fmo-2, with the top candidate being the RIS neuron. Ablation of the RIS neuron did not extend lifespan, suggesting that ser-7 expression in the RIS neuron is required for lifespan extension, positioning it as a key component in the longevity signaling pathway.

      The author also investigated neurotransmitters and found that GABA and tyramine are important components in this circuit. They showed that the tyramine receptor called tyra-3 is required for vhl-1-mediated longevity. Given that tyra-3 is expressed in oxygen- and carbon dioxide-sensing neurons, the author demonstrated that these sensing neurons work downstream of serotonin signaling. Lastly, the author screened neuropeptide/receptor binding pairs and identified NLP-17 as playing a role in hypoxia-mediated longevity.

      Originality and Significance:

      This research is significant in that it uncovers components that are sufficient and necessary for lifespan extension via the hypoxic response. It provides comprehensive data supporting longevity induced by HIF-1-mediated hypoxic response, in conjunction with fmo-2, a longevity gene, as demonstrated in previous work from the lab. Moreover, it provides a number of new transgenic worm tools for C. elegans and aging communities.

      Data and Methodology:

      (1) The experiments were thoroughly conducted, especially the generations of strains using different neuron-type promoters and crossing into mutant strains to demonstrate sufficiency and necessity.

      (2) Some figure legends from the text do not match what the data show. (Figure 6E, F, G).

      (3) The lifespan graph legends are confusing and could use some revamping for better clarification.

      Conclusions:

      This study provides insights into how hypoxic response regulates aging in a cell non-autonomous manner, outlining a potential circuit involving neurons, neurotransmitters, and neuropeptides.

    1. eLife Assessment

      This study presents a valuable application of a video-text alignment deep neural network model to improve neural encoding of naturalistic stimuli in fMRI. The authors found that models based on multimodal and dynamic embedding features of audiovisual movies predicted brain responses better than models based on unimodal or static features. The evidence supporting the claims is generally solid, with clear benchmarking against baseline models. The work will be of interest to researchers in cognitive neuroscience and AI-based brain modeling.

    2. Reviewer #1 (Public review):

      Summary:

      This study compares four models - VALOR (dynamic visual-text alignment), CLIP (static visual-text alignment), AlexNet (vision-only), and WordNet (text-only) - in their ability to predict human brain responses using voxel-wise encoding modeling. The results show that VALOR not only achieves the highest accuracy in predicting neural responses but also generalizes more effectively to novel datasets. In addition, VALOR captures meaningful semantic dimensions across the cortical surface and demonstrates impressive predictive power for brain responses elicited by future events.

      Strengths:

      The study leverages a multimodal machine learning model to investigate how the human brain aligns visual and textual information. Overall, the manuscript is logically organized, clearly written, and easy to follow. The results well support the main conclusions of the paper.

      Weaknesses:

      (1) My primary concern is that the performance difference between VALOR and CLIP is not sufficiently explained. Both models are trained using contrastive learning on visual and textual inputs, yet CLIP performs significantly worse. The authors suggest that this may be due to VALOR being trained on dynamic movie data while CLIP is trained on static images. However, this explanation remains speculative. More in-depth discussion is needed on the architectural and inductive biases of the two models, and how these may contribute to their differences in modeling brain responses.

      (2) The methods section lacks clarity regarding which layers of VALOR and CLIP were used to extract features for voxel-wise encoding modeling. A more detailed methodological description is necessary to ensure reproducibility and interpretability. Furthermore, discussion of the inductive biases inherent in these models-and their implications for brain alignment - is crucial.

      (3) A broader question remains insufficiently addressed: what is the purpose of visual-text alignment in the human brain? One hypothesis is that it supports the formation of abstract semantic representations that rely on no specific input modality. While VALOR performs well in voxel-wise encoding, it is unclear whether this necessarily indicates the emergence of such abstract semantics. The authors are encouraged to discuss how the computational architecture of VALOR may reflect this alignment mechanism and what implications it has for understanding brain function.

      (4) The current methods section does not provide enough details about the network architectures, parameter settings, or whether pretrained models were used. If so, please provide links to the pretrained models to facilitate reproducible science.

    3. Reviewer #2 (Public review):

      Summary:

      Fu and colleagues have shown that VALOR, a model of multimodal and dynamic stimulus features, better predicts brain responses compared to unimodal or static models such as AlexNet, WordNet, or CLIP. The authors demonstrated the robustness of their findings by generalizing encoding results to an external dataset. They demonstrated the models' practical benefit by showing that semantic mappings were comparable to another model that required labor-intensive manual annotation. Finally, the authors showed that the model reveals predictive coding mechanisms of the brain, which held a meaningful relationship with individuals' fluid intelligence measures.

      Strengths:

      Recent advances in neural network models that extract visual, linguistic, and semantic features from real-world stimuli have enabled neuroscientists to build encoding models that predict brain responses from these features. Higher prediction accuracy indicates greater explained variance in neural activity, and therefore a better model of brain function. Commonly used models include AlexNet for visual features, WordNet for audio-semantic features, and CLIP for visuo-semantic features; these served as comparison models in the study. Building on this line of work, the authors developed an encoding model using VALOR, which captures the multimodal and dynamic nature of real-world stimuli. VALOR outperformed the comparison models in predicting brain responses. It also recapitulated known semantic mappings and revealed evidence of predictive processing in the brain. These findings support VALOR as a strong candidate model of brain function.

      Weaknesses:

      The authors argue that this modeling contributes to a better understanding of how the brain works. However, upon reading, I am less convinced about how VALOR's superior performance over other models tells us more about the brain. VALOR is a better model of the audiovisual stimulus because it processes multimodal and dynamic stimuli compared to other unimodal or static models. If the model better captures real-world stimuli, then I almost feel that it has to better capture brain responses, assuming that the brain is a system that is optimized to process multimodal and dynamic inputs from the real world. The authors could strengthen the manuscript if the significance of their encoding model findings were better explained.

      In Study 3, the authors show high alignment between WordNet and VALOR feature PCs. Upon reading the method together with Figure 3, I suspect that the alignment almost has to be high, given that the authors projected VALOR features to the Huth et al.'s PC space. Could the authors conduct non-parametric permutation tests, such as shuffling the VALOR features prior to mapping onto Huth et al.'s PC space, and then calculating the Jaccard scores? I imagine that the null distribution would be positively shifted. Still, I would be convinced if the alignment is higher than this shifted null distribution for each PC. If my understanding of this is incorrect, I suggest editing the relevant Method section (line 508) because this analysis was not easy to understand.

      In Study 4, the authors show that individuals whose superior parietal gyrus (SPG) exhibited high prediction distance had high fluid cognitive scores (Figure 4C). I had a hard time believing that this was a hypothesis-driven analysis. The authors motivate the analysis that "SPG and PCu have been strongly linked to fluid intelligence (line 304)". Did the authors conduct two analyses only-SPG-fluid intelligence and PCu-fluid intelligence-without relating other brain regions to other individual differences measures? Even if so, the authors should have reported the same r-value and p-value for PCu-fluid intelligence. If SPG-fluid intelligence indeed holds specificity in terms of statistical significance compared to all possible scenarios that were tested, is this rationally an expected result, and could the authors explain the specificity? Also, the authors should explain why they considered fluid intelligence to be the proxy of one's ability to anticipate upcoming scenes during movie watching. I would have understood the rationale better if the authors had at least aggregated predictive scores for all brain regions that held significance into one summary statistic and found a significant correlation with the fluid intelligence measure.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, the authors aim to improve neural encoding models for naturalistic video stimuli by integrating temporally aligned multimodal features derived from a deep learning model (VALOR) to predict fMRI responses during movie viewing.

      Strengths:

      The major strength of the study lies in its systematic comparison across unimodal and multimodal models using large-scale, high-resolution fMRI datasets. The VALOR model demonstrates improved predictive accuracy and cross-dataset generalization. The model also reveals inherent semantic dimensions of cortical organization and can be used to evaluate the integration timescale of predictive coding.

      This study demonstrates the utility of modern multimodal pretrained models for improving brain encoding in naturalistic contexts. While not conceptually novel, the application is technically sound, and the data and modeling pipeline may serve as a valuable benchmark for future studies.

      Weaknesses:

      The overall framework of using data-driven features derived from pretrained AI models to predict neural response has been well studied and accepted by the field of neuroAI for over a decade. The demonstrated improvements in prediction accuracy, generalization, and semantic mapping are largely attributable to the richer temporal and multimodal representations provided by the VALOR model, not a novel neural modeling framework per se. As such, the work may be viewed as an incremental application of recent advances in multimodal AI to a well-established neural encoding pipeline, rather than a conceptual advance in modeling neural mechanisms.

      Several key claims are overstated or lack sufficient justification:

      (1) Lines 95-96: The authors claim that "cortical areas share a common space," citing references [22-24]. However, these references primarily support the notion that different modalities or representations can be aligned in a common embedding space from a modeling perspective, rather than providing direct evidence that cortical areas themselves are aligned in a shared neural representational space.

      (2) The authors discuss semantic annotation as if it is still a critical component of encoding models. However, recent advances in AI-based encoding methods rely on features derived from large-scale pretrained models (e.g., CLIP, GPT), which automatically capture semantic structure without requiring explicit annotation. While the manuscript does not systematically address this transition, it is important to clarify that the use of such pretrained models is now standard in the field and should not be positioned as an innovation of the present work. Additionally, the citation of Huth et al. (2012, Neuron) to justify the use of WordNet-based annotation omits the important methodological shift in Huth et al. (2016, Nature), which moved away from manual semantic labeling altogether.

      Since the 2012 dataset is used primarily to enable comparison in study 3, the emphasis should not be placed on reiterating the disadvantages of semantic annotation, which have already been addressed in prior work. Instead, the manuscript's strength lies in its direct comparison between data-driven feature representations and semantic annotation based on WordNet categories. The authors should place greater emphasis on analyzing and discussing the differences revealed by these two approaches, rather than focusing mainly on the general advantage of automated semantic mapping.

      (3) The authors use subject-specific encoding models trained on the HCP dataset to predict group-level mean responses in an independent in-house dataset. While this analysis is framed as testing model generalization, it is important to clarify that it is not assessing traditional out-of-distribution (OOD) generalization, where the same subject is tested on novel stimuli, but rather evaluating which encoding model's feature space contains more stimulus-specific and cross-subject-consistent information that can transfer across datasets.

      Within this setup, the finding that VALOR outperforms CLIP, AlexNet, and WordNet is somewhat expected. VALOR encodes rich spatiotemporal information from videos, making it more aligned with movie-based neural responses. CLIP and AlexNet are static image-based models and thus lack temporal context, while WordNet only provides coarse categorical labels with no stimulus-specific detail. Therefore, the results primarily reflect the advantage of temporally-aware features in capturing shared neural dynamics, rather than revealing surprising model generalization. A direct comparison to pure video-based models, such as Video Swin Transformers or other more recent video models, would help strengthen the argument.

      Moreover, while WordNet-based encoding models perform reasonably well within-subject in the HCP dataset, their generalization to group-level responses in the Short Fun Movies (SFM) dataset is markedly poorer. This could indicate that these models capture a considerable amount of subject-specific variance, which fails to translate to consistent group-level activity. This observation highlights the importance of distinguishing between encoding models that capture stimulus-driven representations and those that overfit to individual heterogeneities.

    1. eLife Assessment

      This important Research Advance builds on the authors' previous work delineating the roles of the rodent perirhinal cortex and the basolateral amygdala in first- and second-order learning. The convincing results show that serial exposure of non-motivationally relevant stimuli influences how those stimuli are encoded within the perirhinal cortex and basolateral amygdala when paired with a shock. This manuscript will be interesting for researchers in cognitive and behavioral neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study advances the lab's growing body of evidence exploring higher-order learning and its neural mechanisms. They recently found that NMDA receptor activity in the perirhinal cortex was necessary for integrating stimulus-stimulus associations with stimulus-shock associations (mediated learning) to produce preconditioned fear, but it was not necessary for forming stimulus-shock associations. On the other hand, basolateral amygdala NMDA receptor activity is required for forming stimulus-shock memories. Based on these facts, the authors assessed: (1) why the perirhinal cortex is necessary for mediated learning but not direct fear learning, and (2) the determinants of perirhinal cortex versus basolateral amygdala necessity for forming direct versus indirect fear memories. The authors used standard sensory preconditioning and variants designed to manipulate the novelty and temporal relationship between stimuli and shock and, therefore, the attentional state under which associative information might be processed. Under experimental conditions where information would presumably be processed primarily in the periphery of attention (temporal distance between stimulus/shock or stimulus pre-exposure), perirhinal cortex NMDA receptor activation was required for learning indirect associations. On the other hand, when information would likely be processed in focal attention (novel stimulus contiguous with shock), basolateral amygdala NMDA activity was required for learning direct associations. Together, the findings indicate that the perirhinal cortex and basolateral amygdala subserve peripheral and focal attention, respectively. The authors provide support for their conclusions using careful, hypothesis-driven experimental design, rigorous methods, and integrating their findings with the relevant literature on learning theory, information processing, and neurobiology. Therefore, this work will be highly interesting to several fields.

      Strengths:

      (1) The experiments were carefully constructed and designed to test hypotheses that were rooted in the lab's previous work, in addition to established learning theory and information processing background literature.

      (2) There are clear predictions and alternative outcomes. The provided table does an excellent job of condensing and enhancing the readability of a large amount of data.

      (3) In a broad sense, attention states are a component of nearly every behavioral experiment. Therefore, identifying their engagement by dissociable brain areas and under different learning conditions is an important area of research.

      (4) The authors clearly note where they replicated their own findings, report full statistical measures, effect sizes, and confidence intervals, indicating the level of scientific rigor.

      (5) The findings raise questions for future experiments that will further test the authors' hypotheses; this is well discussed.

      Weaknesses:

      As a reader, it is difficult to interpret how first-order fear could be impaired while preconditioned fear is intact; it requires a bit of "reading between the lines".

    3. Reviewer #2 (Public review):

      Summary:

      This paper continues the authors' research on the roles of the basolateral amygdala (BLA) and the perirhinal cortex (PRh) in sensory preconditioning (SPC) and second-order conditioning (SOC). In this manuscript, the authors explore how prior exposure to stimuli may influence which regions are necessary for conditioning to the second-order cue (S2). The authors perform a series of experiments which first confirm prior results shown by the author - that NMDA receptors in the PRh are necessary in SPC during conditioning of the first-order cue (S1) with shock to allow for freezing to S2 at test; and that NMDA receptors in the BLA are necessary for S1 conditioning during the S1-shock pairings. The authors then set out to test the hypothesis that the PRh encodes associations in a peripheral state of attention, whereas the BLA encodes associations in a focal state of attention, similar to the A1 and A2 states in Wagner's theory of SOP. To do this, they show that BLA is necessary for conditioning to S2 when the S2 is first exposed during a serial compound procedure - S2-S1-shock. To determine whether pre-exposure of S2 will shift S2 to a peripheral focal state, the authors run a design in which S2-S1 presentations are given prior to the serial compound phase. The authors show that this restores NMDA receptor activity within the PRh as necessary for the fear response to S2 at test. They then test whether the presence of S1 during the serial compound conditioning allows the PRh to support the fear responses to S2 by introducing a delay conditioning paradigm in which S1 is no longer present. The authors find that PRh is no longer required and suggest that this is due to S2 remaining in the primary focal state.

      Strengths:

      As with their earlier work, the authors have performed a rigorous series of experiments to better understand the roles of the BLA and PRh in the learning of first- and second-order stimuli. The experiments are well-designed and clearly presented, and the results show definitive differences in functionality between the PRh and BLA. The first experiment confirms earlier findings from the lab (and others), and the authors then build on their previous work to more deeply reveal how these regions differ in how they encode associations between stimuli. The authors have done a commendable job of pursuing these questions.

      Table 1 is an excellent way to highlight the results and provide the reader with a quick look-up table of the findings.

      Weaknesses:

      The authors have attempted to resolve the question of the roles of the PRh and BLA in SPC and SOC, which the authors have explored in previous papers. Laudably, the authors have produced substantial results indicating how these two regions function in the learning of first- and second-order cues, providing an opportunity to narrow in on possible theories for their functionality. Yet the authors have framed this experiment in terms of an attentional framework and have argued that the results support this particular framework and hypothesis - that the PRh encodes peripheral and the BLA encodes focal states of learning. This certainly seems like a viable and exciting hypothesis, yet I don't see why the results have been completely framed and interpreted this way. It seems to me that there are still some alternative interpretations that are plausible and should be included in the paper.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a series of experiments that further investigate the roles of the BLA and PRH in sensory preconditioning, with a particular focus on understanding their differential involvement in the association of S1 and S2 with shock.

      Strengths:

      The motivation for the study is clearly articulated, and the experimental designs are thoughtfully constructed. I especially appreciate the inclusion of Table 1, which makes the designs easy to follow. The results are clearly presented, and the statistical analyses are rigorous. My comments below mainly concern areas where the writing could be improved to help readers more easily grasp the logic behind the experiments.

      Weaknesses:

      (1) Lines 56-58: The two previous findings should be more clearly summarized. Specifically, it's unclear whether the "mediated S2-shock" association occurred during Stage 2 or Stage 3. I assume the authors mean Stage 2, but Stage 2 alone would not yet involve "fear of S2," making this expression a bit confusing.

      (2) Line 61: The phrase "Pavlovian fear conditioning" is ambiguous in this context. I assume it refers to S1-shock or S2-shock conditioning. If so, it would be clearer to state this explicitly.

      (3) Regarding the distinction between having or not having Stage 1 S2-S1 pairings, is "novel vs. familiar" the most accurate way to frame this? This terminology could be misleading, especially since one might wonder why S2 couldn't just be presented alone on Stage 1 if novelty is the critical factor. Would "outcome relevance" or "predictability" be more appropriate descriptors? If the authors choose to retain the "novel vs. familiar" framing, I suggest providing a clear explanation of this rationale before introducing the predictions around Line 118.

      (4) Line 121: This statement should refer to S1, not S2.

      (5) Line 124: This one should refer to S2, not S1.

      (6) Additionally, the rationale for Experiment 4 is not introduced before the Results section. While it is understandable that Experiment 4 functions as a follow-up to Experiment 3, it would be helpful to briefly explain the reasoning behind its inclusion.

    1. eLife Assessment

      This manuscript describes the identification and characterization of 12 specific phosphomimetic mutations in the recombinant full-length human tau protein that trigger tau to form fibrils. This fundamental study will allow in vitro mechanistic investigations. The presented evidence is convincing. This manuscript will be of interest to all scientists in the amyloid formation field.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      The very well-written manuscript by Lövestam et al. from the Scheres/Goedert groups entitled "Twelve phosphomimetic mutations induce the assembly of recombinant full-length human tau into paired helical filaments" demonstrates the in vitro production of the so-called paired helical filament Alzheimer's disease (AD) polymorph fold of tau amyloids through the introduction of 12 point mutations that attempt to mimic the disease-associated hyper-phosphorylation of tau. The presented work is very important because it enables disease-related scientific work, including seeded amyloid replication in cells, to be performed in vitro using recombinant-expressed tau protein.

      Comments on revised version:

      The manuscript is significantly improved, as also indicated by Reviewer 2, with the 100% formation of the PHF and the additional experiments to elucidate on the potential mechanism by the PTMs. This is a great work.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses an important impediment in the field of Alzheimer's disease (AD) and tauapathy research by showing that 12 specific phosphomimetic mutations in full-length tau allow the protein to aggregate into fibrils with the AD fold and the fold of chronic traumatic encephalopathy fibrils in vitro. The paper presents comprehensive structural and cell based seeding data indicating the improvement of their approach over previous in vitro attempts on non-full-length tau constructs. The main weaknesses of this work results from the fact that only up to 70% of the tau fibrils form the desired fibril polymorphs. In addition, some of the figures are of low quality and confusing.

      Strengths:

      This study provides significant progress towards a very important and timely topic in the amyloid community, namely the in vitro production of tau fibrils found in patients.

      The 12 specific phosphomimetic mutations presented in this work will have an immediate impact in the field since they can be easily reproduced.

      Multiple high-resolution structures support the success of the phosphomimetic mutation approach.

      Additional data show the seeding efficiency of the resulting fibrils, their reduced tendency to bundle, and their ability to be labeled without affecting core structure or seeding capability.

      Comments on revised version:

      Generally, I am satisfied with the revisions. Specifically, the new results showing 100% formation of PHF is a significant improvement.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary and Strengths:

      The very well-written manuscript by Lövestam et al. from the Scheres/Goedert groups entitled "Twelve phosphomimetic mutations induce the assembly of recombinant fulllength human tau into paired helical filaments" demonstrates the in vitro production of the so-called paired helical filament Alzheimer's disease (AD) polymorph fold of tau amyloids through the introduction of 12 point mutations that attempt to mimic the disease-associated hyper-phosphorylation of tau. The presented work is very important because it enables disease-related scientific work, including seeded amyloid replication in cells, to be performed in vitro using recombinant-expressed tau protein. 

      Weaknesses: 

      The following points are asked to be addressed by the authors:

      (i) In the discussion it would be helpful to note the findings that in AD the chemical structure tau (including phosphorylation) is what defines the polymorph fold and not the buffer/cellular environment. It would be further interesting to discuss these findings in respect to the relationship between disease and structure. The presented findings suggest that due to a cellular/organismal alteration, such as aging or Abeta aggregation, tau is specifically hyper-phosphorylated which then leads to its aggregation into the paired helical filaments that are associated with AD. 

      We have added an extra sentence to the Introduction to emphasise this possibility: “Besides the cellular environment in which they assemble, different tau folds may also be determined by chemical modifications of tau itself.”

      In addition, the last paragraph of the Discussion now reads: “It could be that, besides different cellular environments in which the filaments assemble, different posttranslational modification patterns are also important for the assembly of tau into protofilament folds that are specific for the other tauopathies.”

      (ii) The conditions used for each assembly reaction are a bit hard to keep track of and somewhat ambiguous. In order to help the reader, I would suggest making a table to show conditions used for each type of assembly (including the diameter / throw of the orbital shaker) and the results (structural/biological) of those conditions. For example, presumably the authors did not have ThT in the samples used for cryo-EM but the methods section does not specify this. Also, the presence of trace NaCl is proposed as a possible cause for the CTE fold to appear in the 0N4R sample (page 4) but no explanation of why this particular sample would have more NaCl than the others. Furthermore, it appears that NaCl was actually used in the seeded assembly reactions that produced the PHF and not the CTE fold. This would seem to indicate the CTE structure of 0N4RPAD12 is not actually induced by NaCl (like it was for tau297-391). In order for the reader to better understand the reproducibility of the polymorphs, it would be helpful to indicate in how many different conditions and how many replicates with new protein preparations each polymorph was observed (could be included in the same table)  

      We have added a new table (Table 1) with the buffer conditions, protein concentration and shaking speed and time, for all structures described in this paper. We never added ThT to assembly reactions that were used for cryo-EM.

      We did not use NaCl in the seeded assembly reactions (we used sodium citrate). We don’t really know why 0N4R PAD12 tau more readily forms the CTE fold. The observation that it does so prompted us to use 0N3R for all ensuing experiments. 

      (iii) It is not clear how the authors calculate the percentage of each filament type. In Figure 1 it is stated "discarded solved particles (coloured) and discarded filaments in grey" which leaves the reviewer wondering what a "discarded solved particle" is and which filaments were discarded. From the main text one guesses that the latter is probably false positives from automated picking but if so, these should not be referred to as filaments. Also, are the percentages calculated for filaments or segments? In any case, it would be more helpful in such are report to know the best estimate of the ratio of identified filament types without confusing the reader with a measure of the quality of the picking algorithm. Please clarify. Also, a clarification is asked for the significance of the varying degrees of PHF and AD monomer filaments in the various assembly conditions. It could be expected that there is significant variability from sample to sample but it would be interesting to know if there has been any attempt to reproduce the samples to measure this variability. If not, it might be worth mentioning so that the % values are taking with the appropriate sized grain of salt. Finally, the representation of the data in Figure 1 would seem to imply that the 0N3R forms less or no monofilament AD fold because no cross-section is shown for this structure, however it is very similar to (or statistically the same as) the 1:1 mix of 0N3R:0N4R.

      In the revised manuscript, we have used bi-hierchical clustering of filaments, where each segment (or particle) is classified based on both 2D class assignment and to which filament it belongs (this method is based on [Porthula et al (2019), Ultramicroscopy 203, 132-138] and was further developed in [Lövestam et al (2024) Nature 7993, 119-125]. Based on the assumption that filament type does not change within a single filament type, we have observed that this gives excellent classification results, and that this approach allows classification of many, even small minority, filament types. Using this approach, we now quantify the different filament types on the number of segments extracted from filaments classified in this way. 

      Moreover, we have also addressed the problem of having singlets among the PHF preparation: it turns out that waiting longer, just by transferring samples out of the shaker after one week and incubating it quiescently at 37 ºC for two more weeks, the singlets disappear and only PHFs remain. Filaments made for the fluorophore labelling in the revised Figure 3 were also done using the new protocol. In total, we have N=7 replicates with a mean of 95.3% PHFs and a standard deviation of 9.4%. The revised text in the Results section reads:

      “To further increase the proportions of PHFs-to-singlet ratio, we removed the plate from the shaker after one week and incubated it quiescently at 37 ºC for two more weeks. This resulted in 100% PHFs formed (Figure 1 – figure supplement 4). When repeated seven times, on average 95.3% PHFs formed, with 25% of singlets formed in a single outlier (Figure 1 – figure supplement 5)” 

      (iv) The interpretation of the NMR data on soluble tau that the mutations on the second site are suppressing in part long range dynamic interaction around the aggregationinitiation site (FIA) is sound. It is in particular interesting to find that the mutations have a similar effect as the truncation at residue 391. An additional experiment using solvent PREs to elaborate on the solvent exposed sequence-resolved electrostatic potential and the intra-molecular long range interactions would likely strengthen the interpretation significantly (Iwahara, for example, Yu et al, in JACS 2024). Figure 6D Figure supplement shows the NMR cross peak intensities between tau 151-391 and PAD12tau151-391. Overall the intensities of the PAD12 tau construct are more intense which could be interpreted with less conformational exchange between long range dynamic interactions. There are however several regions which do not show any intensity anymore when compared with the corresponding wildtype construct such as 259-262, 292-294 which should be discussed/explained. 

      While long-range intramolecular interactions of tau have previously been reported through the use of spin labels (Mukrasch et al 2009 PLoS Biol 7(2): e1000034), we have been hesitant to introduce paramagnetic agents into our samples for two reasons. First, the bulky size of the spin label may affect filament formation or influence the dynamic properties of the protein. Second, covalent addition of the spin label requires mutation of the primary sequence to both remove native cysteine residues and add cysteines at the desired label location. We have previously shown that mutation of cysteine 322 to alanine leads to the formation of tau filaments with a structure that is different from the PHF (Santambrogio et al (2025) bioRxiv 2025.03.29.646137). 

      Instead, we have included in the revised manuscript new NMR and cryo-EM data that provide further support for the model that a FIA-like interaction between residues <sub>392</sub>IVYK<sub>395</sub> and residues <sub>306</sub>VQIVYK<sub>311</sub> has an inhibiting effect on filament nucleation in unmodified full-length tau. A mutant of tau297-441 where residues <sub>392</sub>IVYK<sub>395</sub> have been deleted and that does not contain the four PAD12 mutations in the carboxy-terminal domain behaves similarly in the NMR experiment as the tau297-441 construct with those four PAD12 mutations. Moreover, full-length 0N3R tau with the eight PAD12 mutations in the amino-terminal fuzzy coat and with the deletion of<sub>392</sub>IVYK<sub>395</sub>, but without the four PAD12 mutations in the carboxy-terminal domain, assembles readily into amyloid filaments (of which we also solved a cryo-EM structure, see the revised Figure 6B). These observations provide mechanistic insights into the previously proposed paper-clip model [Jeganathan (2008), J Biol Chem 283, 32066-32076], where interactions between the fuzzy coat inhibit filament formation of unmodified full-length tau, and phosphorylation in the fuzzy coat interferes with these interactions, thus leading to filament nucleation. Of course, the identification of residues <sub>392</sub>IVYK<sub>395</sub> for this interaction also explain why truncation of tau at residue 391 leads to spontaneous assembly. We have introduced a new Figure 7 to the revised manuscript to explain this model in more detail. The corresponding new section in the Results reads:

      “To investigate this further, we also tested a tau construct comprising residues tau297-441 without the phosphomimetic mutations, but with a deletion of residues (Δ392-395). Filaments formed rapidly and the cryo-EM structure showed that the ordered core consisted of the amino-terminal part of the construct spanning residues 297-318 (Figure 6B). NMR analysis (Figure 6 – figure supplement 5B) showed that the tau297441 Δ392-395 construct exhibited similar backbone rigidity properties to the tau297-441 PAD12 construct, despite peak locations and local secondary structural propensities being more similar to the wildtype tau297-441 (Figure 6 – figure supplement 5A; Figure 6 – figure supplement 6). HSQC peak intensities in the 297-319 and 392-404 regions of tau297-441 Δ392-395 (Figure 6A, expanded from Figure 6 - figure supplement 5C) were like those in the tau297-441 PAD12. These data suggest that the IVYK deletion has a similar effect as the phosphomimetics on residues 396, 400, 403 and 404 on disrupting an intra-molecular interaction between the FIA core region and the carboxy-terminal domain, which may therefore be mediated by interactions between the two IVYK motifs that are similar to those observed in the FIA (Lövestam et al, 2024).”

      A new section in the Discussion now reads:

      “Our NMR data provide insights into the mechanism by which phosphorylation in the fuzzy coat of tau, or truncations of tau, lead to the formation of filaments with ordered cores of residues that are themselves not phosphorylated. HSQC peak intensity differences between unmodified tau 297-441, PAD12 tau 297-441 and tau297-391 suggest that phosphorylation of the fuzzy coat, particularly near the <sub>392</sub>IVYK<sub>395</sub> motif in the carboxy-terminal domain, a7ects the conformation of the residues of tau that become ordered in the FIA (Lövestam et al., 2024). Removal of residues <sub>392</sub>IVYK<sub>395</sub> in the carboxyterminal domain of tau 297-441 led to rapid filament formation in the absence of phosphomimetics, while HSQC peak intensity di7erences for this construct indicate similar backbone rigidity compared to tau 297-441 without the deletion, but with the four PAD12 mutations in the carboxy-terminal domain. Combined, these observations support a model where the <sub>392</sub>IVYK<sub>395</sub> motif in unmodified full-length tau monomers interacts with the <sub>308</sub>IVYK<sub>311</sub> motif, thus inhibiting filament formation by preventing the formation of the nucleating species, the FIA. Phosphorylation of nearby residues 396, 400, 403 and 404, or truncation at residue 391, disrupt this interaction and lead to filament formation. This model agrees with the previously proposed hairpin-like model of tau (Jeganathan et al., 2008), although the corresponding interaction between the aminoterminal domain of tau and the core-forming region remains unknown (Figure 7).”

      Due to the challenging nature of the assignment, it was not possible to assign all residues in the HSQC of the tau151-391 and the PAD12 tau151-391 samples, including residues 259-262 and 292-294 for PAD12 tau151-391. To make this clearer, we have marked residues that are not assigned with an asterisk in the revised version of Figure 6 – figure supplement 1.  

      (v) Concerning the Cryo-EM data from the different hyper-phosphorylation mimics, it would seem that the authors could at least comment on the proportion of monofilament and paired-filaments even if they could not solve the structures. Nonetheless, based on their previous publications, one would also expect that they could show whether the nontwisted filaments are likely to have the same structure (by comparing the 2D classes to projections of non-twisted models). Also, it is very interesting to note that the twist could be so strongly controlled by the charge distribution on the non-structured regions (and may be also related to the work by Mezzenga on twist rate and buffer conditions). Is the result reported in Figure 2 a one-oT case or was it also reproducible?

      As also indicated in the main text, the assembly conditions for the PAD12+4, PAD12-4 and PAD12+/-4 constructs were kept the same as those for the PAD12 construct. It is possible that further optimisation of the conditions could again lead to twisting filaments, but we chose not to pursue this route. With unlimited resources and time, one could assess in detail which of the PAD12 mutations are required and which ones could be omitted to form PHFs. However, this would require a lot of work and cryo-EM time. For now, we chose to prioritise reporting conditions that do work to reproducibly make PHFs in the laboratory (using the PAD12 construct) and leave the more detailed analysis of other constructs for future studies. 

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript addresses an important impediment in the field of Alzheimer's disease (AD) and tauapathy research by showing that 12 specific phosphomimetic mutations in full-length tau allow the protein to aggregate into fibrils with the AD fold and the fold of chronic traumatic encephalopathy fibrils in vitro. The paper presents comprehensive structural and cell based seeding data indicating the improvement of their approach over previous in vitro attempts on non-full-length tau constructs. The main weaknesses of this work results from the fact that only up to 70% of the tau fibrils form the desired fibril polymorphs. In addition, some of the figures are of low quality and confusing. 

      As also explained in our response to reviewer #1, we have performed better quantification of filament types in the revised manuscript, and we have investigated how to get rid of the singlets. In the revised manuscript, we report that singlets disappear as time passes and that one can obtain 100% pure PHFs by quiescently incubating samples for another two weeks, after shaking for a week.

      Strengths: 

      This study provides significant progress towards a very important and timely topic in the amyloid community, namely the in vitro production of tau fibrils found in patients.

      The 12 specific phosphomimetic mutations presented in this work will have an immediate impact in the field since they can be easily reproduced.

      Multiple high-resolution structures support the success of the phosphomimetic mutation approach. Additional data show the seeding efficiency of the resulting fibrils, their reduced tendency to bundle, and their ability to be labeled without affecting core structure or seeding capability.

      Weaknesses: 

      Despite the success of making full-length AD tau fibrils, still ~30% of the fibrils are either not PHF, or not accounted for. A small fraction of the fibrils are single filaments and another ~20% are not accounted for. The authors mention that ~20% of these fibrils were not picked by the automated algorithm. However, it would be important to get additional clarity about these fibrils. Therefore, it would improve the impact of the paper if the authors could manually analyze passed-over particles to see if they are compatible with PHF or fall into a different class of fibrils. In addition, it would be helpful if the authors could comment on what can be done/tried to get the PHF yield closer to 90-100%

      As mentioned above, in the revised manuscript we show that the singlets disappear over time and we now include a description of a method that leads to 100% PHF formation.

      Reviewer #1 (Recommendations for the authors):

      Minor points: 

      (a) In Figure 6 the dashed purple vertical lines overlap with the black bars, rendering a grey color which is confusing because the grey bars used for the shorter construct. It is suggested to improve the colors (remove transparency on the purple?)

      We thank the reviewers for their suggestions for improving the visualisation of our data. We have recoloured the tau297-391 data from grey to gold and moved the dashed lines to the back of image to remove the apparent colour changes.  

      (b) Is there any support for the suggestion that "part of the second microtubule-binding repeat is ordered" being "related to this construct forming filaments with only a single protofilament"? It seemed to have come out of nowhere.

      There is no further support for this statement, but we thought it would be worth hypothesizing about this observation. 

      (c) Figures 1 and 4 E is better described as a "main chain trace" or "backbone trace" although the latter usually refers to only CA positions. Ribbon usually refers to something else in representations of protein structures. 

      This has been changed into “main chain trace” in Figures 1 and 4. 

      (d) Figure 1 Supplement 3: Panel letters in the legend do not match. 

      This has been fixed.

      Reviewer #2 (Recommendations for the authors): 

      The introduction is a bit lengthy (e.g. 3rd paragraph of introduction) and could benefit by focusing specific question the manuscript addresses. 

      We have shortened the Introduction. It now contains ~1150 words, which we hope provides a better compromise between length and sufficient background information.

      Figure captions are generally not helpful in conveying a message to the reader.

      Figure 1 - figure supplement 3 is quite confusing. The 4 structures in A) do not correspond to the grids in B-E. What is this figure supposed to show?

      This confusion was probably the result of incorrect labelling of panels in the legend, which was also pointed out by reviewer #1. This has been fixed in the revised manuscript.

      Page 11: Although I know what you mean, 'linear increase of ThT fluorescence' is not the correct term. 

      We have replaced “linear” with “rapid”.

      Page 15: Although line shape and peak intensity can be related you are not reporting on line shape or width but simply on peak intensity. Therefore, I wouldn't talk about the result of a 'line shape analysis'.

      We have changed the wording accordingly. 

      Figure 6 (and supplement 1) are confusing and too small to be readable in print. It might be sufficient to show the CSP and upload the remaining data to the BMRB. 

      We have made a clearer version of the main NMR Figure 6 in the revised manuscript showing the most pertinent NMR data and have moved the previous version into the figure supplements. We designed these figures to be viewed as full page A4 panels, ideally seen in one image as they show multiple comparisons of different experiments and constructs.

      As such we feel these will be best viewed on screen as part of the eLife web document. We have uploaded HSQC spectra and assignments to the BMRB (see below).

      Figure 6 supplement 3 might benefit from pointing out key residues in the overlay.

      We have added the labels (this is now Figure 6 supplement 4).

      Data availability: Please upload the assignments to the BMRB together with key spectra (e.g. HSQCs). 

      We have uploaded HSQC data along with our assignments to the BMRB, the accession codes are 52694 – tau297-441 wt; 52695 – tau297-441 PAD-12; 52696 – tau151-391 wt; 52697 – tau151-391 PAD-12; and 53230 – tau297-441 delta392-395.  These accession codes have been added to the manuscript. 

      The quality of some of the figures (specifically Figure 1 - supplement 3 and Figure 6) is not suitable for publication. 

      For the original submission to bioRxiv, we produced a single PDF with a manageable file size. We will liaise with the eLife staff to ensure the images used in the version of record will be suitable for publication.

    1. eLife Assessment

      This important work presents a stochastic branching process model of tumour-immune coevolution, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and the summary statistics of sequencing data of bulk and single-cell sequencing of a tumour. The evidence is compelling and the work will be of interest to cancer-immune biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors noted, a general dearth of good models in this space. The authors have made important progress on the topic by introduced a stochastic branching process model of antigenicity / immunogenicity and measuring the proportion of simulated tumors which go extinct. The model is extensively explored and authors provide some nice theoretical results in addition to simulated results, including an analysis of increasing cancer/immune versus cyclical cancer/immune dynamics. The analysis appropriately builds upon the foundation of other work in the field of predicting site frequency spectrum, but extends the results into cancer-immune co-evolution in an intuitive computational framework.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors  noted, a general dearth of good models in this space. The authors have made important progress on the topic by introducing a stochastic branching process model of antigenicity/immunogenicity and measuring the proportion of simulated tumors that go extinct. The model is extensively explored, and the authors provide some nice theoretical results in addition to simulated results. 

      We thank the reviewer for the positive comments on our work.

      Major comments 

      The text in lines 183-191 is intuitively and nicely explained. However, I am not sure all of it follows from the figure panels in Figure 2. For example, the authors refer to a mutation that has a large immunogenicity, but it's not shown how many mutations, or the relative size of the mutations in Figure 2. The same comment holds true for the claim that spikes also arise for mutations with low antigenicity. 

      We thank the reviewer for helping us to further specify this statement in our original submission. We now added muller plots in a new Appendix Figure (Figure A3) presenting the relative abundances of different types of effector cells in the population over time. Each effector type is colour-coded with its antigenicity and immunogenicity. To align with this Appendix Figure (Figure A3), we also updated our Figure 2 generated under the same realisation as Figure A3. We can now see clearly that the spikes in the mean values of the antigenicity and immunogenicity over the whole effector populations in new Figure 2B&2D indeed correspond to the expansion of single or several antigenic mutations recruiting the specific effector cell types. For example, in Figure 2B, we can see that the spikes of low average antigenicity and high immunogenicity (around time 11) happen at the same time when an effector type in Figure A3 with such a trait (coloured in green) arises and takes over the population. We have rewritten our Results section related (Line 192 - Line 222 in main text and Appendix A6).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors developed a model of tumour-immune dynamics, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this  model to investigate how tumour-immune interactions influence tumour outcome and summary  statistics of sequencing data. 

      Strengths: 

      This novel modeling framework addresses an important and timely topic. The authors consider the useful question of how bulk and single-cell sequencing may provide insights into the tumourimmune interactions and selection processes. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      One set of conclusions presented in the paper is the presence of cyclic dynamics between effector/cancer cells, antigenicity, and immunogenicity. However, these conclusions are supported in the manuscript by two sample trajectories of stochastic simulations, and these provide mixed support for the conclusions (i.e. the phasing asynchrony described in the text does not seem to apply to Figure 2C). 

      We have now developed a method to quantify the cyclic dynamics in our system (Appendix A7), where can track the directional changes phase portrait of the abundances of the cancer and effector cells. We first tested this method in a non-evolving stochastic predator-prey system, where our method can correctly capture the number of cycles in this system (Figure A7). We then use this method to quantify the number of cycles we observed between cancer and effector cells under different mutation rates (Figure A5) as well as whether they are counter-clockwise or clockwise cycles (Figure A6). Our results showed that the cyclic dynamics are more often to be observed when mutation rates are higher, and the majority of those cycles are counter-clockwise. When the mutation rate is high, we observe an increase of clockwise cycles, which have been observed in predator-prey systems and explained through coevolution. However, even under high mutation rates, counter-clockwise cycles are still the more frequent type. 

      In our simulations, we observed rarely out-of-phase cycles, which was by chance present in our original Figure 2. We have now removed that statement about out-of-phase cycles and replaced by more systematic analysis of the cyclic dynamics as described above (Line 192 to 207 in the revised version). We thank the constructive comment of the reviewer, which motivated us to improve our analysis significantly. 

      Similarly, the authors also find immune selection effects on the shape of the mutational burden in Figure 5 D/H using a qualitative comparison between the distributions and theoretical predictions in  the absence of immune response. However the discrepancy appears quite small in panel D, and  there are no quantitative comparisons provided to evaluate the significance. An analysis of the robustness of all the conclusions to parameter variation is missing. 

      We have now added statistical analysis using Wasserstein distance between the simulated mutation burden distribution and theoretical (neutral) expectation in Figure 5 C, D, G, H as well as in Figure A11 C&D when there is no cancer-immune interaction. We can see that the measurements of the  Wasserstein distance agrees with our statement, that the higher immune effectiveness leads to larger deviation from the neutral expectation.

      Lastly, the role of the Appendix results in the main messages of the paper is unclear. 

      We agree with the review and have now removed the Appendix sections “Deterministic Analysis”. 

      Reviewing Editor Comments: 

      I find the abstract too long. For example, "Knowledge of this coevolutionary system and the selection taking place within it can help us understand tumour-immune dynamics both during tumorigenesis but also when treatments such as immunotherapies are applied." can be shortened to: "Knowledge of this coevolutionary system can help us understand tumour-immune dynamics both during tumorigenesis and during immunotherapy treatments." 

      We agree and have taken the suggestion of the reviewer to shorten our abstract.

      Reviewer #1 (Recommendations for the authors): 

      The discussion at lines 134-140, centered around Figure A1, is an important and nicely constructed feature of the model. 

      Reviewer #2 (Recommendations for the authors): 

      I suggest that the authors conduct a more in-depth analysis of their conclusions on cyclic dynamics over a large set of sample paths.

      Done and please see our detailed response to the reviewer 2 above.

      In addition, statistical comparisons between the observed mutational burden distribution and  theoretical predictions in the absence of immune selection should be carried out to support their conclusions. In all cases, conclusions should be tested extensively for robustness/sensitivity to parameters. 

      Done and please see our detailed response to the reviewer 2 above.

      Here are some specific suggestions/comments: 

      (1) Please provide a precise mathematical description of the model to complement Figure 1. 

      We have significantly revised our “Model” section to provide a precise mathematical description of our model (Line 138 - 148). Please also see our document showing the difference between the revised version and original submission.

      (2) Section on "Interactions dictate outcome of tumour progress" and Figure 3: please define 'tumour outcome' - are the heatmaps produced in Figure 3 tumor size reflecting whether or not the population has reached level K before a particular time? Also, I do not see a definition for the 'slowgrowing' tumour proportion plotted in Figure 3CF or in the accompanying text. 

      We have now added the definition of “tumour outcome” in our “Model” section (line 171 to 176), where we explain our model parameters and quantities measured in the following “Results” section.

      (3) Figure 5C/G: the green dotted vertical line is difficult to see. 

      We have now changed the mean of the simulations to solid red lines instead of using the green dotted vertical lines previously.

      (4) Appendix A1 text under (A2) should U/N be U/C? N does not appear to be defined. 

      We have more removed the previous A1 section. Please see our response to reviewer 2 as well.

      (5) Text under (A5): it is unclear what is meant by "SFS must be heavy tailed (that is, more heterogeneous)" -- a more precise statement regarding tail decay rate and associated consequences would be more helpful. 

      We have more removed the previous A section, where the original text "...SFS must be heavy-tailed" was.

      (6) Section A4 and Figure A1: can these calculations be compared to simulations? 

      We have more removed the previous A section on the deterministic analysis as they are not so  relevant to our stochastic simulations indeed. Please see our response to reviewer 2 as well.

      (7) Also, in general, please clarify how the results in the Appendix are used in the main text conclusions or provide insights relevant to these conclusions. If they are not, one can consider removing them.  

      We have more removed the previous A section on the deterministic analysis. The remaining sections are about stochastic simulations and extended figures which support our main figures.  

      (8) Figure A2: the two lines are difficult to tell apart on each panel. Please consider different styles.

      We have changed one of the dotted lines to be solid. This figure is now Figure A1 in our revision.

    1. eLife Assessment

      This important study introduces a new class of spectrally tunable, dye-based calcium sensors optimized for imaging in organelles with high calcium concentrations, such as the endoplasmic reticulum and mitochondria. The experimental evidence supporting the applicability of these sensors is convincing, with thorough validation in cultured cells and neurons. The work will be of high interest to researchers studying calcium signaling dynamics in subcellular compartments.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Moret et al. details the development and characterisation of novel ER- and mitochondria-targeted genetically encoded chemogenic Ca2+ sensors.

      Strengths:

      Compared to existing probes, these sensors exhibited superior responsiveness, brightness, and photostability within the red and far-red emission spectrum, enabling triple compartment Ca2+ measurements (ER, mitochondria, cytosol) and the detection of Ca2+ dynamics in axons and dendrites.

      Weaknesses:

      The data are robust and convincing, although the manuscript text lacks precision.

    3. Reviewer #2 (Public review):

      Summary:

      Moret et al. present an engineered family of fluorescent calcium indicators based on HaloCamp, a HaloTag-based sensor system that utilizes Janelia Fluorophores (JF dyes) to report calcium dynamics. By introducing single or multiple amino acid substitutions, the authors reduce HaloCamp's calcium affinity, making these low-affinity variants well-suited for imaging calcium transients in high-calcium environments such as the endoplasmic reticulum (ER) and mitochondria. The study validates the sensors' dissociation constants (Kd), spectra, and multiplex capabilities. It demonstrates improved performance compared to existing tools when targeted to subcellular compartments in mammalian cells and cultured neurons. The sensors can be tuned across the red-to-far-red spectrum via JF585 and JF635 labeling, enabling flexible multiplexed imaging. For example, the authors show that HaloCamp can be targeted to mitochondria and used alongside other green and red sensors, allowing simultaneous imaging of calcium dynamics in the cytosol, ER, and mitochondria. Overall, they achieve their goals, and the data demonstrate that HaloCamp variants are effective for detecting ER and mitochondrial calcium changes under physiological conditions. The presented experiments support the conclusions. However, some key aspects, such as sensor kinetics and axonal validation, would benefit from further analysis.

      This work is likely to have an important impact on the fields of calcium imaging and organelle physiology. The modular design of HaloCamp and its compatibility with a wide range of fluorophores offer a broad application range for cell biologists and neuroscientists.

      Strengths:

      (1) The authors introduce the first tunable, dye-based, low-affinity HaloTag calcium sensors for subcellular imaging, addressing a significant unmet need for ER and mitochondrial calcium detection.

      (2) The ability to pair HaloCamp with JF585 and JF635 extends the spectral range, facilitating multiplexed imaging with existing calcium indicators.

      (3) The sensors are validated in a range of subcellular compartments (ER, mitochondria, cytosol) in both mammalian cells and neurons.

      (4) The authors successfully demonstrate simultaneous imaging of three compartments using orthogonal sensors, a technically impressive feat.

      (5) Kd values are measured, and fluorescent responses are tested under physiologically relevant stimulation.

      Weaknesses:

      (1) The authors do not quantify the kinetics (e.g., decay tau or off-rate) of the fluorescent signals, particularly after stimulation. For example, in the ER imaging experiments in neurons, the decay of the HaloCamp fluorescence after field stimulation (20 APs @ 20 Hz) is not analyzed or compared to ER-GCaMP6-210 or R-CEPIer.

      (2) It remains unclear whether the observed decay represents the sensor's off-kinetics or actual physiological calcium clearance from the ER. A comparison between sensors or an independent measurement of ER clearance rates in vitro would clarify this.

      (3) The choice of 20 APs at 20 Hz is not justified. Specifically, single APs or low-frequency stimulations are not tested, leaving unclear what the detection threshold of the new sensors is.

      (4) In neuron experiments, the authors report measuring ER calcium in axons based presumably on morphology, but no specific justification for selection, markers, or post hoc labeling is described.

      (5) Figure 5 assumes that all three indicators (cytosolic, ER, and mitochondrial) are fast enough to report calcium dynamics in response to histamine. This assumption is not fully validated. Cross-controls (e.g., expressing GCaMP6-210 in mitochondria and HaloCamp in the ER) would strengthen confidence that the sensors are correctly reporting dynamic changes.

      (6) It is not clear why Thapsigargin leads to depletion in HeLa cells and neurons in experiments shown in Figure 1E, but not in 2B upon field stimulation.

    1. eLife Assessment

      This study presents useful findings on the molecular mechanisms driving female-to-male sex reversal in the ricefield eel (Monopterus albus) during aging, which would be of interest to biologists studying sex determination. The manuscript describes an interesting mechanism potentially underlying sex differentiation in M. albus. However, the current data are incomplete and would benefit from more rigorous experimental approaches.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the molecular mechanism by which warm temperature induces female-to-male sex reversal in the ricefield eel (Monopterus albus), a protogynous hermaphroditic fish of significant aquacultural value in China. The study identifies Trpv4 - a temperature-sensitive Ca²⁺ channel - as a putative thermosensor linking environmental temperature to sex determination. The authors propose that Trpv4 causes Ca²⁺ influx, leading to activation of Stat3 (pStat3). pStat3 then transcriptionally upregulates the histone demethylase Kdm6b (aka Jmjd3), leading to increased dmrt1 gene expression and ovo-testes development. This work aims to bridge ecological cues with molecular and epigenetic regulators of sex change and has potential implications for sex control in aquaculture.

      Strengths:

      (1) This study proposes the first mechanistic pathway linking thermal cues to natural sex reversal in adult ricefield eel, extending the temperature-dependent sex determination paradigm beyond embryonic reptiles and saltwater fish.

      (2) The findings could have applications for aquaculture, where skewed sex ratios apparently limit breeding efficiency.

      Weaknesses:

      (A) Scientific Concerns:

      (1) There is insufficient replication and data transparency. First, the qPCR data are presented as bar graphs without individual data points, making it impossible to assess variability or replication. Please show all individual data points and clarify n (sample size) per group. Second, the Western blotting is only shown as single replicates. If repeated 2-3 times as stated, quantification and normalization (e.g., pStat3/Stat3, GAPDH loading control) are essential. The full, uncropped blots should be included in the supplementary data.

      (2) The biological significance of the results is not clear. Many reported fold changes (e.g., kdm6b modulation by Stat3 inhibition, sox9a in S3A) are modest (<2-fold), raising concerns about biological relevance. Can the authors define thresholds of functional relevance or confirm phenotypic outcomes in these animals?

      (3) The specificity of key antibodies is not validated. Key antibodies (Stat3, pStat3, Foxl2, Amh) were raised against mammalian proteins. Their specificity for ricefield eel proteins is unverified. Validation should include siRNA-mediated knockdown with immunoblot quantification with 3 replicates. Homemade antibodies (Sox9a, Dmrt1) also require rigorous validation.

      (4) Most of the imaging data (immunofluorescence) is inconclusive. Immunofluorescence panels are small and lack monochrome channels, which severely limits interpretability. Larger, better-contrasted images (showing the merge and the monochrome of important channels) and quantification would enhance the clarity of these findings.

      (B) Other comments about the science:

      (1) In S3A, sox9a expression is not dose-responsive to Trpv4 modulation, weakening the causal inference.

      (2) An antibody against Kdm6b (if available) should be used to confirm protein-level changes.

      In sum, the interpretations are limited by the above concerns regarding data presentation and reagent specificity.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents valuable findings on the molecular mechanisms driving the female-to-male transformation in the ricefield eel (Monopterus albus) during aging. The authors explore the role of temperature-activated TRPV4 signaling in promoting testicular differentiation, proposing a TRPV4-Ca²⁺-pSTAT3-Kdm6b axis that facilitates this gonadal shift.

      Strengths:

      The manuscript describes an interesting mechanism potentially underlying sex differentiation in M. albus.

      Weaknesses:

      The current data are insufficient to fully support the central claims, and the study would benefit from more rigorous experimental approaches.

      (1) Overstated Title and Claims:

      The title "TRPV4 mediates temperature-induced sex change" overstates the evidence. No histological confirmation of gonadal transformation (e.g., formation of testicular structures) is presented. Conclusions are based solely on molecular markers such as dmrt1 and sox9a, which, although suggestive, are not definitive indicators of functional sex reversal.

      (2) Temperature vs Growth Rate Confounding (Figure 1E):

      The conclusion that warm temperature directly induces gonadal transformation is confounded by potential growth rate effects. The authors state that body size was "comparable" between 25{degree sign}C and 33{degree sign}C groups, but fail to provide supporting data. In ectotherms, growth is intrinsically temperature-dependent. Given the known correlation between size and sex change in M. albus, growth rate-rather than temperature per se-may underlie the observed sex ratio shifts. Controlled growth-matched comparisons or inclusion of growth rate metrics are needed.

      (3) TRPV4 as a Thermosensor-Insufficient Evidence:

      The characterisation of TRPV4 as a direct thermosensor lacks biophysical validation. The observed transcriptional upregulation of Trpv4 under heat (Figure 2) reflects downstream responses rather than primary sensor function. Functional thermosensors, including TRPV4, respond to heat via immediate ion channel activity-typically measurable within seconds-not mRNA expression over hours. No patch-clamp or electrophysiological data are provided to confirm TRPV4 activation thresholds in eel gonadal cells. Additionally, the Ca²⁺ imaging assay (Figure 2F) lacks essential details: the timing of GSK1016790A/RN1734 administration relative to imaging is unclear, making it difficult to distinguish direct channel activity from indirect transcriptional effects.

      (4) Cellular Context of TRPV4 Activity Is Unclear:

      In situ hybridisation suggests TRPV4 expression shifts from interstitial to somatic domains under heat (Figures. 2H, S2C), implying potential cell-type-specific roles. However, the study does not clarify: (i) whether TRPV4 plays the same role across these cell types, (ii) why somatic cells show stronger signal amplification, or (iii) the cellular composition of explants used in in vitro assays. Without this resolution, conclusions from pharmacological manipulation (e.g., GSK1016790A effects) cannot be definitively linked to specific cell populations.

      (5) Rapid Trpv4 mRNA Elevation and Channel Function:

      The authors report a dramatic increase in Trpv4 mRNA within one day of heat exposure (Figures 4D, S2B). Given that TRPV4 is a membrane channel, not a transcription factor, its rapid transcriptional sensitivity to temperature raises mechanistic questions. This finding, while intriguing, seems more correlational than functional. A clearer explanation of how TRPV4 senses temperature at the molecular level is needed.

      (6) Inconclusive Evidence for the Ca<sup>2+</sup> -pSTAT3-Kdm6b Axis:

      Although the authors propose a TRPV4-Ca<sup>2+</sup> -pSTAT3-Kdm6b-dmrt1 pathway, intermediate steps remain poorly supported. For example, western blot data (Figures 3C, 4B) do not convincingly demonstrate significant pSTAT3 elevation at 34{degree sign}C. Higher-resolution and properly quantified blots are essential. The inferred signalling cascade is based largely on temporal correlation and pharmacological inhibition, which are insufficient to establish direct regulatory relationships.

      (7) Species-Specific STAT3-Kdm6b Regulation Is Unresolved:

      The proposed activation of Kdm6b by pSTAT3 contrasts with findings in the red-eared slider turtle (Trachemys scripta), where pSTAT3 represses Kdm6b. This divergence in regulatory direction between the two TSD species is surprising and demands further justification. Cross-species differences in binding motifs or epigenetic context should be explored. Additional evidence, such as luciferase reporter assays (using wild-type and mutant pSTAT3 binding motifs in the Kdm6b promoter) is needed to confirm direct activation. A rescue experiment-testing whether Kdm6b overexpression can compensate for pSTAT3 inhibition-would also greatly strengthen the model.

      (8) Immunofluorescence-Lack of Structural Markers:

      All immunofluorescence images should include structural markers to delineate gonadal boundaries. Furthermore, image descriptions in the figure legends and main text lack detail and should be significantly expanded for clarity.

      (9) Pharmacological Reagents-Mechanisms and References:

      The manuscript lacks proper references and mechanistic descriptions for the pharmacological agents used (e.g., GSK1016790A, RN1734, Stattic). Established literature on their specificity and usage context should be cited to support their application and interpretation in this study.

      (10) Efficiency of Experimental Interventions:

      The percentage of gonads exhibiting sex reversal following pharmacological or RNAi treatments should be reported in the Results. This is critical for evaluating the strength and reproducibility of the interventions.

    1. eLife Assessment

      This important work advances our understanding of DNA methylation and its consequences for susceptibility to DNA damage. This work presents evidence that DNA methylation can accentuate the genomic damage propagated by DNA damaging agents as well as potentially being an independent source of such damage. The experimental results reported are sound. The evidence presented to support the conclusions drawn is convincing and alternative interpretations are considered. The work will be of broad interest to biochemists, cell and genome biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Introduction of cytosine-5 DNA methylation sensitizes cells to oxidative damage" proposes that 5mC modifications to DNA, despite being ancient and wide-spread throughout life, represent a vulnerability, making cells more susceptible to both chemical alkylation and, of more general importance, reactive oxygen species. Sarkies et al take the innovative approach of introducing enzymatic genome-wide cytosine methylation system (DNA methyltransferases, DNMTs) into E. coli, which normally lacks such a system. They provide compelling evidence that the introduction of DNMTs increases the sensitivity of E. coli to chemical alkylation damage. Surprisingly they also show DNMTs increase the sensitivity to reactive oxygen species and propose that the DNMT generated 5mC presents a target for the reactive oxygen species that is especially damaging to cells. Evidence is presented that DNMT activity directly or indirectly produces reactive oxygen species in vivo, which is an important discovery if correct, though the mechanism for this remains obscure.

      I am satisfied that the points #2, #3 and #4 relating to non-addativity, transcriptional changes and ROS generation have been appropriately addressed in this revised manuscript. The most important point (previously #1) has not been addressed beyond the acknowledgement in the results section that: "Alternatively, 3mC induction by DNMT may lead to increased levels of ssDNA, particularly in alkB mutants, which could increase the risk of further DNA damage by MMS exposure and heighten sensitivity." This slightly miss-represents the original point that 5mC the main enzymatic product of DNMTs rather or in addition to 3mC is likely to lead to transient damage susceptible ssDNA, especially in an alkB deficient background. And more centrally to the main claims of this manuscript, the authors have not resolved whether methylated cytosine introduced into bacteria is deleterious in the context of genotoxic stress because of the oxidative modification to 5mC and 3mC, or because of oxidative/chemical attack to ssDNA that is transiently exposed in the repair processing of 5mC and 3mC, especially in an alkB deficient background. This is a crucial distinction because chemical vulnerability of 5mC would likely be a universal property of cytosine methylation across life, but the wide-spread exposure of ssDNA is expected to be peculiarity of introducing cytosine methylation into a system not evolved with that modification as a standard component of its genome.

      These two models make different predictions about the predominant mutation types generated, in the authors system using M.SssI that targets C in a CG context - if oxidative damage to 5mC dominates then mutations are expected to be predominantly in a CG context, if ssDNA exposure effects dominate then the mutations are expected to be more widely distributed - sequencing post exposure clones could resolve this.

      Strengths:

      This work is based on an interesting initial premise, it is well motivated in the introduction and the manuscript is clearly written. The results themselves are compelling.

      Weaknesses:

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specifically the authors have not resolved whether oxidative modification to 5mC and 3mC, or chemical attack to ssDNA that is transiently exposed in the repair processing of 5mC and 3mC is the principal source of the observed genotoxicity. The authors acknowledge this potential alternative model in their discussion of the revised manuscript.

    3. Reviewer #2 (Public review):

      5-methylcytosine (5mC) is a key epigenetic mark in DNA and plays a crucial role in regulating gene expression in many eukaryotes including humans. The DNA methyltransferases (DNMTs) that establish and maintain 5mC, are conserved in many species across eukaryotes, including animals, plants, and fungi, mainly in a CpG context. Interestingly, 5mC levels and distributions are quite variable across phylogenies with some species even appearing to have no such DNA methylation.

      This interesting and well-written paper discusses continuation of some of the authors' work published several years ago. In that previous paper, the laboratory demonstrated that DNA methylation pathways coevolved with DNA repair mechanisms, specifically with the alkylation repair system. Specifically, they discovered that DNMTs can introduce alkylation damage into DNA, specifically in the form of 3-methylcytosine (3mC). (This appears to be an error in the DNMT enzymatic mechanism where the generation 3mC as opposed to its preferred product 5-methylcytosine (5mC), is caused by the flipped target cytosine binding to the active site pocket of the DNMT in an inverted orientation.) The presence of 3mC is potentially toxic and can cause replication stress, which this paper suggests may explain the loss of DNA methylation in different species. They further showed that the ALKB2 enzyme plays a crucial role in repairing this alkylation damage, further emphasizing the link between DNA methylation and DNA repair.

      The co-evolution of DNMTs with DNA repair mechanisms suggest there can be distinct advantages and disadvantages of DNA methylation to different species which might depend on their environmental niche. In environments that expose species to high levels of DNA damage, high levels of 5mC in their genome may be disadvantageous. This present paper sets out to examine the sensitivity of an organism to genotoxic stresses such as alkylation and oxidation agents as the consequence of DNMT activity. Since such a study in eukaryotes would be complicated by DNA methylation controlling gene regulation, these authors cleverly utilize Escherichia coli (E.coli) and incorporate into it the DNMTs from other bacteria that methylate the cytosines of DNA in a CpG context like that observed in eukaryotes; the active sites of these enzymes are very similar to eukaryotic DNMTs and basically utilize the same catalytic mechanism (also this strain of E.coli does not specifically degrade this methylated DNA) .

      The experiments in this paper more than adequately show that E. coli expression of these DNMTs (comparing to the same strain without the DNMTS) do indeed show increased sensitivity to alkylating agents and this sensitivity was even greater than expected when a DNA repair mechanism was inactivated. Moreover, they show that this E. coli expressing this DNMT is more sensitive to oxidizing agents such as H2O2 and has exacerbated sensitivity when a DNA repair glycosylase is inactivated. Both propensities suggest that DNMT activity itself may generate additional genotoxic stress. Intrigued that DNMT expression itself might induce sensitivity to oxidative stress, the experimenters used a fluorescent sensor to show that H2O2 induced reactive oxygen species (ROS) are markedly enhanced with DNMT expression. Importantly, they show that DNMT expression alone gave rise to increased ROS amounts and both H2O2 addition and DNMT expression has greater effect that the linear combination of the two separately. They also carefully checked that the increased sensitivity to H2O2 was not potentially caused by some effect on gene expression of detoxification genes by DNMT expression and activity. Finally, by using mass spectroscopy, they show that DNMT expression led to production of the 5mC oxidation derivatives 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) in DNA. 5fC is a substrate for base excision repair while 5hmC is not; more 5fC was observed. Introduction of non-bacterial enzymes that produce 5hmC and 5fC into the DNMT expressing bacteria again showed a greater sensitivity than expected. Remarkedly, in their assay with addition of H2O2, bacteria showed no growth with this dual expression of DNMT and these enzymes.

      Overall, the authors conduct well thought-out and simple experiments to show that a disadvantageous consequence of DNMT expression leading to 5mC in DNA is increased sensitivity to oxidative stress as well as alkylating agents.

      Again, the paper is well-written and organized. The hypotheses are well-examined by simple experiments. The results are interesting and can impact many scientific areas such as our understanding of evolutionary pressures on an organism by environment to impacting our understanding about how environment of a malignant cell in the human body may lead to cancer.

      In a new revised version of the paper, the authors have adequately addressed issues put forth by other reviewers.

    4. Reviewer #3 (Public review):

      Summary:

      Krwawicz et al., present evidence that expression of DNMTs in E. coli results in (1) introduction of alkylation damage that is repaired by AlkB; (2) confers hypersensitivity to alkylating agents such as MMS (and exacerbated by loss of AlkB); (3) confers hypersensitivity to oxidative stress (H2O2 exposure); (4) results in a modest increase in ROS in the absence of exogenous H2O2 exposure; and (5) results in the production of oxidation products of 5mC, namely 5hmC and 5fC, leading to cellular toxicity. The findings reported here have interesting implications for the concept that such genotoxic and potentially mutagenic consequences of DNMT expression (resulting in 5mC) could be selectively disadvantageous for certain organisms. The other aspect of this work which is important for understanding the biological endpoints of genotoxic stress is the notion that DNA damage per se somehow induces elevated levels of ROS.

      Strengths:

      The manuscript is well-written, and the experiments have been carefully executed providing data that support the authors' proposed model presented in Fig. 7 (Discussion, sources of DNA damage due to DNMT expression).

      Weaknesses:

      (1) The authors have established an informative system relying on expression of DNMTs to gauge the effects of such expression and subsequent induction of 3mC and 5mC on cell survival and sensitivity to an alkylating agent (MMS) and exogenous oxidative stress (H2O2 exposure). The authors state (p4) that Fig. 2 shows that "Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to WT C2523, supporting the conclusion that the expression of DNMTs increased the levels of alkylation damage." This is a confusing statement and requires revision as Fig. 2 does ALL cells shown in Fig. 2 are expressing DNMTs and have been treated with MMS. It is the absence of AlkB and the expression of DNMTs that that causes the MMS sensitivity.

      (2) It would be important to know whether the increased sensitivity (toxicity) to DNMT expression and MMS is also accompanied by substantial increases in mutagenicity. The authors should explain in the text why mutation frequencies were not also measured in these experiments.

      (3) Materials and Methods. ROS production monitoring. The "Total Reactive Oxygen Species (ROS) Assay Kit" has not been adequately described. Who is the Vendor? What is the nature of the ROS probes employed in this assay? Which specific ROS correspond to "total ROS"?

      (4) The demonstration (Fig. 4) that DNMT expression results in elevated ROS and its further synergistic increase when cells are also exposed to H2O2 is the basis for the authors' discussion of DNA damage-induced increases in cellular ROS. S. cerevisiae does not possess DNMTs/5mC, yet exposure to MMS also results in substantial increases in intracellular ROS (Rowe et al, (2008) Free Rad. Biol. Med. 45:1167-1177. PMC2643028). The authors should be aware of previous studies that have linked DNA damage to intracellular increases in ROS in other organisms and should comment on this in the text.

      Comments for the revised manuscript:

      In this revised manuscript, the authors have satisfactorily addressed the issues raised in the review of the original submission and have significantly improved these studies.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specifically the authors have not resolved whether oxidative modification to 5mC and 3mC, or chemical attack to ssDNA that is transiently exposed in the repair processing of 5mC and 3mC is the principal source of the observed genotoxicity.

      (1) Original query which still stands: As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been [adequately] considered.

      We thank the reviewer for expanding on their previous comment.  We completely agree with the possibility that they raise and have added an extra paragraph in the discussion to expand on our consideration of the role of ssDNA in DNMT-induced DNA damage, which we reproduce here:

      "The observation that TET overexpression sensitizes cells expressing DNMTs to oxidative stress strongly suggests that the site of DNA damage is the modified cytosine itself.  However, we do not currently have definitive evidence supporting this.  As mentioned in the results section, the presence of unrepaired 3mC may lead to increased levels of ssDNA; it is also possible that 5mC itself may increase ssDNA levels.  Loss of alkB would be expected to increase the amount of ssDNA.  Thus DNA damage surrounding modification sites, but not specifically localised to it, might be the cause of the increased sensitivity.  These two different models make different predictions.  If modified cytosines are the source of the damage, mutations arising would be predominantly located at CG dinucleotides.  Alternatively, ssDNA exposure would result in distributed mutations that would not necessarily be located at CG sites.  The highly biased spectrum of mutations that can be screened through the Rif resistance assay does not allow us to address this currently.  However, future experiments to create mutation accumulation lines could allow us to address the question systematically on a genome-wide level. "

    1. eLife Assessment

      This study presents DeepTX, a valuable methodological tool that integrates mechanistic stochastic models with single-cell RNA sequencing data to infer transcriptional burst kinetics at genome scale. The approach is broadly applicable and of interest to subfields such as systems biology, bioinformatics, and gene regulation. The evidence supporting the findings is solid, with appropriate validation on synthetic data and thoughtful discussion of limitations related to identifiability and model assumptions.

    2. Joint Public Review:

      In this work, the authors present DeepTX, a computational tool for studying transcriptional bursting using single-cell RNA sequencing (scRNA-seq) data and deep learning. The method aims to infer transcriptional burst dynamics-including key model parameters and the associated steady-state distributions-directly from noisy single-cell data. The authors apply DeepTX to datasets from DNA damage experiments, revealing distinct regulatory patterns: IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU alters burst frequency in human cancer cells, driving apoptosis or survival depending on dose. These findings underscore the role of burst regulation in mediating cell fate responses to DNA damage.

      The main strength of this study lies in its methodological contribution. DeepTX integrates a non-Markovian mechanistic model with deep learning to approximate steady-state mRNA distributions as mixtures of negative binomial distributions, enabling genome-scale parameter inference with reduced computational cost. The authors provide a clear discussion of the framework's assumptions, including reliance on steady-state data and the inherent unidentifiability of parameter sets, and they outline how the model could be extended to other regulatory processes.

      The revised manuscript addresses many of the original concerns, particularly regarding sample size requirements, distributional assumptions, and the biological interpretation of inferred parameters. However, the framework remains limited by the constraints of snapshot data and cannot yet resolve dynamic heterogeneity or causality. The manuscript would also benefit from a broader contextualisation of DeepTX within the landscape of existing tools linking mechanistic modelling and single-cell transcriptomics. Finally, the interpretation of pathway enrichment analyses still warrants clarification.

      Overall, this work represents a valuable contribution to the integration of mechanistic models with high-dimensional single-cell data. It will be of interest to researchers in systems biology, bioinformatics, and computational modelling.

    3. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      In this work, the authors develop a new computational tool, DeepTX, for studying transcriptional bursting through the analysis of single-cell RNA sequencing (scRNA-seq) data using deep learning techniques. This tool aims to describe and predict the transcriptional bursting mechanism, including key model parameters and the steady-state distribution associated with the predicted parameters. By leveraging scRNA-seq data, DeepTX provides high-resolution transcriptional information at the single-cell level, despite the presence of noise that can cause gene expression variation. The authors apply DeepTX to DNA damage experiments, revealing distinct cellular responses based on transcriptional burst kinetics. Specifically, IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU affects burst frequency in human cancer cells, leading to apoptosis or, depending on the dose, to survival and potential drug resistance. These findings underscore the fundamental role of transcriptional burst regulation in cellular responses to DNA damage, including cell differentiation, apoptosis, and survival. Although the insights provided by this tool are mostly well supported by the authors' methods, certain aspects would benefit from further clarification.

      The strengths of this paper lie in its methodological advancements and potential broad applicability. By employing the DeepTXSolver neural network, the authors efficiently approximate stationary distributions of mRNA counts through a mixture of negative binomial distributions, establishing a simple yet accurate mapping between the kinetic parameters of the mechanistic model and the resulting steady-state distributions. This innovative use of neural networks allows for efficient inference of kinetic parameters with DeepTXInferrer, reducing computational costs significantly for complex, multi-gene models. The approach advances parameter estimation for high-dimensional datasets, leveraging the power of deep learning to overcome the computational expense typically associated with stochastic mechanistic models. Beyond its current application to DNA damage responses, the tool can be adapted to explore transcriptional changes due to various biological factors, making it valuable to the systems biology, bioinformatics, and mechanistic modelling communities. Additionally, this work contributes to the integration of mechanistic modelling and -omics data, a vital area in achieving deeper insights into biological systems at the cellular and molecular levels.  

      We thank the reviewers for their positive opinion on our manuscript. As reflected in our detailed responses to the reviewers’ comments, we will make significant changes to address their concerns comprehensively.

      This work also presents some weaknesses, particularly concerning specific technical aspects. The tool was validated using synthetic data, and while it can predict parameters and steady-state distributions that explain gene expression behaviour across many genes, it requires substantial data for training. The authors account for measurement noise in the parameter inference process, which is commendable, yet they do not specify the exact number of samples required to achieve reliable predictions. Moreover, the tool has limitations arising from assumptions made in its design, such as assuming that gene expression counts for the same cell type follow a consistent distribution. This assumption may not hold in cases where RNA measurement timing introduces variability in expression profiles.

      Thank reviewers for detailed and constructive feedback on our work. We will address the key concerns raised from the following points:

      (1) Clarification on the required sample size: We tested the robustness of our inference method on simulated datasets by varying the number of single-cell samples. Our results indicated that the predictions of burst kinetics parameters become accurate when the number of cells reaches 500 (Supplementary Figure S3d, e). This sample size is smaller than the data typically obtained with current single-cell RNA sequencing (scRNA-seq) technologies, such as 10x Genomics and Smart-seq3 (Zheng GX et al., 2017; Hagemann-Jensen M et al., 2020). Therefore, we believed that our algorithm is well-suited for inferring burst kinetics from existing scRNA-seq datasets, where the sample size is sufficient for reliable predictions. We will clarify this point in the main text to make it easier for readers to use the tool.

      (2) Assumption-related limitations: One of the fundamental assumptions in our study is that the expression counts of each gene are independently and identically distributed (i.i.d.) among cells, which is a commonly adopted assumption in many related works (Larsson AJM et al., 2019; Ochiai H et al., 2020; Luo S et al., 2023). However, we acknowledged the limitations of this assumption. The expression counts of the same gene in each cell may follow distinct distributions even from the same cell type, and dependencies between genes could exist in realistic biological processes. We recognized this and will deeply discuss these limitations from assumptions and prospect as an important direction for future research.  

      The authors present a deep learning pipeline to predict the steady-state distribution, model parameters, and statistical measures solely from scRNA-seq data. Results across three datasets appear robust, indicating that the tool successfully identifies genes associated with expression variability and generates consistent distributions based on its parameters. However, it remains unclear whether these results are sufficient to fully characterise the transcriptional bursting parameter space. The parameters identified by the tool pertain only to the steady-state distribution of the observed data, without ensuring that this distribution specifically originates from transcriptional bursting dynamics.

      We appreciate reviewers’ comments and the opportunity to clarify our study’s contributions and limitations. Although we agree that assessing whether the results from these three realistic datasets can represent the characterize transcriptional burst parameter space is challenging, as it depends on data property and conditions in biology, we firmly believe that DeepTX has the capacity to characterize the full parameter space. This believes stems from the extensive parameters and samples we input during model training and inference across a sufficiently large parameter range (Method 1.3). Furthermore, the training of the model is both flexible and scalable, allowing for the expansion of the transcriptional burst parameter space as needed. We will clarify this in the text to enable readers to use DeepTX more flexibly.

      On the other hand, we agree that parameter identification is based on the steady-state distribution of the observed data (static data), which loses information about the fine dynamic process of the burst kinetics. In principle, tracking the gene expression of living cells can provide the most complete information about real-time transcriptional dynamics across various timescales (Rodriguez J et al., 2019).

      However, it is typically limited to only a small number of genes and cells, which could not investigate general principles of transcriptional burst kinetics on a genome-wide scale. Therefore, leveraging the both steady-state distribution of scRNA-seq data and mathematical dynamic modelling to infer genome-wide transcriptional bursting dynamics represents a critical and emerging frontier in this field. For example, the statistical inference framework based on the Markovian telegraph model, as demonstrated in (Larsson AJM et al., 2019), offers a valuable paradigm for understanding underlying transcriptional bursting mechanisms. Building on this, our study considered a more generalized non-Mordovian model that better captures transcriptional kinetics by employing deep learning method under conditions such as DNA damage. This provided a powerful framework for comparative analyses of how DNA damage induces alterations in transcriptional bursting kinetics across the genome. We will highlight the limitations of current inference using steady-state distributions in the text and look ahead to future research directions for inference using time series data across the genome.

      A primary concern with the TXmodel is its reliance on four independent parameters to describe gene state-switching dynamics. Although this general model can capture specific cases, such as the refractory and telegraph models, accurately estimating the parameters of the refractory model using only steadystate distributions and typical cell counts proves challenging in the absence of time-dependent data.

      We thank reviewers for highlighting this critical concern regarding the TXmodel's reliance on four independent parameters to describe gene state-switching dynamics. We acknowledge that estimating the parameters of the TXmodel using only steady-state distributions and typical single-cell RNA sequencing (scRNA-seq) data poses significant challenges, particularly in the absence of timeresolved measurements.

      As described in the response of last point, while time-resolved data can provide richer information than static scRNA-seq data, it is currently limited to a small number of genes and cells, whereas static scRNA-seq data typically capture genome-wide expression. Our framework leverages deep learning methods to link mechanistic models with static scRNA-seq data, enabling the inference of genome-wide dynamic behaviors of genes. This provides a potential pathway for comparative analyses of transcriptional bursting kinetics across the entire genome.

      Nonetheless, the refractory model and telegraphic model are important models for studying transcription bursts. We will discuss and compare them in terms of the accuracy of inferred parameters.

      Certainly, we agree that inferring the molecular mechanisms underlying transcriptional burst kinetics using time-resolved data remains a critical future direction. We will include a brief discussion on the role and importance of time-resolved data in addressing these challenges in the discussion section of the revised manuscript.

      The claim that the GO analysis pertains specifically to DNA damage response signal transduction and cell cycle G2/M phase transition is not fully accurate. In reality, the GO analysis yielded stronger p-values for pathways related to the mitotic cell cycle checkpoint signalling. As presented, the GO analysis serves more as a preliminary starting point for further bioinformatics investigation that could substantiate these conclusions. Additionally, while GSEA analysis was performed following the GO analysis, the involvement of the cardiac muscle cell differentiation pathway remains unclear, as it was not among the GO terms identified in the initial GO analysis.

      We thank the reviewer for this valuable feedback and for pointing out the need for clarification regarding the GO and GSEA analyses. We agree that the connection between the cardiac muscle cell differentiation pathway identified in the GSEA analysis and the GO terms from the initial analysis requires further clarification. This discrepancy arises because GSEA examines broader sets of pathways and may capture biological processes not highlighted by GO analysis due to differences in the statistical methods and pathway definitions used. We will revise the manuscript to address this point, explicitly discussing the distinct yet complementary nature of GO and GSEA analyses and providing a clearer interpretation of the results.

      As the advancement is primarily methodological, it lacks a comprehensive comparison with traditional methods that serve similar functions. Consequently, the overall evaluation of the method, including aspects such as inference accuracy, computational efficiency, and memory cost, remains unclear. The paper would benefit from being contextualised alongside other computational tools aimed at integrating mechanistic modelling with single-cell RNA sequencing data. Additional context regarding the advantages of deep learning methods, the challenges of analysing large, high-dimensional datasets, and the complexities of parameter estimation for intricate models would strengthen the work.

      We greatly appreciate your insightful feedback, which highlights important considerations for evaluating and contextualizing our methodological advancements. Below, we emphasize our advantages from both the modeling perspective and the inference perspective compared with previous model. As our work is rooted in a model-based approach to describe the transcriptional bursting process underlying gene expression, the classic telegraph model (Markovian) and non-Markovian models which are commonly employed are suitable for this purpose:

      Classic telegraph model: The classic telegraph model allows for the derivation of approximate analytical solutions through numerical integration, enabling efficient parameter point estimation via maximum likelihood methods, e.g., as explored in (Larsson AJM et al., 2019). Although exact analytical solutions for the telegraph model are not available, certain moments of its distribution can be explicitly derived. This allows for an alternative approach to parameter inference using moment-based estimation methods, e.g., as explored in (Ochiai H et al., 2020). However, it is important to note that higher-order sample moments can be unstable, potentially leading to significant estimation bias. 

      Non-Markovian Models: For non-Markovian models, analytical or approximate analytical solutions remain elusive. Previous work has employed pseudo-likelihood approaches, leveraging statistical properties of the model’s solutions to estimate parameters ,e.g., as explored in (Luo S et al., 2023).

      However, the method may suffer from low inference efficiency. 

      In our current work, we leverage deep learning to estimate parameters of TXmodel, which is nonMarkovian model. First, we represent the model's solution as a mixture of negative binomial distributions, which is obtained by the deep learning method. Second, through integration with the deep learning architecture, the model parameters can be optimized using automatic differentiation, significantly improving inference efficiency. Furthermore, by employing a Bayesian framework, our method provides posterior distributions for the estimated dynamic parameters, offering a comprehensive characterization of uncertainty. Compared to traditional methods such as moment-based estimation or pseudo-likelihood approaches, we believe our approach not only achieves higher inference efficiency but also delivers posterior distributions for kinetics parameters, enhancing the interpretability and robustness of the results. We will present and emphasize the computational efficiency and memory cost of our methods the revised version.

      Recommendations for the authors:

      There are various noise sources in biological progress. How transcriptional bursting fits within those as well as the reasons to focus only on this source needs to be clearly discussed in the introduction of the manuscript. Related to this last point, transcriptional bursting might not be the only mechanism to take advantage of the stochastic nature of biomolecular processes to make decisions. Once again, what are the implications of assuming this as the underlying mechanism?

      Thank the reviewer for this valuable comment. We fully agree that biological systems are subject to multiple stochastic sources, which arise from both intrinsic and extrinsic noise (Eling N et al., 2019). Intrinsic noise is primarily driven by the stochastic biochemical effects that directly influence mRNA and protein expression in a gene-specific manner, such as DNA, epigenetic, transcription, and translation levels. Extrinsic noise arises from fluctuations in cell-specific manners, such as changes in cell size, cell cycle, or cell signaling. Given that DNA damage most directly perturbs transcription and translation processes, focusing on intrinsic noise sources is appropriate for mechanistically modeling gene-specific expression variability, particularly since this variability can be captured at the genome-wide scale by scRNA-seq data.

      Among various intrinsic noise sources, transcriptional bursting offers a mechanistically wellcharacterized and quantifiable representation of gene expression variability (Tunnacliffe E & Chubb JR, 2020). It reflects the dynamic switching between active and inactive gene states and has been observed consistently across prokaryotic and eukaryotic cells (Eling N et al., 2019). Moreover, transcriptional bursting kinetics, defined by burst size and frequency, can be inferred from scRNA-seq data at the singlegene level using steady-state assumptions, making it an analytically tractable and biologically meaningful feature for large-scale inference (Rodriguez J & Larson DR, 2020).

      We acknowledge that transcriptional bursting is not the only mechanism through which cells can utilize stochasticity for fate decisions. Other processes, such as translational noise and chromatin accessibility, may also contribute. However, given the data modality (static scRNA-seq) and the established theoretical framework for bursting, we assume transcriptional bursting as a representative and interpretable proxy of stochastic regulation. This assumption enables us to extract meaningful insights while remaining open to future model extensions, incorporating additional regulatory layers as more data types become available.

      In this version of the manuscript, we have revised the introduction section to better clarify the rationale of this assumption and to more explicitly emphasize the important role of transcriptional bursting within stochastic noise.

      More careful discussion of how the proposed method differentiates from previous work that employs scRNA-seq to elucidate the diverse sources of noise (pp.3).

      Thank the reviewer for this suggestion. Our proposed method differs significantly from previous work that utilizes scRNA-seq data to study diverse noise sources from several aspects (Ochiai H et al., 2020; Eling N et al., 2019; Morgan MD & Marioni JC, 2018). Specifically, DeepTX infers genomewide burst kinetics by directly matching the full steady-state distribution of a mechanistic stochastic model to the observed scRNA-seq data, rather than relying solely on low-order statistics such as mean and variance. Moreover, by adopting a non-Markovian process that allows multi-step promoter switching, DeepTX extends beyond the classic telegraph model to better capture the complex molecular events underlying transcriptional activation and repression. Crucially, we used a deep-learning–based solver to obtain these intractable steady-state distributions rapidly and accurately. This combination of richer data usage, more realistic mechanistic assumptions, and scalable neural-network–accelerated computation lays the groundwork for incorporating additional noise sources into a unified inference framework in future work. 

      In this version of the manuscript, we have revised the discussion section to highlight the difference with previous works.

      The paper could benefit from being contextualised alongside other computational tools that aim to integrate mechanistic modelling with single-cell RNA sequencing data. This is an active area of research, and works such as Sukys and Grima (bioRxiv, 2024), Garrido-Rodriguez et al. (PLOS Computational Biology, 2021), Maizels (2024), and others could provide valuable context.

      Thank the reviewer for suggesting these relevant works. Garrido-Rodriguez et al. (PLOS Comput. Biol., 2021) integrated single-cell and bulk transcriptomic data into mechanistic pathway models to infer signaling dynamics, an approach complementary to our mapping of burst kinetic parameters onto pathway enrichment for linking transcriptional bursting to functional outcomes. Sukys and Grima et al. (bioRxiv, 2024; Now in Nucleic Acids Res., 2025) demonstrated that cell-cycle stage and cellular age significantly modulate burst frequency and size, highlighting the potential to enhance DeepTX by incorporating cell-cycle–dependent variability into genome-wide burst inference. Maizels et al. (Philos. Trans. R. Soc. Lond. B. Biol. Sci., 2024) reviewed methods for capturing single-cell temporal dynamics across multi-omic modalities, underscoring how higher time-resolved data could refine and validate steady-state burst inference frameworks to better resolve causal gene-expression mechanisms.

      We have cited these studies on the contextual relevance to DeepTX in the discussion sections.

      As the advancement is primarily methodological, it lacks a comprehensive comparison with traditional methods that serve similar functions. Consequently, the overall evaluation of the method, including aspects such as inference accuracy, computational efficiency, and memory cost, remains unclear. We suggest incorporating these experiments to provide readers with a more complete understanding of the proposed method's performance.

      Thank the reviewer for constructive suggestion regarding a comprehensive comparison with other previous methods. To address this problem, in this version, we compared DeepTX with our previous work, txABC, that utilized approximate Bayesian computation to infer parameters from the generalized telegraph model (Luo S et al., 2023). As a result, DeepTX achieved improvements in inference accuracy and computational efficiency (Supplementary Figure S4.). For memory cost during single-gene inference, DeepTX requires an average memory usage of approximately 70 MB, whose memory consumption accounts for only a small fraction of the total available memory on standard computing devices (typically exceeding 10 GB), while exhibiting superior inference efficiency compared to txABC. We have mentioned in the third result section.

      Discuss the validity of the assumption of the static snapshot provided by the scRNA-seq data as in steadystate (i.e., stationary distribution), and the implications of this assumption being untrue (for the proposed method).

      We thank the reviewer for the comment regarding the stationary assumption. We assume that each scRNA-seq snapshot approximates the steady-state (stationary) distribution of transcript counts because (i) typical single-cell experiments sample large, asynchronously dividing populations that collectively traverse many transcriptional burst cycles, and (ii) in the absence of a synchronized perturbation, mRNA production and degradation reach a dynamic balance on timescales much shorter than overall cell-type changes. Under these conditions, the empirical count distribution closely mirrors the model’s stationary solution, justifying steady-state inference of burst size and frequency from a single time point. This assumption is commonly adopted in probabilistic models of transcriptional bursting (Larsson AJM et al., 2019; Raj A & van Oudenaarden A, 2008).

      However, this steady-state assumption has some limitations. First, in some scenarios, the cell system may exhibit highly transient transcriptional programs that do not satisfy stationarity, leading to biased or misleading parameter estimates. For example, immediately following a synchronized developmental stimulus—such as serum shock–induced activation of immediate-early genes. Second, because DeepTX infers the mean burst frequency and size across the population, it cannot recover the underlying time-resolved dynamics or distinguish heterogeneous kinetic subpopulations. 

      We have added a statement in the discussion to acknowledge these limitations and suggest future extensions—such as incorporating time-series measurements or latent pseudo time covariates—to address non-stationarity and recover temporal burst dynamics.

      On page 3, "traditional telegraph model" is mentioned without any context. This model, and particularly the implications for the current work, might not be obvious to the reader. Take one or two sentences to give the reader context.

      Thank the reviewer for this helpful comment. We acknowledge that the mention of the "traditional telegraph model" on page 3 may not be immediately clear to all readers. The traditional telegraph model is a mathematical framework commonly used to describe gene expression burst dynamics, in which genes stochastically switch between active (ON) and inactive (OFF) states, with exponentially distributed waiting times for state transitions. To provide the necessary context, we added a brief introduction to the traditional telegraph model and its relevance to our work in the revised manuscript.

      A primary concern with the model used in Figure 2a (TXmodel) is its reliance on four independent parameters to describe gene state switching dynamics. While this general model can encompass specific cases such as the refractory model (Science 332, 472 (2011)) and the telegraph model, accurately estimating the parameters of the refractory model using only steady-state distributions and typical cell numbers (10³-10⁴) is challenging without time-dependent data. To address this, we suggest that the authors provide parameter inference results for each individual parameter, rather than only for burst size and burst frequency, based on synthetic data. This would help clarify the model's effectiveness and improve understanding of its estimation precision.

      Thank the reviewer for highlighting this important concern. We agree that the lack of timeresolved measurements may affect the accuracy of inferences about dynamic parameters, especially the unidentifiability of parameters inferred from steady-state distributions, i.e., multiple parameters leading to the same steady-state distribution. The unidentifiability of individual parameters is a common and critical problem in systems biology studies. To address this issue, for example, Trzaskoma et al. developed StochasticGene, a computationally efficient software suite that uses Bayesian inference to analyze arbitrary gene regulatory models and quantify parameter uncertainty across diverse data types (Trzaskoma P et al., 2024). Alexander et al. adopt a Bayesian approach to parameter estimation by incorporating prior knowledge through a prior distribution and classify a parameter as practically nonidentifiable if it cannot be uniquely determined beyond the confidence already provided by the prior (Browning AP et al., 2020). Hence, in DeepTX, we employed a Bayesian approach based on loss potential to infer the posterior distributions of the parameters (Figure 3E). 

      Although DeepTX also encounters the issue of unidentifiability for individual parameters (Supplementary Figure S11), the multimodal nature of the posterior distribution suggests that multiple distinct parameter sets can produce similarly good fits to the observed data, highlighting the inherent non-identifiability of the model. Nevertheless, in the multimodal posterior distribution, at least one of the posterior peaks aligns closely with the ground truth, thereby demonstrating the validity of the inferred result. Moreover, inference results on synthetic data confirm that the BS and BF can be accurately estimated (Supplementary Figure S3b and S3c). We also performed robustness analyses on synthetic datasets. As shown in Supplementary Figure S3d and S3e, our model reliably recovers the ground-truth burst kinetics of models when the number of cells reaches ~1000, which is within the range of typical single-cell RNA-seq experiments. 

      We have explicitly pointed out the potential issue of unidentifiability due to the lack of temporal resolution information in the discussion section. 

      Noteworthy, transcriptional is always a multi-step process (depending on the granularity with which the process is described). What do the authors mean by saying that "DNA damage turns transcription into a multi-step process rather than a single-step process"?

      Thank the reviewer for pointing out the lack of precision in our original statement. We agree that the phrasing could be misleading. Transcription is inherently a multi-step process, but most mechanistic studies simplify it to a single-step “telegraph” model for tractability. In the context of DNA damage, however, damage-induced pausing and repair-mediated delays introduce additional intermediary states in the transcription cycle that cannot be approximated by a single step. To capture these damage-specific interruptions, DeepTX explicitly consider a multi-step promoter switching framework rather than combining all transitions into one. What we originally wanted to express was the necessity of multi-step process modeling. We have replaced the original sentence in introduction with: “However, the presence of DNA damage necessitates modeling the transcriptional process as a multistep process, rather than a single-step process, to capture the additional complexity introduced by the damage”.

      It is unclear why the authors have chosen a different definition in Equation (2) rather than the commonly used burst frequency, 1/(k_deg * tau_off), as reported in the literature. Unlike the traditional definition, which is unit-free, the definition in Eq. (2) includes units, raising questions about its interpretability and consistency with established conventions. Clarifying this choice would improve the understanding and consistency of the methodology.

      Thank the reviewer for raising this important point. We acknowledge that there are multiple definitions of burst frequency (BF) in the literature. Here, we provide a detailed explanation, clarifying the differences between these definitions, including the one used and the traditional definition .

      First, the definition of burst frequency we adopt has been widely used in recent literatures, such as Benjamin Zoller et al. (Zoller B et al., 2018), Caroline Hoppe et al. (Hoppe C et al., 2020) and Daniel Ramsköld (Ramsköld D et al., 2024). And its quantity represents the average time it takes for the promoter to complete one full stochastic cycle between its active and inactive states . Secondly, the traditional definition can be regarded as a simplified version of our definition, under the assumptions that τ<sub>on</sub> is negligible and k<sub>deg</sub> =1 (i.e., rate parameters are normalized to be unit-free). Although it is reasonable to neglecting activate time τ<sub>on</sub>, as it is typically much shorter than inactive time under some conditions, we chose a more complete way to define the burst frequency so that it is applicable to more general situations. In addition, by defining the burst frequency as , the mean transcription level can be analytically represented as the product of burst size and burst frequency.

      This explanation has been clarified in the methods 1.2 section.

      The authors mention the need to model "more realistic gene expression processes". How is this exactly being incorporated into the model?

      Thank the reviewer for raising this important question. To incorporate "more realistic gene expression processes" into our model, we considered two critical aspects into DeepTX that are often oversimplified in traditional approaches:

      (1) Integration of gene expression and sequencing processes: Observations from scRNA-seq data are influenced by both the intrinsic gene expression processes and the subsequent sequencing procedure. Traditional models often focus solely on gene expression, neglecting the stochastic effects introduced by the sequencing process. Our model explicitly incorporates both the gene expression and sequencing processes, providing a more comprehensive and realistic representation of the observed data.

      (2) Modeling gene expression as a multi-step process: Gene expression is inherently a multi-step process. However, traditional telegraph models typically simplify gene state switching as a single-step process for tractable analysis, often assuming Markovian dynamics where transition waiting times follow exponential distributions. In contrast, our model accounts for the multi-step nature of gene state transitions by allowing the waiting times to follow non-exponential (non-Markovian) distributions. This model is more suitable for gene expression dynamics that cannot be simplified to a single-step process, such as DNA damage, which may introduce an intermediate state to represent pausing and repair in the transcription process.

      By addressing these factors, our model better reflects the complexity and stochastic nature of gene expression processes, aligning more closely with the data generated from biological systems. We have added detailed explanations after this sentence for clarification in the first result section.

      Better explanation of the previously developed TXmodel, and the assumption of a non-Markovian system. In particular, it isn't clear how using arbitrary distributions for the waiting times implies a non-Markovian process (as the previous state(s) of the system is not used to inform the transition probability, at least as explained in pp. 4). Without a clear discussion of the so-called arbitrary waiting time distribution, it isn't clear how these represent a mechanistic model. In general, a more careful discussion of the "mechanistic" model is needed.

      Thank the reviewer for this thoughtful comment. In this revised version, we provided a more detailed explanation of the relationship between the TXmodel and the non-Markovian system in the revised manuscript. Specifically, we will clarify the following points:

      (1) Why non-Markovian system: In a Markovian system, the waiting times for events are exponentially distributed, meaning that the state transitions depend solely on the current state and are memoryless (Van Kampen NG, 1992). However, when the waiting times follow non-exponential distributions, such as Gamma or Weibull distributions, the state transitions are no longer independent of the system's previous states. This introduces memory into the system, making it non-Markovian.

      (2) Why mechanistic model: First, it is important to clarify that regardless of whether the waiting time is arbitrary or exponential (corresponding to non-Markovian and Markovian systems), our TXmodel is a mechanistic model because it models the dynamic process of transcription bursts with interpretable kinetic parameters. Second, although we introduced arbitrarily distributed waiting times, reasonable selection of waiting time distributions can still make the distribution parameters mechanistically interpretable. For example, in the context of modeling ON and OFF state switching times using a Gamma distribution, the two parameters have clear interpretations: the shape parameter represents the number of sequential exponential (memoryless) steps required for the transition to occur, capturing the complexity or multi-step nature of the switching process, while the scale parameter denotes the average duration of each of these steps. We have added the explanation in methods 1.2 section.

      Include a brief discussion about the metric used to compare distributions (and introduce KL abbreviation).

      Thank the reviewer for this suggestion. In the second result and methods 1.3 section of revised manuscript, we have included a brief discussion to introduce and clarify the metric used to compare distributions. Specifically, we have given more explanation for the Kullback-Leibler (KL) divergence, which is a widely used metric for quantifying the difference between two probability distributions. We also ensured that the abbreviation "KL" is properly introduced when it first appears in the text, along with a concise description of its mathematical definition and interpretation within the context of our analysis. 

      What does the "CTM" model stand for (in supplementary information)? And "TX" model?

      Thank the reviewer for highlighting this point. We revised the supplementary information to explicitly define the "CTM" and "TX" models and clarify their distinctions.

      CTM model: The "CTM" model refers to the classic telegraph model, a widely used model for capturing Markovian gene expression burst kinetics. The CTM describes stochastic gene expression as a sequence of four biochemical reactions involving two gene states (ON and OFF), mRNA transcription and degradation:

      k<sub>off</sub> as the rate at which the gene switches from OFF to ON, k<sub>on</sub>  as the rate at which the gene switches from ON to OFF, k<sub>syn</sub>  as the rate of mRNA synthesis and k<sub>deg</sub>  as the rate of mRNA degradation. In this model, gene switching between active and inactive states is governed by a memoryless Markovian process, where the waiting times for transitions follow exponential distributions (Van Kampen NG, 1992).

      TX model: In contrast, the "TX" model is a more generalized telegraph model for transcriptional processes.

      Different from the CTM, the waiting times for state transitions between ON and OFF in the TX model follow arbitrary waiting time distributions. This implies that the future state of the system depends not only on the current state but may also be influenced by its historical trajectories. Consequently, the TX model exhibits non-Markovian behavior. We have added more detailed description on these two models in section 1.1 of supplementary text.

      Leaky transcription (in the OFF promoter state) is not considered. What would be the implications of its presence in the data?

      Thank the reviewer for pointing out the potential role of leaky transcription in our analysis. We acknowledge that leaky transcription, occurring in the promoter OFF state, was not explicitly considered in our current model. Our decision to exclude it assumed that the leaky transcription rate is relatively small and its impact on the observed data is negligible. This assumption is consistent with previous studies that similarly disregard leaky transcription in gene expression modeling due to its minimal contribution to the overall dynamics (Larsson AJM et al., 2019).

      However, we recognize that the leaky transcription should be considered, particularly in systems where the leaky rate is significant relative to the active transcription rate. In such cases, it may introduce additional variability to the observed expression levels or obscure the distinction between ON and OFF states. We have added relevant statements in the discussion section.

      In the main text, the waiting time for state transitions is described by two parameters, while in the methods/supplementary information only one parameter is considered per distribution (without a clear discussion of the so-called "dwell time distributions").

      Thank the reviewer for this comment. We recognize the need to clarify the discrepancy between the descriptions of waiting times in the main text and supplementary materials.

      Dwell time distribution refers to the probability distribution of the time in which a gene remains in a particular transcriptional state (ON or OFF) before transitioning to the other state. While in Markovian models the dwell time follows an exponential distribution, more complex or non-Markovian regulatory mechanisms may give rise to Gamma, Weibull, or other non-exponential dwell time distributions.

      In our model, we denote the dwell time distributions in the OFF and ON states by and , respectively, where w represents a vector of parameters characterizing the distribution, the dimensionality of which depends on the specific form of the distribution. For example, when an exponential distribution is assumed, w consists of a single rate parameter; in contrast, for distributions such as the Gamma or Weibull, w includes two parameters. In the main text, both and are modeled using Gamma distributions, whereas in the Supplementary Materials, we assume exponential distributions for both, resulting in a single-parameter representation. We have added relevant statements in the methods 1.2 section.

      Related, but more general, across the manuscript there are problems with the consistency in terminology. This is especially problematic with the figures. It makes it incredibly hard to follow the work. Better integration of the information, and consistency with the terminology, would improve the understanding for the reader.

      Thank the reviewer for the valuable feedback. To enhance clarity and readability, we have carefully revised the manuscript to ensure consistent terminology throughout the text and figures e.g., unifying terms such as "untreatment" and "control" under the consistent label "control"—across both the text and figures.

      One of the four main assumptions behind the model is that "the solution of the model can be explained by a mixed negative binomial distribution". The logic and implications of this assumption need to be discussed in the paper. (Methods, pp.13.) All four assumptions need to be carefully argued in the paper. 

      We appreciate the reviewer’s comment regarding the assumptions underlying our model. Here, we would like to clarify the rationale and implications of each assumption. 

      Assumption 1 (The gene expression of cells was in a stationary distribution during sequencing.) has been extensively used in previous studies for the inference and modeling of scRNA-seq data, demonstrating effectiveness in capturing mRNA expression distributions and inferring underlying dynamic parameters (Larsson AJM et al., 2019; Luo S et al., 2023; Ramsköld D et al., 2024; Gupta A et al., 2022).

      For Assumption 2 (Gene expression counts of the same cell type follow the same distribution.) is as follows: cell types are typically defined based on gene expression profiles or functional characteristics. Cells with similar functions often exhibit consistent transcriptional programs, leading to approximately identical gene expression distributions. This assumption has been widely adopted in previous research (Larsson AJM et al., 2019; Gupta A et al., 2022).

      Regarding Assumption 3 (The solution of the model can be approximated by a mixed negative binomial distribution.), in the most general formulation, a chemical master equation (CME) model of biological systems converges to a stationary distribution P(n;θ) over n∈ℕ. And P(n;θ) afford a real Poisson representation (Gardiner CW & Chaturvedi S, 1977): where F is a mixing cumulative distribution function (CDF). If such a Poisson representation exists, we can always write down a finite approximation over K Poisson kernels: , where w<sub>k</sub> are weights on a K-dimensional simplex. Further, as k →∞,QP . More problematically, convergence in the number of kernels in K is typically slow. Negative binomial kernels P<sub>Poisson</sub> (n m<sub> k</sub>,l<sub>k</sub>), which are continuous Poisson mixtures with a gamma mixing density can accelerate convergence in K (Gorin G et al., 2024). Hence, the solution of the TX model can be approximated by a mixed negative binomial distribution. 

      For Assumption 4 (The state space sampled from a sufficiently long single simulation is statistically equivalent to that obtained from multiple simulations at steady state in gene expression models.), when a sample trajectory of the model is simulated for a sufficiently long period, it is assumed to have traversed the entire stationary state space (Kuntz J et al., 2021). Therefore, by performing truncated statistical analysis on the trajectory, the corresponding stationary distribution of the model can be obtained. We have added the explanation in methods 1.1 section.

      The authors propose that the waiting times between promoter states follow a non-exponential distribution, but the choice of gamma distribution and the implications for the method and the biological conclusions need to be discussed.

      We thank the reviewer for this comment. To account for the impact of DNA damage on the transcription process, our model assumes that both the "ON" and "OFF" states of the promoter consist of multiple underlying sub-states. When a promoter switches from the "ON" state to the "OFF" state, the transition is governed by multiple distinct waiting time distributions that follow exponential distributions. Similarly, when a promoter switches from the "OFF" state to the "ON" state, there may be multiple transitions from different "OFF" sub-states. Consequently, the waiting times for the transitions from the "OFF" state to the "ON" state, and vice versa, must account for multiple exponential waiting time distributions associated with each "ON" state transition. We can map a multiple exponential-waiting-times reaction process to a single-step reaction process with a non-exponential waiting time distribution. Therefore, we use a Gamma distribution for dwell time of promoter switching, which can be expressed as the convolution of multiple exponential distributions (corresponding to a sum of multiple exponential variables). Additionally, other non-exponential distributions, such as those discussed in our previous studies (Zhang J & Zhou T, 2019), may also be considered, and we recognize that alternative choices could be made depending on the specific characteristics of the system. We have added the explanation in methods 1.2 section.

      BF - burst frequency; BS - burst size. These terms represent the main data output, but they are only mathematically defined in the methods, and never the intuition of the specific expression explained (e.g., why not using tON/(tON+tOFF) as BF instead of 1/(tON+tOFF), and why not kSYN*tON as BS instead of kSYN*tON).

      We appreciate the reviewer’s comment and agree that clarifying the biological intuition behind the mathematical definitions of burst frequency (BF) and burst size (BS) is important. Below, we provide a more detailed explanation of these definitions.

      BF: The definition of burst frequency we adopt has been widely used in previous literature, such as Benjamin Zoller et al (Zoller B et al., 2018), Caroline Hoppe et al (Hoppe C et al., 2020) and Daniel Ramsköld (Ramsköld D et al., 2024). And its quantity represents the average time it takes for the promoter to complete one full stochastic cycle between its active and inactive states.

      BS: The definition of burst size BS = we adopt is consistent with the definition proposed by the reviewer. Burst size refers to the average number of mRNA transcripts produced during a single transcriptional activation event of a gene. It reflects the quantity of gene product synthesized per activation and is influenced by the rate of transcription and the duration of the active state of the gene. Our definition aligns with this biological interpretation and is mathematically formulated as BS = , where k<sub>syn</sub> is the transcription rate and is the average duration of the active state.

      In addition, the mean transcription level can be analytically represented as the product of burst size and burst frequency. This analytical result has been included in the methods 1.2 section of revised manuscript.

      One can assume from the methods that omegaON and omegaOFF are the vector of (2) parameters describing the distribution, but the reader would benefit from some clarity here. The authors claim that they proved that the distribution moments can be obtained through an iterative process. How much does this rely on the assumption of an underlying binomial distribution?

      Thank the reviewer for this helpful suggestion. To clarify, the vectors omegaON and omegaOFF represent the parameters characterizing the waiting time distributions of the promoter's active and inactive states, respectively. The exact form and interpretation of these vectors depend on the specific distributional choice for the waiting times. For instance, when the waiting time distribution follows a Gamma distribution with shape parameter α>0  and scale parameter β>0 , denoted as , then w<sub>on</sub> = (α,β) . Conversely, when the waiting time distribution follows a Weibull distribution, denoted as , with shape parameter k >0 and scale parameter l>0, then w<sub>on</sub> = (l,k) . We have clarified it in the Methods 1.2 section of the revised manuscript.

      For the question about the binomial distribution, in our work, we use the binomial moment method to compute distributional statistics of chemical master equation (Zhang J et al., 2016). Binomial moments of the mRNA stationary distribution P(m) are defined as , where the symbol represents the combinatorial number. This technique refers to a mathematical tool for moment calculation and is not based on the assumption that the underlying distribution is binomial distribution (Luo S et al., 2023). Hence, our approach is general and does not require the distribution itself to follow a binomial form.

      More details about the parameter sampling are required. For instance, why are the specific ranges chosen and their implications? And is the space explored in logarithmic scale?  

      Thank the reviewer for the insightful comment regarding parameter sampling. In our study, we considered five parameters: . The parameters k<sub>off</sub>  and k<sub>on</sub> represent the number of intermediate reaction steps involved in transcriptional state transitions. These values were sampled uniformly from the range 1 to 15, which aligns with biological evidence indicating that most genes undergo either direct (single-step) transitions or a small number of intermediate steps, typically fewer than ten (Tunnacliffe E & Chubb JR, 2020). This range is sufficient to capture both widely used singlestep models and more detailed multi-step mechanisms without introducing biologically implausible complexity. 

      Among these parameters, r<sub>off</sub> and r<sub>on</sub> denote the rate constants governing stochastic transitions between the OFF and ON transcriptional states, respectively. The mean duration of the OFF state, which corresponds to the time between transcriptional bursts, is given by = k<sub>off</sub> / r<sub>off</sub> , and falls within the range ∈(0.1,150).Experimental measurements report a median value of approximately 3.7 (Gupta A et al., 2022), which is well contained within this range. Similarly, the mean duration of the ON state, referred to as the burst duration, is defined by = k<sub>on</sub> / r<sub>on</sub> , and spans the interval ∈(0.1,1500). The experimentally observed median value of 0.12 (Gupta A et al., 2022) confirms that the parameter range adequately captures biologically realistic dynamics.

      The parameter k<sub>syn</sub>  represents the normalized synthesis rate after accounting for molecular degradation. Its range was chosen based on empirical observations of transcriptional burst sizes, which typically vary from single molecules to several dozen (Gupta A et al., 2022). Considering the relationship BS = k<sub>syn</sub> * , the selected range of k<sub>syn</sub> ensures that the experimentally observed burst sizes are well represented within the defined parameter space. We have added the explanation in methods 1.2 section and supplementary text 4.

      We fully recognize the advantages of logarithmic sampling, particularly when parameters span several orders of magnitude. Logarithmic scaling ensures balanced exploration across wide ranges and prevents sampling bias towards larger values. However, in our work, we applied Sobol sampling directly within the original (linear) parameter space. Although we did not explicitly transform parameters into logarithmic scale, Sobol sequences provide low-discrepancy, quasi-random coverage, which promotes uniform sampling across bounded domains (Sobol IM, 1967). Further, if necessary, we can increase the parameter range adaptively, and perform simulation algorithm to obtain sample and train a new model to solve a larger parameter range. 

      On page 15, the rationale for selecting the parameter space is unclear. This is crucial, as fully connected neural networks typically exhibit poor extrapolation beyond their training parameter space. If the parameter space of an experimental dataset significantly differs from the training range, the inference results may become unreliable. We suggest further clarification on how the alignment between the parameter spaces of the experimental data and the training dataset can be ensured to maintain inference accuracy.

      We appreciate the reviewer’s insightful comment regarding the extrapolation limitations of fully connected neural networks. To address this concern, we have implemented a truncation strategy during inference, which constrains the inferred parameters to remain within the bounds of the training parameter space. This ensures that the neural network operates within a regime where its predictive accuracy has been validated, thereby enhancing the robustness of our results. Additionally, we have carefully selected the training parameter space to be reasonable, based on the characteristics of the experimental data. These ranges have been validated through domain knowledge and data analysis, ensuring that even when the experimental data approaches the boundaries of the training range, the inference results remain reliable and accurate.

      On page 16, it is unclear why the authors chose to incorporate the Fano factor instead of using the coefficient of variation or variance. Clarifying the reasoning behind the selection of the Fano factor over these other statistical measures would provide better insight into its relevance for their analysis.  

      We thank the reviewer for raising this point. Although the loss term is described using the Fano factor, its formulation actually involves both the variance and the mean. Specifically, the loss we use is: . We chose to use the Fano factor because it is particularly well-suited for quantifying transcriptional noise in systems where the mean expression level varies across conditions or parameters. Unlike variance, the Fano factor normalizes variability by the mean, making it more robust for comparing noise levels across genes or regulatory regimes with different expression levels. Compared to the coefficient of variation (CV), which normalizes by the square of the mean, the Fano factor tends to be less sensitive to low expression regimes and is commonly used in stochastic gene expression studies, especially when the distribution is skewed or over dispersed (i.e., variance exceeds the mean). This makes it a more appropriate metric in our context, where transcriptional bursting often leads to over dispersed expression distributions. We have added an explanation in the methods 1.3 of revised manuscript to explain this choice.

      On page 17, the definition of "sample" is unclear. Does it refer to the number of parameters sets or to the simulated trajectories generated by stochastic simulation algorithms?

      Thank reviewers for your valuable feedback. The term "sample" in this context refers to the data points used in the neural network training set. To eliminate any ambiguity, we included a precise mathematical definition of "sample" (θ<sub>i</sub>,P<sub>simulation,i</sub> ) in the methods 1.3 section of revised manuscript.

      Additionally, it is unclear how the authors determined the number of simulated trajectories per parameter set to ensure training accuracy. Furthermore, it would be relevant to address whether including moments during neural network training is beneficial.

      We appreciate the reviewer’s insightful questions regarding the simulation and training process. To clarify, for each parameter set, we did not simulate multiple trajectories to obtain the corresponding distribution. Instead, we simulated the system for a sufficiently long period to ensure that the system reached a steady-state distribution. From this steady-state data, we then used interpolation methods to derive the corresponding distribution for each parameter set.

      On the other hand, the moments were calculated theoretically without any approximations, providing higher accuracy. By incorporating the moments into the training process, we can effectively mitigate potential biases arising from insufficient sampling of the simulated data. Moreover, our experiments on the synthetic dataset demonstrate that introducing the moments as a loss function significantly enhances the model's performance on the test set (Figure 2E).

      What is the intuition behind the choice of alpha_cg? On page 18, the rationale for setting the sampling probability to 0.5 is unclear. Could this parameter be inferred rather than being preset?  

      We thank the reviewer for the insightful comment regarding the choice of α<sub>cg</sub>. We acknowledge that the typical values of this parameter in related literature often fall within a narrower range (e.g., 0.06–0.32) (Zheng GX et al., 2017; Macosko EZ et al., 2015). However, our decision to set α<sub>cg</sub> was based on a trade-off between sampling efficiency and computational tractability in our specific application context. While it is indeed possible to infer α<sub>cg</sub> as a learnable parameter, we opted for a fixed value in this work to reduce model complexity and avoid unidentifiability issues. In addition, we conducted inference under different capture efficiencies (0.5, 0.3, and 0.2), and found that the inferred burst size (BS) and burst frequency (BF) remained strongly correlated across these conditions (Supplementary Figure S12). This indicates that variations in capture efficiency do not significantly impact the outcomes of downstream enrichment analyses. Nevertheless, we agree that adaptively learning α<sub>cg</sub> could be a promising direction, and we plan to explore this in future work. We have added the explanation in methods 1.4 section.

      On page 19, the authors employed gradient descent for parameter inference. However, as this method is sensitive to initial values, it is unclear how the starting points were selected.

      We sincerely thank the reviewer for highlighting the sensitivity of gradient-based optimization methods to initial values. To address this concern, we adopted a black-box optimization strategy in the form of the adaptive differential evolution (DE) algorithm (Das S & Suganthan PN, 2010) to derive robust initial parameters for the parameter inference. The adaptive DE algorithm enables global exploration across a broad parameter space, thereby reducing the risk of convergence to suboptimal local minima. This yielded reasonably good initial estimates, which were subsequently refined using gradient-based optimization to identify high-quality solutions characterized by a vanishing gradient norm. This hybrid strategy, which combines global and local search, is widely adopted in optimization literature to alleviate the risk of entrapment in local optima (Ahandani MA et al., 2014). We have clarified this detail in the third result of the revised manuscript.

      Furthermore, clarification on how the gradients were computed - whether through finite difference approximation or other methods - would offer additional insight into the robustness and accuracy of their approach.

      Thank reviewers for valuable feedback. Regarding the computation of gradients, we use the chain rule in neural networks, and the gradients are computed through backpropagation. Specifically, we rely on automatic differentiation to efficiently calculate the gradients. Unlike finite difference approximation, automatic differentiation directly computes the derivative of the loss function with respect to each parameter, ensuring accurate gradient calculations (Baydin AG et al., 2018). We have clarified this detail in the discussion section of the revised manuscript.

      The paper presents several comparisons between continuous and discrete distributions in Figure 2B and Supplementary Figures S4, S6, and S8, described as a "comparison between mRNA distribution and inferred distribution by DeepTX for scRNA-seq data" or a "comparison between SSA results and DeepTX prediction results." This may lead to confusion for the reader, as the paper focuses on transcriptional bursting, a process where we would typically expect the distributions to be discrete. Clarifying this point would help align the figures with the main topic and enhance the reader's understanding.

      We sincerely thank the reviewer for this insightful comment. We understand the concern that the distributions shown in Figure 2B and Supplementary Figures S4, S6, and S8 may appear to be continuous, which could be confusing given that transcriptional bursting naturally results in discrete mRNA count distributions.

      We have clarified that in all these figures, both the empirical mRNA distributions derived from scRNAseq data and the model-predicted distributions from DeepTX are inherently discrete. To visualize the empirical distributions, we used histograms where the x-axis corresponds to discrete mRNA copy numbers and the y-axis represents the normalized frequency (density). To illustrate the DeepTX-inferred probability mass function, we plotted the predicted probabilities at each integer count as points and connected them with lines for clarity. While the connecting lines give the appearance of continuity, this is a standard graphical convention used to better show trends and model fit in discrete distributions.

      We suggest that Figure 3E could present the error as a percentage of the parameter value, as this would provide a more equitable comparison and better illustrate the relative accuracy of the parameter estimation.

      Thank reviewers for suggestion regarding Figure 3E. We agree that presenting the error as a percentage of the parameter value would offer a more equitable basis for comparison and better highlight the relative accuracy of our parameter estimation. Accordingly, we have revised Figure 3E to include the relative percentage error for each parameter.

      Figure 4A could be improved for better legibility. The contour plots are somewhat confusing, and the light blue points are difficult to distinguish. Additionally, the x-axis label "Untreatment" appears throughout the manuscript-could this term be referring to the control experiment?

      Thank reviewers for constructive feedback. We have revised Figure 4A to improve its clarity and legibility. Specifically, we adjusted the display style of the contour plots and enhanced the visibility of the light green points to make them more distinguishable.

      Additionally, we recognize the potential confusion caused by the term "Untreatment" and have replaced it with "Control" throughout the revised manuscript to ensure consistency and accuracy in terminology.

      Figure 4B was unclear, and further explanation would be helpful for understanding its purpose.

      Thank reviewers for feedback. The purpose of Figure 4B is to illustrate the relationship between bursting kinetics and the mean and variance of the model. In the revised manuscript, we will provide a more detailed explanation of how the figure captures these relationships, highlighting the key insights it offers into the underlying dynamics.

      Figure 4B illustrates the quantitative relationships among BS, BF, and gene expression noise within the framework of the transcriptional model. In this log-log-log 3D space, the mean expression level is constrained on a blue plane defined by the equation log(BS)+log(BF) = log(Mean), highlighting that the product of burst size and burst frequency determines the mean expression level. The orange plane represents a scaling relationship between expression noise and burst kinetics, expressed as log(BS)+log(BF) = klog(Noise), where k is a constant indicating how the burst kinetics co-vary with noise. Notably, the trajectory of the green sphere demonstrates that, under a fixed mean expression level (i.e., remaining on the blue plane), an increase in gene expression noise arises primarily from an increase in burst size. We have revised the caption of Figure 4B.

      In Figure 4D, two of the GO analysis terms are highlighted in red, but the meaning behind this emphasis is not clear. The same question applies to Figure 5E, where the green dots are missing from the plot.

      Clarification on these points would enhance the overall clarity.  

      We appreciate the reviewer’s thoughtful comments. We have added further clarification regarding the enrichment analysis results presented in Figure 4D. Specifically, we highlighted the "cell cycle G2/M phase transition" pathway because a delay in the G2/M phase transition has been shown to increase the probability of cell differentiation, which is a key aspect of our study. In addition, since IdU treatment is known to induce DNA damage, we emphasized the DNA damage-related pathway to support the biological relevance and consistency of our enrichment results. Similarly, in Figure 5E, we highlighted the apoptosis-related pathway. Apoptosis in this context is closely associated with cellular responses to toxic substances and mitochondrial dynamics. The enrichment of pathways related to these processes enables us to hypothesize the underlying mechanisms driving apoptosis in our system. Further, the absence of green dots in Figure 5E was due to an error in the figure caption. We have revised the figure caption accordingly to accurately describe all elements presented in the figure.

      Clarify axis labels in figures, particularly the y-axis in Figure 5A and the x-axis in Figure 6G. In the first case, it isn't clear what this "value" represents. In the second case, the x-label is very confusing. As I understand the figure description, in these plots you are always comparing the G0 arrested genes between control and treated cells. But the x-label says "G0 (0 D)", "Cycle (50 D)".

      Thank reviewers for pointing out the issues with the axis labels. We have made the necessary revisions to eliminate any confusion. In Figure 5A, the label for the y-axis has been changed from "value" to "log2 (value)" for clarity. The “value” in y-axis represents the value of statistical measure indicated at top of each panel. In Figure 6G, the x-axis label "Cycle (50 D)" has been updated to "G0 (50 D)" to accurately reflect the comparison between the G0-arrested genes in control and treated cells. We have revised the text of Figure 5A and Figure 6G.

      Figure 6 uses a QS metric (quality score), but the definition of this metric is not provided. Including a brief explanation of its meaning would be helpful for clarity.  

      Thank reviewers for feedback. In this version, we provided explanation of the QS (Quality Score) metric in the supplementary text 3 for better clarity. The QS is calculated based on the difference in z-scores derived from GSVA (Gene set variation analysis) of gene sets upregulated and downregulated during the quiescent phase, and is defined as QS = z(up genes)− z(down genes) , as described in the literature (Wiecek AJ et al., 2023). z(up genes) represents the standardized enrichment score of the gene set upregulated during quiescence in each sample. A higher value indicates that the quiescenceassociated upregulated genes are actively expressed, suggesting that the sample is more likely to be in a quiescent (G0) state. z(down genes)  corresponds to the standardized enrichment score of genes downregulated during quiescence. A lower value implies effective suppression of these genes, which is also consistent with quiescence. The difference score QS serves as an integrated indicator of the quiescent state: A higher value reflects simultaneous activation of quiescence-associated upregulated genes and repression of downregulated genes, indicating a gene expression profile that strongly aligns with the G0/quiescent state. A lower or negative value suggests a deviation from the quiescent signature, potentially reflecting a proliferative state or failure to enter quiescence. 

      In Figure 6G, light grey lines are shown, but their significance is unclear. It would be useful to specify what these lines represent.

      Thank reviewers for observation. In Figure 6G, each point represents a single gene, and the light grey lines indicate the trend of changes in the corresponding bursting kinetics values, mean and variance for genes. We have added the explanation in the caption of Figure 6G.

      Additionally, the manuscript should include references to the specific pathways used in the GO analysis to provide more context for the reader.

      Thank reviewers for the suggestion. We have included references to the specific pathways used in the GO analysis in the revised manuscript to provide additional context for the readers.

      In the discussion, sentences like "IdU drug treatment-induced BS enhancement delays the cell mitosis phase transition, impacting cell reprogramming and differentiation" are problematic as they imply causality, which I believe cannot be determined through the present analysis. The strength of the conclusions needs to be better argued (or toned down).

      We acknowledge that the original sentence lacked precision and may have conveyed a misleading implication of causality not fully supported by our current analysis. In the discussion section of revised manuscript, we have rephrased the statement to present a more nuanced interpretation: IdU drug treatment-induced BS enhancement of genes may be associated with a delayed transition in the cell mitosis phase, which could potentially influence cell reprogramming and differentiation.  

      Other (minor) comments:

      On pp. 10, "the BS down-regulates differential genes were mainly enriched..." appears to have a grammatical error/typo, "down-regulated"?

      We have made correction. We have revised “down-regulates” to “down-regulated” for grammatical consistency.

      Equation 2 doesn't match Figure 1A.

      We have made correction. The definition of BF = in Equation 2 is correct. We have revised the definition of BF in Figure 1A to ensure consistency with Equation 2.

      Reference

      Zheng, G.X., Terry, J.M., Belgrader, P., Ryvkin, P., Bent, Z.W., Wilson, R., Ziraldo, S.B., Wheeler, T.D., McDermott, G.P., Zhu, J., Gregory, M.T., Shuga, J., Montesclaros, L., Underwood, J.G., Masquelier, D.A., Nishimura, S.Y., Schnall-Levin, M., Wyatt, P.W., Hindson, C.M., Bharadwaj, R., Wong, A., Ness, K.D., Beppu, L.W., Deeg, H.J., McFarland, C., Loeb, K.R., Valente, W.J., Ericson, N.G., Stevens, E.A., Radich, J.P., Mikkelsen, T.S., Hindson, B.J., Bielas, J.H. 2017. Massively parallel digital transcriptional profiling of single cells. Nature Communications 8: 14049. DOI: https://dx.doi.org/10.1038/ncomms14049, PMID: 28091601

      Hagemann-Jensen, M., Ziegenhain, C., Chen, P., Ramsköld, D., Hendriks, G.J., Larsson, A.J.M., Faridani, O.R., Sandberg, R. 2020. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nature Biotechnology 38: 708714. DOI: https://dx.doi.org/10.1038/s41587-020-0497-0, PMID: 32518404

      Larsson, A.J.M., Johnsson, P., Hagemann-Jensen, M., Hartmanis, L., Faridani, O.R., Reinius, B., Segerstolpe, A., Rivera, C.M., Ren, B., Sandberg, R. 2019. Genomic encoding of transcriptional burst kinetics. Nature 565: 251-254. DOI: https://dx.doi.org/10.1038/s41586-018-0836-1, PMID: 30602787

      Ochiai, H., Hayashi, T., Umeda, M., Yoshimura, M., Harada, A., Shimizu, Y., Nakano, K., Saitoh, N., Liu, Z., Yamamoto, T., Okamura, T., Ohkawa, Y., Kimura, H., Nikaido, I. 2020. Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cells. Science Advances 6: eaaz6699. DOI: https://dx.doi.org/10.1126/sciadv.aaz6699, PMID: 32596448

      Luo, S., Wang, Z., Zhang, Z., Zhou, T., Zhang, J. 2023. Genome-wide inference reveals that feedback regulations constrain promoter-dependent transcriptional burst kinetics. Nucleic Acids Research 51: 68-83. DOI: https://dx.doi.org/10.1093/nar/gkac1204, PMID: 36583343

      Rodriguez, J., Ren, G., Day, C.R., Zhao, K., Chow, C.C., Larson, D.R. 2019. Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity. Cell 176: 213-226.e218. DOI: https://dx.doi.org/10.1016/j.cell.2018.11.026, PMID: 30554876

      Luo, S., Zhang, Z., Wang, Z., Yang, X., Chen, X., Zhou, T., Zhang, J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Royal Society Open Science 10: 221057. DOI: https://dx.doi.org/10.1098/rsos.221057, PMID: 37035293

      Eling, N., Morgan, M.D., Marioni, J.C. 2019. Challenges in measuring and understanding biological noise. Nature Reviews Genetics 20: 536-548. DOI: https://dx.doi.org/10.1038/s41576-019-0130-6, PMID: 31114032

      Tunnacliffe, E., Chubb, J.R. 2020. What is a transcriptional burst? Trends in Genetics 36: 288-297. DOI: https://dx.doi.org/10.1016/j.tig.2020.01.003, PMID: 32035656

      Rodriguez, J., Larson, D.R. 2020. Transcription in living Cells: molecular mechanisms of bursting. Annual Review of Biochemistry 89: 189-212. DOI: https://dx.doi.org/10.1146/annurev-biochem-011520-105250, PMID: 32208766

      Morgan, M.D., Marioni, J.C. 2018. CpG island composition differences are a source of gene expression noise indicative of promoter responsiveness. Genome Biology 19: 81. DOI: https://dx.doi.org/10.1186/s13059-018-1461-x, PMID: 29945659

      Raj, A., van Oudenaarden, A. 2008. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135: 216-226. DOI: https://dx.doi.org/10.1016/j.cell.2008.09.050, PMID: 18957198

      Trzaskoma, P., Jung, S., Pękowska, A., Bohrer, C.H., Wang, X., Naz, F., Dell’Orso, S., Dubois, W.D., Olivera, A., Vartak, S.V. 2024. 3D chromatin architecture, BRD4, and Mediator have distinct roles in regulating genome-wide transcriptional bursting and gene network. Science Advances 10: eadl4893. DOI: https://dx.doi.org/https://www.science.org/doi/10.1126/sciadv.adl4893, PMID: 

      Browning, A.P., Warne, D.J., Burrage, K., Baker, R.E., Simpson, M.J. 2020. Identifiability analysis for stochastic differential equation models in systems biology. Journal of the Royal Society Interface 17: 20200652. DOI: https://dx.doi.org/10.1098/rsif.2020.0652, PMID: 33323054

      Zoller, B., Little, S.C., Gregor, T. 2018. Diverse spatial expression patterns emerge from unified kinetics of transcriptional bursting. Cell 175: 835-847.e825. DOI: https://dx.doi.org/10.1016/j.cell.2018.09.056, PMID: 30340044

      Hoppe, C., Bowles, J.R., Minchington, T.G., Sutcliffe, C., Upadhyai, P., Rattray, M., Ashe, H.L. 2020. Modulation of the promoter activation rate dictates the transcriptional response to graded BMP signaling levels in the drosophila embryo. Dev Cell 54: 727-741.e727. DOI: https://dx.doi.org/10.1016/j.devcel.2020.07.007, PMID: 32758422

      Ramsköld, D., Hendriks, G.J., Larsson, A.J.M., Mayr, J.V., Ziegenhain, C., Hagemann-Jensen, M., Hartmanis, L., Sandberg, R. 2024. Single-cell new RNA sequencing reveals principles of transcription at the resolution of individual bursts. Nature Cell Biology 26: 1725-1733. DOI: https://dx.doi.org/10.1038/s41556-024-01486-9, PMID: 39198695 Van Kampen, N.G. 1992. Stochastic Processes in Physics and Chemistry. Elsevier.

      Gupta, A., Martin-Rufino, J.D., Jones, T.R., Subramanian, V., Qiu, X., Grody, E.I., Bloemendal, A., Weng, C., Niu, S.Y., Min, K.H., Mehta, A., Zhang, K., Siraj, L., Al' Khafaji, A., Sankaran, V.G., Raychaudhuri, S., Cleary, B., Grossman, S., Lander, E.S. 2022. Inferring gene regulation from stochastic transcriptional variation across single cells at steady state. Proceedings of the National Academy of Sciences 119: e2207392119. DOI: https://dx.doi.org/10.1073/pnas.2207392119, PMID: 35969771

      Gardiner, C.W., Chaturvedi, S. 1977. The Poisson representation. I. A new technique for chemical master equations. Journal of Statistical Physics 17: 429-468. DOI: https://dx.doi.org/https://doi.org/10.1007/BF01014349, PMID: 

      Gorin, G., Carilli, M., Chari, T., Pachter, L. 2024. Spectral neural approximations for models of transcriptional dynamics. Biophysical Journal 123: 2892-2901. DOI: https://dx.doi.org/10.1016/j.bpj.2024.04.034, PMID: 38715358

      Kuntz, J., Thomas, P., Stan, G.-B., Barahona, M. 2021. Stationary distributions of continuous-time Markov chains: a review of theory and truncation-based approximations. SIAM Review 63: 3-64. DOI: 

      Zhang, J., Zhou, T. 2019. Computation of stationary distributions in stochastic models of cellular processes with molecular memory. bioRxiv: 521575. DOI: https://dx.doi.org/https://doi.org/10.1101/521575, PMID: 

      Zhang, J., Nie, Q., Zhou, T. 2016. A moment-convergence method for stochastic analysis of biochemical reaction networks. The Journal of chemical physics 144. DOI: 

      Sobol, I.M. 1967. On the distribution of points in a cube and the approximate evaluation of integrals. USSR Comput. Math. Math. Phys. 7: 784-802. DOI: https://dx.doi.org/10.1016/0041-5553(67)90144-9, PMID: 

      Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., Trombetta, J.J., Weitz, D.A., Sanes, J.R., Shalek, A.K., Regev, A., McCarroll, S.A. 2015. Highly parallel genome-wide expression profiling of individual cells using nanoliter dsroplets. Cell 161: 1202-1214. DOI: https://dx.doi.org/10.1016/j.cell.2015.05.002, PMID: 26000488

      Das, S., Suganthan, P.N. 2010. Differential evolution: A survey of the state-of-the-art. IEEE transactions on evolutionary computation 15: 4-31. DOI: https://dx.doi.org/10.1109/TEVC.2010.2059031, PMID: 

      Ahandani, M.A., Vakil-Baghmisheh, M.-T., Talebi, M. 2014. Hybridizing local search algorithms for global optimization. Computational Optimization and Applications 59: 725-748. DOI: https://dx.doi.org/https://doi.org/10.1007/s10589014-9652-1, PMID: 

      Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M. 2018. Automatic differentiation in machine learning: a survey. Journal of machine learning research 18: 1-43. DOI: https://dx.doi.org/https://dl.acm.org/doi/abs/10.5555/3122009.3242010, PMID: 

      Wiecek, A.J., Cutty, S.J., Kornai, D., Parreno-Centeno, M., Gourmet, L.E., Tagliazucchi, G.M., Jacobson, D.H., Zhang, P., Xiong, L., Bond, G.L., Barr, A.R., Secrier, M. 2023. Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancer. Genome Biology 24: 128. DOI: https://dx.doi.org/10.1186/s13059-023-02963-4, PMID: 37221612

    1. eLife Assessment

      This study reveals that female moths use ultrasonic sounds emitted by dehydrated plants to guide their oviposition decisions. It highlights sound as an additional sensory modality in host searching, adding an important piece to the puzzle of how insects and plants interact. Through convincing experimental approaches, the authors provide insights that advance our understanding of plant-insect interactions.

    2. Reviewer #2 (Public review):

      This paper presents interesting and fresh approach as it investigates whether female moths utilize plant-emitted ultrasounds, particularly those associated with dehydration stress, in their egg-laying decision-making process. It provides the first empirical evidence suggesting that acoustic information may contribute to insect-plant interactions.

      The revised version is significantly strengthened by the addition of supplementary data and improved explanations. The authors present robust results across multiple experiments, enhancing the credibility of their conclusions.

      Female moths showed a preference for moist, fresh plants over dehydrated ones in experiments using actual plants. Additionally, when both plants were fresh but ultrasonic sounds specific to dehydrated plants were presented from one side, the moths chose the silent plant. However, in experiments without plants, contrary to the hypothesis derived from the above results, the moths preferred to oviposit near ultrasonic playback mimicking the sounds of dehydrated plants. 

      These results clearly indicate that moths can perceive plant presence through sound. The findings also highlight the need for future investigation into the multi-modal nature of moth decision-making, as acoustic cues alone may not fully explain the behavioral choices observed across different contexts.

      Overall, the results are intriguing, and I think the experiments are very well designed. The authors successfully demonstrate that plant-derived acoustic signals influence oviposition behavior in female moths, thereby achieving the study's objectives. The experimental design and analysis protocols are reproducible and well suited for adaptation to other species.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors demonstrate that female Spodoptera littoralis moths prefer to oviposit on wellwatered tomato plants and avoid drought-stressed plants. The study then recorded the sounds produced by drought-stressed plants and found that they produce 30 ultrasonic clicks per minute. Thereafter, the authors tested the response of female S. littoralis moths to clicks with a frequency of 60 clicks per minute in an arena with and without plants and in an arena setting with two healthy plants of which one was associated with 60 clicks per minute. These experiments revealed that in the absence of a plant, the moths preferred to lay eggs on the side of the area in which the clicks could be heard, while in the presence of a plant the S. littoralis females preferred to oviposit on the plant where the clicks were not audible. In addition, the authors also tested the response of S. littoralis females in which the tympanic membrane had been pierced making the moths unable to detect the click sounds. As hypothesised, these females placed their eggs equally on both sites of the area.

      Finally, the authors explored whether the female oviposition choice might be influenced by the courtship calls of S. littoralis males which emit clicks in a range similar to a drought-stressed tomato plant. However, no effect was found of the clicks from ten males on the oviposition behaviour of the female moths, indicating that the females can distinguish between the two types of clicks. Besides these different experiments, the authors also investigated the distribution of egg clusters within a longer arena without a plant, but with a sugar-water feeder. Here it was found that the egg clusters were mostly aggregated around the feeder and the speaker producing 60 clicks per minute. Lastly, video tracking was used to observe the behaviour of the area without a plant, which demonstrated

      that the moths gradually spent more time at the arena side with the click sounds.

      We thank the reviewers for their helpful comments. We agree with the summary, but would like to note that in the control experiment (Figure 2) we used a click rate of 30 clicks per minute—a design choice driven by the editor’s feedback. We have clarified this and, to further probe the system’s dynamics, added a second experiment employing the same click rate (30 clicks per minute) with a dehydrated plant (see details below). In both experiments, females again showed a clear tendency to oviposit nearer the speaker; these findings are described in the updated manuscript.

      (2) The study addresses a very interesting question by asking whether female moths incorporate plant acoustic signals into their oviposition choice, unfortunately, I find it very difficult to judge how big the influence of the sound on the female choice really is as the manuscript does not provide any graphs showing the real numbers of eggs laid on the different plants, but instead only provides graphs with the Bayesian model fittings for each of the experiments. In addition, the numbers given in the text seem to be relatively similar with large variations e.g. Figure 1B3: 1.8 {plus minus} 1.6 vs. 1.1 {plus minus} 1.0. Furthermore, the authors do not provide access to any of the raw data or scripts of this study, which also makes it difficult to assess the potential impact of this study. Hence, I would very much like to encourage the authors to provide figures showing the measured values as boxplots including the individual data points, especially in Figure 1, and to provide access to all the raw data underlying the figures.

      We acknowledge that there are researchers who favor Bayesian graphical representation versus raw data visualization. Therefore, we have added chartplots of the raw data from Figure 1 in the supplementary section. We are aware of the duplication in presentation and apologize for this redundancy.  

      Regarding the variance and means we obtained in our experiment, we have analyzed all raw data using the statistical model presented, and if statistical significance was found despite a particular mean difference or variance, this is meaningful from a biological perspective. One can certainly discuss whether this difference has biological importance, but it should be remembered that in this experimental system, we are trying to isolate the acoustic signal from a complex system that includes multiple signals. Therefore, at no point we’ve suggested that this is a standalone factor, but rather proposed it as an informative and significant component. 

      In addition to the experiments described above, we conducted an experiment in which we counted both eggs and clusters. The results indicate that cluster counts are a reliable proxy for reproductive investment at a given location. In this experiment, we present cluster numbers alongside egg counts (Figure 2).

      Furthermore, we apologize for the technical error that prevented our uploaded data files from reaching the reviewers. We have also uploaded updated data and code.

      (3) Regarding the analysis of the results, I am also not entirely convinced that each night can be taken as an independent egg-laying event, as the amount of eggs and the place were the eggs are laid by a female moth surely depends on the previous oviposition events. While I must admit that I am not a statistician, I would suggest, from a biological point of view, that each group of moths should be treated as a replicate and not each night. I would therefore also suggest to rather analyse the sum of eggs laid over the different consecutive nights than taking the eggs laid in each night as an independent data point.

      We thank the reviewer for this question. This is a valid and point that we will address in three aspects: 

      First, regarding our statistical approach, we used a model that takes into account the sequence of nights and examines whether there is an effect of the order of nights, i.e., we used GLMMs, with the night nested within the repetition. This is equivalent to addressing this as a repeated measure and is, to our best knowledge, the common way to treat such data. 

      Second, following the reviewer's comment, we also reran the statistics of the third experiment (i.e., “sound gradient experiments”, Figure 2 and Supplementary figure 4) when only taking the first night when the female/s laid eggs to avoid the concern of dependency. This analysis revealed the same result – i.e., a significant preference for the sound stimulus. We have now updated our methods and results section to clarify this point.  

      Third, an important detail that may not have been clearly specified in the methods: at the end of each night, we cleaned the arena of counted egg clusters using a cloth with ethanol, so that on the subsequent night, we would not expect there to be evidence of previous oviposition but thus would not exclude some sort of physiological or cognitive memories. We have now updated our methods section to clarify this important procedural point. 

      (4) Furthermore, it did not become entirely clear to me why a click frequency of 60 clicks per minute was used for most experiments, while the plants only produce clicks at a range of 30 clicks per minute. Independent of the ecological relevance of these sound signals, it would be nice if the authors could provide a reason for using this frequency range. Besides this, I was also wondering about the argument that groups of plants might still produce clicks in the range of 60 clicks per minute and that the authors' tests might therefore still be reasonable. I would agree with this, but only in the case that a group of plants with these sounds would be tested. Offering the choice between two single plants while providing the sound from a group of plants is in my view not the most ecologically reasonable choice. It would be great if the authors could modify the argument in the discussion section accordingly and further explore the relevance of different frequencies and dBlevels.

      This is an excellent point. We originally increased the click rate generate a strong signal. However, it was important for us to verify that there was ecological relevance in the stimulus we implemented in the system. For this purpose, we recorded a group of dehydrated plants at a distance of ~20cm and we measured a click rate of 20 clicks per minute (i.e., 0.33 Hz) (see Methods section). Therefore, as mentioned at the beginning of this letter, in the additional experiment described in Figure 2, we reduced the click frequency to 30 clicks per minute, and at this lower rate, the effect was maintained. Increasing plant density would probably lead to a higher rate of 30 clicks per minute. 

      (5) Finally, I was wondering how transferable the findings are towards insects and Lepidopterans in general. Not all insects possess a tympanic organ and might therefore not be able to detect the plant clicks that were recorded. Moreover, I would imagine that generalist herbivorous like Spodoptera might be more inclined to use these clicks than specialists, which very much rely on certain chemical cues to find their host plants. It would be great if the authors would point more to the fact that your study only investigated a single moth species and that the results might therefore only hold true for S. littoralis and closely related species, but not necessary for other moth species such as Sphingidae or even butterflies.

      Good point. Our research uses a specific model system of one moth species and one plant species in a particular plant-insect interaction where females select host plants for their offspring. As with any model-based research that attempts to draw broader conclusions, we've taken care to distinguish between our direct findings and potential wider implications. We believe our system may represent mechanisms relevant to a wider group of herbivorous insects with hearing capabilities, particularly considering that several moth families and other insect orders can detect ultrasound. However, additional research examining more moth and plant species is necessary to determine how broadly applicable these findings are. We have made these clarifications in the text.

      Reviewer #2 (Public review):

      (6) The results are intriguing, and I think the experiments are very well designed. However, if female moths use the sounds emitted by dehydrated plants as cues to decide where to oviposit, the hypothesis would predict that they would avoid such sounds. The discussion mentions the possibility of a multi-modal moth decision-making process to explain these contradictory results, and I also believe this is a strong possibility. However, since this remains speculative, careful consideration is needed regarding how to interpret the findings based solely on the direct results presented in the results section.  

      Thank you for this insightful observation. We agree that the apparent attraction of females to dehydrated-plant sounds contradicts our initial prediction. Having observed this pattern consistently across multiple setups, we have now added a targeted choice experiment to the revised manuscript: here female moths were offered a choice between dehydrated plants broadcasting their natural ultrasonic emissions and a control. These results—detailed in the Discussion and presented in full in the Supplementary Materials (Supplementary Figure 4)—show that when only a dehydrated plant is available, moths would prefer it for oviposition, supporting our hypothesis that in the absence of a real plant, the plant’s sounds might represent a plant..

      (7) Additionally, the final results describing differences in olfactory responses to drying and hydrated plants are included, but the corresponding figures are placed in the supplementary materials. Given this, I would suggest reconsidering how to best present the hypotheses and clarify the overarching message of the results. This might involve reordering the results or re-evaluating which data should appear in the main text versus the supplementary materials

      Thank you for this suggestion. We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues. We agree that a detailed investigation of multimodal interactions deserves a separate study, which we plan to pursue in future work. 

      (8) There were also areas where more detailed explanations of the experimental methods would be beneficial.

      Thank you for highlighting this point. We have expanded and clarified the Methods section to provide comprehensive detail on our experimental procedures.

      Reviewer #1 (Recommendations for the authors):

      (9) Line 1: Please include the name of the species you tested also in the title as your results might not hold true for all moth species.

      We do not fully agree with this comment. Please see comment 5.

      (10) Line 19-20: Please rephrase the sentence so that it becomes clear that the "dehydration stress" refers to the plant and not to the moths.

      Thank you for the suggestion; we have clarified the text accordingly

      (11) Line 31: Male moths might provide many different signals to the females, maybe better "male sound signals" or similar.

      Thank you for the suggestion; we have clarified the text accordingly.

      (12) Line 52-53: Maybe mention here that not all moth species have evolved these abilities.

      Thank you for the suggestion; we have clarified the text accordingly.

      (13) Line 77: add a space after 38.

      Thank you for the suggestion; we have clarified the text accordingly.

      (14) Line 88: Maybe change "secondary predators" to "natural enemies".

      Thank you for the suggestion; we have clarified the text accordingly.

      (15) Line 134: Why is "notably" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (16) Line 140-144: If you did perform the experiment also with the more ecological relevant playback rate, why not present these findings as your main results and use the data with the higher playback frequency as additional support?

      Thank you for this suggestion. We agree that the ecologically relevant playback data are important; as described in detail at the beginning of this letter and also in comment 4, however, to preserve a clear and cohesive narrative, we have maintained the original ordering of this section. Nevertheless, the various experiments conducted in Figure 1 differ in several components from Figure 2 and the work that examined sounds in plant groups in the appendices. Therefore, we find it more appropriate to use them as supporting evidence for the main findings rather than creating a comparison between different experimental systems. For this reason, we chose to keep them as a separate description in "The ecological playback findings (Lines 140–144) remain fully described in the Results and serve to reinforce the main observations without interrupting the manuscript's flow.

      (17) Line 146: Please explain already here how you deafened the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (18) Line 181: should it be "male moths' " ?

      Thank you for the suggestion; we have clarified the text accordingly.

      (19) Line 215: Why is "without a plant" in italics? I would suggest using normal spelling/formatting rules here.

      Thank you for the suggestion; we have clarified the text accordingly.

      (20) Line 234: I do not understand why this type of statistic was used to analyse the electroantennogram (EAG) results. Would a rather simple Student's t-test or a Wilcon rank sum test not have been sufficient? I would also like to caution you not to overinterpret the data derived from the EAG, as you combined the entire headspace into one mixture it is no longer possible to derive information on the different volatiles in the blends. The differences you observe might therefore mostly be due to the amount of emitted volatiles.

      We have reorganized the manuscript and removed the olfactory response data from the current version to maintain a focused narrative on acoustic cues (See comment 7). 

      (21) Line 268: It might be nice to add an additional reference here referring to the multimodal oviposition behaviour of the moths.

      Thank you for the suggestion; we have clarified the text accordingly.

      (22) Line 284: If possible, please add another reference here referring to the different cues used by moths during oviposition.

      Thank you for the suggestion; we have clarified the text accordingly.

      (23) Line 336: What do you mean by "closed together"?

      Thank you for the suggestion; we have clarified the text accordingly.

      (24) Line 434-436: Please see my overall comments. I do not think that you can call it ecologically relevant if the signal emitted by multiple plants is played in the context of just a single plant.

      Please see comments 1 and 4.

      (25) Line 496: Please change "stats" to statistics.

      Thank you for the suggestion; we have clarified the text accordingly.

      (26) Line 522-524: I am not sure whether simply listing their names does give full credit to the work these people did for your study. Maybe also explain how they contributed to your work.

      Thank you for the suggestion; we have clarified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      (27) L54 20-60kHz --> 20Hz-60kHz or 20kHz - 60kHz?

      OK. We have replaced it.

      (28) L124 Are the results for the condition where nothing was placed and the condition where a decoy silent resistor was placed combined in the analysis? If so, were there no significant differences between the two conditions? Comparing these with a condition presenting band-limited noise in the same frequency range as the drought-stressed sounds might also have been an effective approach to further isolate the specific role of the ultrasonic emissions.

      We have used both conditions due to technical constrains and pooled them tougher for analysis— statistical tests confirmed no significant differences between them—and this clarification has now been added to the Methods section including the results of the statistical test.

      (29) L125 (Fig. 1A), see Exp. 1 in the Methods). -> (Fig.1B. See Exp.1 in the Methods).

      Thank you for the suggestion; we have clarified the text accordingly.

      (30) L132 "The opposite choice to what was seen in the initial experiment (Fig.1B)"

      Thank you for the suggestion; we have clarified the text accordingly.

      (31) L137-143 If you are writing about results, why not describe them with figures and statistics? The current description reads like a discussion.

      These findings were not among our primary research questions; however, we believe that including them in the Results section underscores the experimental differences. In our opinion, introducing an additional figure or expanding the statistical analysis at this point would disrupt the narrative flow and risk confusing the reader.

      (32) L141 "This is higher than the rate reported for a single young plant" Are you referring to the tomato plants used in the experiments? It might be helpful to include in the main text the natural click rate emitted by tomato plants, as this information is currently only mentioned in the Methods section.

      See comment 4.  

      (33) L191 Is the main point here to convey that the plant playback effect remained significant even when the sound presentation frequency was reduced to 30 clicks per minute? The inclusion of the feeder element, however, seems to complicate the message. To simplify the results, moving the content from lines 185-202 to the supplementary materials might be a better approach. Additionally, what is the rationale for placing the sugar solution in the arena? Is it to maintain the moths' vitality during the experiment? Clarifying this in the methods section would help provide context for this experimental detail.

      In this series of experiments, we manipulated four variables—single moths, ultrasonic click rate, arena configuration (from a two-choice design to an elongated enclosure), and the response metric (total egg counts rather than cluster counts)—to evaluate moth oviposition under more ecologically realistic conditions. We demonstrate the system’s robustness and validity in a more realistic setting (by tracking individual moths, counting single eggs, etc.).  

      As noted in the text, feeders were included to preserve the moths’ natural behavior and vitality. We have further clarified this in the revised manuscript.

      (34) L215 Is the click presentation frequency 30 or 60 per minute? Since Figure 3 illustrates examples of moth movement from the experiment described in Figure 1, it might be more effective to present Figure 3 when discussing the results of Figure 1 or to include it in the supplementary materials for better clarity and organization.

      See comments 1 and 4. As mentioned in the above 

      (35) L291 Please provide a detailed explanation of the experiments and measurements for the results shown in Figure S3 (and Figure S2). If the multi-modal hypothesis discussed in the study is a key focus, it might be better to include these results in the main results section rather than in the supplementary materials.

      Thank you for this suggestion. Figure S2 was removed, see comments above. We’ve added now the context to figure S3.

      (36) L303 It might be helpful to include information about the relationship between the moth species used in this study and tomato plants somewhere in the text. This would provide an important context for understanding the ecological relevance of the experiments.

      Thank you for the suggestion; we have clarified the text accordingly.

      (37) Table 1 The significant figures in the numbers presented in the tables should be consistent.

      Thank you for the suggestion; we have clarified the text accordingly.

      (38) L341 The text mentions that experiments were conducted in a greenhouse, but does this mean the arena was placed inside the greenhouse? Also, the term "arena" is used - does this refer to a sealed rectangular case or something similar? For the sound presentation experiments, it seems that the arena cage was placed inside a soundproof room. If the arena is indeed a case-like structure, were there any specific measures taken to prevent sound scattering within the case, such as the choice of materials or structural modifications?

      Here, “arena” refers to the plastic boxes used throughout this study. In this particular experiment, we presented plants alone—reflecting ongoing debate in the literature—and used these trials as a baseline for our subsequent sound-presentation experiments, during which we measured sound intensity as described in the Methods section. All sound-playback experiments were conducted in sound-proof rooms, and acoustic levels were measured beforehand—sound on the control side fell below our system’s detection threshold. 

      (39) L373 "resister similar to the speaker" Could you explain it in more detail? I think this would depend on the type of speaker used-particularly whether it includes magnets. From an experimental perspective, presenting different sounds such as white noise from the speaker might have been a better control. Was there a specific reason for not doing so? Additionally, the study does not clearly demonstrate whether the electric and magnetic field environments on both sides of the arena were appropriately controlled. Without this information, it is difficult to evaluate whether using a resistor as a substitute was adequate.

      Thank you for this comment. We have now addressed this point in the Discussion. We acknowledge that we did not account for the magnetic field, which might have differed between the speaker and the resistor. We agree that using an alternative control, such as white noise, could have been informative, and we now mention this as a limitation in the revised Methods.

      (40) L435 60Hz? The representation of frequencies in the text is inconsistent, with some values expressed in Hz and others as "clicks per second." It would be better to standardize these units for clarity, such as using Hz throughout the manuscript.

      We agree that this is confusing. We reviewed the text and made sure that when we addressed click per second, we meant how many clicks were produced and when we addressed Hz units it was in the context of sound frequencies.  

      (41) L484 "we quantified how many times each individual crossed the center of the arena" Is this data being used in the results?

      Yes. Mentioned in the text just before Figure 3. L220

    1. eLife Assessment

      IL-10 balances protective and deleterious immune functions in mice and humans, but if IL-10 also controls avian intestinal homeostasis remains unclear. Generating genetic knockouts, Meunier et al. established that a complete lack of IL-10 strengthened immunity against enteric bacteria in chickens, while also aggravating infection-inflicted inflammatory tissue damage and dysbiosis upon parasite infection, but unlike mouse models, IL-10 deficiency did not provoke spontaneous colitis in chickens. The findings presented are valuable, and the strength of evidence is convincing. The observation may have implications for the livestock industry and additional studies involving genetic knockouts may further unravel conserved and distinct avian IL-10 controls.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Meunier et al. investigated the functional role of IL-10 in avian mucosal immunity. While the anti-inflammatory role of IL-10 is well established in mammals, and several confirmatory Knock-out models available in mice, IL-10's role in avian mucosal immunity is so far correlative. In this study the authors generated two different models of IL-10 ablation in Chickens. A whole body knock-out model, and an enhancer KO model leading to reduced IL10 expression. The authors first performed in vitro LPS stimulation based experiments, and then in vivo two different infection models employing C. jejuni and E. tenella, to demonstrate that complete ablation of IL10 leads to enhanced inflammation related pathology and gene expression, and enhanced pathogen clearance. At a steady-state level, however, IL-10 ablation did not lead to spontaneous colitis.

      Strengths:

      Overall the study is well executed and establishes an anti-inflammatory role of IL-10 in birds. While the results are expected, and not surprising, this appears to be the first report to conclusively demonstrate IL-10's anti-inflammatory role upon its genetic ablation in avian model. Provided the applicability of this information in combating pathogen infection in livestock species in sustainable industries like poultry, the study is worth publishing.

      Weaknesses:

      The study is primarily a confirmation of the already established anti-inflammatory role of IL-10.

      Comments on revised version:

      The authors have incorporated most of the points raised, and provided a reasonable argument for not considering DSS mediated colitis as an additional model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors were to investigate functional role of IL10 on mucosal immunity in chickens. CRISPR technology was employed to generate IL10 knock out chickens in both exon and putative enhancer regions. IL10 expressions were either abolished (knockout in exon) or reduced (enhancer knock-out). IL-10 play an important role in the composition of the caecal microbiome. Through various enteric pathogens challenge, deficient IL10 expression was associated with enhanced pathogen clearance, but with more severe lesion score and body weight loss.

      Strengths:

      Both in vitro and in vivo knock-out in abolished and reduced IL10 expression and broad enteric pathogens were challenged in vivo and various parameters were examined to evaluate the functional role of IL10 on mucosal immunity.

      Weaknesses:

      Overexpression of IL10 either in vitro or in vivo may further support the findings from this study.

      Comments on revised version:

      The authors' response and justifications are appropriate.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, Meunier et al. investigated the functional role of IL-10 in avian mucosal immunity. While the anti-inflammatory role of IL-10 is well established in mammals, and several confirmatory knockout models are available in mice, IL-10's role in avian mucosal immunity is so far correlative. In this study, the authors generated two different models of IL-10 ablation in Chickens. A whole body knock-out model and an enhancer KO model leading to reduced IL10 expression. The authors first performed in vitro LPS stimulation-based experiments, and then in vivo two different infection models employing C. jejuni and E. tenella, to demonstrate that complete ablation of IL10 leads to enhanced inflammation-related pathology and gene expression, and enhanced pathogen clearance. At a steady-state level, however, IL-10 ablation did not lead to spontaneous colitis. 

      Strengths: 

      Overall, the study is well executed and establishes an anti-inflammatory role of IL-10 in birds. While the results are expected and not surprising, this appears to be the first report to conclusively demonstrate IL-10's anti-inflammatory role upon its genetic ablation in the avian model. Provided this information is applicable in combating pathogen infection in livestock species in sustainable industries like poultry, the study will be of interest to the field. 

      Weaknesses: 

      The study is primarily a confirmation of the already established anti-inflammatory role of IL-10. 

      We do not agree that this work is primarily confirmatory. The anti-inflammatory role of IL10 was indeed known previously from studies in mammals. The much more general insight from the current study is our demonstration of the intrinsic trade-off between inflammation and tolerance in the response to both the microbiome (which was significantly altered in the IL10 knockout birds) and mucosal pathogens. The study of Eimeria challenge in particular highlights the fact that it may be better for the host to tolerate a potential pathogen than to take on the cost of elimination.

      Reviewer #2 (Public review): 

      Summary: 

      The authors were to investigate the functional role of IL10 on mucosal immunity in chickens. CRISPR technology was employed to generate IL10 knock-out chickens in both exon and putative enhancer regions. IL10 expressions were either abolished (knockout in exon) or reduced (enhancer knock-out). IL-10 plays an important role in the composition of the caecal microbiome. Through various enteric pathogen challenges, deficient IL10 expression was associated with enhanced pathogen clearance, but with more severe lesion scores and body weight loss. 

      Strengths: 

      Both in vitro and in vivo knock-out abolished and reduced IL10 expression, and broad enteric pathogens were challenged in vivo, and various parameters were examined to evaluate the functional role of IL10 on mucosal immunity. 

      Weaknesses: 

      Overexpression of IL-10 either in vitro or in vivo may further support the findings from this study. 

      An overexpression experiment, regardless of outcome, would not necessarily support or invalidate the findings of the current study. It would address the question of whether the absolute concentration of IL10 produced alters the outcome of an infection.

      Reviewer #1 (Recommendations for the authors): 

      The following are the recommendations that, in my opinion, will be helpful to enhance the quality of the study. 

      Major point: 

      The authors at a steady state did not observe any sign of spontaneous colitis. Since IL-10 KO in mice leads to enhanced pathological score upon DSS-mediated induction of colitis, and several colitis models are well established in birds, it will be worthwhile to test the consequence of experimentally inducing colitis in this context. 

      One of the novel features of this study is the observation that the microbiome is modified in the IL10KO HOM chicks, which may serve to mitigate potential spontaneous pathology; we now mention this in the discussion. We agree that it could be worthwhile in the future to look at additional challenge models. However, we would argue that the Eimeria challenge is a sufficiently adequate experimentally-induced model of colitis to demonstrate the increased inflammation that occurs in an IL10-deficient bird. This is further supported by evidence of enhanced inflammatory responses in the caeca of IL10KO HOM birds challenged with Campylobacter or Salmonella relative to WT controls. See in the revised manuscript (pages 12-13).

      Minor points: 

      (1) In Figure 2B, the authors should confirm whether the ROS-AV163 groups also have LPS treatment. 

      The legend for Figure 2B already states that neutralizing anti-IL10 antibody was added to LPS-stimulated BMDMs: “Nitric oxide production was assessed by measuring nitrite levels using Griess assay for LPS-stimulated BMDMs […] in the absence or presence of neutralizing anti-IL10 antibody ROS-AV163”. However, for added clarity we have now modified the x-axis label for Figure 2B (“+ROS-AV163” replaced by “+LPS +anti-IL10”) and we have also made minor changes to the figure legend. See in the revised manuscript (page 33).

      (2) In Figure 3F, the authors should discuss why the duodenum of KO birds has enhanced infiltration compared to WT? 

      We are not sure what the reviewer is referring to here. Although not specifically mentioned in Figure 3F, there is no statistically significant difference in cellular infiltration in the duodenum of IL10KO WT and HOM birds raised in our specified pathogen-free (SPF) facility, nor in the duodenum of IL10KO WT and HOM birds raised in our conventional facility (Mann-Whitney U tests, p>0.1 in both cases); this can be seen in the sums of histopathological scores shown in Figures 3C (SPF facility) and 3E (conventional facility). Figure 3F shows that there is a statistically significant difference in cellular infiltration scores in the duodenum and proximal colon of both IL10KO WT and HOM birds based on the environment they are raised in (SPF vs conventional). We have made minor changes to the text to clarify this. See in the revised manuscript (page 7).

      (3) The authors should discuss the observed differences in the C. jejuni colonization results among the two cohorts at week 1 and week 2 post-infection. 

      Numbers of C. jejuni in the caeca of IL10KO HOM birds were markedly lower than for WT controls at 1-week post-infection in cohort 1, and at both time intervals post-infection in cohort 2 (Figure 4A). This reached statistical significance at 1-week post-infection in cohort 1 and at 2-weeks post-infection in cohort 2. It is evident from Figure 4A that considerable inter-animal variance existed in each group, and in the IL10KO HOM birds in particular. This is typical of C. jejuni colonisation in chickens, where bacterial population structures have been reported to be variable and unpredictable (Coward et al., Appl Environ Microbiol 2008, PMID: 18424530). Similar variation between time intervals, birds and repeated experiments has been reported when evaluating vaccines against C. jejuni colonisation (e.g. Buckley et al., Vaccine 2010, PMID: 19853682; Nothaft et al., Front Microbiol 2021, PMID: 34867850). We performed two independent studies for this reason. Taken together, we consider that our data provide convincing evidence of elevated pro-inflammatory responses upon C. jejuni infection in IL10KO HOM birds relative to WT controls that associates with reduced bacterial burden. Our data is also consistent with a published observation that a commercial broiler line with low IL10 expression had correspondingly elevated expression of CXCLi-1, CXCLi-2 and IL-1b (Humphrey et al., mBio 2014, reference 33 in our original submission). We have added text to the discussion to capture the points above.  See in the revised manuscript (page 13).

      Reviewer #2 (Recommendations for the authors): 

      For the animal challenging experiments, both IL10KO HOM and IL10EnKO HOM chickens were used for Eimeria challenge, but not for Salmonella and Campylobacter. Could the authors justify why? 

      The Eimeria challenge produced a much higher and more reproducible level of inflammation than either of the bacterial challenge models. Within the parasite challenge cohorts, IL10KO HET and IL10EnKO HOM birds were only marginally different from WT controls (e.g. parasite replication: Figures 5A and B; lesion scores: Figures 5E and F; body weight gain: Figures 5G and H). Given the more limited response and the inter-individual variation in the bacterial challenge models, we felt that analysis of a sufficiently large cohort of the IL10KO HOM was appropriate, while additional cohorts of IL10KO HET and IL10EnKO HOM birds large enough to detect statistically significant differences could not be justified.

      In the M&M, there was no mention of # of birds generated for IL10EnKO HOM, HET, etc. 

      Full details of bird numbers can be found in SI Appendix Table S1 “Number of IL10KO and IL10EnKO WT, HET and HOM chicks hatched in the NARF SPF chicken facility in the first (G1) and second (G2) generations”. Table S1 is already referred to in the Results section “Generation of IL10-deficient chickens”; we have now also clearly referred to it in the “Animals” and “Generation of surrogate host chickens and establishment of the IL10KO and IL10EnKO lines under SPF conditions” sections of the Materials and Methods. In all three sections we have also added some text to clarify that the table details G1 and G2 bird numbers. See in the revised manuscript (pages 5, 15, 17).

      From the results of Campylobacter challenge, the results from the cohort 1 and cohort 2 were not consistent at both 1 and 2 weeks of post-infection. There is not much discussion on this inconsistency. What is the final conclusion: significant difference in week 1 or week 2, OR none of them, OR both of them. What would happen if an additional cohort were conducted for Salmonella and Eimeria? 

      As noted in response to Reviewer 1 (minor point 3), we have now added text to the discussion on the partial inconsistency between independent C. jejuni challenge studies. We do not feel that additional experiments to address this comment are required. Highly significant increases in the infiltration of lymphoplasmacytic cells and heterophils were detected in IL10KO HOM chickens relative to WT controls in the caeca, a key site of Campylobacter colonisation. This was consistently observed in two independent cohorts at both 1- and 2-weeks post-infection (SI Appendix Figures S7 and S8) and was reflected in similar patterns of expression of pro-inflammatory genes at these intervals in both cohorts (Figure 4B). As our laboratory has observed substantially less variation between repeated Salmonella challenges, a single study was performed, but with adequate power to detect statistical differences.  The effects of E. tenella infection in IL10KO WT and HOM birds were replicated (compare Figure 4 with data from day 6 in Figure 5).

    1. eLife Assessment

      The authors present a software (TEKRABber) to analyze how expression of transposable elements (TEs) and TE silencing factors KRAB zinc finger (KRAB-ZNF) genes are correlated in experimentally validated datasets. TEKRABber is used to reconstruct regulatory networks of KRAB-ZNFs and TEs during human brain evolution and in Alzheimer's disease. The direction of the work is important, with potentially significant interest from others looking for a tool for correlative gene expression analysis across individual genomes and species. However, the reviews identified biases and shortcomings in the pipeline that could lead to an unacceptable number of false positive and negative signals and thus impact the conclusions, leaving the work in its current form incomplete.

    2. Reviewer #1 (Public review):

      The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof of that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber progamm as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

      My main concerns are provided below:

      One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity (as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). Bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all) , which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

      With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend too) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

      Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the stategy and KRABber software approach described highly biased and unreliable.

      There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs repspectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

      Finally, there are some minor but important notes I want to share:

      The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could be merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen associate with certatin disease associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

      There is a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

      Additional note after reviewing the revised version of the manuscript:

      After reviewing the revised version of the manuscript, my criticism and concerns with this study are still evenly high and unchanged. To clarify, the revised version did not differ in essence from the original version; it seems that unfortunately, no efforts were taken to address the concerns raised on the original version of the manuscript, the results section as well as the discussion section are virtually unchanged.

    3. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof of that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber progamm as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis. 

      Thank you very much for the insightful review of our manuscript. Since most of the comments on our revised version are not different from the comments on our first version, we repeated our previous answer, but wrote a new reply to the new concerns (please see the last two paragraphs). 

      We would also like to reiterate here that most of the critique of the reviewer concerns the performance of other tools and not TEKRABber presented in our manuscript. We consider it out of scope for this manuscript to improve other tools.

      My main concerns are provided below: 

      One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity (as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). Bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all) , which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after. 

      With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend too) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study. 

      We sincerely thank the reviewer for highlighting the potential issues of false positives and negatives in TE quantification. The reviewer provided valuable examples of how different TE classes, such as Alus, LTRs, LINEs, and SVAs, exhibit distinct behaviors in the genome. To our knowledge, specific tools like ERVmap (Tokuyama et al., 2018), which annotates ERVs, and LtrDetector (Joseph et al., 2019), which uses k-mer distributions to quantify LTRs, could indeed enhance precision by treating specific TE classes individually. We acknowledge that such approaches may yield more accurate results and appreciate the suggestion. 

      In our study, we used TEtranscripts (Jin et al., 2015) prior to TEKRABber. TEtranscripts applies the Expectation Maximization (EM) algorithm to assign ambiguous reads as the following steps. Uniquely mapped reads are first assigned to genes, and  reads overlapping genes and TEs are assigned to TEs only if they do not uniquely match an annotated gene. The remaining ambiguous reads are distributed based on EM iterations. While this approach may not be as specialized as the latest tools for specific TE classes, it provides a general overview of TE activity. TEtranscripts outputs subfamily-level TE expression data, which we used as input for TEKRABber to perform downstream analyses such as differential expression and correlation studies.

      We understand the importance of adapting tools to specific research objectives, including focusing on particular TE classes. TEKRABber is designed not to refine TE quantification at the mapping stage but to flexibly handle outputs from various TE quantification tools. It accepts raw TE counts as input in the form of dataframes, enabling diverse analytical pipelines. We would also like to clarify that, since the input data is transcriptomic, our primary focus is on expressed TEs, rather than the effects of non-expressed TEs in the genome. In the revised version of our manuscript, we emphasize this distinction in the discussion and provide examples of how TEKRABber can integrate with other tools to enhance specificity and accuracy.

      Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the stategy and KRABber software approach described highly biased and unreliable. 

      There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs repspectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data. 

      We thank the reviewer for raising the important issue of accurately annotating the expanded repertoire of KRAB-ZNFs in primates, particularly the challenges of cross-species orthology and potential biases in RNA-seq data analysis. Indeed, we have also addressed this challenge in some of our previous papers (Nowick et al., 2010, Nowick et al., 2011 and Jovanovic et al., 2021).

      In the revised manuscript, we include more details about our two-step strategy to ensure accurate KRAB-ZNF ortholog assignments. First, we employed the Gene Order Conservation (GOC) score from Ensembl BioMart as a primary filter, selecting only one-to-one orthologs with a GOC score above 75% across primates. This threshold, recommended in Ensembl’s ortholog quality control guidelines, ensures high-confidence orthology relationships.(http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html#goc).

      Second, we incorporated data from Jovanovic et al. (2021), which independently validated KRAB-ZNF orthologs across 27 primate genomes. This additional layer of validation allowed us to refine our dataset, resulting in the identification of 337 orthologous KRAB-ZNFs for differential expression analysis (Figure S2).

      We acknowledge that different annotation methods or criteria may for some genes yield variations in the identified orthologs. However, we believe that this combination provides a robust starting point for addressing the challenges raised, while we remain open to additional refinements in future analyses.

      Finally, there are some minor but important notes I want to share:

      The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could be merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen associate with certatin disease associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution. 

      We fully acknowledge the concern that, given the large number of KRAB-ZNFs and their inherent variability, some associations with AD or other neurological disorders could occur by chance. This highlights the importance of additional functional studies to validate the causal role of KRAB-ZNF and TE interactions in disease contexts. While previous studies have indeed analyzed KRAB-ZNF and TE expression in human brain tissues, our study seeks to expand on this foundation by incorporating interspecies comparisons across primates. This approach enabled us to identify TE:KRAB-ZNF pairs that are uniquely present in healthy human brains, which may provide insights into their potential evolutionary significance and relevance to diseases like AD.

      In addition to analyzing RNA-seq data (GSE127898 and syn5550404), we have cross-validated our findings using ChIP-exo data for 159 KRAB-ZNF proteins and their TE binding regions in humans (Imbeault et al., 2017). This allowed us to identify specific binding events between KRAB-ZNF and TE pairs, providing further support for the observed associations. We agree with the reviewer that additional experimental validations, such as functional studies, are critical to further establish the role of KRAB-ZNF and TE networks in AD. We hope that future research can build upon our findings to explore these associations in greater detail.

      There is a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited. 

      We agree with the reviewer that many studies have examined the expression levels of KRAB-ZNFs and TEs in developing human brain tissues (Farmiloe et al., 2020; Turelli et al., 2020; Playfoot et al., 2021, among others). However, the novelty of our study lies in comparing KRAB-ZNF and TE expression across primate species, as well as in adult human brain tissues from both control individuals and those with Alzheimer’s disease. To our knowledge, no previous study has analyzed these data in this context. We therefore believe that our results will be of interest to evolutionary biologists and neurobiologists focusing on Alzheimer’s disease.

      Additional note after reviewing the revised version of the manuscript: 

      After reviewing the revised version of the manuscript, my criticism and concerns with this study are still evenly high and unchanged. To clarify, the revised version did not differ in essence from the original version; it seems that unfortunately, no efforts were taken to address the concerns raised on the original version of the manuscript, the results section as well as the discussion section are virtually unchanged.

      We regret that this reviewer was not satisfied with our changes. In fact, many of the points raised by this reviewer are important, but concern weaknesses of other tools. In our opinion, validating other tools would be out of scope for this paper. We want to emphasize that TEKRABber is not a quantification tool for sequencing data, but a software for comparative analysis across species. We provided a detailed answer to the reviewer and readers can refer to that answer in the public review above for further information.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

      Thank you very much for the insightful review of our manuscript.

      My main concerns are provided below:

      (1) One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity (as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). The bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all), which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

      With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend to) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in the brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

      We sincerely thank the reviewer for highlighting the potential issues of false positives and negatives in TE quantification. The reviewer provided valuable examples of how different TE classes, such as Alus, LTRs, LINEs, and SVAs, exhibit distinct behaviors in the genome. To our knowledge, specific tools like ERVmap (Tokuyama et al., 2018), which annotates ERVs, and LtrDetector (Joseph et al., 2019), which uses k-mer distributions to quantify LTRs, could indeed enhance precision by treating specific TE classes individually. We acknowledge that such approaches may yield more accurate results and appreciate the suggestion. 

      In our study, we used TEtranscripts (Jin et al., 2015) prior to TEKRABber. TEtranscripts applies the Expectation Maximization (EM) algorithm to assign ambiguous reads as the following steps. Uniquely mapped reads are first assigned to genes, and  reads overlapping genes and TEs are assigned to TEs only if they do not uniquely match an annotated gene. The remaining ambiguous reads are distributed based on EM iterations. While this approach may not be as specialized as the latest tools for specific TE classes, it provides a general overview of TE activity. TEtranscripts outputs subfamily-level TE expression data, which we used as input for TEKRABber to perform downstream analyses such as differential expression and correlation studies.

      We understand the importance of adapting tools to specific research objectives, including focusing on particular TE classes. TEKRABber is designed not to refine TE quantification at the mapping stage but to flexibly handle outputs from various TE quantification tools. It accepts raw TE counts as input in the form of dataframes, enabling diverse analytical pipelines. We would also like to clarify that, since the input data is transcriptiomic, our primary focus is on expressed TEs, rather than the effects of non-expressed TEs in the genome. In the revised version of our manuscript, we emphasize this distinction in the discussion and provide examples of how TEKRABber can integrate with other tools to enhance specificity and accuracy.

      (2) Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the strategy and KRABber software approach described highly biased and unreliable.

      There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs respectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

      We thank the reviewer for raising the important issue of accurately annotating the expanded repertoire of KRAB-ZNFs in primates, particularly the challenges of cross-species orthology and potential biases in RNA-seq data analysis. Indeed, we have also addressed this challenge in some of our previous papers (Nowick et al., 2010, Nowick et al., 2011 and Jovanovic et al., 2021).

      In the revised manuscript, we include more details about our two-step strategy to ensure accurate KRAB-ZNF ortholog assignments. First, we employed the Gene Order Conservation (GOC) score from Ensembl BioMart as a primary filter, selecting only one-to-one orthologs with a GOC score above 75% across primates. This threshold, recommended in Ensembl’s ortholog quality control guidelines, ensures high-confidence orthology relationships. (http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html#goc).

      Second, we incorporated data from Jovanovic et al. (2021), which independently validated KRAB-ZNF orthologs across 27 primate genomes. This additional layer of validation allowed us to refine our dataset, resulting in the identification of 337 orthologous KRAB-ZNFs for differential expression analysis (Figure S2).

      We acknowledge that different annotation methods or criteria may for some genes yield variations in the identified orthologs. However, we believe that this combination provides a robust starting point for addressing the challenges raised, while we remain open to additional refinements in future analyses.

      (3) The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen to associate with certain disease-associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

      There are a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

      We fully acknowledge the concern that, given the large number of KRAB-ZNFs and their inherent variability, some associations with AD or other neurological disorders could occur by chance. This highlights the importance of additional functional studies to validate the causal role of KRAB-ZNF and TE interactions in disease contexts. While previous studies have indeed analyzed KRAB-ZNF and TE expression in human brain tissues, our study seeks to expand on this foundation by incorporating interspecies comparisons across primates. This approach enabled us to identify TE:KRAB-ZNF pairs that are uniquely present in healthy human brains, which may provide insights into their potential evolutionary significance and relevance to diseases like AD.

      In addition to analyzing RNA-seq data (GSE127898 and syn5550404), we have cross-validated our findings using ChIP-exo data for 159 KRAB-ZNF proteins and their TE binding regions in humans (Imbeault et al., 2017). This allowed us to identify specific binding events between KRAB-ZNF and TE pairs, providing further support for the observed associations. We agree with the reviewer that additional experimental validations, such as functional studies, are critical to further establish the role of KRAB-ZNF and TE networks in AD. We hope that future research can build upon our findings to explore these associations in greater detail.

      Reviewer #1 (Recommendations for the authors):

      It is essential before this work can be considered for publication, that the points above are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

      We sincerely appreciate the reviewer’s insightful recommendations and constructive feedback. Each specific point has been carefully addressed in detail in the public reviews section above.

      Reviewer #2 (Public review)

      Summary:

      The aim was to decipher the regulatory networks of KRAB-ZNFs and TEs that have changed during human brain evolution and in Alzheimer's disease.

      Strengths:

      This solid study presents a valuable analysis and successfully confirms previous assumptions, but also goes beyond the current state of the art.

      Weaknesses:

      The design of the analysis needs to be slightly modified and a more in-depth analysis of the positive correlation cases would be beneficial. Some of the conclusions need to be reinterpreted.

      We sincerely thank the reviewer for the thoughtful summary, positive evaluation of our study, and constructive feedback. We appreciate the recognition of the strengths in our analysis and the valuable suggestions for improving its design and interpretation. 

      We would like to briefly comment on the suggested modifications to the design here and will provide a detailed point-by-point review later with our revised manuscript. 

      The reviewer recommended considering a more recent timepoint, such as less than 25 million years ago (mya), to define the "evolutionary young group" of KRAB-ZNF genes and TEs when discussing the arms-race theory. This is indeed a valuable perspective, as the TE repressing functions by KRAB-ZNF proteins  may have evolved more recently than the split between Old World Monkeys (OWM) and New World Monkeys (NWM) at 44.2 mya we used. 

      Our rationale for selecting 44.2 mya is based on certain primate-specific TEs such as the Alu subfamilies, which emerged after the rise of Simiiformes and have been used in phylogenetic studies (Xing et al., 2007 and Williams et al., 2010). This timeframe allowed us to investigate the potential co-evolution of KRAB-ZNFs and TEs in species that emerged after the OWM-NWM split (e.g., humans, chimpanzees, bonobos, and macaques used for this study). However, focusing only on KRAB-ZNFs and TEs younger than 25 million years would limit the analysis to just 9 KRAB-ZNFs and 92 TEs expressed in our datasets. While we will not conduct a reanalysis using this more recent timepoint, we will integrate the recommendation into the discussion section of the revised manuscript. 

      Furthermore, we greatly appreciate the reviewer's detailed insights and suggestions for refining specific descriptions and interpretations in our manuscript. We will address these points in the revised version to ensure the content is presented with greater precision and clarity.

      Once again, we thank both reviewers for their valuable feedback, which provides significant input for strengthening our study.

      Reviewer #2 (Recommendations for the authors):

      We thank the reviewer for the very insightful comments, which helped a lot in our interpretation and discussion of our results and in improving some of our statements.

      The present study seeks to uncover how the repression of transposable elements (TEs) by rapidly evolving KRAB-ZNF genes, which are known for their role in TE suppression, may influence human brain evolution and contribute to Alzheimer's disease (AD). Utilizing their previously developed tool, TEKRABber, the researchers analyze transcriptome datasets from the brains of four species of Old World Monkeys (OWM) alongside samples from healthy human individuals and AD patients.

      Through bipartite network analysis, they identify KRAB-ZNF/Alu-TE interactions as the most negatively correlated in the network, highlighting the repression of Alu elements by KRAB-ZNF proteins. In AD patient samples, they observe a reduction in a subnetwork comprising 21 interactions within an Alu TE module. These findings are consistent with earlier evidence that: (1) KRAB-ZNFs are involved in suppressing evolutionarily young Alu TEs; and (2) specific Alu elements have been reported to be deregulated in AD. The study also validates previous experimental ChIP-exo data on KRAB-ZNF proteins obtained in a different cell type (Imbeault et al., 2017).

      As a novely, the study identifies a human-specific amino acid variation in ZNF528, which directly contacts DNA nucleotides, showing signs of positive selection in humans and several human-specific TE interactions.

      Interestingly, in addition to the negative links, the researchers observed predominantly positive connections with other TEs, suggesting that while their approach is consistent with some previous observations, the authors conclude that it provides limited support for the 'genetic arms race' hypothesis.

      The reviewer is a specialist in TE and evolutionary research.

      Major issues:

      The study demonstrates the usefulness of the TEKRABber tool, which can support and successfully validate previous observations. However, there are several misconceptions and problems with the interpretation of the results.

      KRAB-ZNF proteins in repressing TEs in vertebrates  In the Abstract: "In vertebrates, some KRAB-ZNF proteins repress TEs, offering genomic protection."

      Although some KRAB-ZNF proteins exist in vertebrates, their TE-suppression role is not as prominent or specialized as it is in mammals, where it serves as a key defense mechanism against the mobilization of TEs.

      We appreciate the reviewer’s clarification regarding the role of KRAB-ZNF proteins in vertebrates. To improve accuracy and precision, we have revised the wording to specify that this mechanism is primarily observed in mammals rather than vertebrates.

      The definition of young and old

      The study considers the evolutionary age of young ({less than or equal to} 44.2 mya) and old(> 44.2 mya). This is the time of the Old World Monkey (OWM) and New World Monkey (NWM) split. Importantly, however, the KRAB-ZNF / KAP1 suppression system primarily suppresses evolutionarily younger TEs (< 25 MY old). These TEs are relatively new additions to the genome, i.e. they are specific to certain lineages (such as primates or hominins) and are more likely to be actively transcribed (and recognized as foreign by innate immunity) or have residual activity upon transposition. Examples include certain subfamilies of LINE-1, Alu (Y, S, less effective for J), SVA and younger human endogenous retroviruses (HERVs) such as HERV-K. The KRAB-ZNF / KAP1 system therefore focuses primarily on TEs that have evolved more recently in primates, in the last few million years (within the last 25 million years). Older TEs are controlled by broader epigenetic mechanisms such as DNA methylation, histone modifications, etc. Therefore, the age ({less than or equal to} 44.2 mya) is not suitable to define it as young.

      In this context, the specific TEs of the Simiiformes cannot be considered as 'recently evolved' (in the Abstract). The Simiiformes contain both OWM and NWM. Notably, the study includes four species, all of which belong to the OWMs.

      The 'genetic arms race' theory

      Unfortunately, the problematic definition of young and old could also explain why the authors conclude that their data only weakly support the 'genetic arms race' hypothesis.

      The KRAB-ZNF proteins evolve rapidly, similar to TEs, which raises the 'genetic arms race' hypothesis. This hypothesis refers to the constant evolutionary struggle between organisms and TEs. TEs constantly evolve to overcome host defences, while host genomes develop mechanisms to suppress these potentially harmful elements. Indeed, in mammals, an important example is the KRAB-ZNF/TE interaction. The KRAB-ZNF proteins rapidly evolve to target specific TEs, creating a 'genetic arms race' in which each side - TEs and the KRAB-ZNF/KAP1 (alias TRIM28) repressor complex - drives the evolution of the other in response to adaptive pressure. Importantly, the 'genetic arms race' hypothesis describes the evolutionary process that occurs between TE and host when the TE is deleterious. Again, this includes the young TEs (< 25 MY old) with residual transposition activity or those that actively transcribed and exacerbate cellular stress and inflammatory responses. Approximately 25 million years ago, the superfamilies Hominoidea (apes) and Cercopithecoidea (Old World monkeys, I.e. macaque) split.

      Just to clarify, our initial study aim was to examine whether TEs exhibit any evolutionary relationships with KRAB-ZNFs across the four studied species (human, chimpanzee, bonobo, and macaque). For investigating the arms-race hypothesis, we really appreciate the reviewer suggesting a more recent time point, such as less than 25 million years ago (mya), to define the "evolutionary young group" of TEs and KRAB-ZNF genes. This is indeed a valuable recommendation, as 25 mya marks the emergence of Hominoidea (Figure 2C in the manuscript), making it a meaningful reference point for studying recently evolved KRAB-ZNFs and TEs. However, restricting the analysis to elements younger than 25 mya would reduce the dataset to only 9 KRAB-ZNFs and 92 TEs. Nevertheless, we provide here our results for those elements in Table S7:

      We observed that among the correlations in the < 25 mya subset, negative correlations (7) outnumbered positive ones (2). However, these correlations were derived from only 3 out of 9 KRAB-ZNFs and 9 out of 92 TE subfamilies. Therefore, based on our data, while the < 25 mya group shows a higher proportion of negative correlations, the sample size is too limited to derive networks or draw robust conclusions in our analysis, especially when compared to our original evolutionary age threshold of 44.2 mya. For this reason, we chose not to reanalyze the data but rather to acknowledge that our current definition of “young” may not be optimal for testing the arms-race model in humans. While previous studies (Jacobs et al., 2014; Bruno et al., 2019; Zuo et al., 2023) have explored relevant KRAB-ZNF and TE interactions, our review of the KRAB-ZNFs and TEs highlighted in those works suggests that a specific focus on elements <25 mya has not been a primary emphasis. 

      "our findings only weakly support the arms-race hypothesis. Firstly, we noted that young TEs exhibit lower expression levels than old TEs (Figure 2D and 5B), which might not be expected if they had recently escaped repression". - This is a misinterpretation. These old TEs are no longer harmful. This is not the case of the 'genetic arms race'.

      We sincerely appreciate the reviewer’s comments, which have helped us refine our interpretation to prevent potential misunderstandings. Our initial expectation, based on the arms-race hypothesis, was that young TEs would exhibit higher expression levels due to a recent escape from repression, while young KRAB-ZNFs would show increased expression as a counter-adaptive response. However, our findings indicate that both young TEs and young KRAB-ZNFs exhibit lower expression levels. This observation does not align with the classical arms-race model, which typically predicts an ongoing cycle of adaptive upregulation. We rephrase the sentences in our discussion to hopefully make our idea more clear. In addition, we added the notion that older TEs might not be harmful anymore, which we agree with.

      "Additionally, some young TEs were also negatively correlated with old KRAB-ZNF genes, leading to weak assortativity regarding age inference, which would also not be in line with the arms-race idea."

      This is not a contradiction, as an old KRAB-ZNF gene could be 'reactivated' to protect against young TEs. (It might be cheaper for the host than developing a brand new KRAB-ZNF gene.

      We agree with the reviewer's point that older KRAB-ZNFs may be reactivated to suppress young TEs, potentially as a more cost-effective evolutionary strategy than the emergence of entirely new KRAB-ZNFs. We have incorporated this perspective into the revised manuscript to provide a more detailed discussion of our findings.

      TEs remain active

      In the abstract: "Notably, KRAB-ZNF genes evolve rapidly and exhibit diverse expression patterns in primate brains, where TEs remain active."

      This is not precise. TEs are not generally remain active in the brain. It is only the autonomous LINE-1 (young) and non-autonomous Alu (young) and SVA (young) elements that can be mobilized by LINE-1. In addition, the evolutionary young HERV-K is recognized as foreign and alerts the innate immune system (DOI: 10.1172/jci.insight.131093 ) and is a target of the KRAB-ZNF/KAP1 suppression system.

      In the abstract: "Evidence indicates that transposable elements (TEs) can contribute to the evolution of new traits, despite often being considered deleterious."

      Oversimplification: The harmful and repurposed TEs are washed together.

      We appreciate the reviewer’s detailed suggestions for improving the precision of our abstract. While we previously mentioned LINE-1 and Alu elements in the introduction, we now explicitly specify in the abstract that only certain TE subfamilies, such as autonomous LINE-1 and non-autonomous Alu and SVA elements, remain active in the primate brain. Additionally, we have refined the phrasing regarding the role of TEs in evolution to clearly distinguish between their deleterious effects and their potential for functional repurposing. These clarifications have been incorporated into the revised abstract to ensure greater accuracy and nuance.

      Positive links

      "The high number of positive correlations might be surprising, given that KRAB-ZNFs are considered to repress TEs."

      Based on the above, it is not surprising that negative associations are only found with young (< 25 my) TEs. In fact, the relationship between old KRAB-ZNF proteins and old (non-damaging) TEs could be neutral/positive. The case of ZNF528 could be a valuable example of this.

      We thank the reviewer for providing this plausible interpretation and added it to the manuscript.

      "276 TE:KRAB-ZNF with positive correlations in humans were negatively correlated in bonobos"  It would be important to characterise the positive correlations in more detail. Could it be that the old KRAB-ZNF proteins lost their ability to recruit KAP1/TRIM28? Demonstrate it.

      The strategy of developing sequence-specific DNA recognition domains that can specifically recognise TEs is expensive for the host. Recent studies suggest that when the TE is no longer harmful, these proteins/connections can be occasionally repurposed. The repurposed function would probably differ from the original suppressive function.

      In my opinion, the TEKRABber tool could be useful in identifying co-option events:

      We appreciate the reviewer’s suggestion regarding the characterization of positive correlations. While it is possible that some old KRAB-ZNF proteins have lost their ability to recruit KAP1/TRIM28, we cannot conclude this definitively for all cases. To address this, we examined ChIP-exo data from Imbeault et al. (2017) (Accession: GSE78099) and analyzed the overlap of binding sites between KRAB-ZNFs, KAP1/TRIM28, and RepeatMasker-annotated TEs. Our results indicate that some old KRAB-ZNFs still exhibit binding overlap with KAP1 at TE regions, suggesting that their repressive function may be at least partially retained (Author response image 1).

      Author response image 1.<br /> Overlap of KAP1, Zinc finger proteins, and RepeatMasker annotation. Here we detect the overlap of ChIP-exo binding events using KAP1/TRIM28, with KRAB-ZNF genes (one at a time) and RepeatMasker annotation. (115 old and 58 young KRAB-ZNFs, Mann-Whitney, p<0.01).<br />

      Minor

      "Lead poisoning causes lead ions to compete with zinc ions in zinc finger proteins, affecting proteins such as DNMT1, which are related to the progression of AD (Ordemann and Austin 2016)."

      Not precise: While DNMT1 does contain zinc-binding domains, it is not categorized as a zinc finger protein.

      We appreciate the reviewer’s insight regarding the classification of DNMT1. After careful consideration, we have removed this sentence from the introduction to maintain focus on KRAB zinc finger proteins.

      Definition of TEs

      "There were 324 KRAB-ZNFs and 895 TEs expressed in Primate Brain Data." Define it more precisely. It is not clear, what the authors mean by TEs: Are these TE families, subfamilies? Provide information on copy numbers of each in the analysed four species.

      We appreciate the reviewer’s suggestion to clarify our definition of TEs. To improve precision, we have specified that the analysis was conducted at the subfamily level. Additionally, we have provided the copy numbers of TEs for the four analyzed species in Table S4.

      Occupancy of TEs in the genome

      "TEs comprise (i) one third to one half of the mammalian genome and are (ii) not randomly distributed..."

      (i) The most accepted number is 45%. However, some more recent reports estimate over 50%, thus the one third is an underestimation.

      (ii) Not randomly distributed among the mammalian species?

      (i) We thank the reviewer for pointing out that our statement about the abundance of TEs was outdated. We have updated the estimate to reflect that TEs can occupy more than half of the genome, based on recent publications.

      (ii) We acknowledge the reviewer’s concern regarding the distribution of TEs. Although TEs are interspersed throughout the genome, their insertion sites are not entirely random, as they tend to exhibit preferences for certain genomic regions. To clarify this, we have revised the wording in the paragraph accordingly.

      We would like to express our sincere gratitude to both reviewers for their insightful feedback, which has been instrumental in enhancing the quality of our study.

    1. eLife Assessment

      This study provides valuable insights into the evolutionary conservation of sex determination mechanisms in ants by identifying a candidate sex-determining region in a parthenogenetic species. The strength of evidence is solid, using well-executed genomic analyses to identify differences in heterozygosity between females and diploid males, though not yet functional validation of the candidate locus.

    2. Reviewer #1 (Public review):

      The authors have implemented several clarifications in the text and improved the connection between their findings and previous work. As stated in my initial review, I had no major criticisms of the previous version of the manuscript, and I continue to consider this a solid and well-written study. However, the revised manuscript still largely reiterates existing findings and does not offer novel conceptual or experimental advances. It supports previous conclusions suggesting a likely conserved sex determination locus in aculeate hymenopterans, but does so without functional validation (i.e., via experimental manipulation) of the candidate locus in O. biroi. I also wish to clarify that I did not intend to imply that functional assessments in the Pan et al. study were conducted in more than one focal species; my previous review explicitly states that the locus's functional role was validated in the Argentine ant.

    3. Reviewer #3 (Public review):

      The authors have made considerable efforts to conduct functional analyses to the fullest extent possible in this study; however, it is understandable that meaningful results have not yet been obtained. In the revised version, they have appropriately framed their claims within the limits of the current data and have adjusted their statements as needed in response to the reviewers' comments.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include (in what will be lines 123-126) the highlighted portion of the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion (in what will be lines 372-374): “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results, in what will be lines 172-174: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the highlighted portion of the following sentence (in what will be lines 268-270) to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below (in what will be lines 287-295), with the additions highlighted:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      L307-308 should state homozygous for either allele in THE MAJORITY of diploid males.

      This will be fixed in the revised manuscript, in what will be line 321.

      Reviewer #3 (Recommendations for the authors):

      The association between heterozygosity in the CSD candidate region and female development in O. biroi, along with the high sequence homology of this region to CSD loci identified in two distantly related ant species, is not sufficient to fully address the evolution of the CSD locus and the mechanisms of sex determination.

      Given that functional genetic tools, such as genome editing, have already been established in O. biroi, I strongly recommend that the authors investigate the role of the lncRNA through knockout or knockdown experiments and assess its impact on the sex-specific splicing pattern of the downstream tra gene.

      Although knockout experiments of the lncRNA would be illuminating, the primary signal of complementary sex determination is heterozygosity. As is clearly stated in our manuscript and that of (Pan et al. 2024), it does not appear to be heterozygosity within the lncRNA that induces female development, but rather heterozygosity in non-transcribed regions linked to the lncRNA. Therefore, future mechanistic studies of sex determination in O. biroi, L. humile, and other ants should explore how homozygosity or heterozygosity of this region impacts the sex determination cascade, rather than focusing (exclusively) on the lncRNA.

      With this in mind, we developed three sets of guide RNAs that cut only one allele within the mapped CSD locus, with the goal of producing deletions within the highly variable region within the mapped locus. This would lead to functional hemizygosity or homozygosity within this region, depending on how the cuts were repaired. We also developed several sets of PCR primers to assess the heterozygosity of the resultant animals. After injecting 1,162 eggs over several weeks and genotyping the hundreds of resultant animals with PCR, we confirmed that we could induce hemizygosity or homozygosity within this region, at least in ~1/20 of the injected embryos. Although it is possible to assess the sex-specificity of the splice isoform of tra as a proxy for sex determination phenotypes (as done by (Pan et al. 2024)), the ideal experiment would assess male phenotypic development at the pupal stage. Therefore, over several more weeks, we injected hundreds more eggs with these reagents and reared the injected embryos to the pupal stage. However, substantial mortality was observed, with only 12 injected eggs developing to the pupal stage. All of these were female, and none of them had been successfully mutated.

      In conclusion, we agree with the reviewer that functional experiments would be useful, and we made extensive attempts to conduct such experiments. However, these experiments turned out to be extremely challenging with the currently available protocols. Ultimately, we therefore decided to abandon these attempts.  

      We opted not to include these experiments in the paper itself because we cannot meaningfully interpret their results. However, we are pleased that, in this response letter, we can include a brief description for readers interested in attempting similar experiments.

      Since O. biroi reproduces parthenogenetically and most offspring develop into females, observing a shift from female- to male-specific splicing of tra upon early embryonic knockout of the lncRNA would provide much stronger evidence that this lncRNA is essential for female development. Without such functional validation, the authors' claim (lines 36-38) seems to reiterate findings already proposed by Pan et al. (2024) and, as such, lacks sufficient novelty.

      We have responded to the issue of “lack of novelty” above. But again, the actual CSD locus in both O. biroi and L. humile appears to be distinct from (but genetically linked to) the lncRNA, and we have no experimental evidence that the putative lncRNA in O. biroi is involved in sex determination at all. Because of this, and given the experimental challenges described above, we do not currently intend to pursue functional studies of the lncRNA.

      References

      Hasselmann M, Gempe T, Schiøtt M, Nunes-Silva CG, Otte M, Beye M. 2008. Evidence for the evolutionary nascence of a novel sex determination pathway in honeybees. Nature 454:519–522.

      Koch V, Nissen I, Schmitt BD, Beye M. 2014. Independent Evolutionary Origin of fem Paralogous Genes and Complementary Sex Determination in Hymenopteran Insects. PLOS ONE 9:e91883.

      Matthey-Doret C, van der Kooi CJ, Jeffries DL, Bast J, Dennis AB, Vorburger C, Schwander T. 2019. Mapping of multiple complementary sex determination loci in a parasitoid wasp. Genome Biology and Evolution 11:2954–2962.

      Miyakawa MO, Mikheyev AS. 2015. QTL mapping of sex determination loci supports an ancient pathway in ants and honey bees. PLOS Genetics 11:e1005656.

      Pan Q, Darras H, Keller L. 2024. LncRNA gene ANTSR coordinates complementary sex determination in the Argentine ant. Science Advances 10:eadp1532.

      Privman E, Wurm Y, Keller L. 2013. Duplication and concerted evolution in a master sex determiner under balancing selection. Proceedings of the Royal Society B: Biological Sciences 280:20122968.

      Rabeling C, Kronauer DJC. 2012. Thelytokous parthenogenesis in eusocial Hymenoptera. Annual Review of Entomology 58:273–292.

      Schmieder S, Colinet D, Poirié M. 2012. Tracing back the nascence of a new sex-determination pathway to the ancestor of bees and ants. Nature Communications 3:1–7.

      Vorburger C. 2013. Thelytoky and Sex Determination in the Hymenoptera: Mutual Constraints. Sexual Development 8:50–58.

    1. eLife Assessment

      Axon growth is essential to formation of neural connections. This manuscript presents a useful presentation of a new method for assessing the adhesion strength of axons with the use of a laser-induced shock wave. However, the strength of the evidence is incomplete as critical controls for calibration and time course are lacking.

    2. Reviewer #1 (Public review):

      Summary:

      Axon growth is of course essential to formation of neural connections. Adhesion is generally needed to anchor and rectify such motion, but whether the tenacity or forces of adhesion must be optimal for maximal axon extension is unknown. Measurements and contributing factors are generally lacking and are pursued here with a laser-induced shock wave approach near the axon growth cone. The authors claim to make measurements of the pressure required to detach axon from low to high matrix density. The results seem to support the authors' conclusions, and the work -- with further support per below - is likely to impact the field of cell adhesion. In particular, there could be some utility of the methods for the adhesion and those interested in aspects of axon growth

      Strengths:

      A potential ability to control the pressure simply via proximity of the laser spot is convenient and perhaps responsible. The 0 to 1 scale for matrix density is a good and appropriate measure for comparing adhesion and other results. The attention to detachment speed, time, F-actin, and adhesion protein mutant provides key supporting evidence. Lastly, the final figure of traction force microscopy with matrix varied on a gel is reasonable and more physiological because neural tissue is soft (cite PMID: 16923388); an optimum in Fig.6 also perhaps aligns with axon length results in Fig.5.

      Weaknesses:

      The results seem incomplete and less than convincing. This is because the force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements. Secondly, the claimed effect of pressure on detachment of the growth cone does not consider other effects such as cavitation or temperature and certainly needs validation with additional methods that overcome such uncertainties. The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um2 in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward.

      The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      Weaknesses concerning the laser method have been addressed, but alternative methods and relevance to in vivo remain lacking.

    3. Reviewer #3 (Public review):

      Summary:

      Yamada et al. build on classic and more recent studies (Chen et al., 2023; Lemmon et al., 1992; Nichol et al., 2016; Zheng et al., 1994; Schense and Hubbell, 2000) to better understand the relationship between substrate adhesion and neurite outgrowth.

      Strengths:

      The primary strength of the manuscript lies in developing a method for investigating the role of adhesion in axon outgrowth and traction force generation using a femtosecond laser technique. The most exciting finding is that both outgrowth and traction force generation have a biphasic relationship with laminin concentration.

      Weaknesses:

      The primary weaknesses, as written, are a lack of discussion of prior studies that have directly measured the strength of growth cone adhesions to the substrate (Zheng et al., 1994) and traction forces (Koch et al., 2012), the inverse correlation between retrograde flow rate and outgrowth (Nichol et al., 2016), and prior studies noting a biphasic effect of substrate concentration of neurite outgrowth (Schense and Hubbell, 2000).

      Overall, the claims and conclusions are well justified by the data. The main exception is that the data is more relevant to how the rate of neurite outgrowth is controlled rather than axonal guidance.

      This manuscript will help foster interest in the interrelationship between neurite outgrowth, traction forces, and substrate adhesion, and the use of a novel method to study this problem.

      The authors did an excellent job in addressing my original concerns in the revision.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Axon growth is of course essential to the formation of neural connections. Adhesion is generally needed to anchor and rectify such motion, but whether the tenacity or forces of adhesion must be optimal for maximal axon extension is unknown. Measurements and contributing factors are generally lacking and are pursued here with a laser-induced shock wave approach near the axon growth cone. The authors claim to make measurements of the pressure required to detach axons from low to high matrix density. The results seem to support the authors' conclusions, and the work - with further support - is likely to impact the field of cell adhesion. In particular, there could be some utility of the methods for the adhesion and those interested in aspects of axon growth.

      Strengths:

      A potential ability to control the pressure simply via proximity of the laser spot is convenient and perhaps reasonable. The 0 to 1 scale for matrix density is a good and appropriate measure for comparing adhesion and other results. The attention to detachment speed, time, F-actin, and adhesion protein mutant provides key supporting evidence. Lastly, the final figure of traction force microscopy with matrix varied on a gel is reasonable and more physiological because neural tissue is soft (cite PMID: 16923388); an optimum in Fig.6 also perhaps aligns with axon length results in Fig.5.

      We thank you for your many suggestions to improve the presentation to explain our experimental results obtained. We carefully reconsidered problems you pointed out and revised the manuscripts as follows.

      Weaknesses:

      The results seem incomplete and less than convincing. This is because the force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements.

      As the force calibration data, although we have indicated by the experimental system over 10 years ago, we have used the same system under appropriate maintenance. The system performance has been checked regularly and maintained. Therefore, the calibration data displayed is suitable even in the present. There is no problem with the calibration data.

      Secondly, the claimed effect of pressure on the detachment of the growth cone does not consider other effects such as cavitation or temperature, and certainly needs validation with additional methods that overcome such uncertainties.

      The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um<sup>2</sup> in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward.

      We have previously reported that a single pulse from a Ti:sapphire femtosecond laser amplifier can effectively generate shockwave and stress waves with minimal thermal effects. Notably, during this process, the temperature elevation at the laser focal point is sufficiently suppressed, allowing efficient force generation without causing significant heating in the surrounding area. By applying this method, we have confirmed that cell have any damage after the force loading. Therefore, this approach enables cell detachment while minimizing thermal and cavitation-induced damage to the cell. This clarification has been incorporated into the revised results section (lines 119-120). We agree with the reviewer that the presented data was insufficient for supporting the proposed model. To this end, we have performed additional experiments and analyses, which are included in the revised version of the manuscript. To examine the impact of femtosecond laser irradiation on laminin, fluorescently labeled laminin was coated onto glass-bottom dishes, and the fluorescent intensity was analyzed before and after the impulsive force loading. The result indicates that the fluorescent intensity at the laser focal point remained unaffected by laser irradiation. This finding suggests that axon detachment results from the dissociation between L1 and laminin rather than the detachment of laminin from the substrate. These data have been incorporated into Supplementary Fig. 1 and page 5 (lines 113-120). In addition, explanation of the relationship between the adhesion pressure and the traction stress has been specified in page 8 (lines 253-258).

      The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      In response to the reviewer’s request, we measured the axon length on the polyacrylamide gel with stiffness comparable to brain tissue (0.3kPa). The axon length was consistently shorter on the gel on the glass under our experimental conditions, in agreement with previous findings (Abe at al., 2021). Furthermore, a biphasic relationship between axon outgrowth and laminin concentration was observed. These results suggest that the biphasic behavior of axon outgrowth identified in this study is likely to occur in vivo. We have updated the Fig. 6 and specified the result (lines 224-225) in revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The force calibration curve seems to be from a >10 yr old paper without any more recent checks or validating measurements - which are essential. Effects of cavitation and temperature must be checked, and validated with additional methods that overcome such uncertainties. The authors need to check whether the laser perturbs the matrix, particularly local density. A relation between traction stresses of ~20-50 pN/um2 in Fig.6 and the adhesion pressure of 3-5 kPa of FIg.3 needs to be carefully explained; the former units equate to 0.02-0.05 kPa, and would perhaps suggest cells cannot detach themselves and move forward. The authors need to measure axon length on gels (Fig.6) as more physiological because neural tissue is soft. The studies are also limited to a rudimentary in vitro model without clear relevance to in vivo.

      Thank you this reviewer for the recommendations on our manuscript. For this, we have answered above comments. Please find our response there.

      Reviewer #2 (Public Review):

      Summary:

      The authors measure axon outgrowth rate, laminin adhesion strength, and actin rearward flow rate. They find that the axon outgrowth rate has a biphasic dependence on adhesion strength. In interpreting the results, they suggest that the results "imply that adhesion modulation is key to the regulation of axon guidance"; however, they measure elongation rate, not guidance.

      Strengths:

      The measurements of adhesion strength by laser-induced shock waves are reasonable as is the measurement of actin flow rates by speckle microscopy.

      Weaknesses:

      They only measure the length of the axons after 3 days and have no measurements of the actual rate of growth cone movements when they are moving. They do not measure the rate of actin growth at the leading edge to know its contribution to the extension rate. This is inadequate.

      These studies are unlikely to have an impact on the field because the measurement of axon growth rate at short times is missing.

      We thank the reviewer for understanding novelty of our study. We agree with the reviewer’s comment. Following the comment, we performed time-lapse imaging of growth cone movements and quantified the migration rate. Consistent with the length of axons, the migration rate did not exhibit a monotonic increase with increased L1CAM-laminin binding but rather displayed biphasic behavior, where excessive L1CAM-laminin binding led to a reduction in the migration rate. Notably, the biphasic migration behavior was abolished in the L1CAM knockdown neurons. We believe these results provide further support for our proposed model. This has been incorporated into new Fig.5 and page 7 (lines 209-218) of the revised manuscript. In addition, the experimental method has been added in page 13 (lines 385-391).

      Reviewer #2 (Recommendations For The Authors):

      This is a very weak paper because of the lack of relevant measurements to enable correlations between actual extension rate, traction force, and rates of speckle movement.

      Thank you this reviewer for the critical comment on our model. we performed time-lapse imaging of growth cone movements and quantified the migration rate. From this reviewer and reviewer #3 comments, we recognized the importance of prior studies that the measurement of adhesion strength in the growth cone, traction force, the correlation between retrograde flow and outgrowth, and biphasic dependence of substrate concentration of neurite outgrowth (Please also find our response to recommendations from reviewer #3).

      Reviewer #3 (Public Review):

      Summary:

      Yamada et al. build on classic and more recent studies (Chen et al., 2023; Lemmon et al., 1992; Nichol et al., 2016; Zheng et al., 1994; Schense and Hubbell, 2000) to better understand the relationship between substrate adhesion and neurite outgrowth.

      Strengths:

      The primary strength of the manuscript lies in developing a method for investigating the role of adhesion in axon outgrowth and traction force generation using a femtosecond laser technique. The most exciting finding is that both outgrowth and traction force generation have a biphasic relationship with laminin concentration.

      Weaknesses:

      The primary weaknesses are a lack of discussion of prior studies that have directly measured the strength of growth cone adhesions to the substrate (Zheng et al., 1994) and traction forces (Koch et al., 2012), the inverse correlation between retrograde flow rate and outgrowth (Nichol et al., 2016), and prior studies noting a biphasic effect of substrate concentration of neurite outgrowth (Schense and Hubbell, 2000).

      Overall, the claims and conclusions are well justified by the data. The main exception is that the data is more relevant to how the rate of neurite outgrowth is controlled rather than axonal guidance.

      This manuscript will help foster interest in the interrelationship between neurite outgrowth, traction forces, and substrate adhesion, and the use of a novel method to study this problem.

      We thank the reviewer for appropriate comments and recognition of the strength to our manuscript. Regarding to these comments, we recognized the importance of prior studies that the measurement of adhesion strength in the growth cone, traction force, the correlation between retrograde flow and outgrowth, and biphasic dependence of substrate concentration of neurite outgrowth. With respecting the prior studies, we revised the introduction (lines 38-44, 61-65) and discussion (lines 272-281) in the manuscript. The references suggested by the reviewer have been added (Ref. 17, 26, 27, 31, and 35) (see also below responses).

      Reviewer #3 (Recommendations For The Authors):

      Overall, I found the experiments discussed in the manuscript to be excellent. My primary suggestion is to slightly expand the introduction and discussion to put this work in context better. Additionally, the writing is unclear in places and would be helped by a careful edit.

      We appreciate the reviewer’s constructive critiques and would like to thank him/her for the experimental suggestions, which we have taken into account in the revised version of the manuscript. We trust that the additional modification of the text will satisfactorily address the reviewer’s concerns.

      In more detail:

      The introduction is well-written but could be improved by discussing how these studies build earlier work. Through the 1980s and 90s, an important question was whether growth cone guidance occurred as the result of chemical cues that altered the activity of signaling pathways or differences in the adhesion between growth cones and substrates. While there was some clear evidence that growth cones were steered to more adhesive substrates (Hammarback and Letourneau, 1986), there were also important exceptions. For example, (Calof and Lander, 1991) examined the biophysical relationship between neuronal migration and substrate adhesion and found that laminin, which tends to support rapid migration and neurite outgrowth, tended to decrease adhesion.

      Thank you for critical comments to our manuscript. We have modified the introduction to discuss our understanding of the growth cone guidance, particularly regarding the role of neurite migration and substrate adhesion into introduction (line 38-40, 42-44) in revised manuscript.

      To better understand the relationship between substrate adhesion and outgrowth, Heidemann's group (Zheng et al., 1994) was, to the best of my knowledge, the first paper to directly measure the force required to detach growth cones from substrates; including laminin and L1. For DRG neurons, this was ~ 1000 - 3000 dynes (i.e., 10 to 30 nN) and they noted that traction force generation is 3 to 15 times less than the force needed to dislodge growth cones. Additionally, that manuscript goes on to suggest, "These data argue against the differential adhesion mechanism for growth cone guidance preferences in culture." With the rising development of powerful molecular genetic tools and a growing appreciation of the importance of signaling pathways in neurite outgrowth (Huber et al., 2003), the field as the whole has focused on the molecular aspects of growth cone guidance, leaving many aspects of the physical process of neurite outgrowth unanswered. The strength of this manuscript is that it develops a new method for measuring growth cone adhesion forces, which reassuringly generates similar results to classic studies. In turn, it combines this with molecular genetic analysis to determine the contribution L1-LN interaction makes to the overall adhesion strength.

      We will ensure that the manuscript explicitly acknowledges the significance of Zheng et al. (1994) in shaping the field and clarifies how our study expands upon these foundational findings. Following the reviewer’s suggestion we have added Zheng et al. (1994) in reference and modified discussion (line 272-281, Ref. 17) in revised manuscript.

      There are also a couple of other papers directly relevant to this work. In particular, (Koch et al., 2012) measured the traction forces generated by hippocampal neurons on polyacrylamide gels. They estimated it to be ~ 5 to 10 Pa. While the overall results are similar, in this manuscript, it is reported that the forces generated by hippocampal neurons are significantly higher, in the range of 25-75 Pa. I don't have an issue with this difference, but please look at the Koch paper and see if there is some technical reason for the different estimates of traction forces. Along these lines, please note the Young's modulus of the gels used in the experiments.

      As you mentioned, the traction force measured in our experiments is more than 5 times stronger than that reported by Koch et al., While the exact reason remains unclear, difference in gel-coating may have influenced the result. In the study by Koch et al., pre-coating was performed using Cell-Tak before laminin coating. in contrast, our study used poly-lysin for pre-coating. This methodological difference may have affected the measurement of traction force. However, at least, our experiments have consistently yielded reproducible results.

      (Nichol et al., 2016) nicely shows an inverse relationship between RF rate and LN density at low concentrations. While the results reported here are similar, a strength of this paper is that it extends the work to higher LN concentrations.

      Thank you for pointing out the relevance of Nichol et al., 2016 to our study. We agree that their study provides important insights into the relationship between RF rate and LN density at low concentrations. The novelty our study lies not only in extending the analysis to higher LN concentrations, but also performed analysis that include adhesion strength, traction force, and migration rate in the growth cone. We have included this discussion (line 259-261, Ref. 26) in revised manuscript.

      My understanding is that the biphasic effect of LN in neurite outgrowth was previously established. For example, Buetter and Pittman, 1991 note a biphasic effect of LN conc on some parameters of neurite outgrowth, such as RMS, a measure of growth cone velocity, but not others, such as total neurite length. Likewise, (Schense and Hubbell, 2000) noted a biphasic effect of RGB peptides on outgrowth. In light of this, it would seem the main contribution of this paper is the finding that traction force generation has a bi-phasic relationship with LN concentration.

      Thank you for your thoughtful comment. We agree that the main contribution of this study is demonstrating that the biphasic behavior of axon migration arises from the biphasic dependence of the traction force on laminin concentration. We have included this discussion (line 272-281, Ref. 31) in the revised manuscript.

      Please appreciate that I'm not asking the authors to copy-paste the text above into the manuscript. Instead, the references provide a starting point for better explaining the novel contributions here. The interaction of adhesions, traction force generation, the rate of neurite outgrowth, and biophysics of growth cone guidance is a classic problem in neuronal mechanics but is far from solved. My hope is that this manuscript might inspire more interest in this problem.

      Thank you for your thoughtful feedback and for highlighting the importance of better contextualizing our novel contributions within the broader field of neuronal mechanics. We appreciate your emphasis on the classic yet unresolved nature of the interactions between adhesions, traction force generation, axon outgrowth rate, and the biophysics of growth cone guidance.

      We hope these revisions help strengthen the manuscript’s impact and inspire further investigation into this important problem. We appreciate your insightful comments and the opportunity to improve our work.

      The text would be improved with a careful copy edit, for example:

      The last sentence of the introduction currently reads, "We suggested mechanism of the axon outgrowth which depends on the density of laminin on the substrate, revealing L1CAM-laminin binding as a mechanism for the regulation of axon outgrowth." which is challenging to understand.

      We appreciate the reviewer’s comment pointing out the lack of clarity in the final sentence of the introduction. To improve readability and clarity, we have revised the sentence as follows:

      “In this study, we suggested mechanism of the axon outgrowth that depends on the density of laminin on the substrate, i.e. the L1CAM-laminin binding is key to the regulation of axon outgrowth..” We believe this revised version better conveys our main finding in a more concise and comprehensible manner.

      Line 224 needs to be F-actin and the next sentence is difficult to understand.

      Thank you for pointing this out. We have corrected "F-action" to "F-actin" to ensure accuracy (line 256). Additionally, we have revised the following sentence to improve clarity (line 256-258).

      Line 232 instead of "traction force slows", did you mean the rate of retrograde flow slows?

      Thank you for pointing this out. We mean to refer to the rate of retrograde flow, not the traction force itself. We have revised the wording accordingly to avoid confusion (line 266).

      Line 242, shear-stress instead of share-stress.

      We have corrected the typo into "shear-stress" (line 282).

      Lines 255, 267, and the abstract. The paper doesn't directly address axonal guidance. It would be more accurate to replace axonal guidance with neurite outgrowth.

      Thank you for your insightful comment. We agree that the term "neurite outgrowth" more accurately reflects the scope of our study, as we do not directly examine the mechanisms of axonal guidance. Accordingly, we have revised the text in Lines 273, 275, and the abstract to replace "axonal guidance" with "neurite outgrowth" to better align with the presented data and experimental focus.

      Line 362, perhaps reference (Minegishi et al., 2021) here as it provides a nice explanation of the technique.

      Thank you for the helpful suggestion. We have now added a reference to Minegishi et al., 2021 (line 416, Ref.35) in revised manuscript, as it indeed provides a clear explanation of the method.

    1. eLife Assessment

      Davies et al. present a valuable study proposing that Shot can act as a molecular linker between microtubules and actin during dendrite pruning, suggesting an intriguing role in non-centrosomal microtubule organization. However, the experimental evidence is incomplete and does not robustly support these claims, and the lack of a cohesive model connecting the findings weakens the overall impact. While the data suggest that Shot, actin, and microtubule nucleation contribute to dendritic pruning, their precise interplay remains unresolved.

    2. Reviewer #1 (Public review):

      Summary:

      The Neuronal microtubule cytoskeleton is essential long long-range transport in axons and dendrites. The axon-specific plus-end out microtubule organization vs the dendritic-specific plus-end in organization allows for selective transport into each neurite, setting up neuronal polarity. In addition, the dendritic microtubule organization is thought to be important for dendritic pruning in Drosophila during metamorphosis. However, the precise mechanisms that organize microtubules in neurons are still incompletely understood.

      In the current manuscript, the authors describe the spectraplakin protein Shot as important in developmental dendritic pruning. They find that Shot has dendritic microtubule polarity defects, which, based on their rescues and previous work, is likely the reason for the pruning defect.

      Since Shot is a known actin-microtubule crosslinker, they also investigate the putative role of actin and find that actin is also important for dendritic pruning. Finally, they find that several factors that have been shown to function as a dendritic MTOC in C. elegans also show a defect in Drosophila upon depletion.

      Strengths:

      Overall, this work was technically well-performed, using advanced genetics and imaging. The author reports some interesting findings identifying new players for dendritic microtubule organization and pruning.

      Weaknesses:

      The evidence for Shot interacting with actin for its functioning is contradictory. The Shot lacking the actin interaction domain did not rescue the mutant; however, it also has a strong toxic effect upon overexpression in wildtype (Figure S3), so a potential rescue may be masked. Moreover, the C-terminus-only construct, which carries the GAS2-like domain, was sufficient to rescue the pruning. This actually suggests that MT bundling/stabilization is the main function of Shot (and no actin binding is needed). On the other hand, actin depolymerization leads to some microtubule defects and subtle changes in shot localization in young neurons (not old ones). More importantly, it did not enhance the microtubule or pruning defects of the Shot domain, suggesting these act in the same pathway. Interesting to note is that Mical expression led to microtubule defects but not to pruning defects. This argues that MT organization effects alone are not enough to cause pruning defects. This may be be good to discuss. For the actin depolymerization, the authors used overexpression of the actin-oxidizing Mical protein. However, Mical may have another target. It would be good to validate key findings with better characterized actin targeting tools.

      In analogy to C. elegans, where RAB-11 functions as a ncMTOC to set up microtubules in dendrites, the authors investigated the role of these in Drosophila. Interestingly, they find that rab-11 also colocalizes to gamma tubulin and its depletion leads to some microtubule defects. Furthermore, they find a genetic interaction between these components and Shot; however, this does not prove that these components act together (if at all, it would be the opposite). This should be made more clear. What would be needed to connect these is to address RAB-11 localization + gamma-tubulin upon shot depletion.

      All components studied in this manuscript lead to a partial reversal of microtubules in the dendrite. However, it is not clear from how the data is represented if the microtubule defect is subtle in all animals or whether it is partially penetrant stronger effect (a few animals/neurons have a strong phenotype). This is relevant as this may suggest that other mechanisms are also required for this organization, and it would make it markedly different from C. elegans. This should be discussed and potentially represented differently.

    3. Reviewer #2 (Public review):

      Summary:

      In their manuscript, the authors reveal that the spectraplakin Shot, which can bind both microtubules and actin, is essential for the proper pruning of dendrites in a developing Drosophila model. A molecular basis for the coordination of these two cytoskeletons during neuronal development has been elusive, and the authors' data point to the role of Shot in regulating microtubule polarity and growth through one of its actin-binding domains. The authors also propose an intriguing new activity for a spectraplakin: functioning as part of a microtubule-organizing center (MTOC).

      Strengths:

      (1) A strength of the manuscript is the authors' data supporting the idea that Shot regulates dendrite pruning via its actin-binding CH1 domain and that this domain is also implicated in Shot's ability to regulate microtubule polarity and growth (although see comments below); these data are consistent with the authors' model that Shot acts through both the actin and microtubule cytoskeletons to regulate neuronal development.

      (2) Another strength of the manuscript is the data in support of Rab11 functioning as an MTOC in young larvae but not older larvae; this is an important finding that may resolve some debates in the literature. The finding that Rab11 and Msps coimmunoprecipitate is nice evidence in support of the idea that Rab11(+) endosomes serve as MTOCs.

      Weaknesses:

      (1) A significant, major concern is that most of the authors' main conclusions are not (well) supported, in particular, the model that Shot functions as part of an MTOC. The story has many interesting components, but lacks the experimental depth to support the authors' claims.

      (2) One of the authors' central claims is that Shot functions as part of a non-centrosomal MTOC, presumably a MTOC anchored on Rab11(+) endosomes. For example, in the Introduction, last paragraph, the authors summarize their model: "Shot localizes to dendrite tips in an actin-dependent manner where it recruits factors cooperating with an early-acting, Rab11-dependent MTOC." This statement is not supported. The authors do not show any data that Shot localizes with Rab11 or that Rab11 localization or its MTOC activity is affected by the loss of Shot (or otherwise manipulating Shot). A genetic interaction between Shot and Rab11 is not sufficient to support this claim, which relies on the proteins functioning together at a certain place and time. On a related note, the claim that Shot localization to dendrite tips is actin-dependent is not well supported: the authors show that the CH1 domain is needed to enrich Shot at dendrite tips, but they do not directly manipulate actin (it would be helpful if the authors showed the overexpression of Mical disrupted actin, as they predict).

      (3) The authors show an image that Shot colocalizes with the EB1-mScarlet3 comet initiation sites and use this representative image to generate a model that Shot functions as part of an MTOC. However, this conclusion needs additional support: the authors should quantify the frequency of EB1 comets that originate from Shot-GFP aggregates, report the orientation of EB1 comets that originate from Shot-GFP aggregates (e.g., do the Shot-GFP aggregates correlate with anterogradely or retrogradely moving EB1 comets), and characterize the developmental timing of these events. The genetic interaction tests revealing ability of shot dsRNA to enhance the loss of microtubule-interacting proteins (Msps, Patronin, EB1) and Rab11 are consistent with the idea that Shot regulates microtubules, but it does not provide any spatial information on where Shot is interacting with these proteins, which is critical to the model that Shot is acting as part of a dendritic MTOC.

      (4) It is unclear whether the authors are proposing that dendrite pruning defects are due to an early function of Shot in regulating microtubule polarity in young neurons (during 1st instar larval stages) or whether Shot is acting in another way to affect dendrite pruning. It would be helpful for the authors to present and discuss a specific model regarding Shot's regulation of dendrite pruning in the Discussion.

      (5) The authors argue that a change in microtubule polarity contributes to dendrite pruning defects. For example, in the Introduction, last paragraph, the authors state: "Loss of Shot causes pruning defects caused by mixed orientation of dendritic microtubules." The authors show a correlative relationship, not a causal one. In Figure 4, C and E, the authors show that overexpression of Mical disrupts microtubule polarity but not dendrite pruning, raising the question of whether disrupting microtubule polarity is sufficient to cause dendrite pruning defects. The lack of an association between a disruption in microtubule polarity and dendrite pruning in neurons overexpressing Mical is an important finding.

      (6) The authors show that a truncated Shot construct with the microtubule-binding domain, but no actin-binding domain (Shot-C-term), can rescue dendrite pruning defects and Khc-lacZ localization, whereas the longer Shot construct that lacks just one actin-binding domain ("delta-CH1") cannot. Have the authors confirmed that both proteins are expressed at equivalent levels? Based on these results and their finding that over-expression of Shot-delta-CH1 disrupts dendrite pruning, it seems possible that Shot-delta-CH1 may function as a dominant-negative rather than a loss-of-function. Regardless, the authors should develop a model that takes into account their findings that Shot, without any actin-binding domains and only a microtubule-binding domain, shows robust rescue.

      (7) The authors state that: "The fact that Shot variants lacking the CH1 domain cannot rescue the pruning defects of shot[3] mutants suggested that dendrite tip localization of Shot was important for its function." (pages 10-11). This statement is not accurate: the Shot C-term construct, which lacks the CH1 domain (as well as other domains), is able to rescue dendrite pruning defects.

      (8) The authors state that: "In further support of non-functionality, overexpression of Shot[deltaCH1] caused strong pruning defects (Fig. S3)." (page 8). Presumably, these results indicate that Shot-delta-CH1 is functioning as a dominant-negative since a loss-of-function protein would have no effect. The authors should revise how they interpret these results. This comment is related to another comment about the ability of Shot constructs to rescue the shot[3] mutant.

    1. eLife Assessment

      In this useful study, the authors conducted a set of computational and experimental investigations of the mechanism of cholesterol transport in the smoothened (SMO) protein. The computational component integrated multiple state-of-the-art approaches such as adaptive sampling, free energy simulations, and Markov state modeling, providing support for the proposed mechanistic model, which is also consistent with the experimental mutagenesis data. However, substantial revisions are needed for the discussion of the computational results and interpretation of the literature to provide a more balanced and accurate perspective on cholesterol-mediated SMO regulation. In the current form, therefore, the strength of evidence of the study is considered incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses primarily simulation tools to probe the pathway of cholesterol transport with the smoothened (SMO) protein. The pathway to the protein and within SMO is clearly discovered, and interactions deemed important are tested experimentally to validate the model predictions.

      Strengths:

      The authors have clearly demonstrated how cholesterol might go from the membrane through SMO for the inner and outer leaflets of a symmetrical membrane model. The free energy profiles, structural conformations, and cholesterol-residue interactions are clearly described.

      Weaknesses:

      (1) Membrane Model:

      The authors decided to use a rather simple symmetric membrane with just cholesterol, POPC, and PSM at the same concentration for the inner and outer leaflets. This is not representative of asymmetry known to exist in plasma membranes (SM only in the outer leaflet and more cholesterol in this leaflet). This may also be important to the free energy pathway into SMO. Moreover, PE and anionic lipids are present in the inner leaflet and are ignored. While I am not requesting new simulations, I would suggest that the authors should clearly state that their model does not consider lipid concentration leaflet asymmetry, which might play an important role.

      (2) Statistical comparison of barriers:

      The barriers for pathways 1 and 2 are compared in the text, suggesting that pathway 2 has a slightly higher barrier than pathway 1. However, are these statistically different? If so, the authors should state the p-value. If not, then the text in the manuscript should not state that one pathway is preferred over the other.

      (3) Barrier of cholesterol (reasoning):

      The authors on page 7 argue that there is an enthalpy barrier between the membrane and SMO due to the change in environment. However, cholesterol lies in the membrane with its hydroxyl interacting with the hydrophilic part of the membrane and the other parts in the hydrophobic part. How is the SMO surface any different? It has both characteristics and is likely balanced similarly to uptake cholesterol. Unless this can be better quantified, I would suggest that this logic be removed.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors applied a range of computational methods to probe the translocation of cholesterol through the Smoothened receptor. They test whether cholesterol is more likely to enter the receptor straight from the outer leaflet of the membrane or via a binding pathway in the inner leaflet first. Their data reveal that both pathways are plausible but that the free energy barriers of pathway 1 are lower, suggesting this route is preferable. They also probe the pathway of cholesterol transport from the transmembrane region to the cysteine-rich domain (CRD).

      Strengths:

      (1) A wide range of computational techniques is used, including potential of mean force calculations, adaptive sampling, dimensionality reduction using tICA, and MSM modelling. These are all applied in a rigorous manner, and the data are very convincing. The computational work is an exemplar of a well-carried out study.

      (2) The computational predictions are experimentally supported using mutagenesis, with an excellent agreement between their PMF and mRNA fold change data.

      (3) The data are described clearly and coherently, with excellent use of figures. They combine their findings into a mechanism for cholesterol transport, which on the whole seems sound.

      (4) The methods are described well, and many of their analysis methods have been made available via GitHub, which is an additional strength.

      Weaknesses:

      (1) Some of the data could be presented a little more clearly. In particular, Figure 7 needs additional annotation to be interpretable. Can the position of the cholesterol be shown on the graph so that we can see the diameter change more clearly?

      (2) In Figure 3C, it doesn't look like the Met is constricting the tunnel at all. What residue is constricting the tunnel here? Can we see the Ala and Met panels from the same angle to compare the landscapes? Or does the mutation significantly change the tunnel? Why not A283 to a bulkier residue? Finally, the legend says that the figure shows that cholesterol can still pass this residue, but it doesn't really show this. Perhaps if the HOLE graph was plotted, we could see the narrowest point of the tunnel and compare it to the size of cholesterol.

      (3) The PMF axis in 3b and d confused me for a bit. Looking at the Supplementary data, it's clear that, e.g., the F455I change increases the energy barrier for chol entering the receptor. But in 3d this is shown as a -ve change, i.e., favourable. This seems the wrong way around for me. Either switch the sign or make this clearer in the legend, please.

      (4) The impact of G280V is put down to a decrease in flexibility, but it could also be a steric hindrance. This should be discussed.

      (5) Are the reported energy barriers of the two pathways (5.8{plus minus}0.7 and 6.5{plus minus}0.8 kcal/mol) significantly and/or substantially different enough to favour one over the other? This could be discussed in the manuscript.

      (6) Are the energy barriers consistent with a passive diffusion-driven process? It feels like, without a source of free energy input (e.g., ion or ATP), these barriers would be difficult to overcome. This could be discussed.

      (7) Regarding the kinetics from MSM, it is stated that the values seen here are similar to MFS transporters, but this then references another MSM study. A comparison to experimental values would support this section a lot.

    4. Reviewer #3 (Public review):

      This manuscript presents a study combining molecular dynamics simulations and Hedgehog (Hh) pathway assays to investigate cholesterol translocation pathways to Smoothened (SMO), a G protein-coupled receptor central to Hedgehog signal transduction. The authors identify and characterize two putative cholesterol access routes to the transmembrane domain (TMD) of SMO and propose a model whereby cholesterol traverses through the TMD to the cysteine-rich domain (CRD), which is presented as the primary site of SMO activation.

      The MD simulations and biochemical experiments are carefully executed and provide useful data. However, the manuscript is significantly weakened by a narrow and selective interpretation of the literature, overstatement of certain conclusions, and a lack of appropriate engagement with alternative models that are well-supported by published data-including data from prior work by several of the coauthors of this manuscript. In its current form, the manuscript gives a biased impression of the field and overemphasizes the role of the CRD in cholesterol-mediated SMO activation. Below, I provide specific points where revisions are needed to ensure a more accurate and comprehensive treatment of the biology.

      Major Comments:

      (1) Overstatement of the CRD as the Orthosteric Site of SMO Activation

      The manuscript repeatedly implies or states that the CRD is the orthosteric site of SMO activation, without adequate acknowledgment of alternative models. To give just a few examples (of many in this manuscript):

      a) "PTCH is proposed to modulate the Hh signal by decreasing the ability of membrane cholesterol to access SMO's extracellular cysteine-rich domain (CRD)" (p. 3).

      b) "In recent years there has been a vigorous debate on the orthosteric site of SMO" (p. 3).

      c) "cholesterol must travel through the SMO TMD to reach the orthosteric site in the CRD" (p. 4).

      d) "we observe cholesterol moving along TM6 to the TMD-CRD interface (common pathway, Fig. 1d) to access the orthosteric binding site in the CRD" (p. 6).

      While the second quote in this list at least acknowledges a debate, the surrounding text suggests that this debate has been entirely resolved in favor of the CRD model. This is misleading and not reflective of the views of other investigators in the field (see, for example, a recent comprehensive review from Zhang and Beachy, Nature Reviews Molecular and Cell Biology 2023, which makes the point that both the CRD and 7TM sites are critical for cholesterol activation of SMO as well as PTCH-mediated regulation of SMO-cholesterol interactions).

      In contrast, a large body of literature supports a dual-site model in which both the CRD and the TMD are bona fide cholesterol-binding sites essential for SMO activation. Examples include:

      a) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study).

      b) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig 4 Byrne et al, Nature 2016).

      c) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019, is not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules).

      Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      Recommendation:

      The authors should revise the manuscript to provide a more balanced overview of the field and explicitly acknowledge that the CRD is not the sole activation site. Instead, a dual-site model is more consistent with available structural, mutational, and functional data. In addition, the authors should reframe their interpretation of their MD studies to reflect this broader and more accurate view of how cholesterol binds and activates SMO.

      (2) Bias in Presentation of Translocation Pathways

      The manuscript presents the model of cholesterol translocation through SMO to the CRD as the predominant (if not sole) mechanism of activation. Statements such as: "Cholesterol traverses SMO to ultimately reach the CRD binding site" (p. 6) suggest an exclusivity that is not supported by prior literature in the field. Indeed, the authors' own MD data presented here demonstrate more stable cholesterol binding at the TMD than at the CRD (p 17), and binding of cholesterol to the TMD site is essential for SMO activation. As such, it is appropriate to acknowledge that cholesterol may activate SMO by translocating through the TM5/6 tunnel, then binding to the TMD site, as this is a likely route of SMO activation in addition to the CRD translocation route they highlight in their discussion.

      The authors describe two possible translocation pathways (Pathway 1: TM2/3 entry to TMD; Pathway 2: TM5/6 entry and direct CRD transfer), but do not sufficiently acknowledge that their own empirical data support Pathway 2 as more relevant. Indeed, because their experimental data suggest Pathway 2 is more strongly linked to SMO activation, this pathway should be weighted more heavily in the authors' discussion. In addition, Pathway 2 is linked to cholesterol binding to both the TMD and CRD sites (the former because the TMD binding site is at the terminus of the hydrophobic tunnel, the latter via the translocation pathway described in the present manuscript), so it is appropriate that Pathway 2 figure more prominently than Pathway 1 into the authors' discussion.

      The authors also claim that "there is no experimental structure with cholesterol in the inner leaflet region of SMO TMD" (p 16). However, a structural study of apo-SMO from the Manglik and Cheng labs (Zhang et al., Nat Comm, 2022) identified a cholesterol molecule docked at the TM5/6 interface and also proposed a "squeezing" mechanism by which cholesterol could enter the TM5/6 pocket from the membrane. The authors do not take this SMO conformation into account in their models, nor do they discuss the possibility that conformational dynamics at the TM5/6 interface could facilitate cholesterol flipping and translocation into the hydrophobic conduit, even though both possibilities have precedent in the 2022 empirical cryoEM structural analysis.

      Recommendation:

      The authors should avoid oversimplification of the SMO cholesterol activation process, either by tempering these claims or broadening their discussion to better reflect the complexity and multiplicity of cholesterol access and activation routes for SMO, and consider the 2022 apo-SMO cryoEM structure in their analysis of the TM5/6 translocation pathway.

      (3) Alternative Possibility: Direct Membrane Access to CRD

      The possibility that the CRD extracts cholesterol directly from the membrane outer leaflet is not considered. While the crystal structures place the CRD in a stable pose above the membrane, multiple cryo-EM studies suggest that the CRD is dynamic and adopts a variety of conformations, raising the possibility that the stability of the CRD in the crystal structures is a result of crystal packing and that the CRD may be far more dynamic under more physiological conditions.

      Recommendation:

      The authors should explicitly acknowledge and evaluate this potential mechanism and, if feasible, assess its plausibility through MD simulations.

      (4) Inconsistent Framing of Study Scope and Limitations

      The discussion contains some contradictory and misleading language. For example, the authors state that "In this study we only focused on the cholesterol movement from the membrane to CRD binding site." and then several sentences later state that "We outline the entire translocation mechanism from a kinetic and thermodynamic perspective.". These statements are at odds. The former appropriately (albeit briefly) notes the limited scope of the modeling, while the latter overstates the generality of the findings.

      In addition, the authors' narrow focus on the CRD site constitutes a major caveat to the entire work. It should be acknowledged much earlier in the manuscript, preferably in the introduction, rather than mentioned as an aside in the penultimate paragraph of the conclusion.

      Recommendation:<br /> The authors should clarify the scope of the study and expand the discussion of its limitations. They should explicitly acknowledge that the study models one of several cholesterol access routes and that the findings do not rule out alternative pathways.

      Summary:

      This study has the potential to make a useful contribution to our understanding of cholesterol translocation and SMO activation. However, in its current form, the manuscript presents an overly narrow and, at times, misleading view of the literature and biological models; as such, it is not nearly as impactful as it could be. I strongly encourage the authors to revise the manuscript to include:

      (1) A more balanced discussion of the CRD vs. TMD binding sites.

      (2) Acknowledgment of alternative cholesterol access pathways.

      (3) More comprehensive citation of prior structural and functional studies.

      (4) Clarification of assumptions and scope.

      Of note, the above suggestions require little to no additional MD simulations or experimental studies, but would significantly enhance the rigor and impact of the work.

    1. eLife Assessment

      This study is valuable for understanding how dysfunctional mitochondria contribute to vascular diseases by investigating the influence of Miro1 on smooth muscle cell proliferation and neointima development. The solid findings collectively indicate that Miro1 regulates mitochondrial cristae architecture and the efficiency of the respiratory chain. Nevertheless, the analysis would benefit from a more thorough assessment of the relationship between Miro1-dependent mitochondrial defects and vascular smooth muscle cell proliferation.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima, and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied, and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.

      Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      Weaknesses:

      (1) Figure 3:

      I appreciate the system used to assess mitochondrial distribution; however, I believe that time-lapse microscopy to evaluate mitochondrial movements in real time should be mandatory. The experimental timing is compatible with time-lapse imaging, and these experiments will provide a quantitative estimation of the distance travelled by mitochondria and the fraction of mitochondria that change position over time. I also suggest evaluating mitochondrial shape in control and MIRO1-/- VSMC to assess whether MIRO1 absence could impact mitochondrial morphology, altering fission/fusion machinery, since mitochondrial shape could differently influence the mobility.

      (2) Figure 6:

      The evidence of MIRO1 ablation on cristae remodeling is solid; however, considering that the mechanism proposed to explain the finding is the modulation of MICOS/MIB complex, as shown in Figure 6D, I suggest performing EM analysis in each condition. In my mind, Miro1 KK and Miro1 TM should lead to different cristae phenotypes according to the different impact on MICOS/MIB complex assembly. Especially, Miro1 TM should mimic Miro1 -/- condition, while Miro1 KK should drive a less severe phenotype. This would supply a good correlation between Miro1, MICOS/MIB complex formation and cristae folding.

      I also suggest performing supercomplex assembly and complex I activity with each plasmid to correlate MICOS/MIB complex assembly with the respiratory chain efficiency.

      (3) I noticed that none of the in vitro findings have been validated in an in vivo model. I believe this represents a significant gap that would be valuable to address. In your animal model, it should not be too complex to analyze mitochondria by electron microscopy to assess cristae morphology. Additionally, supercomplex assembly and complex I activity could be evaluated in tissue homogenates to corroborate the in vitro observations.

      (4) I find the results presented in Figure S7 somewhat unclear. The authors employ a pharmacological strategy to reduce Miro1 and validate the findings previously obtained with the genetic knockout model. They report increased mitophagy and a reduction in mitochondrial mass. However, in my opinion, these changes alone could significantly impact cellular metabolism. A lower number of mitochondria would naturally result in decreased ATP production and reduced mitochondrial respiration. This, in turn, weakens the proposed direct link between Miro1 deletion and impaired metabolic function or altered electron transport chain (ETC) activity. I believe this section would benefit from additional experiments and a more in-depth discussion.

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture, and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses is suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach, assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo, and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) There is a consistent lack of reporting across figure legends, including group sizes, n numbers, how many independent experiments were performed, or whether the data is mean +/- SD or SEM, etc. This needs to be corrected.

      (2) The in vivo carotid injury experiments are in male mice fed a high-fat diet; this should be explicitly stated in the abstract, as it's unclear if there are any sex- or diet-dependent differences. Is VSMC proliferation/neointima formation different in chow-fed mice after carotid injury?

      (3) The main body of the methods section is thin, and it's unclear why the majority of the methods are in the supplemental file. The authors should consider moving these to the main article, especially in an online-only journal.

      (4) Certain metabolic analyses are suboptimal, including ATP concentration and Complex I activity measurements. The measurement of ATP/ADP and ATP/AMP ratios for energy charge status (luminometer or mass spectrometry), while high-resolution respirometry (Oroboros) to determine mitochondrial complex I activity in permeabilized VSMCs would be more informative.

      (5) The statement that 'mitochondrial mobility is not required for optimal ATP production' is poorly supported. XF Seahorse analysis should be performed with nocodazole and also following MIRO1 reconstitution +/- EF hands.

      (6) The authors should consider moving MIRO1 small molecule data into the main figures. A lot of value would be added to the study if the authors could demonstrate that therapeutic targeting of MIRO1 could prevent neointima formation in vivo.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are potentially useful for understanding the importance of mitochondrial positioning and function in this specific cell type within health and disease contexts, the evidence presented appears incomplete, with key bioenergetic and mechanistic claims lacking adequate support.

      Strengths:

      (1) The study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      (2) It explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a potentially significant area for both basic and translational biology.

      (3) The use of both in vivo and in vitro systems provides a potentially useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      (1) The central claim that MIRO1 loss impairs mitochondrial bioenergetics is not convincingly demonstrated, with only modest changes in respiratory parameters and no direct evidence of functional respiratory chain deficiency.

      (2) The proposed link between MIRO1 and respiratory supercomplex assembly or function is speculative, lacking mechanistic detail and supported by incomplete or inconsistent biochemical data.

      (3) Key mitochondrial assays are either insufficiently controlled or poorly interpreted, undermining the strength of the conclusions regarding oxidative phosphorylation.

      (4) The study does not adequately assess mitochondrial content or biogenesis, which could confound interpretations of changes in respiratory activity.

      (5) Overall, the evidence for a direct impact of MIRO1 on mitochondrial respiratory function in the experimental setting is weak, and the conclusions overreach the data.

    1. eLife Assessment

      This study reports a dynamic association/dissociation between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae under different metabolic conditions that control TCA pathway flux rate. The research question is timely, the use of the NanoBiT split-luciferase system to monitor protein-protein interactions is innovative, and the significance of the findings is valuable. However, the strength of evidence needed to support the conclusions was found to be incomplete based on a lack of critical control and mechanistic experiments.

    2. Reviewer #1 (Public review):

      Summary:

      The study by the Obata group characterizes the dynamics of the canonical malate dehydrogenase-citrate synthase metabolon in yeast.

      Strengths:

      The study is well-written and appears to give clear demonstrations of this phenomenon.

      Studies of the dynamics of metabolon formation are rare; if the authors can address the concern detailed below, then they have provided such for one of the canonical metabolons in nature.

      Weaknesses:

      There is a fundamental issue with the study, which is that the authors do not provide enough support or information concerning the split luciferase system that they use. Is the binding reversible or not? How the data is interpreted is massively influenced by this fact. What are the pros and cons of this method in comparison to, for example, FLIM-FRET? The authors state that the method is semi-quantitative - can they document this? All of the conclusions are based on the quality of this method. I know that it has been used by others, but at least some preliminary documentation to address these questions is required.

    3. Reviewer #2 (Public review):

      This study explores the dynamic association between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae, with the aim of linking this interaction to respiratory metabolism. Utilizing a NanoBiT split-luciferase system, the authors monitor protein-protein interactions in vivo under various metabolic conditions.

      Major Concerns:

      (1) NanoBiT Signal May Reflect Protein Abundance Rather Than Interaction Strength

      In Figure 1C, the authors report increased MDH1-CIT1 interaction under respiratory (acetate) conditions and decreased interaction during fermentation (glucose), as indicated by NanoBiT luminescence. However, this signal appears to correlate strongly with the expression levels of MDH1 and CIT1, raising the possibility that the observed luminescence reflects protein abundance rather than specific interaction dynamics. To resolve this, NanoBiT signals should be normalized to the expression levels of both proteins to distinguish between abundance-driven and interaction-driven changes.

      (2) Lack of Causal Evidence

      The study presents a series of metabolic perturbation experiments (e.g., arsenite, AOA, antimycin A, malonate) and correlates changes in metabolite levels with NanoBiT signals. However, these data are correlative and do not establish a functional role for the MDH1-CIT1 interaction in metabolic regulation. To demonstrate causality, the authors should implement approaches to specifically disrupt the MDH1-CIT1 interaction. One strategy could involve using a 15-residue peptide (Pept1) derived from the Pro354-Pro366 region of CIT1, previously shown to mediate the interaction, or introducing the cit1Δ3 (Arg362Glu) mutation, which perturbs binding. Metabolic flux analysis using ^13C-labeled glucose and mitochondrial respiration assays (e.g., Seahorse) could then assess functional consequences.

      (3) Absence of Protein Expression Controls Under Perturbation Conditions

      In experiments involving acetate, arsenite, AOA, antimycin A, and malonate, the authors infer changes in MDH1-CIT1 association based solely on NanoBiT signals. However, no accompanying data are provided on MDH1 and CIT1 protein levels under these conditions. This omission weakens the conclusions, as altered expression rather than interaction strength could underlie the observed luminescence changes. Immunoblotting or quantitative proteomics should be used to confirm constant protein expression across conditions.

      Conclusion:

      Although the central question is compelling and the use of NanoBiT in live cells is a strength, the manuscript requires additional experimental rigor. Specifically, normalization of interaction signals, introduction of causative perturbations, and validation of protein expression are essential to substantiate the study's claims.

    4. Reviewer #3 (Public review):

      Summary:

      Metabolons are multisubunit complexes that promote the physical association of sequential enzymes within a metabolic pathway. Such complexes are proposed to increase metabolic flux and efficiency by channeling reaction intermediates between enzymes. The TCA cycle enzymes malate dehydrogenase (MDH1) and citrate synthase (CIT1) have been linked to metabolon formation, yet the conditions under which these enzymes interact, and whether such interactions are dynamic in response to metabolic cues, remain unclear, particularly in the native cellular context. This study uses a nanoBIT protein-protein interaction assay to map the dynamic behavior of the MDH1-CIT1 interaction in response to multiple metabolic stimuli and challenges in yeast. Beyond mapping these interactions in real time, the authors also performed GC-MS metabolomics to map whole-cell metabolite alterations across experimental conditions. Finally, the authors use microscale thermophoresis to determine components that alter the MDH1-CIT1 interaction in vitro. Collectively, the authors synthesize their collected data into a model in which the MDH1-CIT1 metabolon dissociates in conditions of low respiratory flux, and is stimulated during conditions of high respiratory flux. While their data largely support these models, some key exceptions are found that suggest this model is likely oversimplified and will require further work to understand the complexities associated with MDH1-CIT1 interaction dynamics. Nonetheless, the authors put forth an interesting and timely toolkit to begin to understand the interaction kinetics and dynamics of key metabolic enzymes that should serve as a platform to begin disentangling these important yet understudied aspects of metabolic regulation.

      Strengths:

      (1) The authors address an important question: how do metabolon-associated protein-protein interactions change across altered metabolic conditions?

      (2) The development and validation of the MDH1-CIT1 nanoBIT assay provides an important tool to allow the quantification of this protein-protein interaction in vivo. Importantly, the authors demonstrate that the assay allows kinetic and real time assessment of these protein interactions, which reveal interesting and dynamic behavior across conditions.

      (3) The use of classic biochemical techniques to confirm that pH and various metabolites can alter the MDH1-CIT1 interaction in vitro is rigorous and supports the model put forth by the authors.

      Weaknesses:

      (1) Some of the data collected seem to be merely reported rather than synthesized and interpreted for the reader. This is particularly true for data that seem to reflect more complex trends, such as the GC-MS experiments that map metabolites across multiple experiments, or treatments that show somewhat counterintuitive results, such as the antimycin A treatment, which promotes rather than disrupts the MDH1-CIT1 interaction.

      (2) Some of the assertions put forth in the manuscript are not substantiated by the data presented, and the authors are at times overly reliant on previous findings from the literature to support their claims. This is particularly notable for claims about "TCA cycle flux"; the authors do not perform flux analysis anywhere in their study and should be cautious when insinuating correlations between their observations and "flux".

      (3) The manuscript presentation could be improved. For figures, at times, the axes do not have intuitive labels (example, Figure 1A), data points and details about the number of samples analyzed are missing (bar graphs and box plots), and molecular weight markers are not reported on western blots. The authors refer to the figures out of order in the text, which makes the manuscript challenging to navigate as a reader.

    1. eLife Assessment

      This useful study analyzed 335 Mycobacterium tuberculosis Complex genomes and found that MTBC has a closed pangenome with few accessory genes. The research provides solid evidence for gene presence-absence patterns which support the appending conclusions however, the main criticism regarding the dominance of genome reduction remains.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focusing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. The second result is still questionable because it relies on a method that disregards paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The main criticism regarding the dominance of genome reduction remains after two rounds of revisions. A method that systematically excludes paralogs is hardly suitable to draw conclusions about the relative importance of insertions/duplications and deletions in a clonal organism, where any insertion/duplication will result in a paralog. I understand that a re-analysis of the data might not be practical, and the authors have added a few sentences in the discussion that touch on this problem. However, the statements regarding the dominance of genome reduction remain too assertive given this basic flaw.

      Here are the more detailed argument from the previous review:

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage level as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focussing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. In the revised article, the description of the data set and the methods is much improved, and the comparison of the two pangenome approaches is more consistent. I still think, however, that the discussion of genome reduction suffers from a basic flaw, namely the failure to distinguish clearly between orthologs and homologs/paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Reviewer #2 (Public review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. This study provides strong evidence that the MTBC pangenome is closed and that genome reduction is the main driver of this species evolution.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that was previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. Lastly, ample statistical support in the form of Heaps law and genome fluidity calculations for each pangenome to demonstrate that they are indeed closed.

      Weaknesses:

      There are no major weaknesses in the revised version of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 27: "lineage-specific and -independent deletions": it is still not clear to me what a lineage-independent, or convergent, deletion is supposed to be. TBD1, for instance, is not lineage-specific, but it is also not convergent: it occurred once in the common ancestor of lineages 1, 2, and 3, while convergence implies multiple parallel occurrences.

      We have changed this and in other places to more evolutionary terms, such as divergent (single event) and convergent (multiple events), or explain exactly what is meant where needed.

      l. 118: "where relevant", what does that mean?

      This was superfluous to the description and so is now removed.

      l. 178ff.: It is not clear to me what issue is addressed by this correction of the pangenome graph. Also here there seems to be some confusion regarding orthologs and paralogs. A gene or IS copy can be present at one locus but absent at another, which is not a mistake of Pangraph that would require correction. It's rather the notion of "truly absent region" which is ambiguous.

      We have changed the text to be more specific on the utility of this step. Since it is known that Panaroo mislabels some genes as being absent due to over splitting (see Ceres et al 2022 and our reclassification earlier in the paper), we wanted to see if the same occurred in Pangraph. We have modified the methods text to be more specific (line 181) and in the results included the percentage of total genes/regions affected by this correction.

      In relation to copy number, Pangraph is not syntenic in its approach; if a region is present anywhere it is labelled as present in the genome. Pangraph will look for multiple copies of that region (e.g. an IS element) but indeed we did not look for specific syntenic changes across the genomes. This would be a great analysis and something we will consider in the future; we have indicated such in the discussion (line 454).

      l. 305: "mislabelled as absent": see above, is this really 'mislabelled'?

      See answer to question above

      l. 372: "using the approach": something missing here.

      This was superfluous to the description and so is now removed.

      l. 381: the "additional analysis of paralogous blocks" (l. 381) seems to suffer from the same confusion of ortho- and paralogy described above: no new sub-lineage-specific accessory regions are found presumably because the analysis did consider any copy rather than orthologous copies.

      Paralogous copies were looked for by Pangraph, and we did not find any sub-lineage where all members had additional copies compared to other sub-lineages. Indeed, single genomes could have these, and shorter timescales could see a lot of such insertions, but we looked at longer-scale (all genomes within a sub-lineage) patterns and did not find these. These limitations are already outlined in the discussion.

      l. 415: see above. There is no diagnosis of a problem that would motivate a "correction". That's different from the correction of the Panaroo results, where fragmented annotations have been shown to be a problem.

      Of interest, the refining of regions did re-label multiple regions as being core when Pangraph labelled it as absent from some genomes was at about the same rate as the correction to Pangraph (2% of genes/regions). This indicates there is a stringency issue with pangraph where blocks are mislabelled as absent. The underlying reason or this is not clear but the correction is evidently required in this version of Pangraph.

      l. 430ff.: The issue of paralogy and that the "same" gene or region is defined in terms of homology rather than orthology should be addressed here. For me the given evidence does not support the claim that deletion is driving molecular evolution in the MTBC.

      As outlined above, indeed paralogy may be driving some elements of the overall evolutionary patterns; our analysis just did not find this. Panaroo without merged paralogs did not find paralogous genes as a main differentiating factor for any sub-lineage. Pangraph also did not find multiple copies of blocks present in all genomes in a sub-lineage. As outlined above, indeed single genomes show such patterns but we did not include single genome analyses here, and outline that as a next steps in the discussion. We have also linked to a recent pangenome paper that showed duplication is present in the pangenome of Mtbc, although not related to any specific lineage (Discussion line 485).

      l. 443 ff: "lineage-independent deletions (convergent evolution)": see above, I still think this terminology is unclear

      This has now been made clearer to be specifically about convergent and divergent evolutionary patterns.

    1. eLife Assessment

      The authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in non-small cell lung cancer, proposing that resistance arises from signaling rewiring rather than additional mutations. While the study addresses a valuable clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation, meaning the strength of evidence is incomplete

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      (2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition ( i.e., H358 10.1016/j.jtcvs.2005.06.051 ; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

      Acquired resistance to KRAS<sup>G12C</sup> inhibitors such as sotorasib or adagrasib remains a significant clinical challenge. Therefore, the identification of mechanisms of acquired resistance, along with the development of alternative therapeutic strategies, including combination therapies with KRAS inhibitors, represents an urgent unmet clinical need. The emergence of secondary KRAS mutations or new mutations in other oncogenic drivers has been observed as a primary cause of acquired resistance in a fraction of patients. No identifiable mutations were detected in more than half of the tumors from patients who developed acquired resistance after treatment with sotorasib or adagrasib.

      Using a discovery-based approach that integrated global proteomic and phosphoproteomic analyses in the TC303AR and TC314AR PDX models, we identified distinct protein signatures associated with KRAS reactivation, upregulation of mTORC1 signaling, and activation of the PI3K/AKT/mTOR pathway. These findings prompted further investigation into these mechanisms of resistance and evaluation of novel therapeutic combinations to overcome resistance. Notably, the combination of sotorasib with copanlisib (a PI3K inhibitor), or the combination of sotorasib with AZD8055 or sapanisertib (mTORC1/2 dual inhibitors) demonstrated strong potential for future clinical use. These regimens effectively restored sotorasib sensitivity in both in vitro and in vivo models and produced robust, synergistic antitumor effects across various acquired resistance models.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      Whole exome sequencing was performed on resistant cells or PDX models to confirm retention of the KRAS<sup>G12C</sup> mutation and to identify secondary KRAS mutations, none of which were found. We acknowledge that additional resistance mechanisms may be involved. These will be the focus of future investigations.

      The revised manuscript will feature improved figure quality, complete and clarified figure legends, and corrected textual errors to enhance overall clarity and presentation.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      We will revise the abstract to ensure that the key findings and overall message are clearly communicated and easily understood by readers.

      2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      Will be done accordingly in the revised version

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      WES was done for KRAS to identify secondary mutations in the KRAS as well as to verify the retention of the KRAS<sup>G12C</sup> mutation in these AR models. WES data will be provided as supplements

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      The texts will be revised accordingly

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      The raw data associated with the proteomics and global proteomics will be added as supplements.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      The authors agree that other pathways may be involved, and this will be the subject of future studies. The raw data will be added as supplements.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      will be uploaded according to the journal guidelines

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      We agree that KRAS activation can be assessed through various methods. In this manuscript, which primarily focuses on mechanisms of resistance, pathway analysis revealed upregulation of KRAS signaling. This finding correlated with the incomplete inhibition of p-ERK by sotorasib in resistant cells. Notably, p-ERK status is widely recognized and routinely used as a surrogate marker for KRAS pathway activation.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      H23 and H358 cells are highly sensitive to sotorasib, as demonstrated by the cell viability assays presented in Figure 2. As shown in Figure 3—figure supplement 3, sotorasib treatment led to complete inhibition of p-ERK in these parental cell lines. In contrast, p-ERK inhibition was incomplete in the resistant H23AR and H358AR cells. Moreover, these AR cells were continuously cultured under sotorasib pressure to maintain resistance.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition ( i.e., H358 10.1016/j.jtcvs.2005.06.051 ; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      We agree. Based on our in vitro (Figure 4) and in vivo (Figure 7) data, copanlisib was able to overcome sotorasib resistance, demonstrating either synergistic or additive effects depending on the specific model. These findings support the potential of combining PI3K inhibition with KRAS<sup>G12C</sup> inhibition as a promising strategy to address acquired resistance.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

      Both H23 and H23AR cells showed high sensitivity to copanlisib, as shown in Figure 4. Combination index analysis for the copanlisib + sotorasib treatment (Figure 7A) revealed synergistic effects on cell viability at specific concentrations. However, in the in vivo experiment (Figure 7G), we did not observe a clear synergistic effect of the combination treatment against H23AR xenografts. This may be attributed to the dose of copanlisib used, which was potentially sufficient on its own to produce a strong antitumor response, thereby masking any additional benefit from the combination.

    1. eLife Assessment

      This important work substantially advances our understanding of how accessory olfactory bulb neurons respond to social odor cues across the estrous cycle, showing that responses vary with the strain and sex of the odor source but display no consistent differences between estrous and non-estrous states. It employs a unique electrophysiology preparation that activates the vomeronasal organ pump via electric stimulation, enabling precise recordings of accessory olfactory bulb cell responses to different chemosignals in anesthetized mice. Overall, the study presents convincing findings on the stability and variability of accessory olfactory bulb response patterns, indicating that while accessory olfactory bulb detects social signals, it does not appear to interpret them based on reproductive state. This work will be of interest to those studying olfaction, social behavior, reproductive cycles, and systems neuroscience more broadly.

    2. Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine. This could be an important future direction.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with the strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      (1) The figures could be better labeled.

      We revised all figures (except the model figure, Fig. 8), and among other improvements (many of which were suggested by the reviewers in other comments), added more labelling and annotation within the figures.

      (2) For Figure 2E, please plot the error bar. Are there any statistics performed to compare the mean responses?

      We added error bars (standard errors of the mean). We had not originally performed statistical comparisons between the stimuli, but now we have. The analysis of responses strength now appears in a new table (Table 1)

      (3) For Figure 2D, it will be more informative to plot the percentage of responsive units.

      Done.

      (4) Could the similarity in response be explained by the similarity in urine composition? The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine.

      We agree. As we wrote in the Discussion: “Ultimately, lacking knowledge of the chemical space associated with each of the stimuli, this and all the other ideas developed here remain speculative.” We note however, that chemical distance (which in itself is hard to define) will provide only part of the picture. The other part is the “projection” of chemical space on the receptor array. This is an idea that we develop in the Discussion and in Figure 8. Specifically, that it is the combination of stimulus composition, and receptor tuning properties that will determine stimulus distances in neuronal space.

      That said, a better understanding of the chemical distance is an important aspect that we are working to include in our future studies. For this dataset unfortunately, we have no such data.

      (5) If it is not possible for the authors to obtain these data first-hand, published data on MUPs and chemicals found in these urines may provide some clues.

      This comment is directly related to the previous one. Measurements about some classes of molecules may be found for some of the stimuli that we used here, but not for all. We are not aware of any single dataset that contains this information for any type of molecule across the entire stimulus set that we have used and pooling results from different studies has limited validity because of the biological and technical variability across studies. In order to reliably interpret our current recordings, it would be necessary to measure the urinary content of the very same samples that were used for stimulation. Unfortunately, we are not able to conduct this analysis at this stage.

      (6) It is not very clear to me whether the female overrepresentation is because there are truly more AOB cells that respond to females than males or because there are only two female samples but 9 male samples.

      The definitive answer to this comment is given in our response to the next one.

      Nevertheless, we agree that this is an important point. It is true that the number of neurons fulfilling each of the patterns depends on the number of individual stimuli that define it (and on the frequency of neurons that respond to those stimuli). However, our measure of “over representation” was designed to overcome this bias, by using bootstrapping to reveal if the observed number of patterns is larger than expected by chance.  The higher frequency of responses to female, as compared to male stimuli, is observed in other studies by others and by us, also when the number of male and female stimuli is matched (e.g., Bansal et al BMC Biol 2021, Ben-Shaul et al, PNAS 2010, Hendrickson et al, JNS, 2008). However, here, by overrepresentation, we do not refer to the higher frequency of female responding neurons, but rather that given the number of responding neurons, the female pattern is more common than expected by chance.

      (7) If the authors only select two male samples, let's say ICR Naïve and ICR DOM, combine them with responses to two female samples, and do the same analysis as in Figure 3, will the female response still be overrepresented?

      Following this suggestion, we have performed this analysis, and we were glad to see that the result is the one we had anticipated. Below, we provide an image of the results, following the same approach that we applied before, and showed in Figure 3C. Here, we defined a female pattern (using the two female samples) and compared it to a male pattern (using the ICR naïve and ICR DOM as suggested). It is as if we had only four stimuli in our set. As in the article, we calculated the expected distribution with 100,000 shuffles. We denoted this pattern as F/M ICR. The results are shown below.

      Under the present conditions, the distribution of the number of female selective patterns is larger (i.e., shifted to the right, compare to the female category in Figure 3C. This is expected, since now the criterion is more permissive. Specifically, now to qualify as a “female pattern”, the two responses to female urine must be stronger only than the responses to the two male stimuli included in this analysis (and to all other responses). Notably, although the null distribution shifted to the right, the actual number of neurons fulfilling this pattern is also larger, so that the actual number remains significantly larger than expected by chance. This is also true for the reverse category (as is the case in the ~female category Figure 3C).  Thus, we conclude that overrepresentation of the female pattern is not a trivial consequence of the number of male and female stimuli.

      Author response image 1.

      (8) In Figure 4B and 4C, the pairwise distance during non-estrus is generally higher than that during estrus, although they are highly correlated. Does it mean that the cells respond to different urines more distinctively during diestrus than in estrus?

      This is an important observation (!) and we had originally overlooked it.  It is true that higher distance (as they are in estrus) imply more distinct population level responses and hence better discrimination among stimuli. However, this is inconsistent with all our other analyses that do not point to enhanced selectivity or discrimination in either state. If anything, we find somewhat higher sparseness in estrus.  Yet, there may be technical explanations for the differences.

      For Euclidean distances, the explanation may be trivial. The distance depends on the number of dimensions (i.e., units), and since our sample contains more neurons recorded during non-estrus, the larger distance is expected.

      In fact, there is a similar dependence on sample size for the correlation distance. Smaller samples are associated with higher (spurious) correlations, and hence larger samples are be associated with larger distances. To demonstrate this, we conducted a simple simulation, where we calculated the absolute correlation coefficients of random samples from standard normal distributions (using the MATLAB function randn), changing the size of the population. For each sample size, we conducted 1000 tests. We considered sample sizes from 10 to 100000, including 200 and 300 (which are similar to our sample sizes). The results are shown in the figure below. Note that the absolute value of the correlation coefficient decreases with sample size, while the p-value for the observed correlation is stable at ~0.5.

      While this is not a rigorous analysis of this issue, and while it does not exactly reflect the scenario in our data, where correlations are generally positive, it shows that the observed correlation (and hence correlation distance) is also affected by sample size.

      For these reasons, we focus on comparison of these distances, rather than the absolute values of the correlation distances.

      Author response image 2.

      Following this comment, we now write in the manuscript:

      “We first note that distances are generally larger during non-estrus, suggesting enhanced discrimination during this stage. However, further analyses of sparseness and selectivity do not support this idea (see below). Furthermore, we note that both Euclidean and correlation distances generally depend on sample size. In both cases, distances are expected to increase as a function of sample size, which in our dataset, is larger for the non-estrus (n = 305) as compared to the estrus (n = 241) neurons. Because of this factor, we focus here on the similarity of the relative within-state distances across the states (and not on their absolute magnitudes). Specifically, we find a positive and significant correlation among pairwise population distances under the two states. Thus, at the population level, representational space remains broadly stable across the estrus cycle. Nevertheless, several points in Fig. 4D, E clearly diverge from a linear relationship, implying that representational space differs under the two states. We next examine such state-dependent changes in more detail.”

      (9) The correlation analysis is not entirely intuitive when just looking at the figures. Some sample heatmaps showing the response differences between estrous states will be helpful.

      If we understand correctly, the idea is to show the correlation matrices from which the values in 4B and 4C are taken. The relevant images are now included in figure 4B, C and are references within the main text.

      Reviewer #2 (Public review):

      Summary:

      Many aspects of the study are carefully done, and in the grand scheme this is a solid contribution. I have no "big-picture" concerns about the approach or methodology. However, in numerous places the manuscript is unnecessarily vague, ambiguous, or confusing. Tightening up the presentation will magnify their impact.

      We have reviewed the text and made substantial editing changes. Along with other specific comments by made both reviewers, we hope that these changes improve the presentation.

      Strengths:

      (1) The study includes urine donors from males of three strains each with three social states, as well as females in two states. This diversity significantly enhances their ability to interpret their results.

      (2) Several distinct analyses are used to explore the question of whether AOB MCs are biased towards specific states or different between estrus and non-estrus females. The results of these different analyses are self-reinforcing about the main conclusions of the study.

      (3) The presentation maintains a neutral perspective throughout while touching on topics of widespread interest.

      Weaknesses:

      (1) Introduction:

      The discussion of the role of the VNS and preferences for different male stimuli should perhaps include Wysocki and Lepri 1991

      We assume that the reviewer is referring to “Consequences of removing the vomeronasal organ” by Wysocki CJ, Lepri JJ, a review article in J Steroid Biochem from 1991. We were not familiar with this specific article and have now read it. The article discusses various male behaviors, and some effects on female behavior and physiology (e.g., puberty acceleration, maternal behaviors, ovulation) but we could not find any mention of the preference of female mice in this article. We also expanded our search to all pubmed articles authored by Wysocki and Lepri and then all articles by Wysocki (with the keyword Vomeronasal). Despite our best intentions to give due credit, we found nothing that seems directly related to this statement. Please correct us if we had missed anything.

      (2) Results:

      a) Given the 20s gap between them, the distinction between sample application and sympathetic nerve trunk stimulation needs to be made crystal clear; in many places, "stimulus application" is used in places where this reviewer suspects they actually mean sympathetic nerve trunk stimulation.

      We realize that this is confusing, and we also agree that at least in one place, we have not been sufficiently clear about the distinction. To clarify, we distinguish between stimulus application (physical application of stimulus to the nostril), and stimulation (which refers to SNT stimulation, which typically induces VNO suction). The general term stimulus presentation refers to the entire process. As explained in the text, in our analysis, we consider the entire window starting at application and ending 40s after stimulation. This is because we sometimes observe immediate responses following application. One such responses is seen in Figure 2D, and this is directly related to a detailed comment made below (on Figure 1D, part c). Indeed, for this figure time 0 indicates stimulus application. This was indicated previously, but we have now rearranged order of the panels to make the distinction between this response and other clearer. We have also revised the figure caption and the text to clarify this issue.

      b) There appears to be a mismatch between the discussion of Figure 3 and its contents. Specifically, there is an example of an "adjusted" pattern in 3A, not 3B.

      True. we have revised the text to correctly refer to the figure. Thanks.

      c) The discussion of patterns neglects to mention whether it's possible for a neuron to belong to more than one pattern. For example, it would seem possible for a neuron to simultaneously fit the "ICR pattern" and the "dominant adjusted pattern" if, e.g., all ICR responses are stronger than all others, but if simultaneously within each strain the dominant male causes the largest response.

      This is true. In the legend to Figure 3B, we actually wrote: “A neuron may fulfill more than one pattern and thus may appear in more than one row.”, but we now also write in the main text:

      “We note that criteria for adjusted patterns are less stringent than for the standard patterns defined above. Furthermore, some patterns are not mutually exclusive, and thus, a neuron may fulfil more than a single pattern.”

      (3) Discussion:

      a) The discussion of chemical specificity in urine focuses on volatiles and MUPs (citation #47), but many important molecules for the VNS are small, nonvolatile ligands. For such molecules, the corresponding study is Fu et al 2015.

      Agreed. We now cite this work and several others that were not included before in the context of chemical and electrophysiological analyses.

      b) "Following our line of reasoning, this scarcity may represent an optimal allocation of resources to separate dominant from naïve males": 1 unit out of 215 is roughly consistent with a single receptor. Surely little would be lost if there could be more computational capacity devoted to this important axis than that? It seems more likely that dominance is computed from multiple neuronal types with mixed encoding.

      We fully agree, and we are not claiming that dominance, nor any other feature, is derived using dedicated feature selective neurons. Our discussion of resource allocation is inevitably speculative. Our main point in this context is that a lack of overrepresentation does not imply that a feature is not important. As a note, we do not think that there is good reason to suppose that AOB neurons reflect the activity of single receptors.

      To present this potential confusion, we now added the following sentences in the Discussion subsection titled “Response patterns of AOB-MCs”:

      “We stress that we do not suggest that features such as physiological state are encoded by the activity of single neurons. In fact, we believe that most ethologically relevant features are encoded by the activity of multiple neurons. Nevertheless, such population level representations ultimately depend on the response properties of individual neurons, and we thus ask: what can we learn from our analysis of response pattern frequency?”

      (4) Methods:

      a) Male status, "were unambiguous in most cases": is it possible to put numerical estimates on this? 55% and 99% are both "most," yet they differ substantially in interpretive uncertainty.

      Upon reexamination, we realized that this sentence is incorrect. Ambiguous cases were not considered as dominant for urine collection. We only classified mice as dominant if they “won” the tube test and exhibited dominant behavior in the subsequent observation period in the cage. The phrasing has now been corrected in the manuscript (Methods section).

      b) Surgical procedures and electrode positioning: important details of probes are missing (electrode recording area, spacing, etc).

      This information has been added to the Methods subsection “Surgical procedures and electrode positioning”

      c) Stimulus presentation procedure: Are stimuli manually pipetted or delivered by apparatus with precise timing?

      They are delivered manually. This has now been clarified in the text.

      d) Data analysis, "we applied more permissive criteria involving response magnitude": it's not clear whether this is what's spelled out in the next paragraph, or whether that's left unspecified. In either case, the next paragraph appears to be about establishing a noise floor on pattern membership, not a "permissive criterion."

      True, the next paragraph is not the explanation for the more permissive criteria. The more permissive criteria involving response magnitude are actually those described in Figure 3A and 3B. The sentence that was quoted above merely states that before applying those criteria, we had also searched for patterns defined by binary designation of neurons as responsive, or not responsive, to each of the stimuli (this is directly related to the next comment below). Using those binary definitions, we obtained a very small number of neurons for each pattern and thus decided to apply the approach actually used and described in the manuscript.

      To clarify this confusion, we thoroughly derived the description of this paragraph, and the beginning of the next one in the Methods section.

      e) Data analysis, method for assessing significance: there's a lot to like about the use of pooling to estimate the baseline and the use of an ANOVA-like test to assess unit responsiveness.

      But:

      i) for a specific stimulus, at 4 trials (the minimum specified in "Stimulus presentation procedure") kruskalwallis is questionable. They state that most trials use 5, however, and that should be okay.

      The exact values are now given in the text. The mean number of repeated presentations per stimulus: 5.1± 0.9, mean ± sd. In 72% of the cases, stimuli were given 5 or more times. Otherwise, they were presented 4 times. In the context of the statistical test, we note that we are not comparing 5 (or 4) values with another set of 5 (or 4 values), but with a much larger sample (~44-55 baseline trials – given 11 trials and 4-5 repeats of each). Under this scenario, we think that the statistical approach is sound. However, the more important consideration, in our opinion, is given below.

      ii) the methods statement suggests they are running kruskalwallis individually for each neuron/stimulus, rather than once per neuron across all stimuli. With 11 stimuli, there is a substantial chance of a false-positive if they used p < 0.05 to assess significance. (The actual threshold was unstated.) Were there any multiple comparison corrections performed? Or did they run kruskalwallis on the neuron, and then if significant assess individual stimuli? (Which is a form of multiple-comparisons correction.)

      First, we indeed failed to mention that our criterion was 0.05. This has been corrected, by adding the information to the results and the Methods sections. No, we did not apply any multiple comparison measures. We consider each neuron-stimulus pair as an independent entity, and we are aware that this leads to a higher false positive rate. On the other hand, applying multiple comparisons would be problematic, as the same number of stimuli used in different studies varies. Application of multiple comparison corrections would thus lead to different response criteria across different studies, which would be very problematic. This raises the almost philosophical question regarding the use of multiple comparisons (as well as one and two tailed tests), but practically, most, if not all of our conclusions involve comparisons across conditions. For this purpose, we think that our procedure is valid. More generally, while selection of responses according to significance has some obvious advantages, the decision to use any particular criterion is entirely arbitrary. Therefore, we do not attach any special meaning to the significance threshold used here. Rather, we think of it as a simple criterion that allows us to exclude weakly responding or non-responsive neurons, and to compare frequencies of neurons that fulfill this criterion, under different conditions and contexts.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Results:

      "are represented more than represented by chance" seems to have a misplaced word

      True. Thanks. Corrected.

      Figure 1D:

      a) Indicate the meaning of the number that appears in the top left for each unit (10, 5, 40, 5, 5) (I'm guessing it's the vertical scale for the PSTH, but best to spell it out explicitly.)

      This information has been added.

      b) "The red vertical line indicates stimulus application": is it the application of the chemical stimulus or SNT shock?

      Please see our answer to c

      c) "For unit 2, time 0 indicate stimulus application, as in this case, responses began after stimulus application, prior to stimulation." First, the meaning of time 0 for the other units is not clearly specified (we infer that unit 2 is an exception, but we don't know what most of them mean). Second, it seems as if the response (?) to ICR naive begins even before stimulus application.

      This issue was also mentioned above as the 2nd weakness raised by this reviewer. To explain the meaning of the red lines, and resolve this confusion, we revised the figure caption text to indicate that for all units (except the former unit 2) time 0 indicates SNT stimulation. We also changed the order of the unit examples, placing the former unit 2 in the rightmost position. It is true that for this unit, there is a firing rate change prior to stimulus application, which actually appears as rate attenuation following stimulus application. In this specific case, we consider this activity as “noise”, and note that this neuron-stimulus combination would not be classified as a response (since there is no consistent change across stimulus presentation).

      As a note, while reviewing this figure, we noted an error. We have previously written that the ITI was 10 s, whereas it was actually 18 s long. This has been corrected in the Figure and in the text.

      Figure 2B:

      "The mean error due to the reduced 2-D representation is 0.29 (arbitrary units)." This is unclear. MDS is often described in terms of % of variance explained, is that what this means? If so, the units are not arbitrary; otherwise, it's unclear whether specifying a value with arbitrary units adds any value.

      This is a very good point, and we thank the reviewer for identifying this mistake. The units are not arbitrary! They are units of correlation distance. We now added a scale bar (a square) to panel 2B to indicate what a distance of 0.1. Following this comment, we also calculated the mean error in the original data, and noted the ratio between the mean absolute error (due to considering only two dimensions) and the mean original distances. We also now report the value of the first two eigenvalues. Specifically, we now write:

      “Note that like all dimensionally reduced representations, the representation in Fig. 2B is an approximation. Here, the first two eigenvalues of account for 44.6% of the variance of the original distances (30.4% and 14.2%, respectively for the first and second dimension). Another way to evaluate the representation is via the mean error due to the reduced 2-D representation. Here, it is 0.29, whereas the mean of the original distances is 0.73.”

      Figure 3A:

      a) There is a truncated label (or something) above the panel letter.

      Thanks. Corrected. This was part of the “Figure” label

      b) The graphic for the "adjusted pattern" also fits the criterion of the "pattern": for example, in the top row the activity for ICR is still higher than for any other stimulus, thus fulfilling the criterion of a "pattern" and not just an "adjusted pattern."

      That was not our intention. An adjusted pattern does not necessarily fulfill the (non-adjusted) “pattern” (while the opposite is true). We have now revised the rightmost panel in figure 3A, adding both “&s” to indicate that all three conditions must be fulfilled, and in attempt for a more intuitive representation, applied a different background denoting stimuli with irrelevant responses. We also changed the terms in the legend within the panel, making them more accurate: (Thus, “strong activity” was changed to “stronger responses”). In addition, we revised the text and figure legends in attempt to better clarify these definitions.

      Figure 3B:

      I'm assuming that the columns of the heatmap correspond to different urine stimuli, and that the color is normalized firing rate. But readers should not have to guess.

      True, and agreed. We added legends to clarify this.

      Figure 4B:

      The caption should mention that the pairwise measures are between the stimulus columns of panel A.

      We revised the caption to indicate this. Note that we also added two additional panels to this figure.

      Figure 5A&B:

      Instead of a multiple-comparisons correction, it seems likely to be better to use a 2-way ANOVA. At a minimum, the nature of the multiple-comparisons correction needs to be specified (many are conservative, but they differ in the extent of how conservative they are).

      We now write in the text that we used a Bonferroni correction (this information previously appeared only in the caption). We also found an error in the caption. We previously wrote that we used a binomial exact test for both panels A and B. However, only the data in panel A was calculated with a binomial exact test. The data in panel B was calculated with a one-way ANOVA.

      We now also applied a 2-way ANOVA to response magnitudes (i.e., panel B). We find a main effect of stimulus, but not of state, and no effect of interaction between the two. This is consistent with our previous analyses. This analysis is now included in the text. We thank the reviewer for this suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. eLife Assessment

      The study presents a valuable resource of proline hydroxylation proteins for molecular biology studies in oxygen-sensing and cell signaling with the characterization of Repo-man proline hydroxylation site. The evidence supporting the claim of the authors is solid, although further clarification of the overall efficiency of the HILIC analysis, the specificity/sensitivity of immonium ion analysis, as well as quantification of proline hydroxylation identifications will be helpful. The work will be of interest to researchers studying post-translational modification, oxygen sensing, and cell signaling.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hao Jiang et al described a systematic approach to identify proline hydroxylation proteins. The authors implemented a proteomic strategy with HILIC-chromatographic separation and reported an identification of 4993 sites from HEK293 cells (4 replicates) and 3247 sites from RCC4 sites (3 replicates) with 1412 sites overlapping between the two cell lines. From the analysis, the authors identified 225 sites and 184 sites respectively from 293 and RCC4 cells with HyPro diagnostic ion. The identifications were validated by analyzing a few synthetic peptides, with a specific focus on Repo-man (CDCA2) through comparing MS/MS spectra, retention time, and diagnostic ions. With SILAC analysis and recombinant enzyme assay, the study showed that Repo-man HyPro604 is a target of the PHD1 enzyme.

      Strengths:

      The study involved extensive LC-MS analysis and was carefully implemented. The identification of over 4000 confident proline hydroxylation sites would be a valuable resource for the community. The characterization of Repo-man proline hydroxylation is a novel finding.

      Weaknesses:

      However, as a study mainly focused on methodology, the findings from the experimental data did not convincingly demonstrate the sensitivity and specificity of the workflow for site-specific identification of proline hydroxylation in global studies.

      Major concerns:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyllysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Jiang et al. developed a robust workflow for identifying proline hydroxylation sites in proteins. They identified proline hydroxylation sites in HEK293 and RCC4 cells, respectively. The authors found that the more hydrophilic HILIC fractions were enriched in peptides containing hydroxylated proline residues. These peptides showed differences in charge and mass distribution compared to unmodified or oxidized peptides. The intensity of the diagnostic hydroxyproline iminium ion depended on parameters including MS collision energy, parent peptide concentration, and the sequence of amino acids adjacent to the modified proline residue. Additionally, they demonstrate that a combination of retention time in LC and optimized MS parameter settings reliably identifies proline hydroxylation sites in peptides, even when multiple proline residues are present

      Strengths:

      Overall, the manuscript presents an advanced, standardized protocol for identifying proline hydroxylation. The experiments were well designed, and the developed protocol is straightforward, which may help resolve confusion in the field.

      Weaknesses:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a new method for detecting and identifying proline hydroxylation sites within the proteome. This tool utilizes traditional LC-MS technology with optimized parameters, combined with HILIC-based separation techniques. The authors show that they pick up known hydroxy-proline sites and also validate a new site discovered through their pipeline.

      Strengths:

      The manuscript utilizes state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, which is an advance compared to other similar approaches before. The use of synthetic control peptides on the HILIC separation step clearly demonstrates the ability of the method to reliably distinguish hydroxy-proline from oxidized methionine - containing peptides. Using this method, they identify a site on CDCA2, which they go on to validate in vitro and also study its role in regulation of mitotic progression in an associated manuscript.

      Weaknesses:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses.

    5. Author response:

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewer recognising that our study has been carefully performed and provides a valuable resource for the community. The characterization of Repo-man proline hydroxylation is also recognised as a novel finding.

      With respect to Concerns raised by reviewer 1:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations.

      We do not agree that the apparent concern raised here, i.e., that the method we present is not 100% specific for enriching only hydroxylated peptides, is a serious issue. We show specifically that our method indeed enriches samples for hydroxylated peptides, thereby increasing the chances of identifying proline hydroxylated peptides in a cell extract. We never claimed that it was mono-specific for enrichment of hydroxylated peptides. Further, we note that almost no chromatographic method we know of, including those commonly used to enrich for different types of post translationally-modified peptides (including phospho-peptides) is completely mono-specific for a single type of modified peptide. The reviewer comments that it could have been possible to use alternative methods to identify proline-hydroxylated peptides. This may be true, but we know of no published examples, or previous studies, where this has been demonstrated experimentally on a scale comparable to that we show here. Of course there is always more than one way to approach technical challenges and it may be that future methods will be demonstrated that achieve equivalent, or even superior, results with respect to the detection of proline hydroxylated peptides. To the best of our knowledge, however, our current study provides a robust methodology that goes well beyond any previously published analysis of proline hydroxylation.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyllysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      We feel that the reviewer’s initial comment is potentially misleading - it implies that we were proposing here that the 'HyPro immonium ion is a diagnostic ion for HyPro identification’. In contrast, this concept was already widely held in the field before we started this project. Indeed, the fact that the diagnostic HyPro immonium ion is often difficult to detect, has been used as one of the arguments by other researchers to support the view that HIF-α is the only physiologically relevant target for PHD enzymes, a controversy referenced explicitly by Reviewer 2 below. What we actually show here are novel data that help to explain why the diagnostic HyPro immonium ion is often difficult to detect, when standard approaches and technical parameters for MS analysis are used. We beleive that this observation, along with other data we present, is a useful contribution to the field that can help to resolve the previous controversies concerning the true prevalence and biological roles of PHD-catalysed proline hydroxylation on protein targets.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      We agree that this study is not quantifying or addressing the stoichiometry of proline hydroxylation across the very large number of new PHD target sites we identify. That was not claimed and was not the objective of our study. Nonetheless, we feel the comments of the referee do not adequately take into account the SILAC data we included (cf Figure 8) or the full range of experimental data presented in this study. We would further refer the reviewer also to the data presented in the companion paper by Druker et al., which we cross-referenced extensively in our study and have also made available previously on biorxiv.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

      Here again we refer to the recent controversy referenced explicitly by Reviewer 2 below, concerning the view expressed by some researchers that only HIF-α is a physiological substrate for PHD enzymes in cells. We were challenged to show that any of the novel protein targets of PHDs we identified were indeed hydroxylated by PHD enzymes in vitro and that is what we demonstrated in Figure 9. This was not an experiment performed to quantify stoichiometry and indeed, it is not possible to draw any firm conclusions about efficiency or stiochiometry in vitro when using catalytic PHD subunits alone, given that we do not yet know whether PHDs may show different properties in cells, dependent on interactions with other factors and/or modifications.

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments that our manuscript presents an advanced, standardized protocol for identifying proline hydroxylation, with well designed experiments, which may help resolve confusion in the field.

      With respect to Concerns raised by reviewer 2:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      We agree and plan to provide a clearly described, step by step guide to assist other researchers who wish to employ our methods for proline hydroxylation analysis in their own studies.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

      We agree that our study provides valuable information germane to the recent controversy in the field and the views published by Cockman et al., to the effect that HIF-α is the only physiologically relevant target for PHDs. We will carefully review our statements when preparing a suitably revised version of record with the aim of providing a balanced and objective discussion of this issue.

      Reviewer #3 (Recommendations for the authors):

      We appreciate the reviewer’s comments that our study employs state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, along with their recognition that our study is, 'an advance compared to other similar approaches before.’ We also appreciate their reference to our companion study by Druker et al, in which we characterise the mechanism and biological role in regulation of mitotic progression of the hydroxylation of P604 in the target protein RepoMan (CDCA2), that is identified in this study.

      With respect to the Concern raised by reviewer 3:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses..

      We agree that this study, which has a focus on methodology and technical approaches for detecting sites of PHD- catalysed proline hydroxylation, cannot exhaustively validate the biological significance of all of the putative sites and targets identified. As the reviewer notes, we have performed a detailed functional characterisation of one such novel PHD-catalyed proline hydroxylation site, i.e. P604 in the protein RepoMan (CDCA2). This functional analysis is presented in the companion paper by Druker et al., which has also been reviewed by eLife and placed on biorxiv (doi: https://doi.org/10.1101/2025.05.06.652400). We hope that publication of our identification of many new putative PHD target sites will encourage other researchers to pursue characterisation of their functional reoles in different biological mechanisms and have tried here to provide some degree of guidance to focus attention on the identification of those sites for which we currently have highest confidence.

    1. eLife Assessment

      This valuable study advances our understanding of how bactofilin cytoskeletal proteins associate with cell membranes by identifying and characterizing a conserved membrane-targeting sequence. The evidence is solid, with a well-integrated combination of mutagenesis, biophysical analysis, molecular simulations, and bioinformatics supporting the mechanistic model. The work will be of particular interest to microbiologists and structural biologists studying bacterial cytoskeletons and membrane-protein interactions.

    2. Reviewer #2 (Public review):

      Summary:

      The authors of this study investigated the membrane-binding properties of bactofilin A from Caulobacter crescentus, a classic model organism for bacterial cell biology. BacA was the progenitor of a family of cytoskeletal proteins that have been identified as ubiquitous structural components in bacteria, performing a range of cell biological functions. Association with the cell membrane is a frequent property of the bactofilins studied and is thought to be important for functionality. However, almost all bactofilins lack a transmembrane domain. While membrane association has been attributed to the unstructured N-terminus, experimental evidence had yet to be provided. As a result, the mode of membrane association and the underlying molecular mechanics remained elusive.

      Liu at al. analyze the membrane binding properties of BacA in detail and scrutinize molecular interactions using in-vivo, in-vitro and in-silico techniques. They show that few N-terminal amino acids are important for membrane association or proper localization and suggest that membrane association promotes polymerization. Bioinformatic analyses revealed conserved lineage-specific N-terminal motifs indicating a conserved role in protein localization. Using HDX analysis they also identify a potential interaction site with PbpC, a morphogenic cell wall synthase implicated in Caulobacter stalk synthesis. Complementary, they pinpoint the bactofilin-interacting region within the PbpC C-terminus, known to interact with bactofilin. They further show that BacA localization is independent of PbpC.

      Although the phenotypic effects of an abolished BacA-PbpC interaction are mild, these data significantly advance our understanding of bactofilin membrane binding, polymerization, and function at the molecular level. The major strength of the comprehensive study is the combination of complementary in vivo, in vitro and bioinformatic/simulation approaches, the results of which are consistent.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The investigators undertook detailed characterization of a previously proposed membrane targeting sequence (MTS), a short N-terminal peptide, of the bactofilin BacA in Caulobacter crescentus. Using light microscopy, single molecule tracking, liposome binding assays, and molecular dynamics simulations, they provide data to suggest that this sequence indeed does function in membrane targeting and further conclude that membrane targeting is required for polymerization. While the membrane association data are reasonably convincing, there are no direct assays to assess polymerization and some assays used lack proper controls as detailed below. Since the MTS isn't required for bactofilin polymerization in other bacterial homologues, showing that membrane binding facilitates polymerization would be a significant advance for the field.

      We agree that additional experiments were required to consolidate our results and conclusions. Please see below for a description of the new data included in the revised version of the manuscript.

      Major concerns

      (1) This work claims that the N-termina MTS domain of BacA is required for polymerization, but they do not provide sufficient evidence that the ∆2-8 mutant or any of the other MTS variants actually do not polymerize (or form higher order structures). Bactofilins are known to form filaments, bundles of filaments, and lattice sheets in vitro and bundles of filaments have been observed in cells. Whether puncta or diffuse labeling represents different polymerized states or filaments vs. monomers has not been established. Microscopy shows mis-localization away from the stalk, but resolution is limited. Further experiments using higher resolution microscopy and TEM of purified protein would prove that the MTS is required for polymerization.

      We do not propose that the MTS is directly involved in the polymerization process and state this more clearly now in the Results and Discussion sections of the revised manuscript. To address this point, we performed transmission electron microscopy studies comparing the polymerization behavior of wild-type and mutant BacA variants. The results clearly show that the MTS-free BacA variant (∆2-8) forms polymers that are indistinguishable from those formed by the wild-type protein, when purified from an E. coli overproduction strain (new Figure 1–figure supplement 1). This finding is consistent with structural work showing that bactofilin polymerization is exclusively mediated by the conserved bactofilin domain (Deng et al, Nat Microbiol, 2019). However, at native expression levels, BacA only accumulates to ~200 molecules per cell (Kühn et al, EMBO J, 2006). Under these conditions, the MTS-mediated increase in the local concentration of BacA at the membrane surface and, potentially, steric constraints imposed by membrane curvature, may facilitate the polymerization process. This hypothesis has now been stated more clearly in the Results and Discussion sections.

      For polymer-forming proteins, defined localized signals are typically interpreted as slow-moving or stationary polymeric complexes. A diffuse localization, by contrast, suggests that a protein exists in a monomeric or, at most, (small) oligomeric state in which it diffuses rapidly within the cell and is thus no longer detected as distinct foci by widefield microscopy. Our single-molecule data show that BacA variants that are no longer able to interact with the membrane (as verified by cell fractionation studies and in vitro liposome binding assays) have a high diffusion rate, similar to that measured for the non-polymerizing and non-membrane-bound F130R variant. These results demonstrate that a defect in membrane binding strongly reduces the ability of BacA to form polymeric assemblies. To support this hypothesis, we have now repeated all single-particle tracking experiments and included mVenus as a freely diffusible reference protein. Our data confirm that the mobilities of the ∆2-8 and F130R variants are similar and approach those of free mVenus, supporting the idea that the deficiency to interact with the membrane prevents the formation of extended polymeric structures (which should show much lower mobilities). To underscore the relevance of membrane binding for BacA assembly, we have now included a new experiment, in which we used the PbpC membrane anchor (PbpC<sub>1-132</sub>-mcherry) to restore the recruitment of the ∆2-8 variant to the membrane (Figure 9 and Figure 9–figure supplement 1). The results obtained show that the ∆2-8 variant transitions from a diffuse localization to polar foci upon overproduction of PbpC<sub>1-132</sub>-mcherry. The polymerization-impaired F130R variant, by contrast, remains evenly distributed throughout the cytoplasm under all conditions. These findings further support the idea that polymerization and membrane-association are mutually interdependent processes.

      (2) Liposome binding data would be strengthened with TEM images to show BacA binding to liposomes. From this experiment, gross polymerization structures of MTS variants could also be characterized.

      We do not have the possibility to perform cryo-electron microscopy studies of liposomes bound to BacA. However, the results of the cell fractionation and liposome sedimentation assays clearly support a critical role of the MTS in membrane binding.

      (3) The use of the BacA F130R mutant throughout the study to probe the effect of polymerization on membrane binding is concerning as there is no evidence showing that this variant cannot polymerize. Looking through the papers the authors referenced, there was no evidence of an identical mutation in BacA that was shown to be depolymerized or any discussion in this study of how the F130R mutation might to analogous to polymerization-deficient variants in other bactofilins mentioned in these references.

      Residue F130 in the C-terminal polymerization interface of BacA is conserved among bactofilin homologs, although its absolute position in the protein sequence may vary, depending on the length of the N-terminal unstructured tail. The papers cited in our manuscript show that an exchange of this conserved phenylalanine residue abolishes polymer formation. Nevertheless, we agree that it is important to verify the polymerization defect of the F130R variant in the system under study. We have now included size-exclusion chromatography data showing that BacA-F130R forms a low-molecular-weight complex, whereas the wild-type protein largely elutes in the exclusion volume, indicating the formation of large, polymeric species (new Figure 1–figure supplement 1). In addition, we performed transmission electron microscopy analyses of BacA-F130R, which verified the absence of larger oligomers (new Figure 1–figure supplement 2).

      (4) Microscopy shows that a BacA variant lacking the native MTS regains the ability to form puncta, albeit mis-localized, in the cell when fused to a heterologous MTS from MreB. While this swap suggests a link between puncta formation and membrane binding the relationship between puncta and polymerization has not been established (see comment 1).

      We show that a BacA variant lacking the MTS (∆2-8) regains the ability to form membrane-associated foci when fused to the MTS of MreB. By contrast, a similar variant that additionally carries the F130R exchange (preventing its polymerization) shows a diffuse cytoplasmic localization. In addition, we show that the F130R exchange leads to a loss of membrane binding and to a considerable increase in the mobility of the variants carrying the MTS of E. coli MreB. As described above, we now provide additional data demonstrating that elevated levels of the PbpC membrane anchor can reinstate polar localization for the ∆2-8 variant, whereas it fails to do so for the polymerization-deficient F130R variant (Figure 9 and Figure 9–figure supplement 1). Together, these results support the hypothesis that membrane association and polymerization act synergistically to establish localized bactofilin assemblies at the stalked cell pole.

      (5) The authors provide no primary data for single molecule tracking. There is no tracking mapped onto microscopy images to show membrane localization or lack of localization in MTS deletion/ variants. A known soluble protein (e.g. unfused mVenus) and a known membrane bound protein would serve as valuable controls to interpret the data presented. It also is unclear why the authors chose to report molecular dynamics as mean squared displacement rather than mean squared displacement per unit time, and the number of localizations is not indicated. Extrapolating from the graph in figure 4 D for example, it looks like WT BacA-mVenus would have a mobility of 0.5 (0.02/0.04) micrometers squared per second which is approaching diffusive behavior. Further justification/details of their analysis method is needed. It's also not clear how one should interpret the finding that several of the double point mutants show higher displacement than deleting the entire MTS. These experiments as they stand don't account for any other cause of molecular behavior change and assume that a decrease in movement is synonymous with membrane binding.

      We now provide additional information on the single-particle analysis. A new supplemental figure now shows a mapping of single-particle tracks onto the cells in which they were recorded for all proteins analyzed (Figure 2–figure supplement 1). Due to the small size of C. crescentus, it is difficult to clearly differentiate between membrane-associated and cytoplasmic protein species. However, overall, slow-diffusing particles tend to be localized to the cell periphery, supporting the idea that membrane-associated particles form larger assemblies (apart from diffusing more slowly due to their membrane association). In addition, we have included a movie that shows the single-particle diffusion dynamics of all proteins in representative cells (Figure 2-video 1). Finally, we have included a table that gives an overview of the number of cells and tracks analyzed for all proteins investigated (Supplementary file 1). Figure 2A and 4D show the mean squared displacement as a function of time, which makes it possible to assess whether the particles observed move by normal, Brownian diffusion (which is the case here). We repeated the entire single-particle tracking analysis to verify the data obtained previously and obtained very similar results. Among the different mutant proteins, only the K4E-K7E variant consistently shows a higher mobility than the MTS-free ∆2-8 variant, with MSD values similar to that of free mVenus. The underlying reason remains unclear. However, we believe that an in-depth analysis of this phenomenon is beyond the scope of this paper. We re-confirmed the integrity of the construct encoding the K4E/K7E variant by DNA sequencing and once again verified the size and stability of the fusion protein by Western blot analysis, excluding artifacts due to errors during cloning and strain construction.

      We agree that the single-molecule tracking data alone are certainly not sufficient to draw firm conclusions on the relationship between membrane binding and protein mobility. However, they are consistent with the results of our other in vivo and in vitro analyses, which together indicate a clear correlation between the mobility of BacA and its ability to interact with the membrane and polymerize (processes that promote each other synergistically).

      (6) The experiments that map the interaction surface between the N-terminal unstructured region of PbpC and a specific part of the BacA bactofilin domain seem distinct from the main focus of the paper and the data somewhat preliminary. While the PbpC side has been probed by orthogonal approaches (mutation with localization in cells and affinity in vitro), the BacA region side has only been suggested by the deuterium exchange experiment and needs some kind of validation.

      The results of the HDX analysis per se are not preliminary and clearly show a change in the solvent accessibility of backbone amides in the C-terminal region in the bactofilin domain in the presence of the PbpC<sub>1-13</sub> peptide. However, we agree that additional experiments would be required to verify the binding site suggested by these data. We agree that further research is required to precisely map and verify the PbpC binding site. However, as this is not the main focus of the paper, we would like to proceed without conducting further experiments in this area.

      We now provide additional data showing that elevated levels of the PbpC membrane anchor are able to recruit the MTS-free BacA variant (∆2-8) to the cytoplasmic membrane and stimulate its assembly at the stalked pole (Figure 9). These results now integrate Figure 8 more effectively into the overall theme of the paper.

      Reviewer #2 (Public review):

      Summary:

      The authors of this study investigated the membrane-binding properties of bactofilin A from Caulobacter crescentus, a classic model organism for bacterial cell biology. BacA was the progenitor of a family of cytoskeletal proteins that have been identified as ubiquitous structural components in bacteria, performing a range of cell biological functions. Association with the cell membrane is a common property of the bactofilins studied and is thought to be important for functionality. However, almost all bactofilins lack a transmembrane domain. While membrane association has been attributed to the unstructured N-terminus, experimental evidence had yet to be provided. As a result, the mode of membrane association and the underlying molecular mechanics remained elusive.

      Liu at al. analyze the membrane binding properties of BacA in detail and scrutinize molecular interactions using in-vivo, in-vitro and in-silico techniques. They show that few N-terminal amino acids are important for membrane association or proper localization and suggest that membrane association promotes polymerization. Bioinformatic analyses revealed conserved lineage-specific N-terminal motifs indicating a conserved role in protein localization. Using HDX analysis they also identify a potential interaction site with PbpC, a morphogenic cell wall synthase implicated in Caulobacter stalk synthesis. Complementary, they pinpoint the bactofilin-interacting region within the PbpC C-terminus, known to interact with bactofilin. They further show that BacA localization is independent of PbpC.

      Strengths:

      These data significantly advance the understanding of the membrane binding determinants of bactofilins and thus their function at the molecular level. The major strength of the comprehensive study is the combination of complementary in vivo, in vitro and bioinformatic/simulation approaches, the results of which are consistent.

      Thank you for this positive feedback.

      Weaknesses:

      The results are limited to protein localization and interaction, as there is no data on phenotypic effects. Therefore, the cell biological significance remains somewhat underrepresented.

      We agree that it is interesting to investigate the phenotypic effects caused by the reduced membrane binding activity of BacA variants with defects in the MTS. We have now included phenotypic analyses that shed light on the role of region C1 in the localization of PbpC and its function in stalk elongation under phosphate-limiting conditions (see below).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the missing estimation of biological relevance, some additional experiments may be carried out.

      For example, given that BacA localizes PbpC by direct interaction, one might expect an effect on stalk formation if BacA is unable to bind the membrane or to polymerize. The same applies to PbpC variants lacking the C1 region. As the mutant strains are available, these data are not difficult to obtain but would help to compare the effect of the deletions with previous data (e.g. Kühn et al.) even if the differences are small.

      We have now analyzed the effect of the removal of region C1 on the ability of mVenus-PbpC to promote stalk elongation in C. crescentus under phosphate starvation. Interestingly, our results show that the lack of the BacA-interaction motif impairs the recruitment of the fusion protein to the stalked pole, but it does not interfere with its stimulatory effect on stalk biogenesis. Thus, the polar localization of PbpC does not appear to be critical for its function in localized peptidoglycan synthesis at the stalk base. These results are now shown in Figure 8–Figure supplement 4. The results obtained may be explained by residual transient interactions of mVenus-PbpC with proteins other than BacA at the stalked pole. Notably, PbpC has also been implicated in the attachment of the stalk-specific protein StpX to components of the outer membrane at the stalk base. The polar localization of PbpC may therefore be primarily required to ensure proper StpX localization, consistent with previous work by Hughes et al. (Mol Microbiol, 2013) showing that StpX is partially mislocalized in a strain producing an N-terminally truncated PbpC variant that no longer localizes to the stalk base.

      We have also attempted to investigate the ability of the Δ2-8 and F130R variants of BacA-mVenus to promote stalk elongation under phosphate starvation. However, the levels of the WT, Δ2-8 and F130R proteins and their stabilities were dramatically different after prolonged incubation of the cells in phosphate-limited medium, so that it was not possible to draw any firm conclusions from the results obtained (not shown).

      In addition, the M23-like endopeptidase LdpA is proposed to be a client protein of BacA (in C. crescentus, Billini et al. 2018, and H. neptunium or R. rubrum, Pöhl et al. 2024). In H. neptunium, it is suggested that the interaction is mediated by a cytoplasmic peptide of LmdC reminiscent of PbpC. This should at least be commented on. It would be interesting to see, if LpdA in C. crescentus is also delocalized and if so, this could identify another client protein of BacA.

      We agree that it would be interesting to study the role of BacA in LdpA function. However, we have not yet succeeded in generating a stable fluorescent protein fusion to LdpA, which currently makes it impossible to study the interplay between these two proteins in vivo. The focus of the present paper is on the mode of interaction between bactofilins and the cytoplasmic membrane and on the mutual interdependence of membrane binding and bactofilin polymerization. Given that PbpC is so far the only verified interaction partner of BacA in C. crescentus, we would like to limit our analysis to this client protein.

      Further comments:

      L105: analyze --> analyzed

      Done.

      L169: Is there any reason why the MTS of E. coli MreB was doubled?

      Previous work has shown that two tandem copies of the N-terminal amphiphilic helix of E. coli MreB were required to partially target a heterologous fusion partner protein (GFP) to the cytoplasmic membrane of E. coli cells (Salje et al, 2011).

      Fig. S3:

      a) Please decide which tag was used (mNG or mVenus) and adapt the figure or legend accordingly.<br /> b) In the legend for panel (C), please describe how the relative amounts were calculated, as the fractions arithmetically cannot add to > 100%. I guess each band was densiometrically rated and independently normalized to the whole-cell signal?

      The fluorescent tag used was mNeonGreen, as indicated in the figure. We have now corrected the legend accordingly. Thank you for making us aware of the wrong labeling of the y-axis. We have now corrected the figure and describe the method used to calculate the plotted values in the legend.

      Legend of Fig 1b: It is not clear to me, to which part of panel B the somewhat cryptic LY... strain names belong. I suggest putting them either next to the images, to delete them, or at least to unify the layout (compare, e.g. to Fig S7). (I would delete the LY numbers and stay with the genes/mutations throughout. This is just a suggestion).

      These names indicate the strains analyzed in panel B, and we have now clarified this in the legend. It is more straightforward to label the images according to the mutations carried by the different strains. Nevertheless, we would like to keep the strain names in the legend, so that the material used for the analysis can be clearly identified.

      Fig. 2a: As some of the colors are difficult to distinguish, I suggest sorting the names in the legend within the graph according to the slope of the curves (e.g. K4E K7E (?) on top and WT being at the bottom).

      Thank you for this suggestion. We have now rearranged the labels as proposed.

      In the legend (L924), correct typo "panel C" to "panel B".

      Done.

      Fig. 3: In the legend, I suggest deleting the abbreviations "S" and "P" as they do not show up in the image. In line 929, I suggest adding: average "relative" amount... or even more precisely: "average relative signal intensities obtained..."

      We have removed the abbreviations and now state that the bars indicate the “average relative signal intensities” obtained for the different fractions.

      Fig 4d: same suggestion as for Fig. 2a.

      Done.

      Fig 8: In the legend (L978), delete 1x "the"

      Done.

      L258 and Fig. S5: The expression "To account for biases in the coverage of bacterial species" seems somewhat unclear. I suggest rephrasing and adding information from the M+M section here (e.g. from L593, if this is meant).

      We now state that this step in the analysis pipeline was performed “To avoid biases arising from the over-representation of certain bacterial species in UniProt”.

      I appreciate the outline of the workflow in panel (a) of Fig. S5. It would be even more useful when some more details about the applied criteria for filtering would be provided (e.g. concerning what is meant with "detailed taxonomic information" or "filter out closely related sequences". Does the latter mean that only one bactofilin sequence per species was used? (As quite many bacteria have more than one but similar bactofilins.)

      We removed sequences from species with unclear phylogeny (e.g. candidate species whose precise taxonomic position has not yet been determined). For many pathogenic species, numerous strains have been sequenced. To account for this bias, only one sequence from clusters of highly similar bactofilin sequences (>90% identity) was retained per species. This information has now been included in the diagram. It is true that many bacteria have more than one bactofilin homolog. However, the sequences of these proteins are typically quite different. For instance, the BacA and BacB from C. crescentus only share 52% identity. Therefore, our analysis does not systematically eliminate bactofilin paralogs that coexist in the same species.

      L281: Although likely, I am not sure if membrane binding has ever been shown for a bactofilin from these phyla. (See also L 380.) Is there an example? Otherwise, membrane binding may not be a property of these bactofilins.

      To our knowledge, the ability of bactofilins from these clades to interact with membranes has not been investigated to date. We agree that the absence of an MTS-like motif may indicate that they lack membrane binding activity, and we have now stated this possibility in the Results and Discussion.

      L285: See comment above concerning the M23-like peptidase LpdA. Although not yet directly shown for C. crescentus, it seems likely that BacACc does also localize this peptidase in addition to PbpC. I suggest rephrasing, e.g. "known" --> "shown"

      We now use the word “reported”.

      L295 and Fig S8: PbpC is ubiquitous. Which criteria/filters have been applied to select the shown sequences?

      C. crescentus PbpC is different from E. coli Pbp1C. It is characterized by distinctive, conserved N- and C-terminal tails and only found in C. crescentus and close relatives. The C. crescentus homolog of E. coli PbpC is called PbpZ (Yakhnina et al, J Bacteriol, 2013; Strobel et al, J Bacterol, 2014), whereas C. crescentus PbpC is related to E. coli PBP1A. We have now added this information to the text to avoid confusion.

      L311: may replace "assembly" by "polymerization"

      Done.

      L320: bactofilin --> bactofilin domain?

      Yes, this was supposed to read “bactofilin domain”. Thank you for spotting this issue.

      L324: The HDX analysis of BacA suggests that the exchange is slowed down in the presence of the PbpC peptide, which is indicative of a physical interaction between these two molecules. To corroborate the claim that BacA polymerization is critical for interaction with the peptide (resp. PbpC), this experiment should be carried out with the polymerization defective BacA version F130R.

      (Or tone this statement down, e.g. show --> suggest.)

      “suggest”

      L386: undergoes --> undergo

      Done.

      L391-400: This idea is tempting but the suggested mechanism then would be restricted to bactofilins of C. crescentus and close relatives. The bactofilin of Rhodomicrobium, for example, was shown to localize dynamically and not to stick to a positively curved membrane.

      In the vast majority of species investigated so far, bactofilins were found to associate with specifically curved membrane regions and to contribute to the establishment of membrane curvature. Unfortu­nately, the sequences of the three co-polymerizing bactofilin paralogs of R. vannielii DSM 166 studied by Richter et al (2023) have not been reported and the genome sequence of this strain is not publicly available. However, in related species with three bactofilin paralogs, only one paralog shows an MTS-like N-terminal peptide and another paralog typically contains an unusual cadherin-like domain of unknown function, as also reported for R. vannielii DSM 166. Therefore, the mechanism controlling the localization dynamics of bactofilins may be complex in the Rhodomicrobium lineage. Nevertheless, at native expression levels, the major bactofilin (BacA) of R. vannielii DSM 166 was shown to localize predominantly to the hyphal tips and the (incipient) bud necks, suggesting that regions of distinct membrane curvature could also play a role in its recruitment. We do not claim that all bactofilins recognize positive membrane curvature, which is clearly not the case. It rather appears as though the curvature preference of bactofilins varies depending on their specific function.

      L405-406: I agree that localization of BacA has been shown to be independent of PbpC. However, this does not generally preclude an effect on BacA localization by other "client" or interacting proteins. (See also comment above about the putative BacA interactor LpdA). I suggest either to corroborate or to change this statement from "client binding" to "PbpC binding".

      Thank you for pointing out the imprecision of this statement. We now conclude that “PbpC binding” is not critical for BacA assembly and positioning.

      Suppl. Fig. S11: In the legend, please correct the copy-paste mismatch (...VirB...).

      Done.

      L482: delete 1x "at"

      Done.

      L484: may be better "soluble and insoluble fractions"?

      We now describe the two fractions as “soluble and membrane-containing insoluble fractions” to make clear to all readers that membrane vesicles are found in the pellet after ultracentrifugation.

      L489-490: check spelling immunoglobulin – immuneglobulin

      Done.

      L500 and 504: º_C --> ºC

      Done.

      Suppl. file X (HDX data): please check the table headline, table should be included in Suppl. file 1

      We have now included a headline in this file (now Supplementary file 3).

    1. eLife Assessment

      This manuscript offers valuable structural and mechanistic insights into the structure and assembly of the Type II internal ribosome entry site (IRES) from encephalomyocarditis virus (EMCV) and the translation initiation complex, revealing a direct interaction between the IRES and the 40S ribosomal subunit. While a solid cryo-EM method was used, enhancing the overall resolution or adding complementary biochemical data would further improve the clarity and impact of this study. This manuscript will attract researchers in cap-independent translation, host-pathogen interactions, and virology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is well-justified.

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing map-sharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      c) Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

    3. Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryo-EM maps, as employed here, can be implemented for other RNA structures.

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

    4. Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is welljustified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the compromise in resolution, we have reported findings related to stretches or regions such as loops and stems, rather than individual nucleotides and interactions.  

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree with the lack of any additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. However, we have cited earlier reports that demonstrate the importance of these regions for overall IRES activity. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown in the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In our study, we observe that GCGA loop interacts with tRNA<sub>i</sub> in EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex. Further, we will attempt to address the concern of lack of experimental validation of GNRA and RAAA loops by performing biochemical assays.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing mapsharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We thank the reviewer for the suggestions, and we would employ suggested processes that may help improve the quality of the maps further. We will include image for rejected 2D classes in the revised manuscript. We agree with the Reviewer’s query related to the substantial number of micrographs and smaller number of particles for the final reconstruction. The total number of micrographs is the summation of multiple datasets, prepared and collected at various times. Among these, around 8000 micrographs have extremely poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous in the compiled dataset. We obtained only 237054 ‘good particles’ after multiple rounds of 2D & 3D classifications, and the final reconstruction has 28439 particles (~12%). This class was obtained after masked classification for IRES and ternary complex density. Hence, only the particles that show the best density for both IRES and ternary complex are used for reconstructing this map. Another set of particles that have only a portion of IRES and tRNA but NO density for eIF2 forms another map (26792 particles, 11.3%). Thus, we obtained a total of 55231 particles (23.3%) with IRES density.  

      b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES, however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence, and few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992. Hence, we used individual domains of EMCV IRES and predicted the tertiary structure independent of other IRES domains. Furthermore, 3D models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3 in Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our I domain apex model (EMCV IRES).

      c)  Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5 (Aichi virus), might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (e.g. Poliovirus, Coxsackie virus) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015).The Aichi viral IRES (type 5) harbours a GNRA loop in its longest domain, which is domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region either elevated the IRES activity or it remained unaltered (Yu et al 2011). We have hypothesized that these IRESs (type 1 and type 5) might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in type 5) similar to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop. Thus, GNRA can potentially mediate long-range interactions with tRNA<sub>i</sub> as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs also have similar placement of GNRA motif-containing domain before the eIF4G-binding domain (domain J-K in EMCV IRES, domain V in poliovirus, domain K in Aichi virus). Hence, we suggest the possibility of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC.  

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryoEM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down IRES-bound ribosomal complex.  

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. However, we would like to mention that we refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss about loops, RNA stretches or motifs that could be inferred with more confidence as shown in Manuscript Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details this interaction was unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of portion of domain I in the cryo-EM map. This is also reflected in the earlier reported SHAPE data (Supplementary figures 2, and 8 in Chamond et al 2014), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Figure 4b in Maloney and Joseph, 2024).

      Furthermore, this decrease in SHAPE reactivity pattern is also evident for FMDV IRES domain 3 apex (like domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018).

      Thus, these studies are consistent with the placement of IRES model in the cryo-EM map.

      We aim to improve the resolution of the maps for better clarity and add biochemical experiments to justify the possible interactions.

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the manuscript. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is much more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context. Furthermore, the synthesis of the polypeptide requires placement of AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We will mention this in the revised manuscript, as we had NOT mutated AUG-826.

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and we wish to make further improvements to support our assessment of small stretches and individual nucleotides.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We are attempting to improve the resolution and complement the interpretations with biochemical experiments.

    1. eLife Assessment

      This valuable investigation provides new and solid evidence for a specific cognitive deficit in cerebellar degeneration patients. The authors use three tasks that modulate complexity and violations of cognitive expectations. They show specific slowing of reaction times in the presence of violations but not with task complexity. While some alternative interpretations of the results are possible and are discussed, the work provides a new, invaluable data point in describing the cognitive contribution of cerebellar processing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the hypothesis that the contribution of the cerebellum to cognitive tasks is similar to motor tasks, and is related to the processing of prediction errors (here: violation of expectations, VE). In three experiments, they find that cerebellar patients show differences compared to controls in measures of VE, but not task complexity. The findings show that cerebellar disease results in deficits in VE processing in cognitive tasks, and makes a valuable contribution of the field. The authors were able to test a large number of patients with cerebellar disease which is known to primarily affect the cerebellum (i.e. SCA6).

      Strengths:

      A strength of the study is that it is hypothesis-driven and that the three experiments are very well thought out. Furthermore, a comparatively large group of patients with spinocerebellar ataxia type 6 (SCA6) was tested, a disease which affects primarily the cerebellum.

      Weaknesses:

      - Acquisition of brain MRI scans would have been useful to perform lesion-behaviour-mapping. But this does not limit the significance of the behavioural findings.<br /> - Exp. 1 and 2: The lack of difference in accuracy was that an unexpected finding? How meaningful are the used paradigms when accuracy was the same in cerebellar patients and controls?<br /> - Exp. 1 and 2: Cerebellar patients have motor dysfunction which impacts reaction time. Can the authors exclude that this contributed at least in part to their findings? Any correlations to SARA score (upper limb function) or oculomotor dysfunction (e.g. presence of nystagmus)?<br /> - Data on the attention probes which have been done would be of interest. Were there any differences in attention between patients and controls, any correlations with the findings?

      Comments on revisions:

      I am not sure if I can follow the interpretation of the authors that the cerebellum contributes to prediction errors, but not predictions; These two are tightly connected? It may rather be that in patients with slowly progressive chronic disease there is a lot of compensation? It is not so rare that in cognitive tasks cerebellar patients do not perform differently from controls, even though one would expect a difference (e.g. based on fMRI data in controls)? Another factor which likely adds is age, Patients and controls are often middle-aged and elderly, adding to variability, decreasing the chance to see group differences?

    3. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data suggests that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, a few important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. Solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a VE, we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      In Experiment 3, during training, the participant is learning a novel rule (grammar), forming new expectations on how strings of letters should be. Afterwards, during testing, the participant is requested to identify if a novel string is following the rule or not. We examined sensitivity to distinguish between grammatical and non‐grammatical strings of letters, thus taking into account a baseline ability to identify expected strings. Additionally, both in the low‐similarity and highsimilarity conditions, there are expectations regarding whether the strings are following the rule or not. However, in the high‐similarity condition, there is more uncertainty regarding which strings are following the grammatical rule, as demonstrated in a lower sensitivity (d prime). Given the group differences only in the low similarity condition, these results suggest the CA group is impaired only when the rules are more certain. Given these results, we suggest that forming cognitive expectations is not necessarily dependent on the cerebellum. Rather, we propose that the cerebellum is critical for processing rule-based VE (detection or processing of detected errors) under conditions of more certainty. One remaining question for future studies is whether the cerebellum contributes to detection of a mismatch between the expectation and sensory evidence, or the processing of a detected VE. 

      We suggest that these key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Importantly, while previous experimental manipulations17,19,40,94–96 have provided important insights regarding the cerebellar role in these processes, some may have confounded these internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions, such as correct trials, where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation. 

      Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks. If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us to achieve a more detailed and well-defined cerebellar motor and non-motor mechanistic account.

      Recommendations for the authors:

      Editors comments:

      The Figures are somewhat sub-standard and should be improved before the paper is made the VOR. Ensure consistent ordering of the group factor (CA, NT) and experimental factor across Figure 3,4, and 6 (panels A). Having the patient group as columns in Figure 4a and in rows in Figure 6a is very confusing.

      We have standardized the layout across Figures 2, 4, and 6 so that the group factor (CA, NT) and experimental conditions are consistently ordered. In all panels, the group factor now appears as a column.

      Subpanels should be numbered A,B,C... not A, B1, B2.

      Subpanel labels have been updated to follow the standard A, B, C format across all figures.

      Fonts should have a 100% aspect ratio - they should not be stretched (Figure 6B).

      We have corrected the font aspect ratios in all figures (e.g., Figure 6B) to ensure proper proportions and readability. 

      Colors should be more suitable to print - use a CYMK color scheme (i.e. avoid neon colors such as the neon green for the CA).

      The color scheme across all figures has been revised to be print-friendly using CMYKcompatible, colorblind-accessible palettes. Neon green for the CA group was replaced with a more muted, distinguishable color.

      Abstract: "The CA group exhibited a disproportionate cost when comparing expected problems compared to unexpected problems" - I recommend switching unexpected and expected, as the disproportional cost in on the former.

      We have changed the wording of the sentence accordingly. 

      Upon re-reading the details for the AGL task were not clear to us. Please do not rely on the reference (78) for the details - your paper should contain enough information to have the reader understand the experimental details. For you to appreciate the depth of our not-understanding, here a simple question: The test strings either followed the grammar in Fig 5 or they did not. If they did not, how exactly was similarity to the grammar measured? If they did, what was the difference between the “Grammatical-high” and “Grammatical-low” trials? If the string was grammatical, there should not be a notion of similarity, no? Or where these trials arbitrary split in half? 

      We have clarified that 50% of the test strings followed the grammar of the training strings. We also elaborated on the calculation of chunk strength as a measure of similarity between the training and testing strings, similar to the previous papers. The differences between low and high similarity are explained in the paper. Specifically, for each test string, we calculated chunk strength by summing the frequencies of all relevant substrings (e.g., bigrams and trigrams) that appeared in the training set. The test strings whose chunk‐strength values fell above the median for grammatical items were classified as “high similarity,” while those falling below the median were classified as “low similarity.” Also, grammatical strings can be of both low and high similarity; this is precisely the beautiful aspect of this experimental manipulation, showing the importance of uncertainty. We have utilized a 2 × 2 fully orthogonal design (grammaticality × similarity).

      Experimental details of the task should be added to the Method section. In the results you should only mention the experimental details that are necessary for understanding the experiments, but details such as the number of trials, etc, can be moved to the methods. 

      We have now moved the experimental task details to the Method sections.

      Reviewer #1 (Recommendations for the author):

      Studies have been done online and not in the lab. Could that have affected the results?

      We addressed this in the Methods section, referring to established protocols for online neuropsychological testing[9–12]. Our results align with similar in-lab findings in both the subtraction and AGL tasks, supporting the online approach's robustness. 

      Figure 2, B1; Figure 4, B1; Figure 6B: How many patients performed worse than the (worst-performing) controls? There appears to be quite some overlap between patients and controls. In the patients who performed worse, was there any difference from the other patients (e.g. disease severity as assessed by SARA score, repeat length, data of attention probes)?

      We appreciate the reviewer’s thoughtful comment. We considered conducting individual-level comparisons to identify patients who performed worse than the lowest-performing controls. However, defining "worse" based on the performance of the lowest control is only one possible criterion. Other definitions—such as a specific number (1/2/3?) of standard deviations below the control mean—are also commonly used in literature, and each may yield different conclusions. This variability highlights the lack of a standardized threshold for what constitutes “worse” or "impaired" performance at the individual level. Given this ambiguity, and in line with prior studies that focus on average group differences rather than “impairment” prevalence, we chose not to include these individual-level comparisons. We believe this approach better aligns with the goals and design of the current study. That said, we agree that examining individual variability is important and may be more appropriate in future studies with larger samples so that percentage is a more robust measure. However, given the rarity of the disease, this would also be a challenge for future studies.  

      SARA ataxia scale does not include oculomotor function. In SCA6 oculomotor deficits are frequent, eg, downbeat nystagmus. Please include information on oculomotor dysfunction.

      We thank the reviewer for this important observation. While it is true that the SARA scale does not explicitly assess oculomotor function, our experimental design – in all three experiments – has control conditions that help account for general processing differences, including those that could arise from oculomotor deficits. These conditions, such as the correct trials and the complexity effects, allow us to isolate effects specifically related to the violation of expectation while minimizing the influence of broader performance factors, such as eye movement abnormalities. We also note that, while some patients can experience oculomotor symptoms such as downbeat nystagmus, none of our tasks required precise visual tracking or gaze shifts. In our experimental tasks, stimuli were centrally presented, and no visual tracking or saccadic responses were required. Moreover, the response time windows and stimulus durations (>2–5 s) were sufficient to mitigate the effects of delayed visual processing due to oculomotor impairment.

      Why was MoCA used and not the CCAS-Schmahmann scale to assess cognitive function?

      We selected the MoCA due to its broad clinical utility, time efficiency, and ability to detect mild cognitive impairment specifically in CA[101,102].  

      Were there any signs of depression in the patient group that could have affected the results?

      None of the patients had a clinical diagnosis of depression or were undergoing psychiatric treatment.  

      Additionally, the interaction between group and expectancy was insignificant when RT was the depended vaibale .." = variable

      This has been corrected to "variable" in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The terms 'unexpected' and 'expected' conditions are confusing. [...] Terming this 'violation of expectation' seems unnecessarily complicated to me. 

      We thank the reviewer for raising this important concern. We recognize that the terms "expected" and "unexpected" can be ambiguous without clarification, and that "violation of expectation" (VE) may initially appear unnecessary. Our choice to use VE terminology is grounded in an established theoretical framework that distinguishes between mere stimulus correctness and prediction mechanisms. Specifically, VE captures the internal processing of mismatches between anticipated and observed outcomes, which we believe is central to the cerebellar function under investigation. While simpler, technical alternatives (e.g., "correct" vs. "incorrect") could describe the stimuli, we find that VE more accurately reflects the mental constructs under study and is consistent with previous literature in both motor and cognitive domains. 

      Both tasks provide an error (or violation of expectation) that is non-informative and therefore unlikely to be used to update a forward model. The authors draw on motor literature to formulate a cognitive task where the presence of an error would engage the cerebellum and lead to longer reaction times in cerebellar patients. But in the motor domain, mismatch of sensory feedback and expectations would lead to an updating of the internal forward model. It seems unlikely to me in the arithmetic and alphabetic addition tasks that patients would update their internal model of addition according to an error presented at the end of each trial. If the error processed in these tasks will not lead to the updating of the internal forward model, can the authors discuss to what extent the cerebellum will be engaged similarly in these tasks, and what exactly connects cerebellar processing in these motor and cognitive tasks.

      We thank the reviewer for this thoughtful and important comment. We fully agree that the current tasks do not directly probe learning-related updating of internal models. As stated in the paper, the goal of the present study was not to support or refute a specific claim regarding the cerebellum’s role in learning processes. Rather, our focus was on examining cerebellar involvement in the processing of VE. While we were inspired by models from the motor domain, our design was not intended to induce learning or adaptation per se, but to isolate the processing of unexpected outcomes. We agree that the tasks in their current form are unlikely to engage forward model updating in the same way as in sensorimotor adaptation paradigms. That said, we believe the current findings can serve as a basis for future research exploring the relationship between cerebellar prediction error processing and learning over time. As we also noted in the paper, this is a direction we propose, and actively pursuing, in ongoing research work.

      The colour scheme is difficult for anyone with colour blindness or red-green visual impairment. Please adjust.

      All figures have been revised to use CMYK-compatible, colorblind-safe palettes, and neon colors have been removed.

      The introduction is a bit difficult to understand, because the authors draw on a number of different theories about cerebellar functioning, without clearly delineating how these relate to each other. For example: a) In the paragraph beginning with 'notably': If the cerebellum is required for sequential operations, why does it show the impairment with the rotation of the letters?

      We understand the concern that if the cerebellum is involved in sequential operations, its involvement in mental letter rotation, which can be assumed as “continuous transformation,” may appear contradictory. We note that the boundary between continuous and stepwise, procedural operations is not always clear-cut and may vary depending on the participant's strategy or previous knowledge, which is not fully known to the researchers. Furthermore, to our knowledge, prior work on mental rotation has not directly investigated the impact of VE during this task. However, these are two debatable considerations. 

      More importantly, a careful reading of our paper suggests that our experiments were designed to examine VE within tasks that involve sequential processing. Notably, we are not claiming that the cerebellum is involved in sequential or procedural processing per se. Rather, our findings point to a more specific role for the cerebellum in processing VE that arises during the construction of multistep procedural tasks. In fact, the results indicate that while the cerebellum may not be directly involved in the procedural process itself, it is critical when expectations are violated within such a context. This distinction is made possible in our study by the inclusion of a control condition (the complexity effect), which allows for a unique dissociation in our experimental design—one that, to our knowledge, has not been sufficiently addressed in previous studies.

      Additionally, in the case of arithmetic problem solving—such as the tasks used in prior studies cited in our manuscript21—there is substantial evidence that these problems are typically solved through stepwise, procedural operations. Arithmetic reasoning, used in Experiments 1 and 2, has been robustly associated with procedural, multi-step strategies, which may be more clearly aligned with traditional views of cerebellar involvement in sequential operations. Thus, we propose that the role of the cerebellum in continuous transformations should be further examined. 

      We suggest a more parsimonious theory that the cerebellum contributes to VE,  a field that was highly examined before. Yet, to reconcile ours and previous findings, we propose that the cerebellum’s contribution may not be limited to either continuous or stepwise operations per se, but rather to a domain-general process: the processing of VE. This theoretical framework can explain performance patterns across both mental rotation tasks and stepwise, procedural arithmetic.   

      The authors mention generation prediction as a function of the cerebellum, processing of prediction errors (or violations of expectations), sequentially, and continuous transformations - but it is unclear whether the authors are trying to dissociate these from each other or whether ALL of these functions have informed task design.

      We propose that the cerebellum’s contribution may not be limited to either continuous transformations or stepwise, procedural operations per se, but rather to a domain-general process: the processing of VE. We would like to clarify that we do not claim the cerebellum contributes to continuous transformations only, as suggested in some earlier work[21]. Rather, it could be that the cerebellum may contribute to continuous transformations, but we propose that it also supports multi-step, procedural processes. Given that framework, in the current study, across three separate experiments, we demonstrated that the cerebellum can also contribute to procedural, multi-step reasoning tasks.  

      Minor Comments

      Typo under paragraph beginning with 'notably' - cerebellum role should be cerebellar role.

      Corrected as suggested.

      When mentioning sequences as a recruiting feature for the cerebellum in the introduction, Van Overwalle's extensive work in the social domain should be referenced for completeness.

      Thank you for the suggestion. We have now cited Van Overwalle’s work on cerebellar involvement in sequence processing within the social domain in the revised Introduction.

    1. eLife Assessment

      This study provides fundamental insights into eukaryotic phosphate homeostasis by demonstrating how yeast vacuoles dynamically regulate cytosolic phosphate levels. The conclusions are convincing, supported by an elegant combination of in vitro assays and in vivo measurements. This study will be of interest to cell biologists, particularly for those who are working in the field of phosphate metabolism.

    2. Reviewer #1 (Public review):

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyze polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool.

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including :

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen?

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here.

      (3) What are the consequences on vacuole morphology if the cells lack Pho91?

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes.

      Strengths:

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation.

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically.

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis.

      Weaknesses:

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electron-dense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field.

    4. Reviewer #3 (Public review):

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization.

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level.

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

    5. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyse polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool. 

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including: 

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen? 

      The concentrations inside vacuoles can reach those values. However, given that polyP is a potent chelator of divalent metal ions, what would matter are the concentrations of free Zn<sup>2+</sup> or Mg<sup>2+</sup> inside the organelle. These are not known. This is not critical since we use those two conditions only as a convenient tool to differentiate Ppn1 and Ppn2 activity in vitro. In our initial characterisation of Ppn2 (10.1242/jcs.201061), we had also tested Mn, Co, Ca, Ni, Cu. Only Zn and Co supported activity. Ca did not. Andreeva et al. (10.1016/j.biochi.2019.06.001) reached similar conclusions and extended our results.

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here. 

      The concentration of 30 mM was not arbitrarily chosen. It is the luminal P<sub>i</sub> concentration that the vacuoles could reach through when they entered a plateau of luminal Pi. We consider this as an upper limit because polyP kept increasing which luminal P<sub>i</sub> did not. Thus, there is in principle no physiological motivation for trying higher values. But we will probably add a titration to the revised version.

      (3) What are the consequences on vacuole morphology if the cells lack Pho91? 

      We had not observed significant abnormalities during a screen of the genome-wide deletion collection of yeast (10.1371/journal.pone.0054160)

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly. 

      We will pick this up in the discussion. However, the situation is much more complex because major pools of counterions (up to hundreds of mM) are constituted by vacuolar lysine, arginine, polyamines, Mg, Zn etc. Their interplay with polyP is probably complex and worth to be treated in a dedicated project.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system? 

      Mammalian cells have their Pi exporter XPR1 mainly on a lysosome-like compartment (10.1016/j.celrep.2024.114316). Whether and how it functions there for Pi export from the cytosol is not entirely clear. We will address this situation in the revision.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes. 

      Strengths: 

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation. 

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically. 

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis. 

      Weaknesses: 

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electron-dense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field. 

      Yeast vacuoles show all major chemical features of acidocalcisomes. They are acidified, contain high concentrations of Ca, polyP (which make them electron-dense, too), other divalent ions, such as Mg, Zn, Mn etc, and high concentrations of basic amino acids. Thus, they clearly have an acidocalcisome-like character. In addition, they have hydrolytic, lysosome-like functions and, depending on the strain background, they can be larger than acidocalcisomes described e.g. in protists. We will elaborate this point, which is obvious to us but probably not to most readers, in the revised version.

      Reviewer #3 (Public review): 

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization. 

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level. 

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

      It is not clear to us what the reviewer would see as a more rigorous test of the model.

    1. eLife Assessment

      This important study suggests that adolescent mice exhibit less accuracy than adult mice in a sound discrimination task when the sound frequencies are very similar. The evidence supporting this observation is solid and suggests that it arises from cognitive control differences between adolescent and adult mice. The adolescent period is largely understudied, despite its contribution to shaping the adult brain, which makes this study interesting for a broad range of neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.

      The analysis, including the systematic comparison of task performance across the two age groups, is most interesting and reveals differences in learning (or learning strategies?) that are compelling.

      Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:

      The weaknesses listed by this reviewer were addressed by adequate revisions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      The weaknesses listed by this reviewer were addressed by adequate revisions.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.

      The analysis, including the systematic comparison of task performance across the two age groups, is most interesting, and reveals differences in learning (or learning strategies?) that are compelling.

      Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:

      The presentation of the paper must be strengthened. Inconsistencies, missing information or confusing descriptions should be fixed.

      We have carefully re-read the manuscript and reviewed it for inconsistencies. We made several corrections in the figures. For example, we removed redundant lines from violin plots and statistics, applied consistent labels, matched y- and x-limits of graphics, and adjusted labels. We also clarified descriptions of some experiment by adding explanations to the text.

      The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? Comparing to the original manuscript, the authors have now done the analysis for AuDp and AUDv separately, and say that the differences are similar in both regions. The data however shows that this is not the case (Fig S7). And even if it were the case, how would it compatible with the published literature?

      To address this and previous concerns about regional differences, the manuscript now includes 4 figures (4-1, 4-3, 6-2, 7-1) and 5 supplemental tables (3,4, 5, 6, 8) that explicitly compare results across brain regions.

      Following the reviewer’s request for subsequent analysis, we now added a new supplemental figure (Fig. S6-2) and two new supplementary tables (Tables S5, S6). We show that similar to expert mice (supplementary Table 3, and supplementary Table 4), the firing properties of adolescent and adult novice mice differ across auditory subregions (supplementary Table 5). We also show that the different auditory subregions have different firing properties (supplementary Table 6). With respect to task engagement, we show that (similar to Fig. S4-2) the neuronal discriminability in different auditory subregions is similar in both novice and expert mice (Fig. S6-2).

      Following the comment on Fig. S7-1, we made three changes to the revised manuscript. First, we now highlight that the differences firing properties between adolescent and adult neurons in AUDp and AUDv were distinct, but not significantly different within age-group comparisons. Second, we clearly state that the learning related changes in the measured parameters are different between AUDp and AUDv. Note, however, the greater changes in adult neurons after learning remains consistent between AUDp and AUDv. Third, we softened our original claim but still highlighted the stronger learning-induced plasticity in adults.

      Regarding the concern that different regions should show different patterns due to their known differences (e.g. tonotopy). Of course we agree that different areas differ functionally (as shown in our own previous work and here as well). However, it is still plausible, and biologically reasonable, that developmental changes may proceed in a similar direction across different areas, even if their baseline coding properties differ.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We have carefully re-read the manuscript and reviewed it for analyses that lacked a clear rationale or conclusion. To address this, we have made several changes to clarify the reasoning and strengthen the interpretation of the results.

      Reviewer #1 (Recommendations for the authors):

      It would have helped if the authors had highlighted the changes they made to the manuscript compared to the original version - especially since many replies to the reviewers' comments were as vague as "...we fixed some of the wording so it adheres to the data shown", or "we refined our interpretation", without further details.

      The revised version has improved substantially, and the main claims have been discussed in a more objective way. Important new analyses have been added to allow for a refined interpretation of the results. However, the presentation of the data could still be strengthened significantly (in response to comment A from last review).

      We apologize for the lack of detail in some of our previous responses. Our intention was to keep the replies concise, assuming that the side-by-side version with tracked changes would make the edits sufficiently clear. However, we understand the need for greater transparency. Thus, below we provide the following five lists describing the major changes: (1) List of specific reviewer recommendations, (2) list of corrections in figures, (3) list of clarity issues, (4) list of fixed mistakes, (5) list of new figures. We hope this breakdown makes the revisions clearer and more accessible.

      List of specific reviewer recommendations:

      l.108 mentions a significant change in the vertical line of Fig 1F - Could this significance be indicated and quantified in the figure?

      We quantified and indicated the significance of the vertical line in Fig. 1f and Fig. 1i.

      Fig.1G - the thick and thin lines should be defined, as well as the grey and white dots (same values for adolescents, not for adults).

      (a) We removed the thin inner lines from the violin plot. We define the bar (thick line) of the violin plot in an additional sentence in the methods section under data analysis (LL820-823). b) We adjusted the marker outlines in the adult data (Fig. 1G).

      the figure axis legends should be consistent (trails in Fig D vs # trails in Fig 1F)

      We adjusted the axis legend to # trials in Fig. 1D.

      l.110: is d' always calculated based on the 100 last trials of a session, or is it just for Figure 1F? -etc...

      d’ is always calculated based on the last 100 trials. To clarify this, we added a description in the methods section (L830).

      List of corrections in the figures:

      (1) We removed the internal lines from violin plots in throughout Fig. 1-7.

      (2) We removed the underline of the statistics throughout Fig. 1-7.

      (3) We consistently applied ‘adolescent’ and ‘adult’ figure labels and titles with lowercase letters throughout Fig. 1-7.

      (4) We applied consistent labelling of ‘time (ms)’ throughout Fig. 1-7.

      (5) We matched the size of dashed lines throughout Fig. 1-6.

      (6) We adjusted the x-label of Fig. 1d, Fig. S-1-1 a, Fig. 3c, Fig. 3h-i, Fig, 4d to ‘# trials’.

      (7) We removed the x-label of ‘Experimental Group’ from Fig. 1 to enhance consistency with other figures.

      (8) We removed misaligned dots from the violin plots in Fig. 1g, Fig. 2f, Fig. 3f,g.

      (9) We corrected the plot in Fig. S1-1b.

      (10) We adjusted the y-limits of Fig. S1-1c to be consistent with Fig. S1-1d,e.

      (11) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (12) We added the age of adolescent and adult mice to the schematic timeline in Fig. 2a.

      (13) We added a label of the reinforcement delay to the schematic trial structure in Fig. 3b.

      (14) We added within-group statistics to Fig. 3e and the figure legend.

      (15) We adjusted the x-label of Fig. 3d to ‘# sessions’.

      (16) We adjusted the x-label of Fig. 3d and Fig. S3-1b to ‘# licks’.

      (17) We changed the y-label in Fig. S3-1a, and Fig. S3-2d, e to ‘lick ratio’ to avoid confusion with the lick rate (Hz) that was calculated in Fig. 4 and Fig. 6.

      (18) We replaced the titles ‘CAMKII’ with ‘dTomato’ in Fig. S3-2 to correctly highlight that both the experimental and control injection were CAMKII injections.

      (19) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (20) We adjusted the y-label of Fig. S4-1c to ‘# neurons’.

      (21) We matched the x-ticks in Fig. 4e,f.

      (22) We matched the x-ticks in Fig. 6d-g.

      (23) We changed the x-label in Fig. 4g, S4-2 and S6-2 to ‘duration (ms)’ to match the figure label with the manuscript.

      (24) We consistently label ‘Hit’, ‘Miss’, ‘FA’ and ‘CR’ with capital letters in Fig. 4d-e.

      (25) We replaced the double figure label ‘C.’ in Fig. S4-2 with ‘D.’.

      (26) We adjusted the dot-size in Fig. 5 to be equal for all graphs.

      (27) We added ticks to the experimental timeline in Fig. 6a.

      (28) We corrected the y-label in Fig.7c. Now it correctly reflects 5 attenuations from 72-32 dB SPL.

      (29) We matched the y-label of Fig. 7e-h and Fig. S7-1.

      List of clarity issues:

      (1) We replaced the term ‘lower response bias’ with ‘higher lick bias’ (L24) to accurately describe the more negative (lower) criterion-bias, which highlights a higher tendency to lick.

      (2) We replaced the term ‘response bias’ with ‘lick bias’ to consistently describe the calculated criterion-bias (L24, L149, L164, L455, L456, L468).

      (3) We clarify that the age-related differences were ‘more pronounced’ instead of simply ‘higher’ to accurately reflect not simply the increase in adolescent lick-bias, but also the decrease in adult lick-bias (L31).

      (4) We clarified that adolescent sound representations are not merely ’distinct’, but ‘not fully mature’ in L83.

      (5) We clarified in L180 that the impulsive responses we observed in adolescent mice could be related to being ‘less impacted by punishments’.

      (6) We clarified the differences in firing properties of auditory sub-regions analyzed in Supplementary Table 3 (L287-295).

      (7) We explained and clarified the reference to Fig. 3j (LL252-253).

      (8) We added statistics to Fig.S4-2 to support our claim that there are no differences in the onset-latency, duration of discriminability and maximal discriminability between different sub-regions within age-groups (LL 314-315).

      (9) We expanded our explanation of the results in Table 3 (LL370-379).

      (10) We separated the reference to Fig. 6b and Fig. 6c to clarify their meaning (LL358-361).

      (11) We clarified the differences in basic firing properties during the FRA protocol in Fig. 7 (LL409-418).

      (12) We expanded our explanation of the differences of the learning related firing properties in AUDp and AUDv of Fig. S7-1 (LL426-433).

      (13) We changed the term ‘plasticity profiles’ to ‘learning related plasticity’ to further clarify our limitation that L5/6 and L2/3 may exhibit distinct learning related changes (L496).

      (14) We changed the term ‘sluggish’ (L481) to ‘delayed’ to more precisely explain differences between adolescent and adult tuning properties.

      (15) We clarified that the running d’ was calculated in bins of 25 trials, instead of ‘the last 25 trials’ (LL845-846).

      List of fixed mistakes:

      (1) We corrected and matched the age to more accurately reflect the age mice were recorded (P37-42 and P77-82).

      (2) We corrected the attenuation range from 72-42 to 72-32 dB SPL to correctly reflect the 5 attenuations used in the protocol.

      (3) We corrected the number of channels shown in the voltage trace from 10 to 11 (Fig. S4-1a)

      (4) We corrected the number of neurons recorded in novice adolescent mice in the legend of Fig. 6 from 140 to 130 (Fig. 6b).

      (5) We removed redundant, or double brackets, commas, dots, and semi-colons in the figure legends.

      (6) We corrected the LME statistics Table 2.

      List of new figures and tables:

      (1) We added a new supplementary figure to accompany Figure 6. Specifically, Fig. S6-2, shows the interaction of the three measured discriminability properties (onset delay, duration of discriminability, and maximal discriminability) in novice compared to expert mice in the easy and hard task (Go compared to No Go). The figure compares the different auditory sub-regions (similar to Fig. S4-2). We show that the discriminability properties within different groups is not significantly different among the four different sub-regions.

      (2) Supplementary Table 5: We compared the firing properties in different auditory subregions in novice mice, and found (similar to expert mice) that the firing properties differ between adult and adolescent mice across the four different sub-regions.

      (3) Supplementary Table 6: We compared the firing properties between different subregions, separately for adolescent and adult novice mice. Similar to expert mice, we found that different auditory subregions differ in their auditory firing properties.

      Reviewer #2 (Recommendations for the authors):

      The authors largely addressed my suggestions.

      Comparing hit vs correct rejection trials in the population decoding analysis (L313-314): The authors acknowledge that comparing these two trial types conflates choice and stimulus decoding but I am not convinced that the changes to the manuscript text make this clear enough to the reader.

      Thank you for pointing this out. We have made additional revisions to clarify this, and other issues more explicitly, as follows:

      (1) We have expanded the explanation of how our population decoding analysis conflates stimulus and choice, and we acknowledge the limitations of this approach in the Abstract (L28), the Results section (L324-326, LL367-370) and the Discussion (LL516-519).

      (2) We replaced the analysis of impulsivity on the head-fixed task. Instead of analyzing all it is, we focus only on ITIs following FA trials (Fig. S3-1c,d). This is more consistent with the analysis in the Educage (Fig. S2-1), where we show that adolescents exhibit increased impulsivity after FA trials. We found a similar result for ITIs following FA trials in the head-fixed task.

      (3) To provide complementary insight, we now further justify our use of the Fisher separation metric alongside decoding accuracy in Figure 5, with a clearer rationale provided in LL343-345

      (4) We also clarified our reasoning for focusing on 62 dB SPL in the FRA-based analysis in LL400-403.

    1. eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and task-irrelevant features is a signature of the strength of cognitive control?

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

    3. Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:<br /> The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:<br /> The choice of analyses are largely well-suited towards the questions at hand - cross-condition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      Weaknesses:

      (1) Choice of ROIs:<br /> A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      (2) Construction of ROIs:<br /> The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      (3) Task dimensionality:<br /> In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      (4) Related to the above:<br /> The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      (5) Statistical inferences:<br /> Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different high-level stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

    5. Author response:

      We thank the reviewers and editors for their careful and constructive assessment of our manuscript. We have provided a provisional response to the eLife assessment and the reviewer’s public comments below, addressing their main concerns and outlining our planned revisions that we believe will substantially strengthen our paper.  

      eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

      We plan to address both specific methodological weaknesses mentioned in the assessment in our forthcoming revision. First, the revision will include analyses of an early visual cortex ROI as an additional control region, allowing us to test whether the primary auditory cortex findings generalize to the sensory cortex across input modalities. Preliminary results indicate that the early visual cortex ROI exhibits a similar pattern of results, with evidence for coding both task-relevant and task-irrelevant visual dimensions across both tasks, as well as the context dimension specifically in the hierarchy task. Second, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks to mitigate concerns over performance-related confounds. In addition, we will include a set of control analyses that demonstrate that equating the amount of data for pattern analyses across the two tasks by subsampling from the hierarchy task, while reducing our overall power, does not appreciably alter our results. We note that our analyses of representational geometries relied only on neural data from correct trials and, in the first-level modelling of the fMRI data, already controlled for differences in trial-by-trial response times. Therefore, our analyses of decoding and representation similarity are not directly affected by differences in performance across the two tasks. Finally, we have provided clarifications regarding Reviewer 2’s questions about the size and construction of the regions of interest employed in the study, as well as about the language employed to discuss null results.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and taskirrelevant features is a signature of the strength of cognitive control?

      We appreciate the reviewers’ positive evaluation of our paper.

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

      To address the Reviewer’s concern about the difference in behavioural performance between the two tasks influencing our results, we will take several approaches. First, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks. This should ensure that any differences we observe across tasks are over and above those that can be explained by the difference in behavioral performance. Second, we will include a set of decoding analyses that control for differences in performance across the tasks. We note that all our analyses of representational geometries relied on neural data from correct trials only. In addition, the first-level modelling of the fMRI data already controlled for trial-by-trial variability in response times. Therefore, our decoding and representation similarity analyses should not directly be affected by differences in performance across the two tasks. However, one possible issue with this approach is that the larger number of errors in the flat task means that less data was available for estimating multivoxel patterns in the flat task compared to the hierarchy task, resulting in differential power to detect decoding effects across the two tasks. We note that the on average, this difference was not substantial: on average, 21.7 runs were available per participant for the flat task, while 23.8 runs per participant were available for the hierarchy task. Moreover, rerunning our analyses with the number of runs equated for each participant does not meaningfully alter the pattern of results. These additional analyses will be included in the supplement in the forthcoming revised manuscript.  

      Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:

      The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:

      The choice of analyses are largely well-suited towards the questions at hand - crosscondition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      We thank the reviewer for noting the strengths of the paper. We respond to the weaknesses noted below. 

      Weaknesses:

      (1) Choice of ROIs:

      A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      We agree with the reviewer that the whole-brain fMRI data certainly provide ample opportunities to explore the nature of these representations across the brain. Our focus in this paper is squarely on the principles of coding and flexibility in the lPFC. We believe that a whole-brain exploration addresses a separate question that would be out of the scope of this study. To clarify, we are not arguing that the lPFC is the only region in the brain that employs the coding principles that our study brings to light. Our contention is only that lPFC employs these principles, and it differs at least from the primary sensory cortex. The questions of whether these principles generalize beyond lPFC (quite likely) and, if so, how broadly, are distinct from the ones addressed in the manuscript. We intend to follow up with another manuscript that addresses these questions.

      Nevertheless, given the focus of this paper, we agree that a second control region, which allows one to test if the primary auditory cortex findings generalize to the sensory cortex more broadly, would strengthen our claims. We will include an early visual cortex ROI in our forthcoming revision. Preliminary results indicate that the early visual cortex ROI shows a similar set of findings – with evidence for coding of task-relevant and taskirrelevant visual dimensions across both tasks, but also specifically the context dimension in the hierarchy task. These results will be detailed in the forthcoming revision

      (2) Construction of ROIs:

      The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      We defined the lPFC ROIs by selecting Schaefer parcels in the frontal lobe that were previously mapped onto the Control A resting state network identified by Yeo et al. (2011). This network aligns with the multiple-demand network, which has also been identified in the macaque, where it includes the lPFC regions that abut the principal sulcus. Prior results from these regions in the monkey brain provide the scientific premise for our hypotheses. The two lPFC ROIs in each hemisphere were constructed out of 5 Schaefer parcels in each hemisphere. These parcels cluster into the same functional network and tend to behave similarly in univariate analyses. Given that our hypotheses do not distinguish between the different parcels, we elected to improve power by merging them into left and right dlPFC ROIs. 

      On the other hand, the same approach could not be used to identify the primary auditory cortex. As Yeo et al. noted in their paper, the 17 resting state networks they identify did not adequately parcellate somatomotor and auditory cortices into distinct networks, likely due to their proximity (see Fig 14 and related text in Yeo et al. (2011)). We therefore relied on a different approach to define the primary auditory cortex, using an association test in Neurosynth to obtain a map of regions associated with the term “primary auditory”. In the revised manuscript, we will also include a primary auditory cortex ROI, defined again using a term-based association test in Neurosynth.

      Our lPFC ROIs and pAC ROIs are of similar size. In the left hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 128-thru-132) has, on average, 624.55 voxels. The left pAC ROI (defined with Neurosynth) has, on average, 628 voxels. In the right hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 330-thru334), has 470.8 voxels on average. The right pAC ROI has, on average, 568 voxels. A table reporting the size of our parcels and ROIs was included in the supplement. In our forthcoming revision, we will additionally include a supplementary figure visualizing the ROI masks. 

      (3) Task dimensionality:

      In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      We thank the reviewer for this comment, which gives us a chance to clarify our argument.

      As noted by the reviewer, whether a network takes advantage of the compressibility of a task depends on its learning regime (i.e. rich vs lazy). One way to frame our question regarding the lPFC’s coding strategy, then, is to ask whether it operates in a rich or a lazy learning regime (which would predict, respectively, task-tailored vs task-agnostic representations). The reviewer’s concern is that the two task structures we employed are differentially compressible, and therefore, it is inevitable that we observe tailored representations and therefore, our hypotheses are not falsifiable.

      First, it is important to clarify the theoretical premise behind our design and how it relates logically to our hypotheses. Under a lazy learning regime, a network would encode highdimensional representations of both tasks, regardless of their compressibility. On the other hand, under a rich learning regime, representational dimensionality will likely be shaped by the tasks’ structure. If the two tasks differ in their compressibility, only in the rich learning regime would the network learn representations of different dimensionality. Therefore, observing representations with dimensionality tailored to the task structure rules out the possibility that the lPFC is operating in a lazy regime. Therefore, the hypotheses are certainly testable.

      The second point of clarification is that, contrary to the reviewer’s assertion, the flat task is, in fact, compressible – the task can be solved with a categorical representation of the response categories, with no sensitivity to the different specific stimuli within each category. Indeed, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer, demonstrating this compressibility. While we agree with the reviewer that in the space of all possible architectures one might consider the two tasks may differ in compressibility, particularly at the local levels, as we noted above, this does not imply that our hypotheses are not testable.

      Finally, as a third point of clarification, our focus in this paper is on understanding the nature of coding in the lPFC in particular. Arguments based on a normative modelling perspective properly apply to the representations learned by an agent (such as an ANN or a human) as a whole. In a minimal feedforward ANN with a single hidden layer trained in a regime which encourages compression (i.e. a rich learning regime), it would indeed be the case that the representational dimensionality in that hidden layer would be higher for less compressible tasks. However, when applied to humans, such an argument applies to the brain as a whole rather than to an individual region of the brain like the lPFC. As such, it is less straightforward to predict how a single region might represent a task without additional information about the region’s inputs, outputs and broader position in a network. Even for a highly compressible task, a particular brain region may nevertheless be sensitive to all task dimensions. Conversely, even when a task is not compressible, a particular population within the brain may be invariant to some task features. For example, the primary auditory cortex is expected to be invariant to visual task dimensions.

      Therefore, how a task is represented in the lPFC in particular (as opposed to the whole brain) depends on its computational function and coding principles, which remain debated. For instance, as some accounts (such as the guided activation theory) posit, if the primary function of the lPFC is to encode ‘context’ and shape downstream processing based on context, we might only expect to see the abstract coding of the auditory context in the hierarchy task (and, perhaps, the response categories across both tasks as they encode the ’context’ for the lower-level response decision), while being invariant to lowerlevel features of the input. In our paper, we specifically contrast two accounts of lPFC coding that have emerged in the literature – one positing that the lPFC learns a representation tailored to the structure of the task, and another that the lPFC encodes a high-dimensional representation that privileges sensitivity to many task features and their non-linear mixture at the cost of generalization. Regardless of the compressibility of the tasks in question, how the lPFC encodes the two tasks is an empirical question.

      In our forthcoming revision, we will clarify these points in the discussion. We will also include the results of neural network simulations alluded to above.

      (4) Related to the above:

      The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      As explained above, there is no need for high local separability in the flat task. The lPFC could have completely abstracted over the individual trial-types that contributed to each response category, encoding only the response categories. Indeed, as also noted above, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer. The two hidden layer units code for each of the two response categories. 

      (5) Statistical inferences:

      Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

      We thank the reviewer for pointing these out. These sentences will be corrected in the revision. For instance, the sentence above will be modified to “Therefore, we find no evidence that the overall separability of pAC representations is shaped by either taskrelevance or task structure.”

      Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different highlevel stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

      We thank the reviewer for highlighting the strengths of our paper and their feedback on the writing. We have reviewed these helpful suggestions with an eye to which we may implement in our revision to improve clarity. Our forthcoming revision will 1) provide clearer scaffolding to aid the reader in understanding our design, analyses and our interpretation of the results 2) incorporate the MDS-based visualization of the representational geometries, which is currently presented in the Supplement, as a figure panel in the main text, 3) provide a justification for the particular task structures we picked in the introduction and 4) incorporate a new paragraph in the Discussion section to highlight the implications of our findings for cognitive control.

    1. eLife Assessment

      The study introduces new tools for measuring the intracellular calcium concentration close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. This approach yields important new information about the spatial and temporal profile of calcium concentrations near the site of entry at the plasma membrane. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in calcium domains. The conclusions are solid and well supported by the data.

    2. Reviewer #1 (Public Review):

      This paper describes technically impressive measurements of calcium signals near synaptic ribbons in zebrafish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. The experiments appear to be well-done and provide strong evidence for the main conclusions reached.

      Strengths

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. Hence the results provide a unique window onto these events.

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results.

    3. Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Comments on revisions:

      Several concerns were raised about the kinetic analyses, and the authors have carefully acknowledged the critiques. The ideal outcome would have been a more complete kinetic readout and analyses (in particular a better readout of risetime would have improved the results). In the absence of a suitable readout of the risetime, the authors scaled back their claims and improved on the description of the falling phase of the signals. The authors have given a reasonable response under the circumstances.

      In addition, the authors provided more context to their results.

      I have no further concerns.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Strengths:

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Weaknesses:

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. The readers should be aware of this, when interpreting the results.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate. 

      Strengths 

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. 

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results. 

      Thank you very much for this positive evaluation of our work.

      Weaknesses 

      Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well. 

      Thank you for this fair and valuable feedback. Following also the suggestion by the Editor, we have now removed the rise-time kinetic fitting results from the manuscript and only retain the bi-exponential decay time constant values. Further, we explicitly detail the issues with kinetic fitting, and state that the precise quantitative conclusions should not be drawn from the differences in kinetic parameters (pages 7 and 2728). 

      We have included the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & B, Fig. 3C & D, Fig. 4C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig. 3E & F, Fig.5E, and Fig. 8B&C, we have included the results of an unpaired t-test. We have also included the t-test statistics information in the respective figure legends in the revised version.

      In Figure 8, we have shown example fluorescence traces from two different cells at the bottom of the A panel, and example traces from different ribbons of RBC a in the D, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements. 

      Yes, we do believe that the high-affinity indicator is partially saturated, and therefore, the measurement with the low-affinity indicator dye is a more accurate reflection of the measured Ca<sup>2+</sup> signal. We now state this more explicitly in the text. Further, we note that the rise time values are no longer listed due to lack of statistical significance for such comparisons, as noted above.

      Reviewer #2 (Public review): 

      Summary: 

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal. 

      Strengths: 

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM. 

      Thank you very much for this positive evaluation of our work.

      Comments on revisions: 

      Specific minor comments: 

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand. 

      Thank you for pointing that out. We have updated the final sentence of the Abstract.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).  

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.  

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.  

      Thank you for the valuable suggestion. We have now provided this information in the introduction and discussion.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-OffBCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......". 

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites. 

      Strengths: 

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging. 

      Thank you very much for this appreciation.

      Weaknesses: 

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results. 

      We acknowledge the reviewer’s point and believe the peptides and genetic approaches to measure local calcium signals have their merits, each with separate advantages and disadvantages.  

      Reviewer #1 (Recommendations for the authors): 

      The revisions helped with some concerns about the original paper, but some issues were not adequately addressed. I have left two primary concerns in my public review. To summarize those: 

      The difference in kinetics of proximal and distal locations is emphasized and quantified in the paper, but the quantification consists of a fit to the average responses. This does not give an idea of whether the difference observed is significant or not. Without an estimate of the error across measurements the difference in kinetic quoted is not interpretable. 

      Thank you for this feedback. Since the kinetics information is a minor part of the manuscript, we have followed the Editor’s advice to significantly tone down the comparison of kinetic fit parameters (completely removing the rise-time comparisons), in order to put more focus on the better-documented conclusions. We also note that we did establish statistical significance of the differences in fluorescence signal amplitudes. 

      Somewhat relatedly, the difference in amplitude and kinetics of the calcium signals measured with low and high affinity indicators is quite concerning. The authors added one sentence stating that the high affinity indicator might be saturated. This is not adequate. Should we distrust the measurements using the high affinity indicator? The differences between the results using the low and high affinity indicators is in some cases large - e.g. larger than the differences cited as a key result between distal and proximal locations. This issue needs to be dealt with directly in the paper. 

      Thank you for this feedback. Yes, the measurements from high-affinity indicators cannot report the Ca2+ as accurately as low-affinity indicators. However, the value of HA indicators is in their ability to detect lowamplitude signals that lower-affinity indicators may miss due to lower signal-to-noise resolution.  We added a sentence on page 12 to further stress this point.

      Related to the point about statistics, it is not clear how to related the horizontal lines in Figure 8 to the actual measurements. It is critical for the evaluation of the conclusions from that figure to understand what is plotted and what the error bars are on the plotted data. 

      We apologize for the earlier ambiguity in Fig. 8. In this figure, we first compare proximal (panel B) and distal (panel C) calcium signals across several RBCs, labeled RBC-a through RBC-d. Each RBC contains multiple ribbons, and for each cell, we present the average calcium signals from multiple ribbons using box plots in panels B and C. In these box plots, the horizontal lines represent the average calcium signal for each cell, while the size of the error bars reflects the variability in proximal and distal calcium signals among the ribbons within that RBC.

      For example, RBC-a had five identifiable ribbons. In panels D–F, we use RBC-a to illustrate the variability in calcium signals across individual ribbons. Specifically, we distinguished proximal and distal calcium signals from five ribbons (ribbons 1–5) within RBC-a. When feasible, we acquired multiple x–t line scans at a single ribbon, shown now as individual data points, to assess variability in calcium signals recorded from the same ribbon.

      The box plots in panels E and F display the average calcium signal (horizontal lines) for each ribbon, based on multiple recordings. These plots demonstrate considerable variability between ribbons of RBC-a. Importantly, the lack of or minimal error bars for repeated measurements at the same ribbon indicates that the proximal and distal calcium signals are consistent within a ribbon. These findings emphasize that the observed variability among ribbons and among cells reflects true biological heterogeneity in local calcium domains, rather than experimental noise.

    1. eLife Assessment

      This useful study presents a hierarchical computational model that integrates locomotion, navigation, and learning in Drosophila larvae. The evidence supporting the model is solid, as it qualitatively replicates empirical behavioral data, but the experimental data is incomplete. While some simplifications in neuromechanical representation and sensory-motor integration are limiting factors, the study could be of use to researchers interested in computational modeling of biological movement and adaptive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a three-layered hierarchical model for simulating Drosophila larva locomotion, navigation, and learning. The model consists of a basic locomotory layer that generates crawling and turning using a coupled oscillator framework, incorporating intermittency in movement through alternating runs and pauses. The intermediate layer enables navigation by allowing larvae to actively sense and respond to odor gradients, facilitating chemotaxis. The adaptive learning layer integrates a spiking neural network model of the Mushroom Body, simulating associative learning where larvae modify their behavior based on past experiences. The model is validated through simulations of free exploration, chemotaxis, and odor preference learning, demonstrating close agreement with empirical behavioral data. This modular framework provides a valuable advance for modeling larva behavior.

      Strengths:

      Every modeling paper requires certain assumptions and abstractions. The main strength of this paper lies in its modular and hierarchical approach to modeling behavior, making connections to influential theories of motor control in the brain. The authors also provide a convincing discussion of the experimental evidence supporting their layered behavioral architecture. This abstraction is valuable, offering researchers a useful conceptual framework and marking a significant step forward in the field. Connections to empirical larval movement are another major strength.

      Weaknesses:

      While the model represents a conceptual advance in the field, some of its assumptions and choices fall behind state-of-the-art approaches. One limitation is the paper's simplified representation of larval neuromechanics, in which the body is reduced to a two-segment structure with basic neural control. Another limitation is the absence of an explicit neuromuscular control system, which would better capture the role of segmental central pattern generators (CPGs) and neuronal circuits in regulating peristalsis and turning in Drosophila larvae. Many detailed neuromechanical models, as cited by the authors, have already been published. These abstractions overlook valuable experimental studies that detail segmental dynamics during crawling and the larval connectome.

      The strength of the model could also be its weakness. The model follows a subsumption architecture, where low-level behaviors operate autonomously while higher layers modulate them. However, this approach may underestimate the complexity of real neural circuits, which likely exhibit more intricate feedback mechanisms between sensory input and motor execution.

    3. Reviewer #2 (Public review):

      Summary:

      Sakagiannis et al. propose a hierarchically layer architecture to larval locomotion and foraging. They go from exploration to chemotaxis and odour preference test after associative learning.

      Strengths:

      A new locomotion model based on two oscillators that also incorporates peristaltic strides.

      Weaknesses:

      • The model is not always clearly or sufficiently explained (chemotaxis and odour test).

      • Data analysis of the model movement is not very thorough.

      • Comparisons with locomotion of behaving animals missing in chemotaxis and odour preference test after associative learning.

      • Overall it is hard to judge the descriptive and predictive value of the model.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents a framework for a multilevel agent-based model of the drosophila larva, using a simplified larval body and locomotor equations coupled to oscillators and sensory input. The model itself is built upon significant existing literature, particularly Wystrach, Lagogiannis, and Webb 2016 and Jürgensen et al. 2024. The aim is to generate an easily configurable, well-documented platform for organism-scale behavioral simulation in specific experiments. The authors demonstrate qualitative similarity between in vivo behavioral experiments to calibrated models.

      Strengths:

      The goal is excellent - a system to rapidly run computational experiments that align naturally with behavioral experiments would be well-suited to develop intuitions and cut through hypotheses. The authors provide quantitative descriptions that show that the best-fit parameters in their models produce results that agree with several properties of larval locomotion.

      The description of model calibration in the appendix is clear and explains several aspects of the model better than the main text.

      In addition, the code is well-organized using contemporary Python tooling and the documentation is nicely in progress (although it remains incomplete). However, see notes for difficulties with installation.

      Weaknesses:

      (1) As presented here the modeling itself is described in an unclear fashion and without a particular scientific question. The majority of the effort appears to be calibrating modest extensions of existing models and applying them to very simple experiments. This could be an effective first part of a paper on the software tool, but the paper needs to point to a scientific question or, if it is a tool paper, a gap in the current state of modeling tools needed to address scientific goals. While the manuscript has a good overview of larval behavioral papers, the discussion of modeling is more of an afterthought. However, the paper is a modeling paper and the contribution is to modeling and particularly with this work's minor adaptions of existing models, it is unclear what the principle contribution is intended to be.

      (2) While the models presented do qualitatively agree with experimental data in specific situations, there is no effort to challenge the model assumptions or compare them to alternative models. Simply because the data is consistent in a small number of simple experiments does not mean that the models are correct. Moreover, given the highly empirical nature of the modeling, I wonder what results are largely the model putting out what was put in, particularly with regards to kinematic results like frequency and body length or the effect of learning simply changing the sensory gain constant. It is difficult to imagine how at this level of empirical modeling, it would appear quite difficult to integrate the type of cell-type-specific perturbation or functional observation that is common in larval experiments.

      (3) The central framing of a "layered control architecture" does not have a significant impact on the work presented here and the paper would do better with less emphasis on it. Given the limited empirical models, there are only so many parameters where different components can influence one another, and as best as I can tell from the paper there is only chemotaxis and modulation of a chemotactic gain constant that are incorporated so far. However, since these are empirical functions it says little about how the layers are actually controlled by the nervous system - indeed, the larval nervous system appears to have many levels of local and long-range module of circuits at both the sensory and motor layers. It is not clear how this aspect would contribute beyond the well-appreciated concept of a relatively finite set of behavioral primitives in an insect brain, particularly for the fly larva. What would be a contradictory model and how would the authors differentiate between that and the one they currently propose? If focusing only on olfactory learning and chemotaxis, how does the current framing add to the existing understanding?

      (4) The paper uses experimental data to calibrate the models, however, the experiments are not described at all in the text.

    5. Author response:

      We thank all three anonymous reviewers for their thoughtful evaluations of our manuscript and for recognizing the conceptual advance in combining agent-based behavioral simulations with systems neuroscience models. We are especially encouraged by the acknowledgement of the framework’s potential to support simulation of neural control of individual animal behavior in realistic sensory environments.

      Below, we respond to each reviewer’s public comments in turn. Throughout, we have aimed to clarify our rationale for modeling choices, acknowledge limitations, and outline concrete steps for improvement in the revised manuscript.

      Furthermore, the call for a better description of the model implementation as voiced by all three reviewers and additional requests from community members has prompted us to formulate a separate technically detailed description of the publicly available larvaworld software package as well as of the readily implemented models in form of a preprint paper (Sakagiannis et al., 2025, bioRxiv, DOI: https://doi.org/10.1101/2025.06.15.659765).

      Reviewer #1:

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2:

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3:

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI: https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

    1. eLife Assessment

      NeuroSC is an accessible and interactive tool for streamlined observation of neuronal morphology, membrane contact, and synaptic connectivity across developmental stages in the nematode C. elegans. This important tool relies on solid electron microscopy datasets. This resource will be of high interest to C. elegans researchers interested in nervous system wiring and circuit function.

    2. Reviewer #2 (Public review):

      Summary

      The past several years has seen publication of both new (Witvliet et al., 2021) and newly analyzed (Cook et al., 2019; Moyle et al., 2021; Brittin et al., 2021) data for the C. elegans connectome. The increase in data availability for a single species allows researchers to examine variability due to both stochastic events and due to changes over development. The quantity of these data are huge. To help the community make these data more accessible, the authors present a new online tool that allows examination of 3D models for C. elegans neurons in the central neuropil across development. In addition to visualizing the overall structure of the neuronal processes and locations of synapses, the NeuroSC tool also allows users to probe into the C-PHATE visualization results, which this group previously pioneered to describe similarities in neuron adjacency (Moyle et al., 2021).

      Strengths

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings are incredibly useful, they were necessary simplifications for a 2D publication and lack details of the complex architecture seen within each EM image. Koonce et al takes advantage of their own and others segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization is intended to allow users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3-D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source. This revision includes an option where hovering over an individual neurons, synapse, or contact will pull up a statistics panel. The addition of text to the video tutorials in the revision is very useful.

      Weaknesses

      There are several bugs with this tool, which make it a bit clunky to use and suggest a lack of rigorous testing. There are also issues with data availability. I was disappointed that my "recommendations for the authors", which focused on the user interface, were not addressed in the response to reviewers.

    3. Reviewer #3 (Public review):

      Summary:

      This work provides graphical tools for reconstructing the detailed anatomy of a nervous system from a series of sections imaged by electron microscopy. Contact between neuronal processes can direct outgrowth and is necessary for connectivity, thus function. A bioinformatic approach is used to group neurons according to shared features (e.g., contact, synapses) in a hierarchy of "relatedness" that can be interrogated at each step. In this work, Koonze et al analyze vEM data sets for the C. elegans nerve ring (NR), a dense fascicle of processes from181 neurons. In a bioinformatic approach, the clustering algorithm Diffusion Condensation (DC) groups neurons according to similar cell biological features in iterations that remove chunks of differences in feature data with each step ultimately merging all NR neurons in one cluster. DC results are displayed with C-Phate a 3D visualization tool to produce a trajectory that can be interrogated for cell identities and other features at each iterative step. In previous work by these authors, this approach was utilized to identify subgroups of neuronal processes or "strata" in the NR that can be grouped by physical contact and connectivity. Here they expand their analysis to include a series of available vEM data sets across C. elegans larval development. This approach suggests that strata initially established during embryonic development are largely preserved in the adult. Importantly, exceptions involving stage specific-specific reorganization of neuronal placement in specific strata were also detected. A case study featured in the paper demonstrates the utility of this approach for visualizing the integration of newly generated neurons into the existing NR anatomy. Visualization tools used in this work are publicly available at NeuroSCAN.

      Strengths:

      A web-based app, NeuroSCAN, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development.

      Weaknesses:

      minor revisions

      Comments on Revisions:

      The authors have satisfactorily addressed my critiques.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Comment 

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSC's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      We thank that reviewer for this positive assessment of our work.

      Comment

      NeuroSC provides an accessible and convenient platform. However, many of the characteristics of NeuroSC overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSC will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      We have added new language to more explicitly state the motivations for developing NeuroSC (Introduction, lines 98-111, and discussion lines 375-384). In a new discussion section, we also include comparisons of the features of NeuroSC with other existing tools, like Neuroglancer and Webknossos, (lines 393-417).

      Briefly, the functional features of NeuroSC are substantially different (and do not exist) in other web-based tools for navigating EM datasets, including NeuroGlancer. This is because the intended use of NeuroSC is substantially different (and purposefully synergistic) to the intended use, and tools available, in NeuroGlancer. 

      NeuroGlancer is a versatile tool designed primarily for web-based visualizations and sharing of large EM datasets. NeuroSC was not designed to enable this type of access to the primary EM data (purposefully done because these features were already available through tools like NeuroGlancer). 

      Instead, the explicit goal of NeuroSC is to provide a platform specifically optimized for examining neuronal relationships across connectomic datasets. NeuroSC builds on the segmentations emerging from programs like NeuroGlancer, but the tools are tailored to explore relationships such as contact profiles in the context of neuronal morphologies and synaptic positions, and across datasets that represent different animals or different developmental stages. 

      To achieve this, all datasets in NeuroSC were optimized to facilitate comparisons across different connectomes of segmented neuronal features, including: 1) alignment of the neurons that are compared upon the display of the segmentations; 2) synchronization of the 3D windows; 3) implementation of a ‘universal color code’ across datasets for each neuron and relationship for easy visual comparisons; 4) use of the specific neuronal names to label instances of the same cells across all available datasets. The use of precise neuronal names among separate data sets allows integration of these objects with other catalogued datasets, including genomic and neuronal activity profiles.

      The formatting and display of the datasets used in NeuroSC was accompanied by the development of new tools including: 1) Rendering of the contact profiles of all neurons in the context of the morphology of the cell and the synapses and 2) C-PHATE diagrams to inspect multidimensional relationship hierarchies based on these contact profiles. In NeuroSC, C-PHATEs can be navigated and compared across multiple stages of development while visualizing neuronal reconstructions, allowing users to compare neuronal relationships across individual datasets.

      We agree with the reviewer that these tools are most useful when integrated. With that intention in mind, we designed NeuroSC as a series of modular, open-source tools that could be integrated into other programs, including Neuroglancer. In that sense our intent was not to produce another free-standing tool, but a set of tools that, if useful, could be integrated to other existing web-based connectomic resources to enhance the user experience of navigating complex EM datasets and draw biological meaning from the relationships between the neurons. Additionally, we intentionally designed NeuroSC to enable the ability to integrate new methods of understanding neuron relationships as they arise. We have dedicated a more detailed section to the discussion (lines 369- 417) to better convey this intention and directly address the unique abilities of NeuroSC as a complementary tool to the powerful existing tools, including Neuroglancer.

      Comment

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      [what do they even mean by linking?]

      We thank the reviewer for this insightful comment and have implemented several improvements to address these suggestions. Specifically, we have added new features to enhance user access to quantitative data within the NeuroDevSCAN viewer:

      Cell, Patch, and Synapse Statistics: Users can now see a statistics panel when clicking on a rendered neuron, contact patch, or a synapse. These panels provide the following information, respectively, and are highlighted in lines 303-315):

      Cell Stats: Click on a cell rendering to show cell stats which displays the total volume and surface area of the selected neuron within the defined neuropil area of our datasets (see Methods). 

      Contact Stats: Click on a patch rendering to show ‘contact stats’. This pop up displays quantifications of the selected contact relationship. Rank compares the summed surface area of contacts ("patches") between these two neurons relative to all other contact relationships for the primary neuron for the cell and the whole nerve ring. A rank of 1, for example, means this neuron pair shares the largest contact surface area of the examined relationship. “Total surface area” is displayed in nanometers, and is the summed surface area of all patches of this identity. Contact percentages are presented in two ways: (1) as the proportion of the primary cell's total surface area occupied by the contact in question, and (2) as the proportion of the total surface area of the nerve ring occupied by that same contact. (Showcased in figure S5). 

      Synapse Stats: A click on a synapse rendering now shows ‘synapse stats’, which displays the number of synapses of the selected identity within the primary neuron, including any polyadic synapse combinations involving the primary neurons. (Showcased in figure S7).

      (1) Grouping and Readability Improvements: While individual synapses are still visualized, their display has been improved for legibility. We have condensed the lengthy naming scheme to improve clarity and codified the synapse type by using superscript letters C, E, U to represent chemical, electrical and undefined synapses, respectively. This is explained and shown in figure S7, we added arrows to indicate the directionality of presumed information flow at each synapse. 

      (2) Developmental Linkage: We can link objects across datasets via cellular identity, but each synapse in the dataset does not yet have an identity attributed to its spatial coordinates, preventing us from linking specific synapses across development beyond their connectivity (ie, that a given synapses connects cell X to cell Y, for instance), also addressed in R1.11.  

      Together, these improvements substantially enhance the utility of the viewer for hypothesis generation by making key quantitative data readily accessible.

      Comment

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSC in generating future hypotheses.

      There are several ways to visualize DC outputs, and one way to quantitatively compare DC clustering events of neurons is via Sankey diagrams. To make the inclusion of these resources more clear, we have highlighted them in lines 175-178 (Supplemental Tables 3-6). ‘DC outputs for each strata across animals can also be inspected using Sankey diagrams (Supplemental Tables 3-6). These spreadsheets detail the neuron members at each iteration of DC, allowing the user to derive quantitative comparisons of clustering events.’

      As the reviewer points out, DC is a deterministic algorithm that will iteratively cluster neurons based on the similarity of their contact profiles. To better explain the selection process for the threshold, the number of DC iterations and the quantitative metrics between the neurons, we have added new text in the Diffusion Condensation methods section.  Briefly:

      Number of DC iterations: During diffusion Condensation (DC) we track the modularity of the resulting clusters at each iteration and select the iteration with the highest modularity to define the clusters that represent the strata  (Moyle et al., 2021), (Brugnone et al., 2019). Mathematically, modularity is calculated by comparing the actual number of edges within clusters to the expected number of such edges in a randomized network with the same degree distribution (Newman et al., 2006). A higher modularity value implies that nodes within the same cluster are more densely connected to each other than to nodes in other clusters. We now better explain this in lines 562-567.

      Threshold for merging points: The threshold (epsilon) used to merge data points in each iteration is set as a small fraction of the spatial extent of the data: for each coordinate dimension (x, y, z), we compute the range (maximum minus minimum), take the maximum of these three values, and divide it by 10,000. This process is performed iteratively for each round of clustering until all data points cluster into a single point. We have updated the manuscript to clarify this threshold selection and included this information in the revised algorithm description and pseudocode. We now better explain this in lines 556-559.

      Distances between neurons in DC C-PHATE: In our previous description in Box 1 algorithm 1, we had provided a general algorithm for DC for any high dimensional dataset. We have now revised the algorithm to indicate how we used DC for these EM datasets. 

      Distances between neurons are determined by the pixel overlap between their segmented shapes in the EM dataset. We use these distances to build a graph with weighted edges, in which the weight of the edge represents the pixel overlap (the adjacency in the actual EM segmentation). Affinities between neurons, which are a proxy for their distance in the graph, are then computed as now revised in Box 1, Algorithm 1. This process is done iteratively as neurons cluster. To better communicate this, we have changed the text in lines 533-538.  

      Comment

      R1.5. While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSC platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

      We now better explain in the manuscript that the selected case study, of the AVF neuron outgrowth, is not one of just correlation based solely on an EM dataset. Instead, the case study represents the NeuroSC-driven exploration of a biologically significant event supported by several independent datasets, as now explained in lines 257-276.

      Briefly, we agree with the reviewer that examining differences across individual EM datasets is insufficient evidence to make conclusions about developmental changes. But the strength of NeuroSC is in its ability to combine and compare multiple datasets, bolstering observations that are not possible by looking at just one dataset, and providing new insights on the way to new hypotheses. We now better explain that we are not looking at single connectomes in isolation and then deriving conclusions, but instead using NeuroSC to compare across 9 EM datasets. We better explain how the tools in NeuroSC, including C-PHATE, enabled comparisons across these multiple connectomes to identify apparent differences in neuronal relationships. We then explain that by using NeuroSC, we could examine these variations in neuronal relationships at the level of individual, cell biological differences of neuronal morphologies between the developmental datasets. This could be due, as pointed by the reviewer, to differences due to development, or just differences between individual animals. In the case of AVF, that features are absent in all early specimens, then arise and persist in all specimens after a certain time point, which lead us to hypothesize they result from a developmental event. Because the segmented objects in NeuroSC are linked to neuronal identities, we are also able to cross reference our observations from the EM datasets with information in other datasets and the literature. In the specific case of postembryonic development of AVF outgrowth, we can now tie the knowledge, from developmental lineage information and molecular profiles, that AVF is a postembryonically born neuron (Sulston et al. 1977, Sun et al 2022, Poole et al 2024, wormatlas.org) to the outgrowth dynamics of its neurites using the postembryonic EM datasets. Our findings using  NeuroSC provide a proof of concept of the utility of the resource and extended our understanding of how the outgrowth of this neuron affects the relationships between the neural circuits in the nerve ring.

      Comment

      R1.6. Given that recent studies have also quantified contact area between neurons across multiple connectomes (Cook et al., Current Biology, 2023; Yim et al., Nature Communications, 2024), and that the authors use a slightly different approach to quantify contact area, a direct comparison between contact area values obtained in this study with prior studies seems appropriate.

      We acknowledge that there are multiple different approaches to calculate adjacencies. In the papers cited above, there are 3 different algorithms used:

      (1) Brittin 2019 (python parse Track EM, boundary thresholds), used in Cook et al 2023, Moyle 2021, and this study).

      (2) Witvliet 2021 (Matlab 2D masks), used in Cook et al 2023.

      (3) Yim 2024 (3D masks), used in Yim et al 2024.

      To briefly describe the different approaches, and the methods we chose for this paper:

      Algorithm 1 (used in this study) defines adjacency based on distances between boundary points in TrakEM2 segmentations, allowing threshold tuning to accommodate differences in resolution and image quality across datasets—an important feature for consistent cross-dataset comparisons.

      Algorithm 2 infers contact via morphological dilation of VAST segmentations, identifying adjacency through overlapping expanded boundaries. 

      Algorithm 3 uses voxelwise contact detection with directional surface area measurements and normalization to account for dataset size differences. 

      In NeuroSC, we use algorithm 1, mostly because we had tested the rigor of this method in (Moyle et al. 2021), where we have shown that results were robust across a range of thresholds. This flexibility enables tailored application across datasets of varying quality and scale, critical for NeuroSC’s mission of curating data sets across differing methodologies to allow for direct relationship comparisons. We detail the methodology for defining thresholds for each dataset in methods section lines 492-521, defined in Supplementary table 1. Another difference between our analysis and the previously cited work is that for our analysis we also chose to include all individually resolved neurons, including post-embryonic cells, without collapsing them into left/right or dorsal/ventral symmetry classes. In this way our approach retains the full cellular resolution of the nervous system. 

      Comment

      Neuroglancer is not mentioned at all in the manuscript, despite it being a very similar and widely accepted platform for vEM data visualization across model organisms. An explicit comparison of NeuroSC and Neuroglancer would be appropriate, given the similarity of the tools. Currently, published C. elegans data (Witvliet et al., 2021; Yim et al., 2024) use Neuroglancer-based viewers, and directly comparing NeuroSC and highlighting its strengths relative to Neuroglancer would strengthen the paper.

      In the original manuscript we had not mentioned tools like Neuroglancer because we envisioned them as distinct, in intended use and output, from NeuroSC. But, as explained in R1.2 comment, in the revised version we have included a section in the Introduction lines 98-108 and in the Discussion (lines 369- 417) that compares these types of web-based tools and highlights synergies. 

      Comment

      Assigning shorthand names to strata, such as "shallow reflex circuit" (page 4, line 172), may oversimplify this group of neurons. Either more detailed support for shorthand names of C-PHATE modules should be included, or less speculative names for strata should be used.

      We appreciate this comment and understand that the original language used in the manuscript to describe strata categorizations may run the risk of oversimplification. We have now clarified the text to communicate that: 1) Strata are labeled by numbers (Strata 1, Strata 2, Strata 3 and Strata 4), rather than functional features of the neurons forming part of the strata, and that 2) the assignment of ‘strata’ is just one level of classification available via DC/CPHATE (as explained below). 

      To be sure, we have observed and published (Moyle et. al. Nature 2021) that within a given stratum, many neurons share the functional identities that we have used as summary descriptors for the strata (eg, shallow reflex circuits for Stratum 1; sensory and integrative circuits in Strata 3 and Strata 4; command interneurons in Strata 2, etc). However, those cell types are not the only members of the strata. We have adjusted the language in lines 197-204 to reflect this more clearly. “Stratum 1, which contains most neurons contributing to shallow reflex circuits that control aversive head movements in response to noxious stimuli, displayed the fewest changes among the developmental connectomes (Figure 3B–F; Supplementary Table 3). In contrast, C. elegans exhibit tractable behaviors that adapt to changing environmental conditions (Flavell et al., 2020). Strata 3 and 4 contain most neurons involved in circuits associated with such learned behaviors, including mechano- and thermo-sensation. This is reflected in Strata 3 and 4 showing the most change in neuronal relationships across postembryonic development.“

      Comment

      The authors state that NeuroSC can be applied to other model organisms. Since model organisms with greater neuron numbers include more individual neurons per cell class, the authors should support this by quantitatively demonstrating how DC/C-PHATE relationships correlate with shared functional roles among C. elegans neurons.

      We now clarify in the manuscript that, like in other organisms, C. elegans neurons are also grouped into functional classes with shared characteristics. In the context of the cylindrical nerve ring of the animal, these neuronal classes are sometimes bilaterally symmetric (forming left-right pairs), four-fold symmetric and six-fold symmetric. We now explain in the discussion that the DC/CPHATE analyses group these neuron classes and their relationships (lines 442-451). In the specific section mentioned by the reviewer, we now also add new text to contextualize this concept and how it might relate to the possible use of these tools in organisms with larger nervous systems: ‘However, our previous work has demonstrated that DC/CPHATE clustering of C. elegans neurons consistently pulls out clusters of shared neuron classes and shared functional roles Moyle et al. (2021). Building on this foundation, we envision applying similar clustering approaches to larger connectomes, aiming to identify classes and functionally related neuronal groups in more complex nervous systems. We suggest that contact profiles, along with neuron morphologies and synaptic partners, can act as ‘fingerprints’ for individual neurons and neuron classes. These ‘fingerprints’ can be aligned across animals of the same species to create identities for neurons. Frameworks for systematic connectomics analysis in tractable model systems such as C. elegans are critical in laying a foundation for future analyses in other organisms with up to a billion-fold increase in neurons (Toga et al., 2012).’

      Comment

      Lack of surface smoothing in NeuroSC leads to processes sometimes appearing to have gaps, which could be remedied by smoothing with a surface mesh. 

      We thank the reviewer for the suggestion, and understand the visibility of gaps in certain neuron processes can be distracting. But this was an intentional choice, with our main goal being to show the most accurate representation of the available data segmentation and avoid any rendering interpretations. In this way, we render the data with the highest fidelity we can and as close as possible to the ground truth of the EM segmentation. We have added language to describe this in the methods, lines 490-491, and in Figure legend 5b.

      Comment

      Toggling between time points while maintaining the same neurons and contact area in NeuroSC is a really valuable feature. The tool would be improved even more by extending this feature to synapses, specifically by allowing the user to add an entire group of synapses to the viewer at once (e.g. "all synapses between AIM and PVQ"), and to keep this synapse group invariant when toggling between developmental stages.

      We thank the reviewer for this suggestion. In response we have now implemented a new feature to ‘clone’ a rendered scene across time while preserving the original elements to ease comparisons. Once the user has rendered a scene, they can use the in-viewer developmental slider to clone the renderings and assigned colors, but display the renderings of the newly selected timepoint. These renderings populate a new window tab which can be dragged to align developmental stage windows side by side. We have added a sentence to account for this in lines 315-317 and to the legend of supplemental Figure S11. 

      Reviewer #2 (Public review)

      Comment

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      We thank that reviewer for this positive assessment of our work.

      Comment

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data is readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      We thank the reviewer for this feedback and in response have now implemented displays with quantitative information in NeuroSC. Now, upon hovering over a contact patch or synapse, the user will see the quantitative data of the relationship. For contact patches, you will see the total area shared between two neurons in that dataset. On hovering over a synapse, you will see how many synapses there are in total with the same members and throughout the dataset. We agree that this improves user analyses, (see also R1.3 response).

      Comment

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

      We appreciate that some of the more complex models can take a while to load. One of our core goals is to keep the high resolution of our models to most accurately represent the EM data, so we had to compromise between resolution and loading times. But to address this concern we have now added a ‘loading’ prompt that reassures the user when there is a wait. We also added, as suggested, text guidance throughout all of the supplemental videos (Supplemental Videos 1-4).

      Reviewer #3 (Public review)

      Comment

      A web-based app, NeuroSC, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development In the opinion of this reviewer, only minor revisions are required.

      We thank that reviewer for this positive assessment of our work.

      Comment

      Contact is defined by length, why not contact area? How are these normalized for changes in the overall dimensions of neurons during development?

      To clarify our methodology: the adjacency algorithm that we use generates a 2D adjacency profile by summing the number of adjacent boundary points per EM section, which are then summed across all EM z slices.

      Contact area can be derived by multiplying the adjacency length in each slice by pixel resolution and z-thickness. Prompted by the reviewer we have now also calculated and display contact surface areas, along with their ranks among all contact relationships for a given neuron. These can be inspected directly via the interface by clicking on a rendered cell or contact patch (Figure S5 and lines 308-312). We believe these additional surface area metrics enhance the interpretability and utility of the viewer.

      We apply normalization at the level of the adjacency threshold to account for dataset-specific differences such as contrast, boundary definition, and age-related changes in neuropil packing density. This normalization is applied before running the adjacency algorithm. We do not normalize by individual neuron size, as the contact data are intended to reflect relational differences between neurons, rather than absolute morphological scaling. In fact, our addition of a scale-spheroid within each rendered model emphasizes the large increase in spatial scale that the nerve ring experiences during larval growth.  

      Comment

      Figure 1, C&D, explanation unclear for how the adjacency matrix is correlated with C-Phate schematic in D.

      We thank the reviewer for the comment and have clarified this section by adding greater detail to the explanation of how an adjacency matrix is computed (lines 149-155), as well as a description now in the figure legend 1C. Additionally, we revised Figure 1C and D to simplify neuron representations/colors and to simplify the adjacency heat map gradient. We also extended the area of contact between neurons on Figure 1C to better reflect what would be considered a “contact”. Lastly, in the figure, we changed the color and placement for the z plane arrow and label from black to white, to make it more visible, to highlight the method of computing adjacency for each z slice. 

      Comment

      Figure 4, panels F & G, unclear why AVF is shown in panel G (L3) but not panel F (L1). Explanation (see below) should be provided earlier, i.e., AVF is not generated until the end of the L1.

      We have now clarified this important point by adding labels to Figure 4 panels F and G, ‘Pre-AVF outgrowth’ and ‘Post-AVF outgrowth’ respectively. Briefly, the point is that AVF grows into the nerve ring after the L2 stage, and that is why it is absent in panel F (L1 stage, now with the label ‘Pre-AVF outgrowth’).  

      Comment

      Line 146 What is the justification for the statement: "By end of Larval Stage 1 (L1), neuronal differentiation has concluded...."? This statement is confusing since this sentence also states that "90% of neurons in the neuropil...have entered the nerve ring..." which would suggest that at least 10% additional NR neurons have NOT fully differentiated.

      We have fixed this sentence in the text. Now the sentence reads ‘By Larval stage 1 (L1) 90% of the neurons in the neuropil (161 neurons out of the 181 neurons) have grown into the nerve ring and adopted characteristic morphologies and positions. 

      Lines 171-175 What is meant by the statement that "degree of these changes mapped onto...plasticity? What are examples of "behavioral plasticity?"

      We have added the following new lines of text (lines 200-204) and now additionally cite a review discussing C. elegans behaviors to clarify and give context to behavioral plasticity. ‘C. elegans exhibit tractable behaviors which can adapt due to changing environmental conditions  (Flavell et. al. Genetics 2020). Strata 3 and 4 contain most neurons belonging to circuits associated with such learned behaviors, including chemo, mechano and thermo sensation. This is seemingly reflected by strata 3 and 4 harboring the most readily recognized set of changes in neuronal relationships across postembryonic development.’  

      Comment

      Lines 189-190 The meaning of this sentence is unclear, "The logic in....merge events."

      This sentence has been deleted and we have instead refocused our descriptions of C-PHATES comparisons by neuronal clustering trajectories and cluster members (rather than iterations).

      Comment

      Lines 193-208 This section reports varying levels of convergence across larval development in C-Phate maps for the interneurons AIML and PVQL. Iterations leading to convergence varied: 16 (L1), 14 (L2), 22 (L3), 20 (l4), 14 (adult). The authors suggest that these differences are biologically significant and reflect the reorganization of AIML and PVQL contact relationships especially between the L4 and adult. Are these differences in iterations significant?

      We agree this could be confusing and instead of focusing on comparing the iteration at which each merging event occurs, we now focus on examining the differences in members of clusters, before and after the merge event. Cluster membership is easier to interpret than the differences in the number of DC iterations (lines 224-229).

      Lines 240-241 States that AVF neurons "terminally differentiate in the embryo" which is not correct. AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage which accounts for their outgrowth into the NR during the L2 stage. 

      We thank the reviewer for the correction and have edited the text to read: ‘AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage (Sulston et al. (1983); Sun and Hobert (2023); Poole et al. (2024); Hall and Altun (2008); Sulston and Horvitz (1977). AVF neurons do not grow into the nerve ring until the L2 stage, and continue to grow until the Adult stage (lines 261-266).’

      Comment

      Lines 289-315. A detailed and highly technical description of website architecture would seem more appropriate for the Methods section.

      We agree and have moved this section to the methods as suggested (lines 663-690).

      Comment

      Line 307 "source data is" should be "source data are"

      Thank you- we have fixed this grammatical error.

      Comment

      Line 324 "circuits identities" should be "circuit identity".

      Thank you- we have fixed this grammatical error.

      Comment

      Trademark/copyright conflict with these sites? https://compumedicsneuroscan.com/about/ https://www.neuroscanai.com/

      We thank the reviewer for drawing our attention to this. To avoid potential conflicts, we have proactively altered the name to NeuroSC throughout the paper.

    1. eLife Assessment

      This valuable study reports convincing evidence about associations between 35 polygenic indices (PGIs) for social, behavioral, and psychological traits, along with some non-fatal health conditions (e.g., BMI) and all-cause mortality in data from Finnish population-based surveys and a twin cohort linked with administrative registers. PGIs for education, depression, alcohol use, smoking, BMI, and self-rated health showed the strongest associations with all-cause mortality, on the order of ~10% increment in risk per PGI standard deviation. Effect sizes from twin-difference analyses tended to be slightly larger than the effect sizes from population cohorts, opposite the pattern generally observed when testing PGI associations with their target phenotypes and supporting robustness of findings to confounding by population stratification.

    2. Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype. I have two main comments:

      (1) The authors should clarify whether the p-value reported in the text will remain significant after multiple testing adjustment. Some of the large effects might be significant; for example, Figure 2C (note that the small prediction accuracy of PGI in older age groups has been extensively studied, see Jiang, Holmes, and McVean, 2021, PLoS Genetics).

      (2) The authors might check if PGI+Phenotype has improved performance over Phenotype only. This is similar to Model 2 in Table 1, but slightly different.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Weaknesses:

      It is unclear whether the PGIs used for each trait represent the most current or optimal versions based on the latest GWAS data.

      If the Finnish data used in this study also contributed to the development of some of the PGIs, there is a risk of overestimating their associations with mortality due to overfitting or "double-dipping." Similar inflation of effect sizes has been observed in studies using the UK Biobank, which is widely used for PGI construction.

    1. eLife Assessment

      In this valuable study, the authors developed long-term imaging tools to simultaneously monitor the temporal and spatial dynamics of excitatory and inhibitory synapses and reported that excitatory and inhibitory synapses need to develop synergistically during synaptogenesis to maintain balance. While the analysis and quantification of the imaging data are incomplete, there is convincing evidence that the developed tools are feasible. If these tools can function stably in vivo, their applications will be much broader.

    2. Reviewer #1 (Public review):

      Summary:

      By imaging the dynamics of synaptic proteins in cultured neurons, this study presents significant findings regarding the dynamics of excitatory and inhibitory synaptic proteins during development. The evidence shows that the ratios of excitatory and inhibitory synaptic proteins are stable during synapse development. This discovery advances our understanding of the complex mechanisms governing synapse formation. The strength of the evidence is robust, as it is supported by a combination of biological assays and endogenous labeling.

      Strengths:

      This research sheds light on the dynamics of the excitatory and inhibitory synapses during development. It is crucial to understand that while excitatory synapses and inhibitory synapses are developed independently, the ratio of their number is relatively stable during development, maintaining a stable excitatory/inhibitory ratio.

      Important findings and implications in the research include:

      (1) Persistent Synapse Dynamics: Excitatory and inhibitory synapses remain highly dynamic even in mature neurons (DIV12-14), challenging the dogma that synaptic structures are stable after the synaptogenesis stage.

      (2) Maintained E/I Balance: Despite ongoing synapse turnover (formation/elimination) and presynaptic terminal reduction, the overall density and ratio of excitatory-to-inhibitory synapses remain relatively stable during circuit maturation (Figure 7).

      (3) Developmental Shifts: While presynaptic compartments decrease over time, postsynaptic sites increase, suggesting independent regulation of pre- and postsynaptic elements within a stable E/I framework.

      Weaknesses:

      This study focuses on specific synaptic proteins within synapses, which may not fully represent the dynamics of other synaptic machinery; also, whether similar observations exist in vivo is still unknown. Further research is needed to explore the implications of these findings in more complex neuronal environments.

    3. Reviewer #2 (Public review):

      Summary:

      The Garbett et al. identified a critical need to begin to understand the interplay between the assembly, maturation, and elimination of excitatory and inhibitory synapses. They also detail the lack of reliable tools to address this gap in knowledge. Here, the authors developed synaptic reporters expressed by lentiviruses (mClover3-Homer1c, HaloTag-Syb2, and tdTomato-Gephyrin). They combined these reporters with resonance scanning confocal imaging to measure synapses over a 15-hour period during neuron development and in mature neurons in primary hippocampal cultures. Using these reporters in the same neuron, the authors compared the ratios of postsynaptic excitatory and inhibitory specializations that co-localize with presynaptic terminals during development and in mature neurons and found that they are stable across time points. Finally, the authors developed CRISPR/Cas9 tools (TKIT) to knock-in endogenous fluorescent tags (GFP/tdTomato-Gephyrin) or epitope tags (HA-Bassoon and HA-Homer1) to begin to study synapse dynamics using endogenous proteins. I believe this paper highlights an important gap in knowledge and begins to offer methodologies to determine the dynamic coordination between excitatory and inhibitory synapses.

      Strengths:

      (1) The experiments are well-designed and carefully controlled.

      (2) The authors carefully validated the reporter and TKIT constructs.

      (3) The authors provide strong proof-of-principle for the use of the reporter constructs to track synapse formation, maintenance, and elimination over a 15-hour period.

      (4) Ingenious use of technologies (reporters, TKIT, and resonance scanning confocal microscopy) to develop a platform for future studies of synapse dynamics.

      (5) Strong evidence supporting that the ratio of excitatory and inhibitory synapses (those that oppose syb2) stays constant through development.

      Weaknesses:

      Overall, this is a well-executed study that develops tools to simultaneously image excitatory and inhibitory synapse dynamics and represents an important first step to address the fundamental question regarding the coordination between these two types of synapses.

      Minor weaknesses of the manuscript include:

      (1) The lack of a characterization of endogenous Homer1-positive excitatory synapses using TKIT.

      (2) Discussion about other approaches to study excitatory and inhibitory synapses using endogenous proteins (e.g., intrabodies - FingR or nanobodies) should be included.

      (3) The activity state of a neuron and/or a synapse might alter the dynamic properties (formation, maintenance, and/or elimination). A discussion on whether the overexpression of Homer1 and/or gephyrin might alter synapse/neuron activity would provide greater interpretability of the results. A discussion of the potential limitations and benefits of the reporter and TKIT approaches would be beneficial.

      (4) A description and interpretation of the computational approach to calculate particle tracking would be helpful. I found that particle tracking figures, while elegant, are difficult to interpret.

    4. Reviewer #3 (Public review):

      In the present study, the authors describe the development of new tools and imaging strategies to assess the concomitant development of excitatory and inhibitory synapses in dissociated neuron cultures. To this end, they generate fluorescently tagged constructs of excitatory and inhibitory synapse marker proteins using either conventional overexpression or CRISPR-based strategies. They then image these marker proteins over a timespan of 15 hours to assess synaptic dynamics at different developmental timepoints. Based on their data, they conclude that excitatory and inhibitory synapse development occur in concert to maintain a functional balance despite individual synapse turnover.

      Overall, this study addresses an interesting question, i.e., the interplay between the development of excitatory and inhibitory synapses, which has important implications, particularly for neurodevelopmental disorders in which the balance of excitation and inhibition is disrupted. The experiments are technically solid and well-executed, and the individual images are highly compelling.

      However, a number of aspects remain to be addressed in order for the study to support the claims made by the authors. First, the novelty aspect of the development of the fluorescently tagged synaptic proteins is unclear, since reporters of this nature are in routine use in many labs. Second, the analysis of the acquired images often seems incomplete, with only example images but no quantification shown, or the distinction between spatial and temporal dynamics appearing unclear. Third, given this incomplete analysis, the interpretations of the authors are not always convincingly supported by the data presented. In conclusion, substantial improvements are required to render the main messages of the study clear and compelling.

    1. eLife Assessment

      This paper presents valuable findings on the processing of sound mixtures in the auditory cortex of ferrets, a species widely used for studies of auditory processing. Using the convenient and relatively high-resolution method of functional ultrasound imaging, the authors provide convincing evidence that background noise invariance emerges across the auditory cortical processing hierarchy. They also draw informative comparisons with previously published fMRI data obtained in humans. This work will be of interest to researchers studying the auditory cortex and the neural mechanisms underlying auditory scene analysis and hearing in noise.

    2. Reviewer #1 (Public review):

      This is a very interesting paper addressing the hierarchical nature of the mammalian auditory system. The authors use an unconventional technique to assess brain responses -- functional ultrasound imaging (fUSI). This measures blood volume in cortex at a relatively high spatial resolution. They present dynamic and stationary sounds in isolation and together, and show that the effect of the stationary sounds (relative to the dynamic sounds) on blood volume measurements decreases as one ascends the auditory hierarchy. Since the dynamic/stationary nature of sounds is related to their perception as foreground/background sounds, this suggests that neurons in higher levels of the cortex may be increasingly invariant to background sounds.

      The study is interesting, well conducted and well written. In the revised manuscript, the authors have addressed all the points I raised in my review.

    3. Reviewer #2 (Public review):

      Summary:

      Noise invariance is an essential computation in sensory systems for stable perception across a wide range of contexts. In this paper, Landemard et al. perform functional ultrasound imaging across primary, secondary and tertiary auditory cortex in ferrets to uncover the mesoscale organization of background invariance in auditory cortex. Consistent with previous work, they find that background invariance increases throughout the cortical hierarchy. Importantly, they find that background invariance is largely explained by progressive changes in spectro-temporal tuning across cortical stations which are biased towards foreground sound features. To test if these results are broadly relevant, they then re-analyze human fMRI data and find that spectro-temporal tuning fails to explain background invariance in human auditory cortex.

      Strengths:

      (1) Novelty of approach: Though the authors have published on this technique previously, functional ultrasound imaging offers unprecedented temporal and spatial resolution in a species where large-scale calcium imaging is not possible and electrophysiological mapping would take weeks or months. Combining mesoscale imaging with a clever stimulus paradigm, they address a fundamental question in sensory coding.

      (2) Quantification and execution: the results are generally clear and well supported by statistical quantification.

      (3) Elegance of modeling: The spectrotemporal model presented here is explained clearly and most importantly, provides a compelling framework for understanding differences in background invariance across cortical areas.

      Comments on revised version:

      The authors have addressed all of my previous concerns and their publicly shared data is easy to view, this is a nice contribution to the field.

    4. Reviewer #3 (Public review):

      This paper investigates invariance to natural background noise in the auditory cortex of ferrets and humans. The authors first replicate, in ferrets, a finding from human neuroimaging showing that invariance to background noise increases along the cortical hierarchy (i.e. from primary to non-primary auditory cortex). Next, the authors ask whether this pattern of invariance could be explained by differences in tuning to low-level acoustic features across primary and non-primary regions. The authors conclude that this tuning can explain the spatial organization of background invariance in ferrets, but not in humans. The conclusions of the paper are well supported by the data.

      The paper is very straightforwardly written, with a generally clear presentation including well-designed and visually appealing figures. Not only does this paper provide an important replication in a non-human animal model commonly used in auditory neuroscience, but also it extends the original findings in three ways. First, the authors reveal a more fine-grained gradient of background invariance by showing that background invariance increases across primary, secondary and tertiary cortical regions. Second, the authors address a potential mechanism that might underlie this pattern of invariance by considering whether differences in tuning to frequency and spectrotemporal modulations across regions could account for the observed pattern of invariance. The spectrotemporal modulation encoding model used here is a well-established approach in auditory neuroscience and seems appropriate for exploring potential mechanisms underlying invariance in auditory cortex, particularly in ferrets. Third, the authors provide a more complete picture of invariance by additionally analyzing foreground invariance, a complementary measure not explored in the original study.

      Comments on author revisions:

      The authors have thoroughly addressed the concerns raised in my initial review.

    5. Author response:

      The following is the authors’ response to the original reviews.\

      Reviewer #1(Public review):

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness. We have now added this point to the discussion: 

      “Second, we used blood volume as a proxy for local neuronal activity. Thus, our signal ignores any heterogeneity that might exist at the level of local neuronal populations. However, our main findings are related to the large-scale organization of cortical responses and how they relate to those of humans. For this purpose, the functional spatial resolution of our signal, driven by the spatial resolution of neurovascular coupling, should be adapted. In addition, using hemodynamic signals provides a much better comparison with human fMRI data, where the same limitations are present.”

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. It seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we observed the contrary, with non-primary regions overrepresenting non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings. 

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we do show that tuning to temporal rates differs across regions and partly explains the differences in background invariance we observe. In this regard, we think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans relies on different computational mechanisms. We have added a sentence to clarify this: “The model included a range of realistic temporal rates and this axis was the most informative to discriminate foregrounds from backgrounds.”

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature (McWalter and McDermott, 2018; McWalter and McDermott, 2019), including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. We have clarified and justified this choice in the beginning of the Results section:

      “We used three types of stimuli: foregrounds, backgrounds, and combinations of those. We use those terms to refer to sounds differing in their stationarity, under the assumption that stationary sounds carry less information than non-stationary sounds, and are thus typically ignored.”

      We have also added a paragraph in the discussion to emphasize the limits of this definition:

      “First, this study defined foregrounds and backgrounds solely based on their acoustic stationarity, rather than perceptual judgments. This choice allowed us to isolate the contribution of acoustic factors in a simplified setting. Within this controlled framework, we show that acoustic features of foreground and background sounds drive their separation in the brain and the hierarchical extraction of foreground sound features.”

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, we have emphasized this limitation in addition to the limitation of our definition of foregrounds and backgrounds in the discussion: 

      “In addition, most of the sounds included in our study likely have more relevance for humans compared to ferrets (see table \ref{tbl1}). Despite including ferret vocalizations and environmental sounds that are more ecologically relevant for ferrets, it is not clear whether ferrets would behaviorally categorize foregrounds and backgrounds as humans do. Examining how ferrets naturally orient or respond to foreground and background sounds under more ecologically valid conditions, potentially with free exploration or spontaneous listening paradigms, could help address this issue.”

      Reviewer #2(Public review);

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex and note that this point would also be valid for the published human fMRI data. Nevertheless, even if small differences in vasculature were present, it is unlikely that they would affect our analyses and results, which are designed to be independent of local vascular density. First, we normalize the signal in each voxel using the silent periods, so that the absolute strength of the raw signal, or baseline blood volume in each voxel, is factored in our analysis. Second, we only focus on reliably responsive voxels in each region and do see comparable sound-evoked responses in all regions (Figure S2). Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Differences in noise, measured through test-retest reliability, can affect values of correlation, which is why we used a noise-correction procedure. After this procedure, invariance does not depend on test-retest, and differences across regions are still seen when matching for test-retest (new  Figure S7). Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results. We added this point in the Methods section when discussing the noise-correction:

      “After this correction, the differences we observed between brain regions were present regardless of voxels' test-retest reliability, or noise level (Figure S7). Thus, potential differences in vasculature across regions are unlikely to affect our results.”

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, aiming at illustrating the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vectors (size: number of sounds) per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per voxel. We finally average these matrices across all voxels. The presence of red squares with high correlations demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We modified the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds. We clarified these points in the discussion:

      “In addition, fMRI has a worse spatial resolution than fUSI (here, 2 vs. 0.1 mm voxels). However, this difference in resolution compensates for the difference in brain size between humans and ferrets. In our previous work, we showed that a large fraction of cortical responses to natural sounds could be predicted from one species to the other using these methods (Landemard et al., 2021).”

      Reviewer #3 (Public review):

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure  S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., subselecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. In ferrets, we found a significantly better prediction accuracy in VP (p=0.001, permutation test) and no differences between MEG and dPEG (p=0.89). In humans, prediction accuracy was slightly higher in primary compared to non-primary auditory cortex, but this effect was not significant (p=0.076). In both species, when matching prediction accuracy between regions, the gradients in invariance were preserved. We have added these analyses to the manuscript (Figure S5).

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion. We have run an additional prediction using only the sounds presented in isolation, which replicates our main results (new Figure S6). We have added this control to the manuscript:

      “Results were similar if the model was fit solely on isolated sounds, excluding mixtures from the training set (Figure S6).”

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent crossspecies differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. While we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. Our timescales are much slower and likely reflect responses post-adaptation, which might not be as true for Hamersky et al. We have added this point to the discussion, as well as a comment on the difference between ferrets and humans in foreground invariance in primary auditory cortex:

      “In ferrets, primary auditory cortex has been found to over-represent backgrounds in mixtures compared to foregrounds (Hamersky et al., 2025). In contrast, we found a slight, non-significant bias towards foregrounds in primary regions. This difference could be driven by a difference in timescales, as we looked at slower timescales in which adaptation might be more present, reducing the strength of background encoding. In humans, we found a much smaller gap between background and foreground invariance in primary auditory cortex, which was not predicted by the spectrotemporal model. Additional, more closely controlled experiments would be needed to confirm and understand this species difference.”

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, explain the relationship between background/foreground and stationarity/non-stationarity, and thus why stationary/nonstationary stimuli could be used to probe differences in background/foreground processing.

      We have added a sentence at the beginning of the results section to justify our choice (see public review).  

      (2) Avoid use of the background/foreground terminology in Results (and probably Methods).

      For consistency with previous literature, we decided to keep this terminology, though imperfect. We further justified our choice in the beginning of the Results section (see previous point).

      (3) In the Discussion, explain what the implications of the results are for background/foreground processing, and, importantly, highlight any caveats that result from stationarity not being a direct measure of background/foreground.

      We added a paragraph in the Discussion to highlight this point choice (see public review).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: Showing a silent period in the examples would help in understanding the fUS signal.

      In Figure 1D, we show the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds. Thus, it would not be very informative to show an equivalent plot for a silent period, as it would look flat by definition. However, we updated the layout and legend of Figure 1 to make it clearer and avoid confusion.

      (2) "Responses were not homogenous" - would make more sense to say something like "responses were not spatially distributed".

      We removed these words which were indeed not necessary: “We found that reliable soundevoked responses were confined to the central part of ventral gyrus of the auditory cortex.”

      (3) Figure 2D: The maps shown in Figure 2D are difficult to understand for the noninitiated in fUS. At a minimum, labels should be added to indicate A-P, M-L, D-V. I cannot see the white square in the primary figure. An additional graphic would be helpful here to understand the geometry of the measurement.

      We thank the reviewer for pointing out that reading these images is indeed an acquired skill. We added an annotated image of anatomy with indications of main features to guide the reader in Figure 1. We also added missing white squares. 

      (4) Figure 2F: Can the authors better justify why the summary statistic is shown for all three areas, but the individual data only compares primary vs. higher order?`

      We now show individual data for all three areas.

      (5) More methods information is needed to understand how recordings were stitched across days. Was any statistical modeling used to factor out the influence of day on overall response levels?

      We simply concatenated voxels recorded across different sessions and days. The slices were sampled randomly to avoid any systematic effect. Because different slices were sampled in different sessions, any spatial structure spanning several slices is unlikely to be artefactual. For instance, the map of average responses in Figure 2A shows a high level of continuity of spatial patterns across slices. This indicates that this pattern reflects a true underlying organization rather than session-specific noise. It also shows that the overall response levels are not affected by the day or recording session. We added a section in the Methods (“Combining different recordings”) to clarify this point:

      “The whole dataset consisted of multiple slices, each recorded in a different recording session. Slices to image on a given day were chosen at random to avoid any systematic bias. Responses were consistent across neighboring slices recorded on different sessions, as shown by the maps of average responses (Figure 2A, Figure S2) where any spatial continuity across different slices must reflect a true underlying signal in the absence of common noise.”

      Reviewer #3 (Recommendations for the authors):

      (1) Figures:

      The figures are generally very well done and visually appealing. However, I have a few suggestions and questions.

      a)  In Figure 1G, the delta CBV ranges from 0.5 to 1.5, although in subsequent figures (e.g., Figure 2D), the range is much larger (-15 to 45). Is it possible that the first figure is a proportion rather than a percentage, or is there some other explanation for the massive difference in scale? Not being very familiar with this measure, it was confusing.

      The same scale is used in both figures, the major difference being that in Figure 1D, we take the average over all voxels and sounds (for each category), which will include many nonresponsive voxels, and for responsive voxels, sounds that they do not respond a lot to. On the other hand, Figure 2D shows the response of a single, responsive voxel. Thus, the values it reaches for its preferred sounds (45%) are an extreme, which weighs only little in Figure 1D. We have changed the legend of Figure 1D to make this more explicit.

      b)  Similar to the first point, the strength of the correlations in the matrices of Figure 1E is very small (~ 0.05) compared to the test-retest reliabilities plotted in Figure 2B (~0.5). Again, I was confused by this large difference in scale.

      Two main factors explain the difference in values between Figure 1E and Figure 2B. First, in Figure 1B, each correlation is done on the average activity in a window of 0.3 s, opposed to 2.4 s in Figure 2B. More averaging leads to better SNR, which inevitably leads to higher testretest correlations. Second, in Figure 1B, the cross-correlation matrices are averaged across all responsive voxels without any criterion for reliability. On the other hand, Figure 2B show example voxels with good test-retest reliability. 

      c)  In Figure 2D, the example voxels are supposed to be shown in white. It appears that this example voxel is only shown for the non-primary voxel. Please be sure to add these voxels throughout the other panels and figures as well. 

      We fixed this mistake and added the example voxel in all panels.

      d)  Why do the invariance results (e.g., Figure 2F) for individual animals combine across dPEG and VP, while the overall results (across all animals) split things across all three regions? The results in Table 2 do, in fact, provide this data. Upon further examination of the data in Table 2, it seems like there is only a significant difference between background invariance between dPEG and VP for one of the two animals, and that this might be what drives the effect when pooling across all animals. This seems important to both show visually in the figure and to potentially discuss. There is still very clearly a difference between primary and non-primary, but whether there is a real difference between dPEG and VP seems more unclear.

      We added the values for single animals in the plot and highlighted this limitation in the text:

      “While background invariance was overall highest in VP, the differences within non-primary areas were more variable across animals (see table 2).”

      e)  Again, as in Figure 2F, the cross symbols seem like a bad choice as markers since the vertical components of the cross are suggestive of the error of the measurement. However, no error is actually plotted in these figures. I recommend using a different marker and including some measure of error in the invariance plots.

      We replaced the crosses with circles to avoid confusion. The measure of error is provided by the representation of values for single animals.

      f) The caption for Figure 4C states that each line corresponds to one animal, but does not precisely state what this line represents. Is this the median or something?

      Each line indeed represents the median across voxels for one animal. We added this information to the legend.

      g)  In Figure 5, the captions for panels D and E are swapped.

      This has now been corrected.

      (2) Discussion:

      (a) In the paragraph on methodological differences, it mentions that the fMRI voxel size is around 2 mm. This may be true in general, but given the comparison to Kell & McDermott 2019, the voxel size should reflect that used in their study (1 mm).

      The reviewer might refer to this sentence from the methods of Kell et al., 2019: “T1weighted anatomical images were collected in each participant (1-mm isotropic voxels) for alignment and cortical surface reconstruction.” However, this does not correspond to the resolution of the functional data, which is 2 mm, as mentioned a bit further in the Methods:  “In-plane resolution was 2 × 2 mm (96 × 96 matrix), and slice thickness was 2.8 mm with a 10% gap, yielding an effective voxel size of 2 × 2 × 3.08 mm.”

      (b) In the next paragraph on the control of attention, it mentions that attentional differences could play a role. However, in Kell & McDermott 2019, they manipulated attention (attend visual versus attend auditory) and found that it did not substantially affect the observed pattern invariance. I suppose it could potentially affect the degree to which an encoding model could explain the invariance. This seems important, and given that the data was already collected, it could be worth it to analyze that data.

      As the reviewer points out, Kell et al. 2019 ran an additional experiment in which they manipulated auditory vs. visual attention. However, the auditory task was just based on loudness and ensured that the participants were awake and paying attention to the stimuli, but not specifically to the foreground or background. This type of attention did not lead to changes in the observed patterns of invariance, which might have been the case for selective attention to backgrounds or foregrounds in the mixture. Given that these manipulations were not done in the ferret experiments, we chose to not include the analysis of this dataset in the scope of this paper. However, future work investigating that topic further would indeed be of interest.

      (c) The mention of "a convolutional neural network trained to recognize digits in noise" should make more obvious that this is visual recognition rather than auditory recognition.

      We clarified this sentence to make clear that the recognition is visual and not auditory: “For instance, in a convolutional neural network trained to visually recognize digits in different types of noise, when local feedback is implemented, early layers encode noise properties, while later layers represent clean signal.”

      (d) Finally, one explanation of the results in the discussion is that "primary auditory areas could be recruited to maintain background representations, enabling downstream cortical regions to use these representations to specifically suppress background information and enhance foreground representations." This "background-related information" being used to "facilitate further extraction of foregrounds" is similar to what is argued in Hicks & McDermott PNAS 2024.

      We thank the reviewer for suggesting this relevant reference and added it in this paragraph of the discussion.

      (3) Methods:

      In the "Cross-correlation matrices" section, it mentions that time-averaged responses from 2.4 to 4.8 s were used. It would be helpful to provide an explanation of why this particular time window was used. Additionally, I wondered whether one could look at adaptation type effects (e.g., that of Khalighinejad et al., 2019) or whether fUSI does not offer this kind of temporal precision?

      The effects shown in Khalighinejad et al., 2019, are indeed likely too fast to be observed with our methods. However, there are still dynamics in the fUSI signal and in its invariance (Figure S1). Each individual combination of foreground and background is presented for 4.8 s (Figure 1B). Therefore, we chose the range 2.4-4.8 s as the biggest window we could use (to improve SNR) while minimizing contamination from the previous or next sound (indeed, blood volume typically lags neuronal activity by 1.5-2 s). We added this precision to the methods.

      In the "Human analyses" section, it is very unclear which set of data was used from Kell & McDermott 2019. For example, that paper contains 4 different experiments, none of which has 7 subjects. Upon closer reading, it seems that only 7 of the 11 participants from Experiment 1 also heard the background sounds in isolation (thus enabling the foreground invariance analyses). However, they stated that there were only 3 female participants in that experiment, while you state that you used data from 7 females. It would be helpful to double-check this and to more clearly state exactly which participants (i.e., from which experiment) were used and why (e.g., why not use data from Experiment 4 in the visual task/attention condition?).

      We added a sentence to clarify which datasets were used: “Specifically, we used data from Experiment 1 which provided the closest match to our experimental conditions, and only considered the last 7 subjects that heard both the foregrounds and the backgrounds in isolation, in addition to the mixtures.” 

      It was a mistake to mention that it was all female, as the original dataset has 3 females and 8 males, of which we used 7 without any indication of their sex. Thus, we removed this mention from the text.

      In the "Statistical testing" section, why were some tests done with 1000 permutations/shuffles while others were done with 2000?

      We homogenized and used 1000 permutations/shuffles for all statistical tests.

      (4) Miscellany:

      (a) The Hamersky et al. 2023 preprint has recently been published (referenced in the public review), and so you could consider updating the reference.

      This reference has now been updated.

      (b) There are a few borderline statistical tests that could use a bit more nuance. For example (on page 4), "In primary auditory cortex (MEG), there was no significant difference between values of foreground invariance and background invariance (p = 0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times)." This test is quite close to being significant, and this might be acknowledged.

      We emphasized the trend to nuance the interpretation of these results: “In primary auditory cortex (MEG), foreground invariance was slightly lower than background invariance, although this difference was not significant (p=0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times).”

      (5) Potential typos:

      (a)   Should the title be "natural sound mixtures" instead of "natural sounds mixtures"?

      (b) The caption for Figure 1 says "We imaged the whole auditory through successive slices across several days." I believe this should the "the whole auditory [cortex]." c) In the first paragraph of the discussion, there is a sentence ending in "...are segregated in hemody-namic signal." I believe this should be "hemody-namic signal."

      These errors are now all corrected.

    1. eLife Assessment

      This study presents experiments suggesting intriguing mesoscale reorganization of functional connectivity across distributed cortical and subcortical circuits during learning. The approach is technically impressive and the results are potentially of valuable significance. However, in its current form, the strength of evidence is incomplete. More in-depth analyses and the acquisition of data from additional animals in the primary experiment could bolster these findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to address an important and timely question: how does the mesoscale architecture of cortical and subcortical circuits reorganize during sensorimotor learning? By using high-density, chronically implanted ultra-flexible electrode arrays, the authors track spiking activity across ten brain regions as mice learn a visual Go/No-Go task. The results indicate that learning leads to more sequential and temporally compressed patterns of activity during correct rejection trials, alongside changes in functional connectivity ranks that reflect shifts in the relative influence of visual, frontal, and motor areas throughout learning. The emergence of a more task-focused subnetwork is accompanied by broader and faster propagation of stimulus information across recorded regions.

      Strengths:

      A clear strength of this work is its recording approach. The combination of stable, high-throughput multi-region recordings over extended periods represents a significant advance for capturing learning-related network dynamics at the mesoscale. The conceptual framework is well motivated, building on prior evidence that decision-relevant signals are widely distributed across the brain. The analysis approach, combining functional connectivity rankings with information encoding metrics is well motivated but needs refinement. These results provide some valuable evidence of how learning can refine both the temporal precision and the structure of interregional communication, offering new insights into circuit reconfiguration during learning.

      Weaknesses:

      The technical approach is strong and the conceptual framing is compelling, but several aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results. The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state. Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled. The optogenetic experiments, while intended to test the functional relevance of rank-increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret. Details on spike sorting are limited.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. measure from 10 cortical and subcortical brain as mice learn a go/no-go visual discrimination task. They found that during learning, there is a reshaping of inter-areal connections, in which a visual-frontal subnetwork emerges as mice gain expertise. Also visual stimuli decoding became more widespread post-learning. They also perform silencing experiments and find that OFC and V2M are important for the learning process. The conclusion is that learning evoked a brain-wide dynamic interplay between different brain areas that together may promote learning.

      Strengths:

      The manuscript is written well and the logic is rather clear. I found the study interesting and of interest to the field. The recording method is innovative and requires exceptional skills to perform. The outcomes of the study are significant, highlighting that learning evokes a widespread and dynamics modulation between different brain areas, in which specific task-related subnetworks emerge.

      Weaknesses:

      I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis, they minimize their analysis to 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case, all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Figure S4, but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      (3) Most of the figures are over-detailed, and it is hard to understand the take-home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially Figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio map is enough, and the rest could be bumped to the Supplementary section, if at all. In general, the figure in several cases do not convey the main take home messages. See more details below.

      (4) The analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between the output and input analysis? Also, the time period seems redundant sometimes. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist.

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript " Dynamics of mesoscale brain network during decision-making learning revealed by chronic, large-scale single-unit recording", Wang et al investigated mesoscale network reorganization during visual stimulus discrimination learning in mice using chronic, large-scale single-unit recordings across 10 cortical/subcortical regions. During learning, mice improved task performance mainly by suppressing licking on no-go trials. The authors found that learning induced restructuring of functional connectivity, with visual (V1, V2M) and frontal (OFC, M2) regions forming a task-relevant subnetwork during the acquisition of correct No-Go (CR) trials.

      Learning also compressed sequential neural activation and broadened stimulus encoding across regions. In addition, a region's network connectivity rank correlated with its timing of peak visual stimulus encoding.

      Optogenetic inhibition of orbitofrontal cortex (OFC) and high order visual cortex (V2M) impaired learning, validating its role in learning. The work highlights how mesoscale networks underwent dynamic structuring during learning.

      Strengths:

      The use of ultra-flexible microelectrode arrays (uFINE-M) for chronic, large-scale recordings across 10 cortical/subcortical regions in behaving mice represents a significant methodological advancement. The ability to track individual units over weeks across multiple brain areas will provide a rare opportunity to study mesoscale network plasticity.

      While limited in scope, optogenetic inhibition of OFC and V2M directly ties connectivity rank changes to behavioral performance, adding causal depth to correlational observations.

      Weaknesses:

      The weakness is also related to the strength provided by the method. It is demonstrated in the original method that this approach in principle can track individual units for four months (Luan et al, 2017). The authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording across multiple days during learning. Many studies have achieved acute recording across learning using similar tasks. These studies have recorded units from a few brain areas or even across brain-wide areas.

      Another weakness is that major results are based on analyses of functional connectivity that is calculated using the cross-correlation score of spiking activity (TSPE algorithm). Functional connection strengthen across areas is then ranked 1-10 based on relative strength. Without ground truth data, it is hard to judge the underlying caveats. I'd strongly advise the authors to use complementary methods to verify the functional connectivity and to evaluate the mesoscale change in subnetworks. Perhaps the authors can use one key information of anatomy, i.e. the cortex projects to the striatum, while the striatum does not directly affect other brain structures recorded in this manuscript.

    1. eLife Assessment

      This valuable study characterises receptors for calcitonin-related peptides from a deuterostomian animal, the echinoderm Apostichopus japonicus, by a combination of heterologous expression, pharmacological experiments, and the quantification of gene-expression levels. The authors provide solid evidence for a functional calcitonin-related peptide system in the sea cucumber, but further work will be needed to confirm the proposed phylogenetic relationships and physiological functions of PDF receptor system in this species. This work should be of interest to scientists studying the signaling pathways, functions, and evolution of neuropeptides, and could be of relevance to improving the culture conditions of this economically key species.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript characterizes a functional peptidergic system in the echinoderm Apostichopus japonicus that is related to the widely conserved family of calcitonin/diuretic hormone 31 (CT/DH31) peptides in bilaterian animals. In vitro analysis of receptor-ligand interactions, using multiple receptor activation assays, identifies three cognate receptors for two CT-like peptides in the sea cucumber, which stimulate cAMP, calcium, and ERK signaling. Only one of these receptors clusters within the family of calcitonin and calcitonin-like receptors (CTR/CLR) in bilaterian animals, whereas two other receptors cluster with invertebrate pigment dispersing factor receptors (PDFRs). In addition, this study sheds light on the expression and in vivo functions of CT-like peptides in A. japonicus, by quantitative real-time PCR, immunohistochemistry, pharmacological experiments on body wall muscle and intestine preparations, and peptide injection and RNAi knockdown experiments. This reveals a conserved function of CT-like peptides as muscle relaxants and growth regulators in A. japonicus.

      Strengths:

      This work combines both in vitro and in vivo functional assays to identify a CT-like peptidergic system in an economically relevant echinoderm species, the sea cucumber A. japonicus. A major strength of the study is that it identifies three G protein-coupled receptors for AjCT-like peptides, one related to the CTR/CLR family and two related to the PDFR family. A similar finding was previously reported for the CT-related peptide DH31 in Drosophila melanogaster that activates both CT-type and PDF-type receptors. Here, the authors expand this observation to a deuterostomian animal, which suggests that receptor promiscuity is a more general feature of the CT/DH31 peptide family and that CT/DH31-like peptides may activate both CT-type and PDF-type receptors in other animals as well.

      Besides the identification of receptor-ligand pairs, the downstream signaling pathways of AjCT receptors have been characterized, revealing broad and in some cases receptor-specific effects on cAMP, calcium, and ERK signaling.

      Functional characterization of the CT-related peptide system in heterologous cells is complemented with ex vivo and in vivo experiments. First, peptide injection and RNAi knockdown experiments establish transcriptional regulation of all three identified receptors in response to changing AjCT peptide levels. Second, ex vivo experiments reveal a conserved role for the two CT-like peptides as muscle relaxants, which have differential effects on body wall muscle and intestine preparations. Finally, peptide injection and knockdown experiments uncover a growth-promoting role for one CT-like peptide (AjCT2). Injection of AjCT2 at high concentration, or long-term knockdown of the AjCT precursor, affects diverse growth-related parameters including weight gain rate, specific growth rate, and transcript levels of growth-regulating transcription factors. The authors also reveal a growth-promoting function for the PDFR-like receptor AjPDFR2, suggesting that this receptor mediates the effects of AjCT2 on growth.

      Weaknesses:

      The authors present a more detailed phylogenetic analysis in the revised version, including a larger number of species. But some clusters in the analysis are not well supported because they have only low bootstrap values. This makes it difficult to interpret the clustering in some parts of the tree.

      Expression of CT-like peptides was investigated both at transcript and protein level, but insight into the expression of the three peptide receptors is limited. This makes it difficult to understand the mechanism underlying the (different) functions of the two CT-like peptides in vivo. The authors identify differences in signal transduction cascades activated by each peptide, which might underpin distinct functions, but these differences were established only in heterologous cells.

      The authors show overlapping phenotypes for a long-term knockdown of the AjCT precursor and the AjPDFR2 receptor, suggesting that the growth-regulating functions of AjCT2 are mediated by this receptor pathway. However, it remains unclear whether this mechanism underpins the growth-regulating function of AjCT2, until further in vivo evidence for this ligand-receptor interaction is presented. For example, the authors could investigate whether knockdown of AjPDFR2 attenuates the effects of AjCT2 peptide injection. In addition, a functional PDF system in this species remains uncharacterized, and a potential role of PDF-like peptides in growth regulation has not yet been investigated in A. japonicus. Therefore, it also remains unclear whether the ability of CT-like peptides to activate PDFRs is an evolutionary ancient property of this peptide family or whether this is an example of convergent evolution in some protostomian (Drosophila) and deuterostomian (sea cucumber) species.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that A. japonicus calcitonins (AjCT1 and AjCT2) activate not only the calcitonin/calcitonin-like receptor, but they also activate the two "PDF receptors", ex vivo. They also explore secondary messenger pathways that are recruited following receptor activation. They determine the source of CT1 and CT2 using qPCR and in situ hybridization and finally test the effects of these peptides on tissue contractions, feeding and growth. This study provides solid evidence that CT1 and CT2 act as ligands for calcitonin receptors; however, evidence supporting cross-talk between CT peptides and "PDF receptors" is weak.

      Strengths:

      This is the first study to report pharmacological characterization of CT receptors in an echinoderm. Multiple lines of evidence in cell culture (receptor internalization and secondary messenger pathways) support this conclusion.

      Weaknesses:

      The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionary ancient since similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. The bootstrap support is quite low in a lot of instances, especially for the deuterostomian and protostomian PDFR clades which is below 30. With such low support, it is unclear if the clade comprising deuterostomian "PDFR" is in fact PDFRs and not another receptor type whose endogenous ligand (besides CT) remains to be discovered.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1). Analysis of transcript expression is limited to the CT-peptide encoding gene, while no gene expression analysis was attempted for the three identified receptors. Differences in the activation of downstream signaling pathways between the three receptors are also questionable due to unclarities in the statistical analysis and variation in the control and experimental data in heterologous assays. Together, this makes it difficult to propose a mechanism underlying differences in the functions of the two CT-like peptides in muscle control and growth regulation.

      We appreciate the reviewer's rigorous critique. The manuscript has been comprehensively revised as follows:

      (1) For the expression analysis of the three identified receptors, the updated results are presented in Figure 5, with the detailed descriptions in Results section 2.4 (line 287-290) and Materials and Methods section 4.5 (line 767).

      (2) For the statistical tests and methodological clarity, statistical tests were indeed performed for all experiments. However, we acknowledge that the original labeling methods required enhanced methodological clarity, and we apologize for any confusion caused. All figures have been revised to improve the visibility of differences, and statistical test information has been added to both the figure legends and the Materials and methods section “4.10 Statistical Analysis” (line 900-910).

      (3) For the variation in the control and experimental data, the minor observed variations in control conditions across experiments primarily arise from two methodological factors: 1) Each experimental set used cells transfected with distinct receptor subtypes (e.g., AjPDFR1 vs. AjPDFR2), inherently introducing baseline variability due to differential receptor expression profiles. 2) Independent cell culture batches were employed for replicate experiments to ensure biological reproducibility.  Importantly, these minor variations ‌did not compromise‌ the statistical significance of downstream signaling differences (p < 0.01 for all comparative analyses). Therefore, differences in the activation of downstream signaling pathways between the three receptors are reliable.

      (2) The authors also suggest a putative orexigenic role for the CT-like peptidergic system in feeding behavior. This effect is not well supported by the experimental data provided, as no detailed analysis of feeding behavior was carried out (only indirect measurements were performed that could be influenced by other peptidergic effects, such as on muscle relaxation) and no statistically significant differences were reported in these assays.

      Thank you for the reviewer’s valuable comments. Our revised manuscript now includes the following multidimensional analyses to strengthen evidence of the orexigenic role of AjCT2: Firstly, in sea cucumbers, the mass of remaining bait is a common indicator of feeding condition. After long-term AjCT2 injection, this value was significantly decreased in comparison with control group during phase V (Figure 8A-figure supplement 1), which indicates that AjCT2 promotes feeding in A. japonicus. Correspondingly, in long-term loss-of-function experiments (newly added in the revised manuscript), the remaining bait in the siAjCTP1/2-1 group was significantly increased in comparison with siNC group form phase II to IV (Figure 10B). The detailed descriptions of these supplementary experiments have been added to‌ Results Section 2.6 (lines 390-396) and Materials and Methods Section 4.9 (line 879-888).

      Secondly, after 24 days of continuous injections of siAjCTP1/2-1, we monitored the feeding behavior of these sea cucumbers over three consecutive days. Each day, we removed residual bait and feces, then repositioned fresh food at the tank center.‌ We calculated the aggregation percentage (AP) of sea cucumbers around the food during the feeding peak (2:00-4:00) each day, which is the most reliable indicator of feeding behavior in this species‌. The results showed that the AP in siAjCTP1/2-1 group was significantly lower than that in control group. Post-dissection observations revealed reduced intestinal food content and significant intestinal degeneration in the siAjCTP1/2-1 group (The figure has been added below). These results indicate that long-term functional loss of AjCT2 reduces food intake and influences the feeding behavior of A. japonicus.

      In response to the comment regarding “No statistically significant differences were reported in these assays”, we have modified the figures to clearly visualize the differences and added statistical test details in both the figure legends and the Materials and methodssection “4.10 Statistical analysis” (lines 900–910).

      Author response image 1.

      The feeding behavior of A. japonicus after long-term loss-of-function of AjCT2. (A) A record of feeding behavior. The red arrow refers to the food and the red box represents the feeding area. The numbers in the figure represent individuals entering into the feeding area. (B) The aggregation percentage (AP) of sea cucumbers around the food during the feeding peak (2:00-4:00) (n=3 days). (C) The degenerated intestine of sea cucumber after 24 days of siAjCTP1/2-1 injection. Data in the graph represent the mean ± standard deviation. *Significant differences between groups (p < 0.05). Control: siNC injection group; CT-SiRNA: siAjCTP1/2 injection group.<br />

      (3) Overall, details regarding statistical analyses are not (clearly) specified in the manuscript, and there are several instances where statements are not supported by literature evidence.

      Thank you for the reviewer’s comments. Again, we sincerely apologize for the confusion caused. To clarify, statistical tests were performed for all experiments. However, the original labeling may have been somewhat messy. We have revised all figures to enhance the visibility of differences and provided detailed statistical test information in both the figure legends and the Materials and Methods section titled “4.10 Statistical Analysis” (lines 900–910). Additionally, we have supplemented the revised manuscript with further literature evidence to support our statements: (1) citation to Furuya et al. (2000), Johnson et al. (2005), Jékely (2013) and Mirabeau et al. (2013) have been added to clarify the foundation studies on DH31 and DH31 receptors in invertebrates (line 73-74); (2) Conzelmann et al. (2013) and Furuya et al. (2000) were cited to validate the present of two different types of CT-related peptides in protostomes: CT-type peptides (with an N-terminal disulphide bridge) and DH31-type peptides (lacking this feature) (line 78-79); (3) Johnson et al. (2005) was referenced to support the dual ligand-receptor interactions of DH31 in Drosophila, specifically its binding to both CG17415 (a CTR/CLR-related protein) and CG13758 (the PDF receptor)  (line 94); (4) Johnson et al. (2005) and Goda et al. (2019) were cited to reinforce the functional significance of dual DH31 receptor pathways in Drosophila, as extensively studied in prior research (line 95-97).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionarily ancient since a similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported for several reasons. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. This clade is sister to the clade comprising CT receptors. This phylogenetic analysis suffers from several issues. Firstly, the phylogenies lack bootstrap support. Secondly, the resolution of the phylogeny is poor because representative members from diverse phyla have not been included. For instance, insect or other protostomian PDF receptors have not been included so how can the authors distinguish between "PDF" receptors or another group of CT receptors? Thirdly, no in vivo evidence has been presented to support that CT can activate "PDF" receptors in vivo.

      We thank the reviewers for their constructive comments. As suggested, ‌we expanded our taxon sampling to include more representative members across diverse phyla‌ and reanalyzed the phylogenetic relationships (including bootstrap tests) in Figure 1C. The revised analysis revealed two distinct clades‌: one containing CTR/CLR-type receptors and the other PDF-type receptors. Specifically, AjCTR clustered within the CTR/CLR-type receptor group, while AjPDFR1 and AjPDFR2 were placed in the PDF-type receptor clade. The full species names for all taxa were provided in the Supplementary Table 2.

      To provide in vivo evidence supporting CT-mediated activation of "PDF" receptors‌, we conducted the following experiments: Firstly, we confirmed that AjPDFR1 and AjPDFR2 were the functional receptors of AjCT1 and AjCT2 (Figure 2, 3 and 4). Secondly, injection of AjCT2 and siAjCTP1/2-1 in vivo induced corresponding changes in AjPDFR1 and AjPDFR2 expression levels in the intestine (Figure 8C, 9A, 9B and 9C).

      (2) The source of CT which mediates the effects on longitudinal muscles and intestine is unclear. Is it autocrine or paracrine signaling by CT from the same tissue or is it long-range hormonal signaling?

      Thank you for this feedback. We have now analysed CT-type neuropeptide expression in A. japonicus using immunohistochemistry with the antiserum to the A. rubens CT-type peptde ArCT, which has previously been shown to cross-react with CT-type neuropeptides in other echinoderms (Aleotti et al., 2022). We have added related descriptions in the following sections: Results (section 2.4, line 299-336), Discussion (section 3.3, line 545-554) and Materials and methods (section 4.6, line 785-817). Consistent with this previous finding, the ArCT antiserum labelled neuronal cells and fibers in the central and peripheral nervous system and in the digestive system of A. japonicus (Figure 6). The specificity of immunostaining was confirmed by performing pre-absorption tests with the ArCT antigen peptide (Figure 6-figure supplement 1). The detection of immunostaining in the innervation of the intestine is consistent with PCR results and the relaxing effect of AjCT2 on intestine preparations. Interestingly, no immunostaining was observed in longitudinal muscle, which is inconsistent with the detection of AjCT1/2 transcripts in this tissue. This may reflect differences in the sensitivity of the methods employed to detect transcripts (PCR) and mature peptide (immunohistochemistry). The absence of ArCT-like immunoreactivity in the longitudinal muscles suggests that AjCT1 and AjCT2 may exert relaxing effects on this tissue in vivo via hormonal signaling mechanisms. However, because AjCT1/2 expression in the longitudinal muscles may be below the detection threshold of the ArCT antibodies, we can’t rule out the possibility that AjCT1/2 are released within the longitudinal muscles physiologically.   

      (3) Pharmacology experiments showing the effects of CT1 and CT2 on ACh-induced contractions were performed. Sample traces have been provided but no traces with ACh alone have been included. How long do ACh-induced contractions persist? These controls are necessary to differentiate between the eventual decay of ACh effects and relaxation induced by CT1 and CT2. The traces also do not reflect the results portrayed in dose-response curves. For instance, in Figure 6B, maximum relaxation is reported for 10-6M. Yet, the trace hardly shows any difference before and after the addition of 10-6M peptide. The maximum effect in the trace appears to be after the addition of 10-8M peptide.

      Thank you for the reviewer’s comments. ‌As requested, we have included representative traces of ACh-induced contraction of longitudinal muscle and intestinal preparations (Figure 7—figure supplement 1B and 1C). Notably, the positive control (ACh) maintained contraction effects for at least 15 minutes‌, consistent with its known pharmacological properties. Regarding Figure 7B (previous Figure 6B), ‌the trace illustrates the cumulative effects of successive neuropeptide treatments at increasing concentrations‌. A gradual reduction in response amplitude was observed at the highest peptide concentration, ‌likely reflecting receptor desensitization‌, a phenomenon previously reported for neuropeptide Y and oxytocin (Tsurumaki et al., 2003; Arrowsmith and Wray, 2014). These results are now explicitly described in the Results Section 2.5 (lines 340-345 and 348-352) and discussed in Section 3.3 (lines 569-574). In response to the reviewer’s suggestion‌, we further tested the pharmacological effects of AjCT2 at 10⁻⁶ M. ‌As shown in Figure 7—figure supplement 1A, this concentration induced maximal relaxation‌, confirming its dose-dependent efficacy.

      (4) I am unsure how differences in wet mass indicate feeding and growth differences since no justification has been provided. Couldn't wet mass also be influenced by differences in osmotic balance, a key function of calcitonin-like peptides in protostomian invertebrates? The statistical comparisons have not been included in Figure 7B.

      We appreciate the reviewer's insightful comments. We fully concur that wet mass constitutes an inadequate indicator for evaluating feeding and growth variations. Consequently, we reassessed A. japonicus growth parameters using two established metrics: weight gain rate (WGR) and specific growth rate (SGR), to delineate differences between experimental and control groups. Notably, the high-concentration AjCT2 injection group exhibited statistically significant increases in both WGR and SGR relative to controls (Figure 8A). This demonstrates a putative physiological role of AjCT2 signaling in enhancing feeding efficiency and growth performance in A. japonicus. Detailed methodologies are provided in the Materials and methods Section 4.8 (lines 847-851), with corresponding results presented in the Results Section 2.6 (lines 370-375). Besides, Cong et al., (2024) reported holotocin-induced osmoregulatory function in A. japonicus, manifested by significant wet weight elevation and body bloating. However, our AjCT2 intervention showed no such phenotypic alterations, suggesting that AjCT2 likely does not participate in osmotic balance regulation, at least under these experimental conditions. Crucially, the observed WGR and SGR enhancements following AjCT2 administration was not caused by osmoregulatory effects.

      (5) While the authors succeeded in knocking down CT, the physiological effects of reduced CT signaling were not examined.

      Thank you for the reviewer’s comment. We have supplemented the experiments to investigate the physiological effects of long-term reduced CT signaling following the reviewer’s suggestions, including measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, and testing the expression levels of three growth factors (AjMegf6, AjGDF-8 and AjIgf) to further assess AjCT2’s role in feeding and growth. The results demonstrated that weight gain rate and specific growth rate in the siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A). Correspondingly, except in phase I, the siAjCTP1/2-1 group exhibited a significant increase in remaining bait and a decrease in excrement during phases II-VI (Figure 10B). Furthermore, the growth inhibitory factor AjGDF-8 was significantly up-regulated and the growth promoting factor AjMegf6 was significantly down-regulated in siAjCTP1/2-1 group (Figure 10C). These findings further support the potential physiological role of AjCT2 signaling in promoting feeding and growth in A. japonicus. The added results are presented in Figure 10, with related descriptions in Section 2.6 (Results, lines 390-396), Section 3.4 (Discussion, line 597-603) and Section 4.9 (Materials and Methods, lines 879-888).

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract states that loss-of-function tests (RNAi knockdown) reveal a potential physiological role for AjCT2 signaling in promoting feeding and growth in A. japonicus. However, RNAi knockdown was only followed by analysis of transcript expression of CT-like receptors and not by the assessment of feeding or growth.

      Thank you for this helpful feedback. In the revised manuscript, we have supplemented the experiments to investigate the physiological effects of long-term reduced CT signaling, as suggested by the reviewer. These include measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, and testing the expression levels of the three growth factors (AjMegf6, AjGDF-8 and AjIgf) to further assess the function of AjCT2 on feeding and growth in A. japonicus. The results are as follows:

      (1) The weight gain rate and specific growth rate in the siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A).

      (2) Correspondingly, except for the phase I, the siAjCTP1/2-1 group had significantly increased remaining bait and decreased excrement during phases II-VI (Figure 10B).

      (3) The growth inhibitory factor AjGDF-8 was significantly up-regulated, while the growth promoting factor AjMegf6 was significantly down-regulated in the siAjCTP1/2-1 group (Figure 10C).

      These findings further support the potential physiological role of AjCT2 signaling in promoting feeding and growth in A. japonicus. We have incorporated these results into ‌Figure 10‌ and added related descriptions in the following sections: Results (section 2.6, line 390-396), Discussion (section 3.4, line 597-603) and Materials and methods (section 4.9, line 879-888).

      Regarding the original statement in the abstract “Furthermore, in vivo pharmacological experiments and loss-of-function tests revealed a potential physiological role for AjCT2 signaling in promoting feeding and growth in A. japonicus.” This sentence effectively summarizes our findings. Therefore, we have retained it in the revised manuscript while supplementing the missing experimental details as requested.

      (2) Information on the statistical tests that were performed is lacking for most experiments. It is recommended to include this information in the figure legends, in addition to the methods section. Details on the phylogenetic analysis (parameters and statistics used) and calculation of half maximal effective concentrations (calculation methods and confidence intervals) also need to be included in the manuscript.

      Thank you for this constructive feedback. As the reviewer suggested, statistical test information‌ has been incorporated into both the figure legends and the “4.10 Statistical Analysis” subsection of the Materials and methods (lines 900-910). Specifically:

      (1)Phylogenetic analysis details‌ (parameters and statistical approaches) are now provided in the Materials and methods section 4.2 (line 675-682);

      (2) Bootstrap test results‌ supporting the phylogenetic trees have been added to Figure 1B and 1C‌;

      (3)Half-maximal effective concentration (EC₅₀) calculations‌, including methodologies and confidence intervals, are documented in both the Figure 2B legend and the “4.10 Statistical Analysis” section (lines 900-910)‌‌.

      (3) In some figures (e.g. Figure 5A, 7A), the n number indicated does not match the number of data points shown in the figure panel. It is not clear what n represents here. In Figure 6B, an x-axis label is missing. In some figure legends (e.g. Figure 4 - Figure Supplement 1), the error bars and significance levels are not defined.

      We apologize for this error; we have corrected all quantity errors related to "n" in the manuscript’ figure legends. And also, the x-axis label was added in Figure 7B (previous Figure 6B), error bars and significance levels were defined in all figure legends clearly

      (4) It would be useful to explain what the difference is between the Cre and SRE luciferase assay and why these two assays were used to study receptor-activated signaling cascades. The source of the synthetic peptides is mentioned, but it is recommended to also state the purity of the synthetic peptides.

      Thank you for the valuable comments. As stated in the introduction (line 66-69)- “binding of CT to CTR in the absence of RAMPs can activate signaling via several downstream pathways, including cAMP accumulation, Ca<sup>2+</sup> mobilization, and ERK activation.” Based on this established mechanism, we selected ‌cAMP and Ca²⁺ signaling pathways‌ as biomarkers for studying receptor-activated cascades, with the following experimental rationale: CRE-Luc Reporter System functions as a cAMP response element detector and SRE-Luc Reporter System serves as an intracellular Ca²⁺ level indicator. In CRE-Luc detection, when the receptor is activated by a ligand, it couples with Gαs protein to activate the cAMP/PKA signaling pathway. The accumulation of cAMP can lead to the phosphorylation of PKA, and then enhance the transcription of CRE-containing genes. Therefore, significant increase in CRE-Luc activity directly correlates with cAMP accumulation. Similarly, SRE-Luc activity reflects dynamic changes in intracellular Ca<sup>2+</sup> levels. We have added the explanation of this part in the materials and methods section 4.4 (line 715-721). The purity of the synthetic peptides was >95%, and we have also added this information in section 4.4 (line 715) according to the reviewer’s suggestion.

      (5) In Figure 3B, it is difficult to see receptor internalization in response to the application of synthetic CT-like peptides, and a control condition (without peptide application) is lacking.

      Thank you for the reviewer’s comment. The control condition (without peptide application) was added in Figure 3-figure supplement 1, which shows the localization of pEGFP-N1/receptors in the cell membrane. Upon stimulation with synthetic CT-like peptides (‌Materials and methods section 2.3‌), the receptors exhibit clear internalization into the cytoplasm, as visualized in ‌Figure 3B‌ through comparative analysis.

      (6) Differences in the activation of downstream signaling cascades between the three receptors are questionable because there is substantial variation in the experimental data and control conditions in different experiments (for example, in Figures 3A and 4A). To better represent this variation, it is recommended to plot individual data points onto the bar graphs in all figures and to nuance the interpretation of putative differences in downstream signaling of different receptors. Differences in the physiological roles of CT-like peptides may be explained by various mechanisms, including differences in peptide/receptor expression or in the potency of peptides to activate different receptors in vivo. It would be useful to elaborate on these different explanations in the discussion.

      We appreciate the reviewer's critical assessment. The observed variations in control conditions across experiments (e.g., Figures 3A & 4A) primarily arise from two methodological factors: ① Each experimental set used cells transfected with distinct receptor subtypes (e.g., AjPDFR1 vs. AjPDFR2), inherently introducing baseline variability due to differential receptor expression profiles. ② Independent cell culture batches were employed for replicate experiments to ensure biological reproducibility.  Importantly, these minor variations ‌did not compromise‌ the statistical significance of downstream signaling differences (p < 0.01 for all comparative analyses). And according to the reviewer’s suggestion, we have plotted individual data points onto the bar graphs in all figures.

      And also, according to the reviewer’s suggestion, we have expanded the discussion on receptor-specific signaling cascades in Section 3.4 (lines 589-609). Key findings include: In vivo pharmacological assays demonstrated that ‌only high concentrations of AjCT2 significantly enhanced feeding and growth rates in A. japonicus‌. In contrast, neither a low concentration of AjCT2 nor any concentration of AjCT1 (low or high) induced detectable effects. Furthermore, ‌long-term knockdown of AjCTP1/2 further validated the essential role of AjCT2 in regulating feeding and growth‌ in this species. To elucidate the receptor mediating AjCT2’s feeding- and growth-promoting effects, we selected AjPDFR2 based on its distinct activation profile:‌ AjCT2 selectively activated AjPDFR2, inducing downstream ERK1/2 phosphorylation, whereas AjCT1 exhibited no activity‌ toward this receptor. Given this receptor specificity, we performed AjPDFR2 knockdown experiments, which revealed phenotypic changes ‌consistent with those in AjCTP1/2 knockdown animals‌, including ‌significantly reduced WGR and SGR‌, alongside ‌increased remaining bait accumulation and diminished excrement output‌ compared to control. Collectively, these results support a model wherein AjCT2 promotes feeding and growth in A. japonicus via AjPDFR2-dependent activation of the cAMP/PKA/ERK1/2 and Gαq/Ca²⁺/PKC/ERK1/2 cascades‌. Considering the inherent complexity of neuropeptide signaling systems, which involve multiple GPCR subtypes coupled to diverse signaling cascades, ligands bound to the same receptor may activate distinct G protein subforms within a single cell (Møller et al., 2003; Mendel et al., 2020). Receptor activation modes may be modulated by structural polymorphisms or binding site diversity (Wong et al., 2000; Changeux, 2010), as well as by the differential efficacy of peptides in activating receptors in vivo‌.  

      (7) For the peptide injection experiments, it is recommended to explain the different animal groups in the results section. In addition, injection in the control condition seems to have a small effect on the wet weight. Therefore, it would be useful to compare control-injected and peptide-injected groups after injection.

      Thank you for the reviewer’s comments. We have provided an expanded explanation of the animal group classifications in Section 2.6 (lines 367–375). We fully agree that a comparative analysis between the experimental and control groups post-injection is essential. However, since wet weight measurement is suboptimal for demonstrating feeding and growth variations, we re-evaluated the data using two validated metrics: weight gain rate (WGR) and specific growth rate (SGR) of A. japonicus. The results revealed that the high-concentration AjCT2 injection group exhibited significantly elevated weight gain rate and specific growth rate compared to the control group, suggesting a potential role of AjCT2 signaling in promoting feeding and growth in A. japonicus. These results are presented in Figure 8A, with detailed descriptions in Results Section 2.6 (lines 370–375) and methodology in Materials and Methods Section 4.8 (lines 847-851).

      (8) Regarding the RNAi knockdown experiments, it is not clear from the methods section what the siNC control exactly is, and how the interference rate is calculated.

      Thank you for this comment. The siNC control was siRNA which does not target any genes in A. japonicus, with interference rates quantified through the 2<sup>-ΔΔCT</sup> method to assess siRNA inhibition efficiency.‌ These methodological details have been incorporated into Materials and Methods Section 4.9 (lines 866–867 and 874-876) for enhanced clarity.‌

      Reviewer #2 (Recommendations for the authors):

      (1) Both the phylogenies are missing bootstrap tests. Please include this analysis. The phylogenetic analyses should also include other Family B ligands and receptors from both vertebrates and invertebrates because it is widely assumed that PDF is related to VIP given their shared roles in circadian clock and gut regulation. Therefore, this analysis needs to be more comprehensive than currently presented. Drosophila melanogaster receptors have also been excluded in spite of the Drosophila PDFR exhibiting ligand promiscuity. The legend should also include the full species names of the various taxa (or modify the figure to include full names) instead of referring to another table. The supplementary table was not available to this reviewer.

      Thank you for the reviewer’s constructive comments. According to the reviewer’s suggestion, we have incorporated the VIPRs and Drosophila melanogaster receptors into the comparative analysis and reanalyzed the phylogenies in Figure 1C, and both phylogenies included bootstrap tests (Figure 1B, 1C) in the revised manuscript. The full species names of the various taxa are listed in supplementary tables 1 and 2 in the revised manuscript.

      (2) Expression data indicate that AjCTP1/2 is expressed in both the longitudinal muscles and intestine. What are the cell types that express AjCTP1/2? Given that the authors show an effect of CT1 and CT2 on both of these tissues, it would be important to know whether this is local regulation (paracrine or autocrine) vs long-distance hormonal control by the nervous system. This can be addressed by performing in situ hybridization or immunohistochemistry of CT (using Asterias rubens CT antibody: https://doi.org/10.3389/fnins.2018.00382) on these tissues.

      Thank you for this feedback. We have now analysed CT-type neuropeptide expression in A. japonicus using immunohistochemistry with the antiserum to the A. rubens CT-type peptde ArCT, which has previously been shown to cross-react with CT-type neuropeptides in other echinoderms (Aleotti et al., 2022). We have added related descriptions in the following sections: Results (section 2.4, line 299-336), Discussion (section 3.3, line 545-554) and Materials and methods (section 4.6, line 785-817). ‌Consistent with this previous finding, the ArCT antiserum labelled neuronal cells and fibers in the central and peripheral nervous system and in the digestive system of A. japonicus (Figure 6). The specificity of immunostaining was confirmed by performing pre-absorption tests with the ArCT antigen peptide (Figure 6-figure supplement 1). The detection of immunostaining in the innervation of the intestine is consistent with PCR results and the relaxing effect of AjCT2 on intestine preparations. Interestingly, no immunostaining was observed in longitudinal muscle, which is inconsistent with the detection of AjCT1/2 transcripts in this tissue. This may reflect differences in the sensitivity of the methods employed to detect transcripts (PCR) and mature peptide (immunohistochemistry). The absence of ArCT-like immunoreactivity in the longitudinal muscles suggests that AjCT1 and AjCT2 may exert relaxing effects on this tissue in vivo via hormonal signaling mechanisms. However, because AjCT1/2 expression in the longitudinal muscles may be below the detection threshold of the ArCT antibodies, we can’t rule out the possibility that AjCT1/2 are released within the longitudinal muscles physiologically.       

      (3) While Drosophila DH31 can activate both PDF and DH31 receptors, the EC50 values differ drastically. Importantly, there is an independent gene encoding PDF which is a more sensitive ligand for the PDF receptor. This is in stark contrast to the situation presented here where the authors have yet to identify the PDF gene in their system. Outside Drosophila this cross signaling between the two systems has not been observed in any species. Based on this, I would argue that the ability of CTs to activate PDFR is not an evolutionary ancient property but rather an example of convergent evolution if supported by more evidence.

      We sincerely appreciate the reviewers' insightful comments.‌ We agree that we cannot rule out the possibilty that ability of CT-type peptides to activate PDF-type receptors in Drosophila and A. japonicus has arisen independently. Therefore, we have modified the text in the discussion accordingly so that this alternative explanation for the effects of CT-type peptides on PDF-type receptors is also presented: “Alternatively, the ability of CT-type neuropeptides to act as ligands for PDF-type receptors in D. melanogaster and A. japonicus may have evolved independently. Further studies on a wider variety of both protostome (e.g. molluscs, annelids) and deuterostome taxa (e.g. other echinoderms, hemichordates) are needed to address this issue.”

      (4) AjCT1 and CT2 can activate the two PDF receptors ex vivo. However, their EC50 values are larger and the responses are lower compared to those seen for the CT receptor. Similar cross-talk between closely related peptide families is often observed in ex vivo systems (see: https://doi.org/10.1016/j.bbrc.2010.11.089 , https://doi.org/10.1073/pnas.162276199 , https://doi.org/10.1093/molbev/mst269 and others). However, very few signaling systems exhibit this type of cross-talk in vivo. Without any in vivo evidence, I suspect that the more likely possibility is that the bona fide endogenous ligand for PDF receptors remains to be discovered. The authors could, however, perform peptide and receptor knockdown experiments and show overlap in phenotypes following CT knockdown and PDFR knockdown to support their claim.

      We sincerely appreciate the reviewers' insightful critique. According to the reviewer’s suggestion, we have supplemented CTP and AjPDFR2 knockdown experiments, and measured the dry weight of remaining bait and excrement, as well as calculating the weight gain rate and specific growth rate in response to phenotypic changes. The results showed that weight gain rate and specific growth rate in experimental groups were significantly decreased respectively (As shown in Figure 10A and 11B), Correspondingly, except for the I phase, the siAjCTP1/2-1 group had significantly increased remaining bait and decreased excrement in II-VI phases (Figure 10B), the remaining bait weight was significantly increased in siAjPDFR2-1 group (except during phase I), while the weight of excrement was significantly decreased in phase V and VI (Figure 11C). Therefore, AjCT and AjPDFR2 knockdown experiments showed overlap in phenotypes, providing evidence that AjCT does act as an endogenous ligand for PDFR. These results were added in Figure 10 and Figure 11. The related description was added in the results section 2.6 (line 390-396), section 2.7 (line 427-439) and the materials and methods section 4.9 (line 879-898). We acknowledge, however, that other peptides, in addition AjCT1 and AjCT2, may also act as ligands for AjPDFR1 and AjPDFR2 in vivo and on-going studies in the Chen (OUC) and Elphick (QMUL) labs are attempting to address this issue

      (5) Why are receptor transcripts upregulated following peptide injection? Usually, increased ligand levels/signaling result in a compensatory decrease in receptor levels. These negative feedback loops maintain optimum signaling levels. Since the authors have successfully implemented RNAi for this CT precursor, what are the phenotypes on growth and feeding?

      We thank the reviewers for raising these critical points. Our responses are structured as follows: Firstly, our findings align with established mechanisms of neuropeptide-induced receptor modulation (Please check the reference Tiptanavattana et al. 2022). Secondly, based on the reviewer’s suggestion, we have supplemented the experiments to detect the phenotype variations on growth and feeding based on long-term reduced CT signaling, including measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, as well as testing the expression levels of the three growth factors (AjMegf6, AjGDF-8 and AjIgf). The results showed that weight gain rate and specific growth rate in siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A), Correspondingly, except for the I phase, the siAjCTP1/2-1 group had more remaining bait and less excrement in II-VI phases (Figure 10B). Furthermore, the growth inhibitory factor AjGDF-8 was significantly up-regulated and the growth promoting factors AjMegf6 were significantly down-regulated in siAjCTP1/2-1 group (Figure 10C). We have added these results in Figure 10, with detailed description in the results section 2.6 (line 390-396) and in the materials and methods section 4.9 (line 879-888). And after long-term continuous injections of siAjCTP1/2-1, we further recorded the feeding behavior of these sea cucumbers for three consecutive days. The remaining bait and feces were cleaned and the food was re-placed in the middle of the tank each day. We calculated the aggregation percentage (AP) of sea cucumbers around the food during the peak feeding period (2:00-4:00) each day, which is the best indicator for sea cucumber feeding behavior detecting. The results showed that the AP in siAjCTP1/2-1 group was significantly lower than that in control group. After dissection, we also found the intestines of siAjCTP1/2-1 group had less food and significantly degenerated (see author response image 1). All these results supported that long-term functional loss of AjCT2 negatively influence the feeding and growth of A. japonicus.

      Other comments:

      (6) What criteria do the authors use to classify some proteins as "type", some as "like" and others as "related"? In my opinion, DH31 could be referred to as CT-like or CT-type. Please use one term for clarity unless there is a scientific explanation behind this terminology.

      Thank you for the reviewer’s comment. If you look at the paper by Cai et al. (2018) you will see in Figure 14 that CT-type peptides and DH31-type peptides are paralogous, probably due to a gene duplication in the common ancestor of the protostomes. The CT-related peptides in protostomes that have a disulphide bridge we would describe as CT-type because they have conserved a feature that is found in CT-type peptides in deuterostomes. Whereas the DH31 peptides we would describe as CT-like. But there is not a formal rule on this. It is possible the duplication event that gave rise to DH31 and CT-type peptides occurred in the common ancestor of the Bilateria but DH31-type signaling was lost in deuterostomes. On the other hand, if the gene duplication that gave rise to DH31-type peptides and CT-type peptides in protostomes did occur in a common ancestor of the protostomes, then DH31 and CT-type peptides in protostomes could be described as co-orthologs of CT-type peptides in deuterostomes. In this case, both CT peptides and DH31 peptides in protostomes could be described as CT-type. Here is a useful link for explanation of terms: https://omabrowser.org/oma/type/

      (7) Was genomic DNA removal step performed before cDNA synthesis for qRT-PCR?

      Thank you for the reviewer’s comment. The genomic DNA removal step was performed before cDNA synthesis for qRT-PCR and we have added the information in the section 4.5 (line 774-776).

      (8) Line 70: The presence of calcitonin-like peptides (DH31) and DH31 receptors in invertebrates was discovered long before the discoveries by Jekely 2013 and Mirabeau and Joly 2013. Please credit these original studies: https://pubmed.ncbi.nlm.nih.gov/10841553/ and https://pubmed.ncbi.nlm.nih.gov/15781884/.

      Thank you for the reviewer’s comment. We have credited these original studies in the revised manuscript.

      (9) Lines 72-74: Please cite https://pubmed.ncbi.nlm.nih.gov/24359412/.

      Thank you for the reviewer’s comment. We have cited it in the revised manuscript.

      (10) Line 87: Please cite https://pubmed.ncbi.nlm.nih.gov/15781884/.

      Thank you for the reviewer’s comment. We have cited it in the revised manuscript.

      (11) Lines 89-91: The functional significance of DH31 signalling to PDFR in Drosophila is known. See: https://pubmed.ncbi.nlm.nih.gov/15781884/ and https://pubmed.ncbi.nlm.nih.gov/30696873/. There are several studies that have shown the functions of DH31 signalling via DH31R.

      Thank you for the reviewer’s comment. We have corrected it and added all this studies in the revised manuscript.

      (12) Figure 1 Supplement 1: The tertiary models for CT1 and CT2 look completely different. This prediction is not in line with both ligands activating the same receptor.

      Thank you for the reviewer’s comment. We have deleted this supplementary figure.

      (13) Figure 1 Supplement 3 legend: Please add panel labels next to the corresponding receptor.

      Thank you for the reviewer’s comment. We have added panel labels next to the corresponding receptors as you suggested.

      (14) Figure 2: What does CO refer to?

      Thank you for the reviewer’s comment. CO (Control) refers to the stimulation of HEK293T transfected cells with serum-free DMEM, and we have added the detailed information in Figure 2 legend (line 251-252).

      (15) Figure 3: Due to the low magnification of the cells, it is difficult to see the localization of the receptor. It would also be more appropriate to use a membrane marker rather than DAPI which does not label the cytoplasm or membrane where the receptor can be found.

      we appreciate the reviewer's insightful comment regarding the experimental controls.‌ The baseline receptor localization data under non-stimulated conditions are presented in ‌Figure 3—figure supplement 1‌, demonstrating constitutive membrane distribution of pEGFP-N1-tagged receptors. Upon stimulation with synthetic CT-like peptides, qualitative imaging analysis revealed significant ligand-induced receptor internalization into the cytoplasm (Figure 3B).

      (16) Figure 9: Please include PDF precursor and receptor as separate columns. Also, Drosophila CT/DH31 receptors have been characterized.

      Thank you for the reviewer’s comment. We have added PDF precursor, predicted peptides and receptors as separate columns in the revised manuscript Figure 12. And also, we corrected the error summary of Drosophila CT/DH31 receptors according to your suggestions.

      (17) Table 1: It is not very clear why there are multiple columns for ERK1/2 with different outcomes.

      Thank you for the reviewer’s comment. Although the cAMP/PKA or Gαq/Ca<sup>2+</sup>/PKC signaling is activated after ligand binding to receptors, the downstream ERK1/2 cascade is not necessarily activated. Therefore, we counted the activation status of cAMP/PKA and its downstream ERK1/2 cascade, and Gαq/Ca<sup>2+</sup>/PKC and its downstream cascade in Table 1 respectively. We have optimized Table1 to make it clearer in the revised manuscript.

    1. eLife Assessment

      This fundamental study examines infection of the liver and hepatocytes during tuberculosis infection. The authors convincingly demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. A further strength of the study lies in clinical evaluation of the presence of tuberculosis bacteria in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure, which will prompt further study.

    2. Reviewer #1 (Public review):

      Summary:

      Authors showed the presence of Mtb in human liver biopsy samples of TB patient and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infect macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patient. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. Authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. Authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drug is possible only when DME of Mtb inside is up regulated or target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if Drug tolerant phenotype section can be rewritten to clarify the facts.

      In the revised manuscript, by immune-staining authors convincingly showed that hepatocytes are a favourable niche for replication of MTb.

      Authors have rewritten the drug tolerant phenotype section which reads better.

      Overall, this paper has new and important information on how MTb establishes a favourable niche for growth in hepatocytes and creates a drug tolerant environment.

    3. Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in similar line for M.tb infection. As liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      Comments on revised version:

      The authors noted that even in experiments where mice were infected with lower CFUs, the presence of Mtb colonies could still be detected in the liver. It would be beneficial to include some experimental data related to this in the supplementary information, as it could provide valuable insights for the research field.

    4. Reviewer #3 (Public review):

      In this revised manuscript, the authors explore how Mtb can infect hepatocytes and create a favorable niche associated with upregulation of the transcription factor PPARγ which presumably allows the bacteria to scavenge lipids from lipid droplets in host cells and upregulate drug-metabolizing enzymes to protect against its elimination. In response to the review, the authors have performed some additional immunostaining of hepatocytes, added more detail to figure legends, added experiments somewhat showing improved colocalization and staining, clarified several points and paragraphs, and updated the referenced literature and discussion.

      The current manuscript provides evidence that human miliary TB patients have infection of hepatocytes with Mtb, with evidence that the bacteria survive at least partially through upregulation of PPARγ, which significantly changes the lipid milieu of the cells. There is also an examination of transcriptomics and lipid metabolism in response to Mtb infection, as well as drug tolerance of Mtb inside hepatocytes. The current manuscript is an improvement over the previous one.

      However, although the manuscript is improved, tissue immunophenotyping of the various cells in the liver remains weak and unconvincing. This is truly a missed opportunity and lessens the rigor of the central findings and conclusions. As pointed out by another reviewer, literature has described different fates of Mtb in the liver. Given the tissue available to the authors, carefully dissecting the various cells that the bacteria are in (esp. hepatocytes versus Kupffer cells) is critical. The authors use only 2 generic markers and do not distinguish among cell types within the tissue slices. A review of the literature shows a variety of both human and mouse antibody markers. In fact, a liver atlas based on immunophenotyping has been published. Likewise, the authors comment on liver granulomas, but this is not justified without immunophenotyping.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Authors showed the presence of Mtb in human liver biopsy samples of TB patient and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infect macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patient. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. Authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. Authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drug is possible only when DME of Mtb inside is up regulated or target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if Drug tolerant phenotype section can be rewritten to clarify the facts.

      In the revised manuscript, by immune-staining authors convincingly showed that hepatocytes are a favourable niche for replication of MTb.

      Authors have rewritten the drug tolerant phenotype section which reads better.

      Overall, this paper has new and important information on how MTb establishes a favourable niche for growth in hepatocytes and creates a drug tolerant environment.

      We thank the reviewer for the through and insightful review.

      Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in similar line for M.tb infection. As liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      Comments on revised version:

      The authors noted that even in experiments where mice were infected with lower CFUs, the presence of Mtb colonies could still be detected in the liver. It would be beneficial to include some experimental data related to this in the supplementary information, as it could provide valuable insights for the research field.

      We thank the reviewer for the in depth evaluation of our manuscript and as suggested we will include the data where Mtb was detected in the liver at low CFUs

      Reviewer #3 (Public review):

      In this revised manuscript, the authors explore how Mtb can infect hepatocytes and create a favorable niche associated with upregulation of the transcription factor PPARγ which presumably allows the bacteria to scavenge lipids from lipid droplets in host cells and upregulate drug-metabolizing enzymes to protect against its elimination. In response to the review, the authors have performed some additional immunostaining of hepatocytes, added more detail to figure legends, added experiments somewhat showing improved colocalization and staining, clarified several points and paragraphs, and updated the referenced literature and discussion.

      The current manuscript provides evidence that human miliary TB patients have infection of hepatocytes with Mtb, with evidence that the bacteria survive at least partially through upregulation of PPARγ, which significantly changes the lipid milieu of the cells. There is also an examination of transcriptomics and lipid metabolism in response to Mtb infection, as well as drug tolerance of Mtb inside hepatocytes. The current manuscript is an improvement over the previous one.

      However, although the manuscript is improved, tissue immunophenotyping of the various cells in the liver remains weak and unconvincing. This is truly a missed opportunity and lessens the rigor of the central findings and conclusions. As pointed out by another reviewer, literature has described different fates of Mtb in the liver. Given the tissue available to the authors, carefully dissecting the various cells that the bacteria are in (esp. hepatocytes versus Kupffer cells) is critical. The authors use only 2 generic markers and do not distinguish among cell types within the tissue slices. A review of the literature shows a variety of both human and mouse antibody markers. In fact, a liver atlas based on immunophenotyping has been published. Likewise, the authors comment on liver granulomas, but this is not justified without immunophenotyping.

      We would like to thank the reviewer for the in-depth and detailed suggestions. We would like to clarify that the primary aim of our study was to determine the localization of Mtb within hepatocytes and the downstream biological consequences. To this end, we employed two well-established and widely validated markers (ASPGR 1 and albumin) that are consistently used to identify hepatocytes in both human and murine liver tissue. While we acknowledge the broader potential of comprehensive immunophenotyping, our focused approach was designed to specifically address the question of hepatocyte involvement, which the selected markers effectively support, which was further reiterated by the Reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In my opinion this paper contains important information and no further information is required for this manuscript.

      We thank the reviewer for the insightful comments

      Reviewer #2 (Recommendations for the authors):

      The authors noted that even in experiments where mice were infected with lower CFUs, the presence of Mtb colonies could still be detected in the liver. It would be beneficial to include some experimental data related to this in the supplementary information, as it could provide valuable insights for the research field.

      As suggested,  we will include the data with the low CFUs in the updated manuscript.

      Reviewer #3 (Recommendations for the authors):

      • Line 340, the fact that PPARγ inhibition decreases bacterial load should not be surprising, as the authors cite several papers where this is already shown.

      • Line 379, the increased tolerance of Mtb to drugs in hepatocytes is only significant at the lower 2 concentrations, not at 5 ug/mL.

      • Fig S4F-H, the y axis is inappropriately not set to zero on the lower limit.

      • Fig S9B, the Y-axis states "relative" CFU, but there is no indication what the bars are normalized to, and the numbers are much more typical of standard CFU values. Was the "Relative" part left in by mistake?

      • Double check the ending of the figure legend for Figure S10 and S11.

      • Line 352, phenomenom [sic] is misspelled.

      • On re-read, several sentences throughout this manuscript need improvement regarding structure and grammar. I suggest careful editorial review.

      We thank the reviewer for pointing out the issues and these will be carefully modified in the next version.


      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors showed the presence of Mtb in human liver biopsy samples of TB patients and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infects macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patients. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. The authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      We thank the reviewer for the positive feedback and for highlighting the strengths of our study.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. The authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drugs is possible only when DME of Mtb inside is up regulated or the target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if the Drug tolerant phenotype section can be rewritten to clarify the facts.

      We agree that several case studies regarding liver infection in pulmonary TB patients have been reported in the literature, however this report is the first comprehensive study that establishes hepatocytes to be a favourable niche for Mtb survival and growth.

      Drug tolerance is a phenomenon that is exhibited by the bacteria and during hostpathogen interactions, can be influenced by both intrinsic (bacterial) and extrinsic (host-mediated) factors. Multiple examples of tolerance being attributed to host driven factors can be found in literature (PMID 32546788, PMID: 28659799, PMID: 32846197). Our studies demonstrate that Mtb infected hepatocytes create a drug tolerant environment by modulating the expression of Drug modifying enzymes (DMEs) in the hepatocytes.

      As suggested by the reviewer we will rewrite the drug tolerant phenotype section.

      Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in the clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in a similar line for M.tb infection. As the liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, the greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      We thank the reviewer for emphasizing on the strengths of our study and how it can lead to further investigations in the field.

      Reviewer #3 (Public review):

      This manuscript by Sarkar et al. examines the infection of the liver and hepatocytes during M. tuberculosis infection. They demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. Transcriptomic analysis of HepG2 cells showed differential regulation of metabolic pathways including fatty acid metabolic processing. Hepatocyte infection is assisted by fatty acid synthesis in the liver and inhibiting this caused reduced Mtb growth. The nuclear receptor PPARg was upregulated by Mtb infection and inhibition or agonism of its activity caused a reduction or increase in Mtb growth, respectively, supporting data published elsewhere about the role of PPARg in lung macrophage Mtb infection. Finally, the authors show that Mtb infection of hepatocytes can cause upregulation of enzymes that metabolize antibiotics, resulting in increased tolerance of these drugs by Mtb in the liver.

      Overall, this is an interesting paper on an area of TB research where we lack understanding. However, some additions to the experiments and figures are needed to improve the rigor of the paper and further support the findings. Most importantly, although the authors show that Mtb can infect hepatocytes in vitro, they fail to describe how bacteria get from the lungs to the liver in an aerosolized infection. They also claim that "PPARg activation resulting in lipid droplets formation by Mtb might be a mechanism of prolonging survival within hepatocytes" but do not show a direct interaction between PPARg activation and lipid droplet formation and lipid metabolism, only that PPARg promotes Mtb growth. Thus, the correlations with PPARg appear to be there but causation, implied in the abstract and discussion, is not proven.

      The human photomicrographs are important and overall, well done (lung and liver from the same individuals is excellent). However, in lines 120-121, the authors comment on the absence of studies on the precise involvement of different cells in the liver. In this study there is no attempt to immunophenotype the nature of the cells harboring Mtb in these samples (esp. hepatocytes). Proving that hepatocytes specifically harbor the bacteria in these human samples would add significant rigor to the conclusions made.

      We thank the reviewer for nicely summarizing our manuscript.

      Our study establishes the involvement of liver and hepatocytes in pulmonary TB infection in mice. Understanding the mechanism of bacterial dissemination from the lung to the liver in aerosol infections demands a detailed separate study.

      Figure 6E and 6F shows how PPARγ agonist and antagonist modulate (increase and decrease respectively) bacterial growth in hepatocytes (further supported by the CFU data in Supplementary Figure 9B). Again, the number of lipid droplets in hepatocytes increase and decrease with the treatment of PPARγ agonist and antagonist respectively as shown in Figure 6G and 6H. Collectively, these studies provide strong evidence that PPARγ activation leads to more lipid droplets that support better Mtb growth.

      We thank the reviewer for finding our human photomicrographs convincing. In the manuscript, we provide evidence for the direct involvement of the hepatocytes (and liver) in Mtb infection. We have performed detailed immunophenotyping of hepatocyte cells in the mice model with ASPGR1 (asialoglycoprotein receptor 1) and in the revised version of record, we have further stained the infected hepatocytes with anti-albumin antibody.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In my opinion drug tolerant phenotype section should be rewritten for better clarification. The manuscript contains important information about hepatic tuberculosis which are not reported yet.

      We have rewritten the drug tolerant phenotype section for better clarity.

      We appreciate the reviewer’s comments regarding important information about hepatic tuberculosis

      Reviewer #2 (Recommendations for the authors):

      The following are some observations and comments on the manuscript.

      (1) The study delves into the mechanisms related to hepatic TB/miliary TB; however, the introduction and discussion only describe and discuss the data in the context of pulmonary TB giving a sense that the mandate of the MS is the exploration of the role of liver cells in pulmonary TB. There appears a gap in the connection of findings from the Miliary TB to the pulmonary TB. A discussion of the conversion of pulmonary TB to extrapulmonary /hepatic TB in the light of the findings may be helpful.

      We have modified the discussion section to include possible mechanisms that convert pulmonary TB to hepatic TB in the light of findings. Briefly, Pulmonary tuberculosis (TB) can lead to miliary TB probably through hematogenous dissemination, where Mtb spreads from the infected lungs into blood vessels either from a primary lung focus, reactivated TB or caseous necrosis.  Once in blood vessels, the bacteria seed multiple organs, forming tiny granulomas, characteristic of miliary TB. The liver involvement could be either through direct hematogenous spread or extrusion from nearby infected lymph nodes, leading to hepatic TB, which presents with granulomas and liver dysfunction. This spread underscores the severity of untreated pulmonary TB and the need for early intervention. Our in vivo infection data clearly shows that pulmonary infection of Mtb in mice and guinea pigs can steadily leads to significant infection of the liver and metabolic abnormalities in the liver. The study further highlights the need for systemic studies to better understand the route and mode of dissemination from lungs to liver for better pathophysiological understanding of the disease and creating new therapeutic targets.  

      (2) The authors show the presence of Mtb in the liver autopsies of miliary tuberculosis patients. It is well known that Mtb disseminates during the late stages to several organs and liver is a major site (Sharma et al. 2005; 10.1016/S1473-3099(05)70163-8). Other clinical observations also point to the fact that although Mtb infects liver cells, it is cleared (Thandi et al., 2018, https://doi.org/10.4049/jimmunol.200.Supp.173.20). As the samples are from miliary TB, it is expected that the bacterial load must have been very high before spreading to blood. It is known that once in blood, M.tb is expected to spread to various organs, especially highly vascular ones. Were any other tissues (especially with high vasculature) stained and verified? If yes, add to the supplementary data or discuss.

      Other tissues were not collected and stained during this study. Studies are currently underway to understand whether other vasculated organs also harbour Mtb or not. Besides several studies have shown that Mtb can infect a wide range of organs like brain, kidney, bone marrow, etc (PMID: 33142108, PMID: 28046053, PMID: 34269789) during miliary conditions.

      (3) It is not evident from this paper if hepatic infiltration occurs in pulmonary TB patients? It may therefore be important to discuss the status of liver infections in the primary pulmonary infection.

      Based on the available data from human biopsied liver samples, there is an indication of liver involvement in systemic tuberculosis (TB). However, to gain a more comprehensive understanding of hepatic infiltration in pulmonary TB patients, it is essential to conduct well-organized clinical studies. These studies should specifically target pulmonary TB patients and explore the extent and nature of liver involvement in these individuals (discussion). As suggested by the reviewer it is in the discussion

      (4) Similarly, in the mice model, M.tb was shown to localize to liver when aerosolic infection was given. Were any other tissues, such as kidney, bone marrow etc, checked? Is it because of the high dose of M.tb against the standard challenge dose of 50-100 CFU? Further, since the study in the mouse model is to mimic a miliary tuberculosis of liver, did the dissemination occur via bloodstream and if mycobacteremia could be observed in infected mice.

      Currently studies are underway to understand the involvement of other organs like kidney, brain, bone marrow, in aerosol infection mice model and how dissemination occurs in those distant organs.

      The focus of the current study was to understand the role of liver in systemic tuberculosis with emphasis on hepatocytes as a key cell type to be infected. We have also conducted the experiments with lower CFUs and could detect the presence of Mtb colonies in liver, so we do not think that the infection of liver is dependent on the dose of infection.

      (5) There are studies in mouse model which infer that liver carried the lowest bacterial burden, was cleared the fastest, and it is established that as compared to sites persistently seeded by M. tuberculosis, in the liver the bacteria rarely infect cell types other than professional phagocytes. As the observations in this study are contrasting, the discussion section should include a critical comparative analysis to justify why in the conditions used in the study, the hepatocytes and not Kupffer cells are infected. Other than the morphological description to indicate M.tb infection of hepatocytes in the liver section (fig 1E), it will be good to show localization of M.tb specifically to hepatocytes by using hepatocyte specific marker. Unlike as reported, why was a clearance of M.tb not observed even after 10 weeks (figure 2B).

      While some studies show that Mtb from the liver is cleared fast but there are several other studies that report Liver harbours Mtb even after 10 weeks postinfection (PMID: 22359543, PMID: 21533158, PMID: 29242198). We have consistently observed Mtb infection of liver post week 10 in our infection model. 

      We have performed detailed immunophenotyping of hepatocyte cells in the mice model with ASPGR1 (asialoglycoprotein receptor 1) and in the revised version of record, we have further stained the isolated hepatocytes with anti-albumin antibody (albumin is a robust marker of hepatocyte identity) and have showed the presence of Mtb in it. The data has been included in the revised manuscript (Fig 2J)

      (6) While the result section mentions that "individuals with miliary tuberculosis' (line 107), the legend of Figure 1 writes 'Presence of Mtb in human pulmonary tuberculosis patients'. This is confusing. Clarify

      We thank the reviewer for pointing it out, we have changed the figure legends to miliary tuberculosis as most of the liver biopsy samples were obtained from military tuberculosis patients. 

      (7) Supplementary Figure 2D: Corresponding control panel (uninfected) should be added, which will also verify the specificity of Ag85b. As it is known that Ag85B is secreted out from the bacteria and hence the detected signals may not confirm that Mtb is in hepatocytes. Ag85B per bacterium decreases by almost 10,000-fold at later stages of infection because of secretion (Ernst JD, Cornelius A, et al 2019 mBio). In Supl figure 2D, Ag85b signal seems to be present everywhere inside the cells. Hence, it is important that the control panel be added.

      We have included a control image below which shows no staining of Ag85B in the uninfected sample.While we acknowledge with the reviewer’s comment, but Ag85B has been consistently used as a marker for Mtb presence in multiple studies. Nargan et al., uses Ag85B based staining to characterize infection both pulmonary and EPTB samples (PMID: 38880068). Jain et al., uses Ag85B to characterize Mtb infection of Mesenchymal stem cell in lung biopsy samples of pulmonary TB patients (PMID: 32546788)

      Author response image 1.

      Ag85B staining in uninfected mice shows no signals

      (8) The kinetics experiments in Figure 3D-3G should have used time laps microscopy of a few of the infected cells or it should be represented in CFU. If we consider the doubling time of H37Rv is about 22h to 24h, the data showing that MFI increases dramatically from 5 HPI to 120 HPI, gives an impression that the bacterial number inside the cells increased more than its doubling time.

      We have added the modified plot. As suggested, the CFU of Mtb within HepG2, PHCs, THP-1, RAW 264.7 and BMDMs have been included in the revised version (Supplementary Figure 4 D-H)

      (9) What is the effect of C45 and T863 on Mtb growth invitro? The effect of C45 and T863 on Mtb growth invitro should be shown to be ruled out. The representative image in Figure 5F is DMSO or C45 treated cells panel? Please specify it.

      As per the reviewer’s suggestion we have seen the effect of C45 (30 µM) and T863 (25 µM) on Mtb growth in vitro and did not find any difference in the growth kinetics. The representative image in Figure 5F is DMSO treated cells.

      Author response image 2.

      Growth kinetics of Mtb in 7H9 medium with DMSO, C75 and T863

      (10) Supplementary Figure 6B: Correct the Y-axis label from mRNA levels to Fold change (normalised to control). Please do similar changes wherever required.

      We have made the necessary changes as per the suggestion of the reviewer.

      (11) Figure 7B and 7C: How was the normalization performed? Is the data normalized to the number of bacteria that entered the specific cell type or was normalized at 48hrs with respect to DMSO? DMSO alone data should be shown.

      In the drug tolerance assays, we have calculated the ratio of the bacterial burden in hepatocytes treated with drugs compared to hepatocytes treated with DMSO. The infection was given for 48 hours post which the infected cells were treated with the mentioned concentrations of isoniazid and rifampicin for 24 hours. CFU enumeration was conducted after this 24 hour. Figure 7A gives a schematic of the experimental set up.

      % Tolerant Bacterial population= [A/B X 100] % where A is the CFU of Mtb from infected hepatocytes treated with drug and B is the CFU of Mtb infected cells treated with DMSO.Thus the effect of MOI is negated.

      To provide further credence to the CFU data, we have analysed these studies using microscopic studies as well, where no cell death was observed under the conditions. Mouse BMDMs were as a macrophage control. We have calculated the % tolerance as ratio by measuring the mean fluorescent intensity of GFP-Mtb per hepatocyte treated with drug to MFI of GFP-Mtb per hepatocyte treated with DMSO (control). More than 20 fields, each consisting of more than 4 infected cells have been used for analysis providing additional evidence of less killing of Mtb in hepatocytes compared to BMDMs with anti-TB drugs. All these details are included in the manuscript.

      (12) While authors have shown the changes in mRNA levels of CYP3A4, CYP3A43, NAT2, the protein or activities of some of these should be measured to verify the effect.

      Currently studies are underway to understand the activities of the key proteins involved in isoniazid and rifampicin metabolism and will be published as a separate manuscript.

      Reviewer #3 (Recommendations for the authors):

      Additional comments are:

      • Figure 2D, the 20X and 40X magnifications do not look appreciably different in size. Please double-check that the correct images were used.

      We thank the reviewer for pointing it out, we havecorrected it in the revised version.

      • Lines 162-164: The authors state almost 100% purity. However, the contour plot in 2F appears to show 2 cell populations. Figure 2G is missing a legend of which colors correspond to which staining (and again there appears to be highly variable staining).

      We agree with the reviewer that there are two contours observed in Figure 2F. Although both the contours are positive for ASPGR1 protein, but the level of expression of the ASPGR1 protein is variable. The corresponding confocal image (Nucleus stained by DAPI and ASPGR1 stained with ASPGR1 antibody with Alexa fluor 555 conjugated secondary antibody) also indicates a variable staining of isolated primary hepatocytes, where some cells give a stronger intensity signal than the other cells, further visually confirming our statement. Moreover, several studies show differential expression of ASPGR1 protein in hepatocyte like cells (PMID: 27143754)

      To further clarify and be more specific with respect to the identity of the hepatocytes, we have stained primary hepatocytes from infected mouse livers with Albumin antibody (a stable marker for hepatocytes) and Ag85B (2J)

      Multiple figures throughout the manuscript, including this one, would benefit from the use of arrows to depict what is described in the legend and text more clearly, and the use of higher power insets to better define cell architecture. Finally, some images appear blurry to the eye. Improvements are needed throughout.

      As per the suggestion, we have modified the figures and figure legends for better clarity.

      • Lines 153-155. Albumin, AST and GGT appear to be significantly up at week 8, contradicting the statement that there is no change until week 10.

      We thank the reviewer for poiting it out and  have made suitable changes in the write up

      • Lines 203-205: The authors state earlier that bacteria survive in macrophage phagosomes. Do the authors know the niche for bacteria in hepatocytes that enable them to continue to grow? Transcriptome data from HepG2 cells suggest perhaps a phagosomal pathway?

      We thank the reviewer for this insightful question. As rightly pointed out by the reviewer, transcription data indeed suggests changes in several important pathways like macroautophagy, golgi vesicular transport and vacuolar transport, which can affect the subcellular localisation of Mtb within hepatocytes. High resolution microscopic studies with respect to the subcellular localisation of labelled Mtb within Primary hepatocytes, HepG2 and THP-1 has been conducted and the % colocalization within different intra-cellular compartments have been measured. The image of colocalization of labelled Mtb within PHCs is shown below along with the % colocalization within various compartments in PHCs, HepG2 and THP-1 is added. 

      Author response image 3.

      Colocalisation of Mtb-GFP with various intra-cellular markers within PHCs.

      Author response image 4.

      Percentage Colocalisation of Mtb-GFP with various intra-cellular markers within PHCs, HepG2 and THP-1.

      • Validation of some critical genes found in the HepG2 cells should be done by qRTPCR in primary hepatocytes.

      qRT-PCR analysis of some of the key genes in HepG2 have been validated in primary hepatocytes at 24 hours post infection. Majority of the genes show a similar trend.

      Author response image 5.

      Gene expression analysis of the mentioned genes in Mtb infected PHCs as compared to the uninfected control.

      • Lines 259-260: The authors state a high degree of co-localization. The photomicrograph of a single cell in Fig. 5D is not convincing. I'm not even sure that they are really in the same subcellular compartment. Co-localization stated in Fig. S8B is also not convincing as shown.

      The image currently shown in figure 3D is a maximum intensity projection image of multiple z-stacks encompassing the entire cell.

      We agree with the reviewer with respect to figure Fig S8B and will modify the text and the figure legend accordingly.

      Copywriting edits:

      • It is difficult to see individual gene names in Figures 4D and 4E. A higher resolution or larger font would be appreciated for the reader.

      An excel file with the top differentially regulated genes at both 0 hours post infection and 48 hours post infection has been added.

      • Figure 5A has a shadow on the top right image.

      We have changed the image in the revised manuscript

      • Figure 5E is difficult to read the labels on the axes; it would be better in general to make the labels separately instead of relying on the graphing software, since these labels can get stretched when the size of the graph is modified.

      We agree with the reviewer and have made necessary changes.

      • Line 163: should be "percent" and not "perfect."

      We thank the reviewer for pointing it out and have corrected it

      • Line 190: is missing a period at the end of the sentence "...for further experiments"

      We thank the reviewer for pointing it out and have corrected it

      • Line 332: should be "hepatocytes" instead of "hepatoctyte" [sic]

      We thank the reviewer for pointing it out and have corrected it

    1. eLife Assessment

      This study presents an important finding on the role of GATA4 in aging- and OA-associated cartilage pathology. The conclusions are well supported by compelling in vitro and in vivo evidence. This work will be of broad interest to both cell biologists and orthopedic clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule were used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the over expression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. Indicating that GATA4 contributes to the onset and progression of OA in aged individuals.

      Comments on revised version:

      Great work! All my concerns have been well addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This study elucidated the impact of GATA4 on aging- and injury-induced cartilage degradation and osteoarthritis (OA) progression, based on the team's finding that GATA expression is positively correlated with aging in human chondrocytes. By integrating cell culture of human chondrocytes, gene manipulation tools (siRNA, lentivirus), biological/biochemical analyses and murine models of post-traumatic OA, the team found that increasing GATA4 levels reduced anabolism and increased catabolism of chondrocytes from young donors, likely through upregulation of the BMP pathway, and that this impact is not correlated with TGF-β stimulation. Conversely, silencing GATA4 by siRNA attenuated catabolism and elevated aggrecan/collagen II biosynthesis of chondrocytes from old donors. The physiological relevance of GATA4 was further validated by the accelerated OA progression observed in lentivirus-infected mice in the DMM model.

      Strengths:

      This is a highly significant and innovative study that provides new molecular insights into cartilage homeostasis and pathology in the context of aging and disease. The experiments were performed in a comprehensive and rigorous manner. The data were interpreted thoroughly in the context of the current literature.

      Weaknesses:

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature.

    4. Reviewer #3 (Public review):

      Summary:

      This is an exciting, comprehensive paper that demonstrates the role of GATA4 on OA-like changes in chondrocytes. The authors present elegant reverse translational experiments that justify this mechanism and demonstrate the sufficiency of GATA4 in a mouse model of osteoarthritis (DMM), where GATA4 drove cartilage degeneration and pain in a manner that was significantly worse than DMM alone. This could pave the way for new therapies for OA that account for both structural changes and pain.

      Strengths:

      (1) GATA4 was identified from human chondrocytes.

      (2) IHC and sequencing confirmed GATA4 presence.

      (3) Activation of SMADs is clearly shown in vitro with GATA4 overexpression.

      (4) The role of GATA4 was functionally assessed in vivo using the mouse DMM model, where the authors uncovered that GATA4 worsens OA structure and hyperalgesia in male mice.

      (5) It is interesting that GATA4 is largely known to be found in cardiac cells and to have a role in cardiac repair, metabolism, and inflammation, among other things listed by the authors in the discussion (in liver, lung, pancreas). What could this new knowledge of GATA4 mean for OA as a potentially systemically mediated disease, where cardiac disease and metabolic syndrome are often co-morbid?

      Weaknesses:

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed.

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes.

      (3) While there appear to be GATA4 small molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study.

      Comments on revised version:

      I do not have further comments. Thank you for addressing the previously mentioned concerns.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young donors. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule was used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the overexpression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. This indicates that GATA4 contributes to the onset and progression of OA in aged individuals.

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      Weaknesses:

      (1) A couple of sentences should be added to the introduction, to emphasize the role GATA4 plays, such as the alterations to the TGF-b signaling pathway and the increased activation of the NF-kB pathway. 

      As suggested, we have expanded on these signaling pathways in the Introduction to highlight the known functions of GATA4. Importantly, there was no previous study reporting the roles of GATA4 in regulating TGF-β pathway.

      “Many growth factors contribute to the chondro-supportive environment in the knee joint. Particularly, transforming growth factor-b (TGF-b) plays a key role in maintaining chondrocytes and replenishing ECM loss. However, during OA, TGF-b can induce catabolic processes in chondrocytes, resulting in matrix stiffening, osteophytes, and chondrocyte hypertrophy.[10-12]” (Lines 80-84)

      “Mechanistically, upregulation of GATA4 was shown to increase nuclear factor-kB (NF-kB) pathway activation.[14,15]  NF-κB is thought to amplify and potentially propagate cellular senescence during the aging process through the senescence-associated secretory phenotype (SASP), which could contribute to a low-grade state of chronic inflammation.[16]” (Lines 99-102)

      “When GATA4 was over expressed, we found that there were alterations to the TGF-b signaling pathway and activation of the NF-kB signaling pathway.” (Lines 106-108)

      (2) Figure 1F, the GATA4 histology image should be bigger.

      We have now increased the size of the image in revised Figure 1F.

      (3) Further discussion should be conducted regarding the reasoning as to why GATA4 increases the phosphorylation of SMAD1/5. 

      Thank you. The underlying mechanism of GATA4 activating SMAD1/5 has not been previously investigated. We have now elaborated on this in the discussion and have added more relevant publications.

      “Our study indicated that there was an observed decrease in chondrogenesis and an increase in hypertrophy-related genes following GATA4 overexpression (Figure 2G).” (Lines 572-574)

      “These previous studies and literature review inspired us to explore the potential association between GATA4 levels and the activation of SMAD1/5.” (Lines 587-588)

      “In this study, it was shown that GATA4 was necessary for bone morphogenic protein-6 (BMP-6) mediated IL-6 induction, in which there are multiple GATA binding domains on the IL-6 promoter. This work further showed that GATA4 interacts with SMAD 2,3 and 4.[55] Studies have suggested that BMP pathways and GATA4 work synergistically to regulate SMAD signaling.56 This information indicates that the involvement of GATA4 in the TGF-b signaling pathway is complex and further studies should be conducted to better assess this relationship.” (Lines 594-599)

      (4) More information should be included to clarify why GATA4 is thought to be linked to DNA damage and the pathway that is associated with that. 

      We have now included further information in the discussion to clarify the association between DNA damage and GATA4 upregulation.

      “The study by Kang et al. demonstrated that the suppression of p62 following DNA damage leads to GATA4 accumulation due to the lack of autophagy.13 DNA damage is known to increase with age.71 Therefore, we believe that DNA damage due to aging is a key driver of the upregulation of GATA4 in old chondrocytes.” (Lines 642-646)

      (5) Please add further information regarding the limitations of the animal study conducted in this work and future plans to assess this. 

      We have included more limitations of the animal study that was conducted in this work and have expanded on the future plans to use inducible GATA4 expression in transgenic mouse lines to study the role of GATA4 overexpression in OA onset and progression.

      “Third, during our in vivo work, the intraarticular injection of GATA4 lentivirus was not chondrocyte-specific. Therefore, the injection also allowed for other cell types to overexpress GATA4. Future work should be conducted using transgenic mouse lines for cartilage-specific inducible overexpression or depletion of Gata4 to further investigate the role of GATA4 in chondrocytes.” (666-670)

      (6) In Figure 5, GATA4 should be changed to Gata4 in the graphed portions for consistency. 

      Thanks. We have made the necessary adjustments throughout the manuscript.

      Reviewer #2 (Public review):

      (1) While it is convincing that GATA4 expression is elevated in elderly individuals, and that it has a detrimental impact on cartilage health, the authors might want to add further discussion on the variability among individual human donors, especially given the finding that the elevation of GATA4 was not observed in chondrocytes from donor O1 (Figure 1G).

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      As suggested, we have included more discussion on the variability among donors.

      “Although we found that GATA4 was generally increased with aging, some young donors also exhibited increased levels of GATA4, which may be associated with increased DNA damage, as discussed above, or other stressors. Therefore, GATA4 should be used together in conjunction with other aging biomarkers, such as the epigenetic clock [72] to precisely define chondrocyte aging. Future work should examine biological versus chronological aging and epigenetic clock-based assessments to explain the variabilities in GATA4 expression among donors.” (Lines 658-663)

      (2) It might also be worth adding additional discussion on the interplay between senescent chondrocytes and the dysfunctional ECM during aging. As noted by the authors, aging is associated with decreased sGAG content and likely degenerative changes in the collagen II network, so the microniche of chondrocytes, and thus cell-matrix crosstalk through the pericellular matrix, is also altered or impaired. 

      Thank you for this comment. We have included more discussion on the interplay of chondrocyte senescence and dysfunctional ECM during aging, with a specific focus on the microniche of chondrocytes.

      “Additionally, a common hallmark of chondrocyte aging is the alternation of ECM, including composition change [2] and stiffening.[57] ECM stiffness can directly affect chondrocyte phenotype and proliferation, and contribute to OA.[58] A recent study by Fu et al. associated matrix stiffening with the promotion of chondrocyte senescence.[59] Furthermore, matrix stiffening has been associated with modulating the TGF-b signaling pathway.[60-62] Future studies should investigate the potential of matrix stiffening and the effect of GATA4 on pericellular matrix proteins such as decorin[63,64], biglycan, collagen VI and XV, as these proteins assist with the regulation of biochemical interactions and assist with the maintenance of the chondrocyte microenvironment.[65] Herein, the TGF-b signaling pathway can further alter the extracellular microenvironment[62], which could promote cellular senescence and subsequently NF-kB pathway activation.” (Lines 600-610)

      (2) If applicable, please also add Y3 and O3 to Figure S1 for visual comparison across individual donors. 

      As suggested, we added Y3 and O3 to the revised Figure S1 for more visual comparisons across individual donors.

      (3) Figure 3C, the molecular weight labels are off. 

      Thanks. We corrected this mistake.

      (4) Line 438 - Please clarify in text that the highest efficiency of siRNA chosen was siRNA2. 

      As suggested, we added the reason for selecting siRNA2.

      “Several GATA4 siRNAs were tested, and the one with the highest efficiency was selected based off RT-qPCR results, which indicated that siRNA2 treatment induced lowest expression of GATA4.  (Supplementary Figure S6).” (Lines 448-450)

      (5) Did the authors test the timeline of sustained knockdown of GATA4 by siRNA?

      We used a 7-day timepoint of chondrogenesis, and RT-qPCR results demonstrated that there was a downregulation of GATA4 expression at this timepoint (Figure 4). In the current in vitro study, we did not examine the efficacy of GATA4 siRNA for longer than 7 days.

      Reviewer #3( Public review):

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed. 

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      When we first saw the results, we did consider studying the role of HIF1a in aging because it was the most differentially expressed. When we reviewed the relevant literature, we found that HIF1a was commonly upregulated in aged individuals which was thought to be linked to hypoxia and increased oxidated stress (PMID: 12470896, PMID: 12573436). Further investigation found studies that investigated HIF1a in chondrocytes and the use of in vivo work to investigate its role in osteoarthritis (PMID: 32214220). Indicating that HIF1a plays a protective role during OA by suppressing the activation of NF-kB pathway.  Moreover, there is work that has been conducted assessing the stabilization of HIF1a by regulating mitophagy and using HIF1a as a potential therapeutic target for OA (PMID: 32587244). Since there have been many studies investigating the correlation of HIF1a expression and OA, we felt that it would be more innovative to look at other molecules, such as GATA4. Moreoever, as we highlighted in the Introducion and Disucussion, through testing in cell types other than chondrocytes, GATA4 was shown to be associated with DNA damage and senescence, which are both aging hallmarks. Given the fact that roles of GATA4 in chodnrocytes had not been previous studies, we thus chose GATA4 in this study. 

      “Of note, Hypoxia-Inducible Factor 1a (HIF1a) was the most differentially expressed gene predicted to regulate chondrocyte aging. The connection between HIF1a and aging has been previously reported.32 Furthermore, additional studies have investigated HIF1a in association with OA and assessed its use as a therapeutic target.[33,34] Therefore, we decided to focus on GATA4, which was less studied in chondrocytes but highly associated with cellular senescence, an aging hallmark. However, our selection did not dampen the importance of HIF1α and other molecules listed in Figure 1D in chondrocyte aging. They can be further studied in the future using the same strategy employed in the current work.” (Lines 526-533)

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes. 

      Thank you for your comment. Based on prior experience, the OARSI score of mice in the sham group had an OARSI score ranging from 0-0.5. In the current study, we focused on the DMM control and DMM Gata4 virus groups so we did not include a sham control group. We recognized this was a limitation of this study.

      “We measured the naive limbs for knee hyperalgesia before DMM surgery, and found the average threshold was 507g. We have highlighted the threshold measurement in the figure legend.507 g was the threshold baseline for non-surgery mice (dashed line).” (Lines 499-500)

      (3) While there appear to be GATA4 small-molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study. 

      We agree with this comment that the results are still preliminary, which was the reason that we put it in the supplementary materials. However, we felt like the result is informative, which will support the potential of GATA4 as a therapeutic target and inspire the development of more specific inhibitors. Therefore, if the reviewer agrees, we want to keep the results in the current study.

      In particular, our in vitro study demonstrated the potential of using small-molecule GATA4 to enhance the quality of cartilage created by old chondrocytes. We can validate the findings in vivo, as well as develop other GATA4 inhibitors. (Lines 673-675)

      (4) Is GATA4 upregulated in chondrocytes in publicly available databases? 

      Thank you for this question. We have examined the public databases and have found that there is data showing the trend that GATA4 is upregulated in aged or OA chondrocytes in work conducted by Ungethuem et al (PMID: 20858714). In one study by Ramos et al. (PMID: 25054223), we noticed that GATA4 expression levels were the same in both young and old groups, which may be due to the relatively smaller sample size in the young group compared to old group (4 vs 26).

      Work Conducted by Grogan et al. (Unpublished https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39795)

      Author response image 1.

      Author response image 2.

      Work conducted by Ramos et al. (PMID: 25054223).<br />

      Author response image 3.

      Work conducted by Ungethuem et al (PMID: 20858714).<br />

      (5) In many cases, the figure captions describe the experiment vs. the outcome. It may be more compelling to state the main finding in the figure title, and you might consider changing it from what is stated at present. For example, Figure 2: instead of the impact of overexpression, you may say GATA4 overexpression impairs cartilage formation (as stated in the results).

      Thanks for the suggestion. We have made the following changes to the figure captions as suggested.

      Figure 1: GATA4 is upregulated in aged chondrocytes (Line 373)

      Figure 2: Overexpressing GATA4 impairs the hyaline cartilage formation capacity of young chondrocytes (Lines 408-409)

      Figure 3: GATA4 overexpression activates SMAD1/5  (Line 436)

      Figure 4: Suppressing GATA4 in old chondrocytes promotes cartilage formation and lowers expression of proinflammatory cytokines (Line 467)

      Figure 5: Gata4 overexpression in the knee joints accelerates OA progression in mice. (Line 593)

      (6) It would be useful to provide a little more information about the human tissue donors, if that is available. 

      We have provided more information about the tissue donors in the revised Supplementary Table S1.

      (7) While aging-like changes were observed in young chondrocytes with GATA4 overexpression, it would be interesting to directly evaluate if there is a change in biological versus chronological age in these tissues. Companies like Zymo can provide this biological v chronological age epigenetic clock-based assessments if that is of interest, to say the young chondrocytes are looking "older". 

      Thank you for this information. We agree that it will be important to assess epigenetic changes in GATA-overexpressing cells. We are contacting the company to learn more about their technology. Meanwhile, we added this to the future work section of the manuscript.

      “Although we found that GATA4 was generally increased with aging, some young donors also exhibited increased levels of GATA4, which may be associated with increased DNA damage, as discussed above, or other stressors. Therefore, GATA4 should be used together in conjunction with other aging biomarkers, such as the epigenetic clock [72] to precisely define chondrocyte aging. Future work should examine biological versus chronological aging and epigenetic clock-based assessments to explain the variabilities in GATA4 expression among donors.”  (Lines 658-663)

      (8) It is not clear the age at which the mice received DMM in the methods, but it is shown in Figure 5. 

      We have added the age at which the mice received the DMM surgery to the methods section.

      “Intraarticular injections were administered to mice between 10-12 weeks of age under general anesthesia to safeguard the well-being of the animals and to minimize procedural discomfort.” (Line 300)

      “One week after viral vector injection, DMM surgery was performed to induce the OA model on mice 11-13 weeks of age.” (Line 312-313)

      (9) It is not clear which factors were assayed using Luminex, and it would be great to add. 

      Thank you for this comment, we have added a comprehensive list of proteins assessed using Luminex into a new supplementary table 6 (S6).

      (10) Also interesting, loss of GATA4 seems to prevent diet-induced obesity in mice and promote insulin sensitivity (potentially via GLP-1 secretion). I wonder if there may be a metabolic axis here too? PMID: 21177287. I may have missed parts of the discussion of the role of GATA4 in metabolism, but it might be an interesting addition to the discussion. 

      In the current study, we have not investigated the role of GATA4 in obesity. As suggested, we have included a discussion of GATA4 in metabolism.

      “Furthermore, GATA4 might be associated with metabolic regulation. A study conducted by Patankar et al. investigated how GATA4 regulates obesity. Specifically, they used intestine-specific Gata4 knockout mice to study diet-induced obesity, showing that the knockout mice were resistant to the high-fat diet, and that glucagon-like peptide-1 (GLP-1) release was increased. These findings indicated a decreased risk for the development for insulin resistance in knockout mice.[44] This work was taken a step further in a subsequent publication, in which the same team investigated the dietary lipid-dependent and independent effects on the development of steatosis and fibrosis in Gata4 knockout mice. The results from this work suggested that the knockdown of Gata4 increases GLP-1 release, in turn suppressing the development of hepatic steatosis and fibrosis, ultimately blocking hepatic de novo lipogenesis.[45] These studies are especially interesting with the rise of GLP-1 based therapy for the treatment of OA.46,47 Thus, the coupling of GATA4-related metabolic dysfunction and OA should be further investigated.” (Lines 542-553)

      (11) Another potential citation: GATA4 regulates angiogenesis and persistence of inflammation in rheumatoid arthritis PMID: 29717129 - around the inflammatory axis potential in OA? since GATA4 was reported in FLS from OA- PMC11183113.

      Thank you. We have included this work/citation in the discussion section.\

      “Further studies have shown that GATA4 regulates angiogenesis and inflammation in fibroblast-like synoviocytes in rheumatoid arthritis, indicating that GATA4 is required for the inflammation induced by IL-1b. This study also demonstrated that GATA4 binds to promoter regions on Vascular Endothelial Growth Factor (VEGF)-A and VEGFC to enhance transcription and regulate angiogenesis.[15]”  (Lines 558-562)

    1. eLife Assessment

      This important study reports the conservation of sperm-egg envelope binding by demonstrating successful recognition of the micropyle in fish eggs by mouse sperm. The evidence supporting the conclusions drawn is convincing. This study will be of interest to reproductive biologists and clinicians studying the biology of fertilization and fertility.

    2. Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. and I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Weaknesses:

      The rationale of some of the experiments, in particular those using CatSper KO sperm is, in my view.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Thank you.

      Weaknesses:

      The rationale of some of the experiments is not well defined.

      Thank you. In the revised manuscript, we have clarified and expanded the rationale behind each experiment to better highlight the specific questions being addressed and how each approach contributes to our overall investigation. These clarifications have been integrated throughout the Results and Discussion sections. We provide detailed rationale in our point-by-point responses to both reviewers, outlining how each experimental design was motivated by prior findings, hypotheses, or specific gaps in knowledge. We hope these revisions make the experimental logic and progression better defined and more compelling.

      Major Comments:

      (1) Figure 5

      I do not understand the rationale for performing experiments using CatSper-null sperm and CD9-null oocytes. It is well established that CatSper-null sperm are unable to penetrate the zona pellucida (ZP), so the relevance of this approach is unclear.

      We thank the reviewer for this comment. This experiment was conducted as the basis to then evaluate the contributions of progressive and hyperactivated motility to the ability of mouse sperm to locate and traverse the zebrafish micropyle. In earlier experiments (Figures 1 and 3), we assessed whether sperm-micropyle interaction was robust by comparing it to binding to the mouse zona pellucida and testing whether both interactions persisted after washing, which is standard approach to distinguish specific binding from non-specific adherence (Avella et al., 2014; Baibakov et al., 2012). Thus, we extended this analysis to CatSper1<sup>Null</sup> sperm; CatSper1<sup>Null</sup> sperm were still capable of binding the zona pellucida comparably to heterozygous controls, though they were unable to cross the zona of Cd9<sup>Null</sup> eggs. These observations served as a validation step for the use of CatSper1<sup>Null</sup> sperm for downstream micropyle interaction assays. Thus, we proceeded to test whether hyperactivated motility, absent in CatSper1<sup>Null</sup> sperm, is required for locating and crossing the micropyle.

      It is indeed well established that CatSper1<sup>Null</sup> sperm are unable to penetrate the zona pellucida, and previous studies have typically used the absence of fertilized eggs as a readout. However, failed fertilization may result from multiple factors, including impaired sperm motility, reduced capacity to bind the zona pellucida, or an inability to penetrate it. To our knowledge, no study has quantitatively assessed the number of CatSper-deficient sperm that successfully bind, cross the zona and reach the perivitelline space. To address this, we first used normal oocytes for sperm binding and Cd9<sup>Null</sup> oocytes (Le Naour et al., 2000), which allow direct quantification of sperm accumulation in the perivitelline space. We have 7included a detailed explanation in the Results to clarify this point, lines 352-365 and 376-369.

      (2) Micropyle penetration and sperm motility

      CatSper-null sperm are reportedly unable to cross the micropyle, but this could be due to their reduced motility rather than a lack of hyperactivation per se. Were these experiments conducted using capacitated or non-capacitated spermatozoa? What was the observed motility of CatSper-null sperm during these assays? Clarifying these conditions is essential to avoid drawing incorrect conclusions from the results.

      Thank you for raising these points. Under our IVF conditions, qualitative observations confirmed that CatSper1<sup>Null</sup> sperm displayed progressive motility, maintained sufficient progressive motility during the first hour post-insemination and exhibited zona binding efficiency comparable to that of CatSper1<sup>Het</sup> controls (Figure 5A and B). This is consistent with previous reports showing that within the first 90 minutes of sperm incubation in media, approximately 20% of CatSper1<sup>Null</sup> sperm preserve motility (Qi et al., 2007). Given previous studies indicating that 15–35% of sperm undergo hyperactivation within 90 minutes (Goodson et al., 2011), and considering that 100,000 progressively motile sperm were used for insemination, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the cross-species insemination dish (mouse sperm x zebrafish eggs). Based on these numbers, we would have expected at least some sperm to locate the micropyle if hyperactivation were not required for its detection and entry. Nevertheless, CatSper1<sup>Null</sup> sperm were detected in proximity to the micropyle canal, its opening, or within the inter-chorion space (ICS). These observations support the conclusion that the inability ofCatSper1<sup>Null</sup> sperm to locate and enter the micropyle is attributable to their failure to hyperactivate. Also, all sperm used in these assays were exposed to identical capacitating conditions (HTF/HSA, 37 °C, 5% CO2). We now clarify this in the Methods, line 624, and we added more rationale under the Results, lines 361-365 and in the Discussion, lines 470-483.

      (3) Rheotaxis and micropyle navigation

      Previous studies have shown that CatSper-null sperm fail to undergo rheotaxis. Could this defect be related to their inability to locate and penetrate the micropyle? Exploring a potential shared mechanism could be informative.

      Thank you for raising this interesting point. Indeed, homozygous mutant mice lacking expression of a different component of the CatSper channel, CatSperz, show reduced rheotactic efficiency and severe subfertility (Chung et al., 2017). We cannot exclude that complete lack of CatSper as shown in CatSper1<sup>Null</sup> mice could lead to reduced rheotactic efficiency, hence we include this interpretation in the Discussion (lines 484-486).

      (4) Lines 61-74

      This paragraph omits important information regarding acrosomal exocytosis, which occurs prior to sperm-egg fusion. Including this detail would strengthen the discussion.

      Thank you. We have revised the text in the discussion to describe the process of acrosome exocytosis, and its relevance for fertilization (lines 504-518).

      Reviewer #2 (Public review):

      Summary:

      Garibova et al. investigated the conservation of sperm recognition and interaction with the egg envelope in two groups of distantly related animals: mammals (mouse) and fish (zebrafish). Previous work and key physiological differences between these two animal groups strongly suggest that mouse sperm would be incapable of interaction with the zebrafish egg envelope (chorion) and its constituent proteins, though homologous to the mammalian zona pellucida (ZP). Indeed, the authors showed that mouse sperm do not bind recombinant zebrafish ZP proteins nor the intact chorion. Surprisingly, however, mouse sperm are able to locate and bind to the zebrafish micropyle, a specialized canal within the chorion that serves as the egg's entry point for sperm. This study suggests that sperm attraction to the egg might be highly conserved from fish to mammals and depends on the presence of a still unknown glycosylated protein within the micropyle. The authors further demonstrate that mouse sperm are able to enter the micropyle and accumulate within the intrachorionic space, potentially through a CatSper-dependent mechanism.

      Strengths:

      The authors convincingly demonstrate that mouse sperm do not bind zebrafish ZP proteins or the chorion. Furthermore, they make the interesting observation that mouse sperm are able to locate and enter the zebrafish micropyle in an MP-dependent manner, which is quite unexpected given the large evolutionary distance between these species, the many physiological differences between mouse and zebrafish gametes, and the largely different modes of both fertilization and reproduction in these species. This may indicate that the sperm chemoattractant in the egg is conserved between mammals and fish; however, whether zebrafish sperm are attracted to mouse eggs was not tested.

      Thank you. We performed an additional experiment with fish sperm used to inseminate ovulated mouse eggs, and results are reported in lines 183-187 and in Supplementary Figure 2.

      Weaknesses:

      The key weakness of this study lies in the rationale behind the overall investigation. In mammals, the zona pellucida (ZP) has been implicated in binding sperm in a taxon-specific manner, such that human sperm are incapable of binding the mouse ZP. Indeed, work by the corresponding author showed that this specificity is mediated by the N-terminal region of the ZP protein ZP2 (Avella et al., 2014). The N-termini of human and mouse ZP2 share 48% identity, which is higher than the overall identity between mouse and zebrafish ZP2, with the latter ortholog entirely lacking the N-terminal domain that is essential for sperm binding to the ZP. Given this known specificity for mouse vs. human sperm-ZP binding, it does not follow that mouse sperm would bind ZP proteins from not only a species that is much more distantly related, but also one that is not even a mammal, the zebrafish. Furthermore, the fish chorion does not play a role in sperm binding at all, while the mammalian ZP can bind sperm at any location. On the contrary, the zebrafish chorion prevents polyspermy by limiting sperm entry to the single micropyle.

      We thank the reviewer for this detailed comment. In this study, our goal was precisely that one of validating the hypothesis that mouse sperm would not bind either recombinant fish ZP proteins or the chorion; in addition, we found it important to examine the observation that mouse sperm could detect the micropyle. We further elaborated this rationale in the Introduction (lines 93-100).

      In addition, though able to provide some information regarding the broad conservation of sperm-egg interaction mechanisms, the biological relevance of these findings is difficult to describe. Fish and mammals are not only two very distinct and distantly related animal groups but also employ opposite modes of fertilization and reproduction (external vs. internal, oviparous vs viviparous). Fish gametes interact in a very different environment compared to mammals and lack many typically mammalian features of fertilization (e.g., sperm capacitation, presence of an acrosome, interaction with the female reproductive tract), making it difficult to make any physiologically relevant claims from this study. While this study may indicate conserved mechanisms of sperm attraction to the egg, the identity of the molecular players involved is not investigated. With this knowledge, the reader is forced to question the motivation behind much of the study.

      We thank the reviewer for their perspective, and we appreciate the opportunity to further elaborate on our rationale. As outlined in our Results and Discussion sections, a growing body of evidence supports the presence of conserved molecular players and signaling pathways involved in gamete interaction across species with diverse reproductive strategies. While zebrafish and mice do differ in their fertilization environments and modes of reproduction, these differences may not necessarily exclude the possibility of conserved molecular mechanisms underlying gamete interaction. For example, the CatSper calcium channel, which plays a key role in regulating sperm motility and hyperactivation, is conserved across a broad range of taxa—from echinoderms such as sea urchins (external fertilizers)(Seifert et al., 2015) to mammals, including mice and humans (internal fertilizers)(Lishko and Mannowetz, 2018). Moreover, sperm from some fish species possess acrosomes that undergo exocytosis prior to fertilization while sperm cross the micropyle (Psenicka et al., 2010). Also, in ovoviviparous species with internal fertilization, such as the black rockfish, sperm do undergo molecular changes while in the female reproductive tract—including immunomodulatory adaptations, glycocalyx remodeling, and interactions with ovarian cells—enabling the sperm with a longer-term survival and a selective persistence that ensures only the fittest sperm can successfully fertilize eggs (Li et al., 2024). As per the mammalian capacitation, it is broadly defined as the process during which sperm undergo hyperactivation (Yanagimachi, 1970), and acquire the ability to undergo the acrosome exocytosis, making the sperm competent for gamete fusion and fertilization (Bhakta et al., 2019; Puga Molina et al., 2018; Yanagimachi, 1957; Yanagimachi et al., 2017). Of note, acrosome exocytosis or changes in sperm motility are not exclusive to internal fertilizers. For example, as we cite in our manuscript (and as just stated above), acrosome exocytosis has been described to occur as sturgeon sperm cross the micropyle (Psenicka et al., 2010). As per changes in flagellar motility, investigations in the Pacific herring (Clupea sp.) demonstrated that sperm remain nearly immotile upon release into seawater and only initiate motility when approaching the micropyle region of the egg (Yanagimachi, 1957; Yanagimachi et al., 2017). In other fish, including bitterling and zebrafish, further enhancement in sperm motility is observed as sperm approach the micropyle area (Suzuki, 1958; Yanagimachi et al., 2017). These studies suggest that functional equivalents of capacitation may exist across taxa.

      We interpret the observation that mouse sperm can locate and enter the micropyle as suggesting that underlying guidance mechanisms may be more broadly conserved across distant species than previously recognized. We have now elaborated on these points in the revised Discussion (lines 531-552), and we hope the motivation behind our study is now more clearly articulated.

      During fertilization in fish, the sperm enters the micropyle and subsequently, the egg, as it is simultaneously activated by exposure to water. During egg activation, the chorion lifts as it separates from the egg and fills with water. This mechanism prevents supernumerary sperm from entering the egg after the successfully fertilizing sperm has bound and fused. In this study, the authors show that mouse sperm enter the micropyle and accumulate in the intrachorionic space. Whether any sperm successfully entered the egg is not addressed, and the status of egg activation is not reported.

      We appreciate the reviewer’s detailed comments and the opportunity to elaborate on this important aspect for our cross-insemination assay. We interpret the reviewer’s reference to “sperm entering the egg” as pertaining to sperm adhesion to the oocyte plasma membrane followed by fusion with the egg cell, two separate steps regulated by different molecular players for sperm-egg plasma membrane adhesion (Bianchi et al., 2014; Fujihara et al., 2021; Herberg et al., 2018; Inoue et al., 2005) and for fusion. It is important to note that proteins mediating gamete fusion are still unidentified in fish and mammals (Bianchi and Wright, 2020; Deneke and Pauli, 2021).

      In our cross-species insemination experiments, zebrafish oocytes were maintained in Hank’s solution to limit spontaneous activation; however, as the reviewer correctly notes, activation likely occurred upon exposure to HTF. While this model does not recapitulate full fertilization events, it serves as a platform to explore whether mammalian sperm can detect (within the scope of our study) and respond (future studies) to putative evolutionarily conserved signals, such as those guiding fish sperm toward the micropyle.

      While investigating cross-species sperm–oocyte fusion was not within the scope of this study and would require a distinct set of experimental approaches, we believe this question is an important one. However, we do not expect our platform to be informative for evaluating sperm adhesion to the fish oolemma or for enabling cross-species gamete fusion. In our assays focused on sperm-micropyle interaction, Hoechst staining of nuclei of transgenically-tagged acrosome sperm revealed no evidence of sperm adhesion to or fusion with the fish egg membrane (Figure 4D). Also, molecular incompatibilities may further prevent this interaction: in zebrafish, the Ly6/uPAR family protein Bouncer is expressed exclusively in the egg and is necessary for sperm–egg membrane adhesion (Herberg et al., 2018). Recent studies in zebrafish and mice have shown that a conserved trimeric complex composed of Izumo1, Spaca6, and Tmem81 on the sperm surface is required for mediating adhesion to the oocyte membrane by interacting with the mammalian oocyte receptor Izumo1R (also known as JUNO) or the zebrafish oocyte receptor Bouncer (Deneke et al., 2024). One would hypothesize that for mouse sperm to adhere to the zebrafish egg membrane, the mouse Izumo1-Spaca6-Tmem81 complex would need to establish binding with Bouncer. To explore this possibility, we performed AlphaFold2-Multimer structural predictions and docking analyses to mimic an interaction between mouse Izumo1-Spaca6-Tmem81 and zebrafish Bouncer, using mouse Izumo1-Spaca6-Tmem81 and Juno or zebrafish Izumo1-Spaca6-Tmem81 and Bouncer as positive controls. We observed low binding affinity between zebrafish Bouncer and the mouse trimeric complex (Izumo1, Spaca6, and Tmem81), as indicated by low ipTM scores and high predicted aligned error (PAE) values. These findings suggest that the mouse complex is unlikely to form an interaction with Bouncer (now shown in Suppl. Figure 7). These predictions were consistent with our observations that no sperm were found adhering or fusing to the egg cell. We describe methods and results in the supplementary files (Supporting Info, lines 53-66) and in the result sections (lines 335-339).

      In Supplementary Videos 3-4, the egg shown has been activated for some time, as evident by the separation of yolk and cytoplasm, yet the chorion is only partially expanded (likely due to mouse IVF conditions). How multiple sperm were able to enter the micropyle but presumably not the egg is not addressed, yet this suggests that the zebrafish mechanism of blocking polyspermy (fertilization by multiple sperm) is not effective for mouse sperm or is rendered ineffective due to mouse IVF conditions. The authors do not discuss these observations in the context of either species' physiological process of fertilization, highlighting the lack of biological context in interpreting the results.

      Thank you for raising this important point. One model for mammalian gamete recognition at the zona supports the notion that mouse sperm can penetrate extracellular matrices as long as sperm can bind to them, and binding is dependent on the cleavage status of ZP2. Zonae surrounding unfertilized mouse eggs present uncleaved ZP2 and these zonae support sperm binding. After gamete fusion, the cortical granules release ovastacin which cleaves ZP2 at the N-terminus, and consequently, zonae presenting cleaved ZP2 no longer support sperm binding. This mechanism acts as block to zona binding and prevents further crossing (Bhakta et al., 2019). Indeed, fertilized mouse eggs or 2-cell embryos surrounded by a zona containing uncleaved ZP2 support de novo sperm binding, and supernumerary sperm cross the zona and accumulate in the perivitelline space, unable to fuse with the fertilized oocyte plasma membrane or blastomere cells (Baibakov et al., 2012, 2007; Burkart et al., 2012; Gahlay et al., 2010). Thus, because under our experimental conditions, mouse sperm could interact with the micropyle opening, we interpret these findings to suggest that once interaction occurs at the micropyle opening, mouse sperm are capable of crossing it, even under conditions where the micropyle may be detached from the oocyte due to oocyte activation. Therefore, our data indicates that mouse sperm may be able to bypass the mechanism of zebrafish oocytes blocking multiple sperm to pass through the micropyle, even after oocyte activation. This point has now been incorporated into the revised Discussion (lines 425-441).

      The authors further show that the zebrafish micropyle does not trigger the acrosome reaction in mouse sperm. Whether the acrosome reacts is not correlated with a sperm's ability to cross the micropyle opening, as both acrosome-intact and acrosome-reacted sperm were observed within the intrachorionic space. While the acrosome reaction is a key event during mammalian fertilization and is required for sperm to fertilize the egg, zebrafish sperm do not contain an acrosome. Thus, these results are particularly difficult to interpret biologically, bringing into question whether this observation has biological relevance or is a byproduct of egg activation/chorion lifting that indirectly draws sperm into the chorion.

      We thank the reviewer for raising this point and we appreciate the opportunity to elaborate on the biological relevance of this experiment. Our motivation to assess acrosome status in mouse sperm following entry into the zebrafish micropyle stemmed from the following biological considerations.  In fish species such as the sturgeon, sperm present an acrosome and undergo acrosome exocytosis while passing through the micropyle, before gamete fusion (Alavi et al., 2012; Psenicka et al., 2010). By contrast, zebrafish sperm lack an acrosome, raising the hypothesis that the zebrafish micropyle may not be able to trigger acrosome exocytosis. However, this possibility has not been experimentally tested. We therefore considered it important to investigate whether passage through the zebrafish micropyle induces acrosome exocytosis in mouse sperm. We have revised the Discussion to better clarify the rationale behind the experiment as well as the interpretation of the findings (lines 504-518). As per the chorion lifting indirectly drawing sperm into the chorion, we have not observed this phenomenon.

      The final experiments regarding CatSper1's role in mediating mouse sperm entry into the micropyle/chorion are not convincing. As no molecular interactions are described or perturbed, the reader cannot be sure whether the sperm's failure to enter is due to signaling via CatSper1 or whether the overall failure to undergo hyperactivation limits sperm motility such that the mutant sperm can no longer find and enter the zebrafish micropyle. Indeed, in Figure 5E, no CatSper1 mutant sperm are visible near any part of the egg, suggesting that overall motility is impaired, and this is not a phenotype specific to interactions with the micropyle.

      We appreciate the comment and the opportunity to further elaborate on the rationale of this experiment. While our data demonstrates a lack ofCatSper1<sup>Null</sup> sperm accumulation within the micropyle and ICS, we appreciate that this may be interpreted as the result of general motility defects, rather than a specific failure in undergoing hyperactivation and micropyle recognition. CatSper1<sup>Null</sup>  sperm are known to lack hyperactivated motility and exhibit a progressive loss of forward motility over time. After 90 minutes, only ~20% of CatSper1<sup>Null</sup>l sperm remain motile, compared to over 70% in fertile sperm (Qi et al., 2007). Of note, under our IVF conditions, CatSper1<sup>Null</sup> sperm retained sufficient progressive motility during the first hour post-insemination to bind the zona pellucida with comparable efficiency to CatSper1<sup>Het</sup> controls. Based on prior reports indicating that 15–35% of sperm exhibit hyperactivation by 90 minutes (Goodson et al., 2011), and considering that we inseminated with 100,000 progressively motile sperm, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the dish. Yet, none were observed near the micropyle canal, its opening, or within the ICS. This led us to conclude that failure to hyperactivate underlies the inability of CatSper1<sup>Null</sup> sperm to reach and traverse the micropyle. Also, we appreciate that identifying the molecular components of the micropyle would allow direct testing of whether the CatSper channel is activated in response to micropyle-associated signals. Indeed, no targeted perturbation of molecular interaction regulating micropyle recognition was performed in this study, as the molecular identity of the zebrafish micropyle guidance cue remains unknown. Efforts to identify and characterize this factor are ongoing in our lab and lie outside the scope of the current work. Therefore, throughout the manuscript, we have clarified that it is the failure to undergo hyperactivation, rather than the absence of CatSper per se, that limits the ability of sperm to locate and traverse the micropyle. The rationale for the experiment, the interpretation of our findings, and relevant future directions have been further elaborated in the revised Abstract, Impact Statement and Discussion (lines 40-41; 46-47; 343-365; 376-379; 389-399; 470-486).

      Reviewer #1 (Recommendations for the authors):

      Minor Comments

      (1) Figure numbering

      There appear to be inconsistencies in the figure references. For example, what is referred to as Figure 3F in the text is actually Figure 4F. Please review and correct all figure labels for accuracy.

      We thank the reviewer for pointing this out. We have carefully reviewed the manuscript and corrected all figure references throughout the text. Also, for better flow and coherence, we have moved the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes." Previously, the callout of panels in Figure 3 was out of order (3A, 3B, 3E, 3C, 3D), and this reorganization also helps maintain logical progression through the figure panels.

      (2) Figure 5 terminology:

      The term "normal" sperm should be replaced with "CatSper heterozygous (Het)" sperm to avoid confusion and improve precision.

      We thank the reviewer for this helpful suggestion. We have revised the terminology in Figure 5 and throughout the manuscript, replacing “normal” sperm with “CatSper1 heterozygous (Het)”

      Reviewer #2 (Recommendations for the authors):

      In addition to my comments in the public review, I would encourage the authors to consider the following suggestions:

      The authors show that mouse sperm can find and enter the fish micropyle, and that this depends on the presence of MP. To better assess sperm binding to the micropyle region, the number of sperm binding to the micropyle vs. non-micropyle chorion should be clearly quantified, as well as the percentage of sperm that enter the micropyle compared to the total used for insemination. The authors state several times throughout the text that a "subpopulation" of mouse sperm finds and enters the micropyle, but it would be more precise and informative to give a percentage.

      We thank the reviewer for this suggestion. We have now reported also the number of sperm bound to the other regions of the chorion (away; lines 231-233), as well as the percentage of sperm that entered the micropyle relative to the total number used for insemination (lines 276-279).

      To ensure that all sperm are inside the chorion, the egg should be removed from the insemination dish, washed thoroughly, and then the chorion should be torn open to definitively show that the sperm were indeed inside.

      We thank the reviewer for these excellent suggestions. As per ensuring that the sperm are inside the ICS, (as shown now in Figures 4A, F, G , Supplementary Figure 6 and Supplementary Movies 3–5), the inseminated oocytes were thoroughly washed prior to imaging to ensure that only sperm located inside the chorion were visualized (as described in the Methods, lines 646-648). In addition, to confirm the spatial localization of sperm within the ICS, we are now including additional TEM images showing sperm in the ICS (Figure 4G, right panel). Also, we generated orthogonal views using ZEN Lite software (Zeiss, Germany) from a z-stack encompassing the full volume of the chorion, ICS, and oocyte (added in the supplementary materials, as Supplementary Figure 6). These views display three focal planes: the surface of the WGA-stained chorion, the middle of the ICS, and the oocyte plasma membrane. Sperm nuclei stained with Hoechst are clearly visible below the chorion surface and above the oocyte plasma membrane, confirming their localization within the ICS. Additionally, in a separate set of experiments, as recommended by this reviewer, we mechanically disrupted the chorion and consistently detected sperm within the ICS. This procedure, however, was technically challenging: upon disruption, the chorion often collapsed onto the oocyte, and during the extraction process, sperm were sometimes displaced. As a result, it was not always possible to determine with complete confidence whether the sperm had originally been located inside or outside the chorion. However, we hope that the additional TEM and confocal images (Figure 4G and Supplementary Figure 6) offer further support for the localization of sperm within the ICS.

      I would further suggest that they examine the micropyle opening after the entry of multiple sperm, as well as the dynamics of egg activation during insemination with mouse sperm.

      Thank you. We now include one additional TEM image capturing the full structure of a micropyle that was traversed by multiple mouse sperm (shown in Figure 4G, left panel).

      At what point does the micropyle detach from the egg surface? Live imaging of this process with a confocal microscope would be very informative.

      During live imaging, the interval between placing the oocyte in the imaging dish, replacement of Hank’s solution with HTF and the addition of sperm, followed by the initiation of video acquisition, is approximately 2 to 3 min. By this time, the ICS is already apparent (Supplementary Video 2), although the micropyle appears to remain adherent to the egg cell. Partial detachment of the micropyle from the egg cell begins around 6–7 minutes after imaging starts and continues progressively over time. We provide time-lapse imaging frames to show the micropyle detachment under mouse IVF conditions (Supplementary Figure 5).

      Along the same lines, sperm should be doubly labeled with an acrosome-independent marker, i.e., a live DNA stain or MitoTracker. Then the authors could track if any sperm are actually able to enter the egg itself, which would be highly unlikely but an important detail to confirm.

      Thank you for pointing this out. In our assays designed to study sperm–micropyle interactions, Hoechst staining of nuclei in transgenically labeled acrosome sperm showed no indication of sperm adhesion to, or fusion with, the zebrafish egg cell (Figure 4D).

      Line 242, 282: The text should refer to Figure 4, not 3. Please make sure all figure references correspond to the correct figure and panel.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and corrected the reference to Figure 4, along with all other figure and panel citations to ensure they accurately correspond to the correct content. Also, to improve the overall flow, we relocated the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes". This change also helped correct the sequence of figure panel references, which were previously cited out of order (i.e., 3A, 3B, 3E, 3C, 3D).

      Line 244: The authors quantify sperm that are "away" from the micropyle, but this is not clearly defined. This should be given as a set radius or distance from the center (e.g., in microns). If the sperm are still motile, can this be accurately measured?

      We thank the reviewer for this valuable suggestion. We have now defined “away from the micropyle” as a distance greater than 160 µm from the center of the micropyle. This measurement was determined using confocal z-stack projections of fixed samples. These details have been added to the revised Methods section (lines 670-674).

      To strengthen the conclusion that the sperm chemoattractant is indeed conserved from fish to mammals, the authors could show that zebrafish sperm are also able to find/approach mouse eggs. Even more compelling would be to show the same is true for other species combinations. As it stands, the choice of comparing mouse and zebrafish does not seem scientifically motivated but rather due to their availability.

      We thank the reviewer for this important suggestion. To test whether zebrafish sperm are capable of binding to the mammalian zona pellucida, we conducted the suggested experiment: ovulated, cumulus-free mouse oocytes were placed in water and incubated with zebrafish sperm. We did not observe any zebrafish sperm bound to the mouse zona pellucida, consistent with the hypothesis that zebrafish sperm do not recognize or interact with mammalian zonae or ZP proteins. This has now been added in the Results (lines 183-187) and shown in Supplementary Figure 2. We interpret these findings as in cross-species insemination assays, reciprocity in sperm-egg interaction is not always observed. For example, while human sperm bind only to human zonae and not to mouse zonae, mouse sperm are able to bind both mouse and human zonae (Avella et al., 2014; Baibakov et al., 2012; Bedford, 1977). This asymmetry may reflect species-specific adaptations in sperm-egg recognition. We have now added this point to the revised Discussion to clarify the rationale and context of our approach (lines 416-423).

      As per the choice of experimental models, while we agree that testing additional species combinations would broaden the scope of the findings, the choice to compare mouse and zebrafish was not solely based on availability. Rather, it was motivated by the opportunity to examine sperm guidance across two evolutionary distant vertebrates. This contrast allows us to seek for potential conservation of structural or molecular cues involved in gamete interaction. Additionally, both zebrafish and mouse offer extensive gene editing, blotting and imaging reagents, which are particularly valuable should future studies aim to identify and functionally disrupt genes encoding micropyle-associated proteins and their putative orthologs in mammals.

      For the CatSper experiment, I would suggest that the authors repeat this experiment with another mouse sperm mutant that is known to have reduced/altered motility. With the current data, I do not believe the failure to find/enter the micropyle is necessarily CatSper-specific. Because we do not know what the sperm interacts with in the micropyle or what the MP interacts with on the sperm, the signaling pathway cannot be tested, making other controls necessary for these results to be meaningful.

      Thank you for highlighting this important point. A wide range of mouse models with sperm motility defects exhibit subfertility or infertility due to structural abnormalities in the axoneme or midpiece rigidity. (Miyata et al., 2024). These defects often result in impaired progressive motility, failure to reach the zona pellucida, or inability to bind or penetrate it. In contrast, we could test and validate that CatSper1<sup>Null</sup> sperm display preserved early progressive motility but fail to transition into hyperactivated motility, making them particularly well suited for specifically assessing the role of hyperactivation in sperm navigation toward and entry into the micropyle. Taken together, these points, along with those discussed in our response to the public review, led us to conclude that the CatSper1<sup>Null</sup> model provides the most biologically relevant context currently available to assess the role of hyperactivation in guiding sperm to the micropyle.

      The authors could greatly strengthen the discussion by addressing the key points I raised in the public review, particularly in terms of interpreting these results in the context of each species' physiological mode of fertilization.

      We thank the reviewer for this important recommendation. We have carefully revised the Discussion to address the key points raised in the public review, particularly by framing our findings within the context of the distinct physiological modes of fertilization in each species, as indicated n our answers to the public review. We hope these additions have strengthened the manuscript as suggested.

    1. eLife Assessment

      The article presents important findings on the impact of climate change on odonates, integrating phenological and range shifts to broaden our understanding of biodiversity change. The study leverages extensive natural history data, offering a convincing analysis of temporal trends in phenology and range limit and their potential drivers.

    2. Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other (or neither), and that geographic context and temperature variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The results also seem to support climate vulnerability assessments for species that rely on geographic range size and geospatial climate data layers rather than more detailed information (like demographic rates, abundances, or traits) that may not be so readily available. The methodology would be useful for other taxa and study regions with strong participatory ("citizen") science and extensive occurrence data.

      Strengths:

      This is an organized and well written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

    3. Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      Comments on revised version:

      The revision has substantially improved the paper.

    4. Reviewer #3 (Public review):

      Summary:

      In their article "Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon" the authors rigorously investigate the spatial and temporal trends in the occurrence of odonate species and their potential drivers. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to cope with changing conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited shifts to earlier emergence. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insects declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings and informs about the drivers of biodiversity loss.

      Weaknesses:

      To better understand whether species shifting both their ranges and phenology are more successful, or as stated here are 'clear winners', and hence whether those that do neither are more vulnerable would require integrating population trends alongside the discussed response. The ~10% species that have not shifted their distribution or phenology might have not declined in abundance, if they have rapidly adapted to local changes in climatic conditions (i.e. they might show a plastic response). These species might be the real 'winners', while species that have recently shifted their ranges or phenology may eventually reach hard limits. The authors are discussing this limitation but might want to adapt their wording, given the potential for misinterpretation. The finding that species with more northern ranges showed lesser northward shifts would speak to the fact that some species have already reached such a geographical range limit.

      Achievements and impact:

      The results support broad differences in the response of odonate species to climate change, and the prediction that range geography and temperature seasonality are more important predictors of these changes than functional traits. Simultaneously addressing range and phenological shifts highlights that most species exhibit coupled responses but also identifies a significant portion of species that do not respond in these ways that are of critical conservation concern. These results are important for improving forecasts of species' responses to climate change and identifying species of particularly conservation concern. Although not exhaustive regarding abundance trends, the study presents an important step towards a general framework for investigating the drivers of multifaceted species responses.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Sumary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and reviewed our primary sources to clarify the trait classifications. We reclassified the species according to the expertise of this reviewer and perform our analysis again; please see details below.

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, although they have still been tested (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we added a reference to the methods section to clarify this (see details below).

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We replaced the terms specialist and generalist with specific predictions based on traits (see details below).

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We have reviewed the text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers (see details below).

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We revised the discussion to acknowledge potential differences in outcomes (please see details below).

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. We have updated the figure and figure caption (please see details below).

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We revised the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them (see details below). We carefully reviewed all figures and captions, and made changes to improve the clarity of the text and the presentation of results (see details below).

      Reviewer #1 (Recommendations for the authors):

      Comment:

      (1) Following weakness #1 in the public review, the authors should review the habitat classifications, consult with an odonatologist, and reclassify many species from Both to Lentic and redo the analysis.

      Thank you for pointing out this disagreement among expert habitat classifications that we cited and other literature. We reclassified species’ habitat preferences based on classifications by Hof et al., a source that was consistent with your suggestions, and identified additional species as Lentic that our other references had identified as Both. We performed our analysis with this new dataset and, as you suspected, our results did not change qualitatively: species habitat preferences did not predict their range shifts.

      Hof, Christian, Martin Brändle, and Roland Brandl. "Lentic odonates have larger and more northern ranges than lotic species." Journal of Biogeography 33.1 (2006): 63-70.

      Comment:

      (2) Following weakness #2, would it be worthwhile or interesting to analyze a smaller ranging group (e.g. cut the quad size in half, 50 x 50 km) to bring in more species and potentially change the inference? Or is the paper too tightly constructed to allow this, even as a secondary piece?

      Thank you for this comment, as it highlights an important consideration for macroecological analyses, and the importance of balancing multiple factors for determining quadrat size. Issues exist with identifying drivers of range boundaries among species with narrow ranges when they are analyzed separately from wide-ranging species, and examining larger quadrats can actually help clarify drivers (Szabo, Algar, and Kerr 2009). The smaller quadrats are, the higher the likelihood that the species is actually there but was never observed, or that the quadrat only covers unsuitable habitat and the species is absent from the entire (or almost entire) quadrat. Too many absences creates issues with violating model assumptions, and creates noise that makes it difficult to identify drivers of species’ range and phenology shifts.

      Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground”, and we have included a brief explanation of this in the text: “We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.”  (Lines 170-172).

      Szabo, Nora D., Adam C. Algar, and Jeremy T. Kerr. "Reconciling topographic and climatic effects on widespread and range‐restricted species richness." Global Ecology and Biogeography 18.6 (2009): 735-744.

      Comment:

      (3) Following weakness #3, are specialists the ones that "failed to shift" (L18)? If so please specify. The prediction about generalists vs specialists needs to be removed or incorporated in other parts of the paper.

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (4) Following weakness #4, cite Pinkert et al at lines 70-73 and Rocha-Ortega et al at lines 73-77 along with https://doi.org/10.1098/rspb.2019.2645. Add Sandall et al https:// doi.org/10.1111/jbi.14457 to L69 references.

      Thank you for the excellent reference suggestions, we have added them as suggested (Lines 80, 86, 77).

      Comment:

      Other comments/suggestions:

      (1) Title: consider adding temp variability 'Range geography and temperature variability, not functional traits,...'.

      Thank you for this suggestion, we have added temperature variability to the title: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”.

      Comment:

      (2) L125: is (northern) Mexico included in North America?

      Yes, we did include observations from Northern Mexico, and have specified this in the text: “We retained ~1,100,000 records from Canada, the United States, and Northern Mexico, comprising 76 species (Figure 2).” (Lines 174-176).

      Comment:

      (3) L128: I'd label this section 'Temperature variability' rather than 'Climate data'.

      Thank you, we agree that this is a more appropriate title for this section, and have replaced ‘Climate data’ with ‘Temperature variability’ (Line 185).

      Comment:

      (4) Table 2: why are there no estimates for the traits?

      We apologise, this information should have been included in the main body of the manuscript, but was only explained in the Table 2 caption. We have added the following explanation: “Non-significant variables, specifically all functional traits, were excluded from the final models.”. (Line 312-323).

      Comment:

      (5) Figure 2: need to identify the A-D panels.

      We apologise for this error and have clarified the differences between panels in the figure caption:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (6) L163-173: I am not familiar with this analysis but it sounds interesting and promising, I am not sure if this can be clarified further. Why the -25 to 25, and -30 to 30, doesn't the -35 to 35 cover these? And what is meant by "include only phenology shifts that could be biologically meaningful", that larger shifts would not be meaningful or tied to climate change?

      We used different cutoffs for phenology shifts to inspect for outliers that were likely to be errors, potentially do to insufficient sampling to calculate phenology. We clarified in the text as follows:

      “We retained emergence estimates between March 1st and September 1st, as well as species and quadrats that showed a difference in emergence phenology of -25 to 25 days, -30 to 30 days, or -35 to 35 days between both time periods, to include only phenology shifts that could be biologically meaningful to environmental climate change (i.e. exclude errors).” (Lines 169-173).

      Comment:

      (7) L193-200: I agree but would make a distinction between ecological vs functional traits, as other studies view geographic traits as ecological manifestations of functional biology, e.g. https://doi.org/10.1016/j.biocon.2019.07.001 and https://doi.org/10.1016/ j.biocon.2023.110098.

      Thank you for this suggestion, and for making us aware of the thinking around range geographies as ecological traits. We have specified throughout the manuscript that the ‘traits’ we are considering are ‘functional traits’, changed the methods subsection title to “Range geographies and functional traits” (Line 252), and added a brief discussion of ecological traits: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) L203: What's the rationale for egg-laying habitat as "biologically relevant to spatial and temporal responses to climate change"? That one's not as obvious as the others and needs a sentence more. Also, I am wondering why other traits were not considered here, like color lightness and voltinism. And why not wing size instead of body size, or better yet the two combined (wing loading) as a proxy for dispersal ability?

      We agree that our rationale for using this trait should be better explained, and we have included the following explanation: “Egg laying habitat was assigned according to whether species use exophytic egg-laying habitat (i.e. eggs laid in water or on land, relatively larger in number), or endophytic egg-laying habitat (i.e. eggs laid inside plants, usually fewer in number); species using exophytic habitats are associated with greater northward range limit shifts (Angert et al., 2011).” (Lines 271-275).

      We considered traits that have been found to be important for range and phenology shifts among odonates, as well as being key traits for expectations for species responses to climate change. Flight duration and body size are correlated with dispersal ability (Powney et al. 2015). Body size is also correlated with competitive ability (Powney et al. 2015), potentially making it an important predictor of a species’ ability to establish and maintain populations in expanding range areas. Traits correlated with range shifts also include breeding habitat type (Powney et al. 2015; Bowler et al. 2021) and egg laying habitat (Angert et al. 2011). Ideally, we would have used dispersal data from mark/release/recapture studies, but it was not available for many of the species included in this study. After finding that none of the functional traits we included were related to range shifts, there was no reason to believe that a further investigation of traits would be meaningful.

      Angert AL, Crozier LG, Rissler LJ, Gilman SE, Tewksbury JJ, Chunco AJ. 2011. Do species’ traits predict recent shifts at expanding range edges? Ecology Letters 14:677–689. doi:10.1111/j.1461-0248.2011.01620.x

      Bowler DE, Eichenberg D, Conze K-J, Suhling F, Baumann K, Benken T, Bönsel A, Bittner T, Drews A, Günther A, Isaac NJB, Petzold F, Seyring M, Spengler T, Trockur B, Willigalla C, Bruelheide H, Jansen F, Bonn A. 2021. Winners and losers over 35 years of dragonfly and damselfly distributional change in Germany.Diversity and Distributions 27:1353–1366. doi:10.1111/ddi.13274

      Powney GD, Cham SSA, Smallshire D, Isaac NJB. 2015. Trait correlates of distribution trends in the Odonata ofBritain and Ireland. PeerJ 3:e1410. doi:10.7717/peerj.1410

      Comment:

      (9) L210: I count at least 5 migratory species in table S3, so although maybe not enough to analyze it's misleading to say "nearly all" were non-migratory, revise to "most" or "vast majority".

      Thank you for pointing this out, we have made the suggested correction (Line 277).

      Comment:

      (10) L252-254: save this for the Discussion and write a more generalized statement for results to avoid citations in the results.

      Thank you for this suggestion, we have moved this to the discussion (Lines 517-527).

      Comment:

      (11) Figures S5 & S6: these are pretty important, I'd consider elevating them to the main document as one figure with two panels.

      Thank you for this suggestion, we agree these figures should be elevated to the main text, and have made them into a panel figure (Figure 4).

      Comment:

      (12) L305-307: great point and recommendation!

      Thank you very much for this positive feedback!

      Comment:

      (13) L335-336: another place to cite https://doi.org/10.1098/rspb.2019.2645 which includes a thermal sensitivity index and would add an odonate citation behind the statement.

      Thank you for this excellent suggestion, we have added this citation (line 480). (Rocha-Ortega et al. 2020)

      Comment:

      (14) L352-353: again see also https://doi.org/10.1098/rspb.2019.2645.

      Thank you for highlighting this reference, we have added it to Line 505 as suggested.

      Comment:

      (15) L355: revise "populations that coexist" to "species that co-occur" (big difference between population and species levels and between coexistence and co-occurrence).

      Thank you very much for pointing this out, we have made the suggested change (Line 507).

      Comment:

      (16) L359-365: are the winners and losers depicted in Figures S5 & S6? If so reference the figure (which I suggest combining and promoting to the main text), if not create a table listing the analyzed species and their winner/loser status.

      We agree that this is an excellent place to bring up Figures S5 and S6 from the supplemental. We have moved them to the main document as one figure and referenced it at line 510.

      Reviewer #2 (Recommendations for the authors):

      Comment:

      (1) Line 53-55: The claim that "These relationships generalize poorly taxonomically and geographically" is valid, but the study only tests Odonata on two continents.

      Thank you for this comment – the word ‘generalize’ may imply that our study tries to find a general pattern across many groups. We have changed the language to: “However, these relationships are inconsistent across taxa and regions, and cross-continental tests have not been attempted (Angert et al., 2011; Buckley and Kingsolver, 2012; Estrada et al., 2016; MacLean and Beissinger, 2017).” (Lines 57-59).

      Comment:

      (2) Line 58-59: Is this statement only true for Odonata? It does not seem to hold for plants, for example.

      Thank you for this comment – this statement references a meta-analysis of multiple animal and plant taxa, but the evidence for the importance of range location comes from animal taxa. We have specified that we are referring to animal species to clarify (Line 60).

      Comment:

      (3) Line 87-91: This section is difficult to understand and needs clarification.

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121).

      Comment:

      (4) Line 99-100: Please define "generalist" and "specialist" more clearly here (e.g., based on climate niche?).

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (5) Line 122: Replace the English letter "X" in "100x100 km" with the correct mathematical symbol.

      We have made the suggested replacement throughout the manuscript.

      Comment:

      (6) Line 148: To address sampling effects, you could check the paper: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15524. Additionally, maximum and minimum values are sensitive to extreme data points, so using 95% percentiles might be more robust.

      Thank you for sharing this paper, as it offers a valuable perspective on the study of species’ ranges. While our dataset is substantially composed of observations from adult sampling protocols, unlike the suggested paper which compares adults and juveniles, this is an interesting alternative approach.

      For our purposes it is meaningful to include outliers, as otherwise we may have missed individuals at the leading edge of range expansions. Our intent here was to detect range limits, as opposed to finding the central tendency of species distributions. This approach is widely accepted in the macroecology literature (i.e. Devictor et al., 2012, 2008; Kerr et al. 2015).

      We have included the following discussion of our approach in the methods section:

      “We followed widely accepted methods to determine species range boundaries (Devictor et al., 2012, 2008; Kerr et al., 2015), although other methods exist that are appropriate for different data types and research questions i.e. (Ni and Vellend, 2021). We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.” (Lines 168-173).

      Kerr JT, Pindar A, Galpern P, Packer L, Potts SG, Roberts SM, Rasmont P, Schweiger O, Colla SR, Richardson LL,Wagner DL, Gall LF, Sikes DS, Pantoja A. 2015. Climate change impacts on bumblebees converge across continents. Science 349:177–180. doi:10.1126/science.aaa7031

      Soroye P, Newbold T, Kerr J. 2020. Climate change contributes to widespread declines among bumble bees across continents. Science 367:685–688. doi:10.1126/science.aax8591

      Devictor V, Julliard R, Couvet D, Jiguet F. 2008. Birds are tracking climate warming, but not fast enough.Proceedings of the Royal Society B: Biological Sciences 275:2743–2748. doi:10.1098/rspb.2008.0878

      Devictor V, van Swaay C, Brereton T, Brotons L, Chamberlain D, Heliölä J, Herrando S, Julliard R, Kuussaari M,Lindström Å, Reif J, Roy DB, Schweiger O, Settele J, Stefanescu C, Van Strien A, Van Turnhout C,

      Vermouzek Z, WallisDeVries M, Wynhoff I, Jiguet F. 2012. Differences in the climatic debts of birds and butterflies at a continental scale. Nature Clim Change 2:121–124. doi:10.1038/nclimate1347

      Comment:

      (7) Line 195: The species' climate niche should also be considered a product of evolution.

      Thank you for this suggestion. To address this comment and a comment from another reviewer, we changed the text to the following: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) Line 244: This speculative statement belongs in the Discussion section.

      Thank you for this suggestion, we have moved this statement to the discussion (Lines 451-453).

      Comment:

      (9) Line 252-254: The projection of Coenagrion mercuriale's range contraction is not part of your results and should be clarified or removed.

      Following this suggestion and a similar suggestion from another reviewer, we moved this text to the discussion (Line 517-527).

      Comment:

      (10) Line 314-316: If the species can tolerate warmer temperatures better, why would they migrate?

      We apologize for the confusion, and we have reworded the section as follows: “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (11) Line 334-335: Species' tolerance to temperature likely depends on their traits, which were not tested in this study. This should be noted.

      We agree, and we have removed the wording “rather than traits” from this sentence (Line 479).

      Reviewer #3 (Recommendations for the authors):

      Comment:

      (1) Title: The title is too general not specifying that your results are on odonates only, but also stressing the implicit role of climate change to a degree the tests do not support.

      Following this comment and a suggestion from another reviewer we changed the title to the following: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”. We wanted to emphasize our use of Odonates as a model species that we used to ask broad questions, while being more specific about the climatic variable that we examined (temperature variability).

      Comment:

      (2) L32: consider including Novella-Fernandez et al. 2023 (NatCommun) which addresses this topic in Odonates.

      Thank you for suggesting this very interesting paper, we have added it as a citation (Line 31-32).

      Comment:

      (3) L35: consider including Grewe et al. 2013 (GEB) and Engelhardt et al. 2022(GCB).

      Thank you for these excellent suggestions, we have added the citations (Line 35).

      Comment:

      (4) L47: rather write 'result from' instead of 'driven by'.

      We agree this is a better characterization and have corrected the wording (Line 48-49).

      Comment:

      (5) L49-52: There has been a recent study on this topic for birds (Neate-Clegg et al., 2024 NEE). However, specifying this to insects would make it not less relevant. This review for odonates might be helpful in this regard (Pinkert et al.. 2022, Chapter: "Odonata as focal taxa for biological responses to climate change" IN Dragonflies & Damselflies: Córdoba-Aguilar et al. (2022) Model Organisms for Ecological and Evolutionary Research.

      Thank you for again suggesting excellent references, we have added them to line 52-53, as well as adding the Pinkert citation to lines 61 and 82.

      Comment:

      (6) L53-66: Combine into one paragraph about drivers. With traits first and the environment second. The natural land cover perspective may be too complicated in this context. Consider focusing on generalities of the impact of changes within species' ranges.

      As suggested we have combined these into one paragraph about drivers (Line 59).

      Comment:

      (7) L67-69: The book from before would be a much stronger reference for this claim. Kalkmann et al (2018) do not address the emphasis of global change research in insects on bees and butterflies. Also, I would highlight that most of the current work is at a national scale, rather than cross-continental.

      Thank you for this suggestion, we have added the suggested reference and included that “…recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 75-77).

      Comment:

      (8) L68: consider rephrasing this part to '..provide a rare opportunity to investigate spatiotemporal biotic responses at larger taxonomic and spatial scales'

      We appreciate this suggestion and really like the wording. We have changed the phrase to read as follows: “While global change research on insects often emphasizes butterfly and bee taxa, recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 74-77).

      Comment:

      (9) L69: This characteristic is not unique to odonates and would hamper drawing general conclusions. Honestly, I think the detailed and comprehensive data on them is the selling point.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (10) L73: Indicator for what? The first part of the sentence would suggest lesser surrogacy for responses of other taxa. Reconsider this statement. They are well- established indicators for habitat intactness and freshwater biodiversity. Darwell et al. suggested their diversity can serve as a surrogate for the diversity of both terrestrial and aquatic taxa.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (11) L76: Fritz et al., is a study on mammals, not odonates.

      Thank you for pointing out this error, the reference has been removed (Line 84-85).

      Comment:

      (12) L84: Lotic habitats are generally better connected than lentic ones. Lentic species are considered to have a greater propensity for dispersal DUE to the lower inherent spatiotemporal stability (implying lower connectivity) compared to lotic habitats.

      Thank you for your comment, we have rewritten this section as follows: “For example, differences in habitat connectivity and dispersal ability may constrain range shifts for lentic species (those species that breed in slow moving water like lakes or ponds) and lotic species (those living in fast moving-water) in different ways (Kalkman et al., 2018). More southerly lentic species may expand their range boundaries more than lotic species, as species accustomed to ephemeral lentic habitats better dispersers (Grewe et al., 2013), yet lotic species have also been found to expand their ranges more often than lentic species, potentially due to the loss of lentic habitat in some areas (Bowler et al., 2021).” (Lines 88-95).

      Comment:

      (13) L90: I would be cautious with this interpretation. If only part of the range is considered (here a country in the northern Hemisphere) southern species are moving more of their range into and northern species more of their range out of the study area in response to warming (implying northward shifts).

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121)

      Comment:

      (14) L117: Odonata Central contains many county centroids as occurrence records. These could be an issue for your use case. I may have overlooked the steps you took to address this, but I think this requires at least more detail and possibly further removal/checks using for instance CoordinateCleaner. The functions implemented in this package allow you to filter records based on political units to avoid exactly this source of error.

      Thank you for this suggestion, we weren’t aware of this issue with Odonata Central. We used the CoordinaterCleaner tool in R to filter all odonate records that we used in our analyses. Less than 1% of observations in our dataset were identified as having potential problems by the tool, so we would not expect this to affect our inferences. However, in future we will employ this tool when using similar datasets.

      Comment:

      (15) L119: Please add a brief explanation of why this was necessary. I am ok with something along the lines in the supplement.

      We moved this information from the supplemental to the main text as follows: “If a species was found on both continents, we only retained observations from the continent that was the most densely sampled. If we merged data for one species found on both continents, we could not perform a cross-continental comparison. However, if the same species on different continents was treated as different species, this would lead to uninterpretable outcomes (and the creation of pseudo-replication) in the context of phylogenetic analyses. In addition, species found on both continents did not have sufficient data to meet criteria for the phenology analysis.” (Lines 161-167).

      Comment:

      (16) L132: This is the letters 'X' or 'x' are not multiplier symbols! Please change to the math symbol (×), everywhere.

      Thank you for pointing out this error, we have made the correction throughout the manuscript.

      Comment:

      (17) L133: add 'main' before 'flight period'

      Thank you for this suggestion, we have made the change. (Line 190)

      Comment:

      (18) L135: I suggest using the coefficient of variation, as it is controlled for the mean. Otherwise, what you see is partly the signature of temperature and not of its variation. For me, it's very difficult to understand what this variation of the variation means and at least needs more explanation.

      Thank you very much for this suggestion, we agree that using the coefficient of variation is a better fit for the question that we’re asking. We re-ran out analyses with the coefficient of variation as the measure of climate variability: all the results reported in the manuscript are now updated for that analysis (Line 377, Table 2), and we have also updated the methods section (Line 191). The results are qualitatively the same to our previous analysis, but we agree that they are now easier to interpret.            

      Comment:

      (19) L155: Please adequately reference all R packages (state the name, and a reference for them including the authors' names, title, and version).

      Thank you for pointing out this omission, we have added reference information for the glm function in base R (Line 298) and ensured all other packages are properly referenced.

      Comment:

      (20) L207: Mention the literature sources here (again).

      We agree that they should be referenced here again, and we have done so (Lines 267-268).

      Comment:

      (21) L209: You could use the number of grid cells as a proxy for range size.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (22) L218: It would be preferable to say 'species-level' instead of 'by-species'.

      Thank you for this suggestion, we agree that this is clearer and made the change (Line 298).

      Comment:

      (23) L219-220: this is unclear. Please rephrase.

      We have clarified as follows: “We used both species-level frequentist (GLM; glm function in R) and Bayesian (Markov Chain Monte Carlo generalized linear mixed model, MCMCglmm; Hadfield, 2010) models to improve the robustness of the results.” (Lines 298-300).

      Comment:

      (24) L224: At least for Europe there is a molecular phylogeny available, which you should preferably use (Pinkert et al. 2018, Ecography). Otherwise, I am ok with using what is available

      We apologize that the nature of the phylogeny that we used was not clear; the phylogeny that we used was built similarly to that in Pinkert et al. 2018, Ecography. It created a molecular phylogeny with a morphological/taxonomic tree as the backbone tree, so that species could only move within their named genera or families. We clarified this in the manuscript as follows:

      “We used the molecular phylogenetic tree published by the Odonate Phenotypic Database (Waller et al., 2019), which used a morphological and taxonomic phylogeny as the backbone tree, allowing species to move within their named genera or families according to molecular evidence (Waller and Svensson, 2017).” (Lines 302-305).

      Comment:

      (25) L233: You said so earlier (1st sentence of this paragraph).

      Thank you for pointing this out, we removed the repetitive sentence (Line 323).

      Comment:

      (26) L236-238: To me, it makes more sense to test this prior to fitting the phylogenetic models.

      MCMC-GLMM is considerably less familiar to most researchers than general linear models or there derivatives/descendants, such as PGLS. We report models both with and without phylogenetic relationships included for the sake of transparency, and we are happy to acknowledge that no interpretation here changes substantially relative to these decisions. However, failing to report models that included possible (if small) effects of phylogenetic relatedness might cause some readers to question what those models might have implied. For the moment, we are opting for the most transparent reporting approach here.

      Comment:

      (27) L241: Rather say directly XX of XX species in our data....

      (28) L245: Same here. Provide the actual numbers, please.

      Thank you for this suggestion, we made this change on Line 332 and Line 334.

      Comment:

      (29) L247-249: Then not necessary.

      This issue highlights a challenge in the global biology literature and around the issue of biodiversity monitoring for understanding global change impacts on species. Almost no studies have been able to report simultaneous range and phenology shifts, and the literature addresses these biotic responses to global change predominantly as distinct phenomena. Differences in numbers of species for which these observations exist, even among the extremely widely-observed odonates, seems to us to be a meaningful issue to report on. If the reviewer prefers that we abbreviate or remove this sentence, we are happy to do so.

      Comment:

      (30) L251:261: That is discussion as you interpret your results.

      Following your suggestion and the suggestion of another reviewer, we moved the following lines to the discussion section: “Species that did not shift their ranges northwards or advance their phenology included Coenagrion mercuriale, a European species that is listed as near threatened by the IUCN Red List (IUCN, 2021), and is projected to lose 68% of its range by 2035 (Jaeschke et al., 2013).” (Lines 517-527).

      Comment:

      (31) 252: Good to mention, but why is the discussion limited to C. mercurial?

      We feel that it is important to link the broad-scale results to the specific biological characteristics of individual species, and C. mercurial is an IUCN threatened species. We are happy to expand links to natural history of this group and have added the following: “This group also includes Coenagrion resolutum, a common North American damselfly (Swaegers et al., 2014), for which we could not find evidence of decline. This may be due in part to the greater area of intact habitat available in North American compared to Europe, enabling C. resolutum to maintain larger populations that are less vulnerable to stochastic climate events. Still, this and other species failing to shift in range or phenology should be assessed for population health, as this species could be carrying an unobserved extinction debt.” (Lines 527-533).

      Comment:

      (32) L264: Insert 'being' before 'consistently'.

      Thank you for the suggestion, we made this change (Line 373).

      Comment:

      (33) L271: .'. However,'.

      Thank you for pointing out this grammatical error, we have corrected it (Line 382).

      Comment:

      (34) L273: 'affected' instead of 'predicted'

      Thank you for the suggestion, we made this change (Line 383).

      Comment:

      (35) L279: 'despite pronounced recent warming' sounds not relevant in this context.

      Thank you for this suggestion, we removed this portion of the sentence (Line 408).

      Comment:

      (36) L281: Rather 'the model performance did not improve....'

      Thank you for the suggestion, we made this change (Line 409).

      Comment:

      (37) L288: Add 'but' before 'not'.

      Thank you for the suggestion, we made this change (Line 416).

      Comment:

      (38) L311-316: Reconsider the causality here. maybe rather rephrase to are associated instead. Greater dispersal ability and developmental plasticity might well lead to higher growth rates, rather than the other way around.

      We agree that plasticity/evolution at range edges is important to consider and have included it as an alternative explanation: “Adaptive evolution and plasticity may enable higher population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Line 449-451).  

      Comment:

      (39) L313-316: Maybe delete the second 'should be able to'.

      This phrase has been changed in response to other reviewer comments and now reads as follows:

      “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (40) L331: Limit this statement ending with 'in North American and European Odonata'.

      Thank you for this suggestion, we made this addition (Lines 475-476).

      Comment:

      (41) L346-347: There are too many of these more-research-is-needed statements in the discussion (at least three in the last paragraphs). Please consider finishing the paragraphs rather with a significance statement.

      Thank you for this suggestion, we have changed the final sentence here to the following: “The extent to which species’ traits actually determine rates of range and phenological shifts, rather than occasionally correlated with them, is worth considering further, but functional traits do not systematically drive patterns in these shifts among Odonates in North America and Europe.” (Lines 480-483).

      We also made additional changes, removing a ‘more-research is needed’ statement from the following paragraph (Line 443), as well as from line 499.

      Comment:

      (42) L349: See also Franke et al. (2022, Ecology and Evolution).

      Thank you for highlighting this excellent reference! We have added it to Line 501.

      Comment:

      (43) L363: Maybe a bit late in the text, but it is important to note that there is the third dimension 'abundance trends' or rather a common factor related to range and phenology shifts. I feel this fits better with the discussion of population growth.

      Thank you for this suggestion, we have addressed the importance of abundance trends in the following sentences: “Further mechanistic understanding of these processes requires abundance data.” (Lines 442-443); “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there are clear ‘winners’ and ‘losers’ under climate change.” (Lines 509-510).

      Comment:

      (44) L375-377: This last sentence is very similar to L371-373. Please reduce the redundancy. Focus more on specifically stating the process instead of vaguely saying 'new insights into patterns' and 'suggesting processes'. Rather, deliver a strong concluding message here.

      Thank you for this suggestion, we feel that we now have a much stronger concluding message: “By considering both the seasonal and range dynamics of species, emergent and convergent climate change responses across continents become clear for this well-studied group of predatory insects.” (Lines 545-547).

      Comment:

      (45) Table 1: To me, the few estimates presented here do not justify a table. rather include them in the text. OR combine them with Table 2. Also, why not include the traits as predictors (from the range shift models) in these models as well?

      We have clarified in the text that the results displayed in Table 1 are from the analysis of the relationship between range and phenology shifts: “The effect of species’ range shifts on phenology range shifts was significant in our model investigating the relationship between these responses, indicating that species shifting their northern range limits to higher latitudes also showed stronger advances in their emergence phenology (Figure 3).” (Lines 341-344).

      As there were no significant effects in the model of phenology change drivers, we have not shown results of this model: “Emergence phenology shifts were not affected by species’ traits, range geography, nor climate variability; due to this, model results are not displayed here.” (Lines 383-384).

      Comment:

      (46) Table 2: L712-713: What does this mean? Are phenology shifts not used as a predictor of range shifts? (why then this comment?). Or do you want to say phenological shifts are not related to Southern range etc? Why do you present a phylosig here but not in Table 1? Why not include the traits as predictors (from the range shift models) in these models as well? Consider using the range size as a continuous predictor instead of 'Widespread'.

      We are glad the reviewer pointed this out to us. We did not emphasize this issue sufficiently. We DID evaluate traits as predictors both of geographical range and phenological shifts, and species-specific biological traits did not significantly affect models predicting either of those sets of responses. We state this on Lines 312-323, but we have also noted in the discussion (Lines 473-476) that the most commonly assessed traits, like body size, do not alter observed trends here. Instead, where species are found, rather than the characteristics of species, is the key determinant of their overall responses.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (47) Figure 1: I don't see any grey points in the figure. Also, there is no A or B. If you are referring to the symbols then write cross and triangle instead and not use capital letters which usually refer to component plots of composite figures. Also, I highly recommend providing a similar figure based on your data (maybe each species as a dot for T1 and another symbol for T2). Given the small number of species, you could try to connect these points with arrows. For the set with only range shifts maybe play the T2-dots at the center of the 'Emergence' axis.

      Thank you for pointing out this error: a previous version of Figure 1 included grey points and multiple panels. We have removed this text from the figure caption to be consistent with the final version of the figure (Line 989).

      The graphical depictions of the conceptual and empirical discoveries in this paper were challenging to create. The reviewer might be suggesting effectively decomposing Figure 3 (change in range on the y axis vs change in phenology among all species into two sets of points on the same graph, where each pair of points is a before and after value for each species. This would make for a very busy figure indeed. We have modified the conceptual Figure 1 to illustrate more clearly, we believe, that species can (in principle) remain within tolerable niche spaces by shifting their activity periods in time (phenology) or in space (geographical range) or both.

      Comment:

      (48) Figure 2: Please add a legend. Also black is a poor background color. The maps appear to be stretched. Please check aspect ratios. Now here are capital letters without an explanation in the caption. From the context I assume the upper panel maps are for the data used to calculate range shifts at the bottom panel maps are for data used to calculate the phenological shifts.

      We apologise for the error in the figure caption and have clarified the differences between panels in the text, as well as changing the map background colour and fixing the aspect ratio:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (49) Figure 3: Why this citation? Of terrestrial taxa? Please explain. Consider adding some stats here, such as the r-squared value for each of the relationships.

      We have better explained the citation in the figure caption, as well as adding r-squared values:

      “Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).” (Lines 679-682)

      Comment:

      (50) L801: What are these underscored references?

      This was an issue with the reference software and has been resolved.

      Comment:

      (51) Table S1: L848: Consider starting with 'Samples of 76 North American and European odonate species from between ...'. Please use a horizontal line to separate the content from the table header. Add a horizontal line below the last row. Same for all tables.

      Thank you for this suggestion, we have edited the caption for Figure S1 as suggested (Line 1124). We have also made the suggested line additions to Table S1, S2, and S3.

      Comment:

      (52) Table S3: This is confusing. In Table 1 (main text) both 'southern range' and 'widespread' are used as predictors. Please explain.

      We originally included information on species range geography, including southern versus northern range, and widespread versus not, into one categorical variable. Following additional comments we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Now the methods section text (Lines 261-263) and Table 1 report results of that variable with distribution options northern, southern, or both. 

      Comment:

      (53) Figure S5 and S6: It would be more coherent if the colors refer to the continents and the suborders are indicated by shading. I would love to see a combination of the two figures with species ordered by the phylogenetic relationship and a dot matrix indicating the traits in the main text! This could really be a good starting point for a synthesis figure.

      The reviewer presents an interesting challenge for us. We have a choice, as we understand things, to present a figure showing phylogeny and traits (as requested here), or an ordered list of species relative to effect sizes in the two main responses to global change. The latter choice centers on the discoveries of the paper, while the former would be valuable for dragonfly biology but would depict information that proved to be biologically uninformative relative to our discovery. That is to say, there is no phylogenetic trend and biological traits among species did not affect results. We have gone some way toward illustrating that issue by retaining phylogeny in the MCMC-GLMM models, but we feel that a figure illustrating phylogeny and traits would (for most readers, at least) illustrate noise, rather than signal. For this reason, we have opted to take on the previous reviewer’s suggestion for a modified, main-text Figure 4, which we include below.

      Figure 4: Distribution of Northern range limit shifts (Panel A, kilometers) and emergence phenology shift (Panel B, Julian day) of 76 European and North American odonate species between a recent time period (2008 - 2018) and a historical time period (1980 - 2002). Anisoptera (dragonflies) are shown in pink, Zygoptera (damselflies) are shown in blue.

      Change last: Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).

    1. eLife Assessment

      This study is a valuable contribution to the field of neuronal modeling by way of providing a method for rapidly obtaining neuronal physiology parameters from electrophysiological recordings. The method is solid as the generated models reproduce both ground-truth simulated data and empirical data, and there is now a quantitative comparison with other approaches.

    2. Reviewer #2 (Public review):

      Summary:

      Developing biophysically detailed computational models that accurately capture the characteristic physiological properties of neurons across diverse cell types is a key challenge in computational neuroscience. A major obstacle lies in determining the large number of model parameters, which are notoriously difficult to fit such that the model faithfully reproduces the empirically observed electrophysiological responses. Existing approaches require substantial computational resources to generate models for even a single neuron. Generating models for additional neurons typically requires starting from scratch, with no reuse of previous computations - making the process just as computationally expensive each time.

      Kim et al. introduce an innovative approach based on a Generative Adversarial Network (GAN) to overcome these limitations. Once trained, the network takes empirically observed electrophysiological responses as input and predicts the biophysical parameters with which a Hodgkin-Huxley model can reproduce these responses. The authors demonstrate this for nine non-spiking neurons in C. elegans. The resulting models generally provide a good fit to the empirical data. As the GAN has learned general relationships between biophysical parameters and the resulting electrophysiology, it can be used to generate models of diverse cell types without retraining - enabling model generation at low computational cost.

      Strengths:

      The authors address an important and technically challenging problem. A noteworthy strength of their approach is that, once trained, the GAN can generate models from new empirical data at low computational cost. The generated models reproduce the responses to current injections well.

      The authors have addressed all of my previous major concerns and have significantly improved their method:

      (1) Most importantly, the generated models reproduce both ground-truth simulated and empirical data well. Responses - including resting membrane potentials - are now well captured.

      (2) The comparison with other approaches has been extended to be more quantitative and rigorous.

      (3) The authors now convincingly demonstrate that the improved EP-GAN is relatively robust to data ablation.

      Weaknesses:

      Slow dynamics (e.g., slow ramps) are still not reliably captured. However, as the approach excels at other frontiers - the generation of models for diverse cell types at low computational cost - I consider this to be a relatively minor limitation.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

      We thank the reviewer for the suggestion. We supplemented the membrane potential regression loss with errors computed for 3 intervals: pre- post- and mid- stimulation time intervals, improving the accuracy of EP-GAN for baseline membrane potential responses (Figure 2, 3, Table S2, S3). We also changed the simulation protocols for generated parameters by allowing a longer simulation time of 15 seconds, where the stimulation is applied during [5, 10] seconds and no stimulation at t = [0, 5) (pre-stimulation) and t = (10, 15] (post-stimulation). These time intervals are chosen to ensure sufficient stabilization periods before and after stimulation.  

      (2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

      We thank the reviewer for the clarification on inverse gradient operation terminology. In the Methods section, we changed the term describing the inverse gradient operation to ‘forward integration’ which is a more accurate description describing the process.

      (3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

      We supplemented the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size consistent with literature. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios.  

      Reviewer #2 (Public review):

      Major 1: Models do not faithfully capture empirical responses. While the models generated with EPGAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EPGAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

      We made improvements to the training data generation and architecture of EP-GAN to improve its overall accuracy with predicted membrane potential responses. In particular, we divided training data generation into three neuron types found in C. elegans non-spiking neurons: 1) Transient outward rectifier, 2) Outward rectifier and 3) Bistable [8, 16]. Each randomly generated training sample is categorized into one of 3 types by evaluating its steady-state currents with respect to experimental dI/dV bound constraints (See generating training data section under Methods for more detail). The process is then followed by imposing minimum-maximum constraints on simulated membrane potential responses. The setup allows generations of training samples that are of closer distribution to experimentally recorded neurons. This is further described in Section Methods page 15 in the revised manuscript.

      We also improved the EP-GAN training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol (see Methods page 13 for more detail). For the training loss functions, we further supplemented the membrane potential regression loss with errors computed for 2 intervals: pre- and post-stimulation time intervals to improve EP-GAN prediction capabilities for baseline membrane potentials.

      Taken together, these modifications improved EP-GAN’s overall ability to better capture empirical membrane potential responses and we show the results in Figure 2 – 5, Table S2, S3.

      Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

      We thank the reviewer for the feedback regarding the comparison with existing methods. We have revised the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). Incorporating this process has improved the accuracy of existing methods especially for small HH-model scenarios where DEMO stood out with the best performance alongside NSGA2 (Figure 5, Table 1, 2).

      We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios. 

      In particular, with extended membrane potential error including pre-, mid- , post-activation periods, EP-GAN (trained with 32k samples, large HH-model, 9 neurons) mean membrane potential responses error of 2.82mV was lower than that of DEMO (12.2mV, 64k samples) trained on identical setup (Table 2) and DEMO (7.78mV, using 36,000k samples, 3 neurons) applied to simpler HHmodel in [16]. With respect to DEMO performance in [16], under identical simulation protocol (i.e., no stimulation during (0, 5s), (10, 15s) and stimulation during (5, 10s)), EP-GAN predicted RIM (large HH-model) showed membrane potential accuracy on par with that of DEMO (simpler HH-model) and EP-GAN predicted AFD showed better accuracy for post-activation membrane potential response where DEMO predicted membrane potentials overshoot above the baseline (not shown in the paper).

      Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

      We thank the reviewer for the feedback regarding the paper's scope. With revised methods, the overall quality of EP-GAN models is improved with the most significant improvements in baseline membrane potential accuracy. While high quality neuron models could be attained with existing methods given sufficient sample size, our results suggest EP-GAN can predict models with enhanced quality with significantly fewer sample size without a need for retraining, thus complementing the main drawback of evolutionary based methods. While EP-GAN still has limitations (e.g., difficulty in predicting slow ramps) that need to be addressed in the future, we believe its overall performance combined with fast inference speed and flexibility in its input data format (e.g., missing membrane potential traces) is a step forward in the large-scale neuron modeling tasks that can contribute to network models.   

      Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

      We improved EP-GAN’s training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol.

      Such input masking during training has improved the results with ablation studies where EP-GAN now retains baseline membrane potential error (3.3mV, averaged across pre-, mid-, post-activation periods) up to 50% of membrane potential inputs remaining (3.5mV) and up to 25% of steady-state currents remaining (3.5mV).

    1. eLife Assessment

      This valuable study investigates the implementation of an efference copy mechanism in the visual flight control system of Drosophila, a topic of broad interest to sensorimotor neuroscientists. Although the behavioral data and computational analyses are each individually solid, there is limited quantitative evaluation of how the model predictions compare to the experimental data.

    2. Reviewer #1 (Public review):

      This study provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that enables predictions of Drosophila behavior in natural visual environments.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings. The quantitative models and validating behavioral experiments make this a valuable contribution to the field. This study is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      The associated code base is well documented and readily produces all figures in the document.

    3. Reviewer #2 (Public review):

      Summary:

      The fly visual circuit and its behavioral response to simple visual stimuli have been well investigated, yet how they respond to more complex visual patterns is less understood. Canelo et al. first characterized a fly's steering to simple stimuli and examined how the combination of those stimuli impacts behavior. Combining behavioral experiments and simulation, the authors found that, for some combinations, a behavioral response can be explained by a linear summation of responses to individual stimuli. However, for looming and background motion combinations, the behavioral response to one was suppressed by the other. Furthermore, the effect was dependent on the onset timing of the pair of stimuli.

      Strength:

      The authors tested various visual stimulus patterns and time delays between combinations of visual stimuli and found novel interactions in behavior. Their findings support the idea that, depending on the visual context, additional mechanisms kick into the visual-motor circuit to coordinate steering behavior flexibly.

      Weakness:

      The manuscript does not provide conclusive evidence on the presence of an efference copy signal, though there appears to be an intention to associate it with the result. However, demonstrating it is likely to be beyond the main scope of the revised version.

      The goal of this manuscript is to understand how the fly's steering behavior is coordinated upon complex visual stimuli, and a number of experiments and simulations support their conclusion.

      The behavioral findings presented in this paper will be helpful in further dissecting the underlying neural mechanisms of contextual sensory processing and in understanding visual processing in other species.

    4. Reviewer #3 (Public review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask how flies orient to visual features and stabilize their gaze. In particular, the authors propose three models of visuomotor control, which lead to specific experimental predictions. With the goal of teasing out the suggested models, the authors design three flight experiments: 1) a bar-background experiment, 2) a looming-background experiment, and 3) a bar-background statistics experiment. The authors claim that: experiment 1 data favor the addition-only and graded EC model; experiment 2 data favor the all-or-none EC model; experiment 3 appears to suggest a graded EC model.

      While the study is interesting, there are major issues with the conceptual framework. In general, there is a major disconnect between model and animal data. The manuscript lacks a statistical framework to support or refute the proposed models. In the end, it is unclear what are the main conclusions of the manuscript and contributions to the field.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The figures are overall clear and salient.

      Weaknesses:

      Comparison of model to fly data:<br /> In general, the manuscript suffers from a lack of quantitative comparisons between proposed models and fly data, which compromises the main findings of the work. While Figure 1-Fig. supplement 1 shows a direct comparison between experiment and model predictions, puzzlingly there is no such quantitative comparison in the main manuscript for the faster moving stimuli. Please overlay model predictions and experimental data and provide statistical comparisons throughout. The 3 proposed models are hypotheses, but there is no statistical framework to reject or support the models/hypotheses. Further, there is a disconnect between the new flight experiments and models. In fact, we do not see the model predictions for the set of experimental conditions tested in Figs. 5-7.

      Concerns about mechanical model: I have several concerns regarding the biomechanics block in Figure 2:

      (1) The inertia coefficient, derived from free flight studies. does not take into account the fact that the center of rotation and center of mass do not align in the magnetic tether (see Bender & Dickinson, 2006 for estimates). This must be corrected using the parallel axis theorem. As the authors compare the model prediction to experimental data in a magnetic tether, it is critical that they revise their analysis.

      (2) According to their chosen inertia and damping constants, they would estimate that the I/C time constant is ~1E-3 ms, which is much much smaller than what has been estimated for yaw turns in the magnetic tether (200 ms; Bender & Dickinson, 2006) or free flight saccades (~17 ms; see Cheng et al., 2010; 10.1242/jeb.038778). The bottom line is that the current model underestimates the influence of inertia in turn manoeuvres, i.e. the aerodynamic damping is cranked up too high relative to yaw inertia. This may explain the mismatch between data and model that the authors posit, "What causes the fly to undershoot the movement of the target object in the magnetically tethered assay? One hypothesis is that strong upward magnetic force or a blunt top end of the steel pin significantly dampens the flies' flight turns."

      Loom response experiment:<br /> As nicely shown by 10.1242/jeb.02369, visual stimulation of looming stimuli in the magnetic tether evokes saccades. Is it the case as well in Fig. 6? Without showing individual trials, it is not possible to know whether this is the case. If indeed saccades are present, then the authors will have to reframe their results given the physiological evidence for saccade-related cancellation signals and the three proposed models.

      Minor comments:

      Missing Equation 13 for saccade model in Methods.

      For the discussion and results related to flight responses to the mismatch between expected and actual visual feedback, which is germane to the proposed models, the authors should integrate a discussion of a recent paper which directly tested this idea through an augmented reality system: 10.1016/j.cub.2023.11.045. In particular, the authors argue that the optomotor response is not particularly flexible because it may not rely on an internal model, as suggested by recent physiological evidence (Fenk et al.). How do these findings relate to the 3 proposed models within your work?

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Drosophila Visuomotor Integration: An Integrative Model and Behavioral Evidence of Visual Efference Copy" provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that allows predictions of the behavior of Drosophila in natural visual environments.

      Strengths:

      Overall, the manuscript is well-structured and clear in its presentation, and the modeling and experimental research are methodically conducted and illustrated in visually appealing and easy-to-understand figures and their captions.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings.

      The associated code base is well documented and readily produces all figures in the document.

      Suggestions:

      However, while the experiments provide evidence for the use of a visual efference copy, the manuscript would be even more impressive if it presented specific predictions for the neural implementation or even neurophysiological data to support this model. Or, at the very least, a thorough discussion. Nonetheless, these models and validating behavioral experiments make this a valuable contribution to the field; it is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      We appreciate the reviewer’s thoughtful comments on the strengths and weaknesses of our manuscript. We agree that biophysically realistic model reflecting the structure of neural circuits as well as physiological data from them would be invaluable. However, we are currently unable to provide physiological evidence for EC-based suppression, nor provide circuit architecture for efference copy-based suppression of the stability circuit because the neural pathway underlying this behavior remains unidentified. Extensive recordings from the HS/VS system have revealed cell-type-specific motor-related inputs during both spontaneous and loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). These studies predicted suppression of the optomotor stability response during such turns, and our new experiments confirmed this suppression specifically during loom-evoked turns (Figures 5, 6). However, these neurons are primarily involved in the head optomotor response, not the body optomotor response. We hope to extend our current model in future studies to incorporate more cellular-level detail, as the feedforward circuits underlying stability behavior become more clearly defined.

      Here are a few points that should be addressed:

      (1) The biomechanics block (Figure 2) should be elaborated on, to explain its relevance to behavior and relation to the underlying neural mechanisms.

      We appreciate this suggestion. The mathematical representation of the biomechanics block has been developed by other groups in previous studies (Fry et al., 2003; Ristroph et al., 2010). We used exactly the same model, and its parameters were identical to those used in one of those studies (Fry et al., 2003; Ristroph et al., 2010), in which the parameters were estimated from the stabilizing response in response to magnetic “stumbling” pulses. In the previous version of the manuscript, we had a description of the biomechanics block in the Method section (see Equation 4). In response to the reviewer’s comment, we have made a few changes in Figure 2A and expanded the associated description in the main text, as follows.

      (Line 160) “To test the orientation behavior of the model, we developed an expanded model, termed “virtual fly model” hereafter. In this model, we added a biomechanics block that transforms the torque response of the fly to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Figure 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now change its body orientation, simulating the visual orientation behavior of flies in the free flight condition.”

      (2) It is unclear how the three integrative models with different strategies were chosen or what relevance they have to neural implementation. This should be explained and/or addressed.

      Thank you for this valuable comment. We selected the three models based on previous studies investigating visuomotor integration across multiple species, under conditions where multiple sensory cues are presented simultaneously.

      The addition-only model represents the simplest hypothesis, analogous to the “additive model” proposed by Tom Collett in his 1980 study (Collett, 1980). We used this model as a baseline to illustrate behavior in the absence of any efference copy mechanism. Notably, some modeling studies have proposed linear (additive) integration for multimodal sensory cues at the behavioral level (Liu et al., 2023; Van der Stoep et al., 2021). However, experimental evidence demonstrating strictly linear integration—either behaviorally or physiologically—remains limited. In our study, new data (Figure 5) show that bar-evoked and background movement-evoked locomotor responses are combined linearly, supporting the addition-only model.

      The graded efference copy model has been most clearly demonstrated in the cerebellum-like circuit of Mormyrid fish during electrosensation (Bell, 1981; Kennedy et al., 2014). In this system, the efference copy signal forms a negative image of the predicted reafferent input and undergoes plastic changes as the environment changes—an idea that inspired our modifiable efference copy model (Figure 4–figure supplement 1). The all-or-none efference copy model is exemplified in the sensory systems of smaller organisms, such as the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006). Notably, in crickets, the motor-related input is referred to as corollary discharge rather than efference copy. Typically, “efference copy” refers to a graded, subtractive motor-related signal, while “corollary discharge” denotes an all-or-none signal, both counteracting the sensory consequences of self-generated actions. In this manuscript, we use the term efference copy more broadly, encompassing both types of motor-related feedback signals (Sommer and Wurtz, 2008).

      In response to this comment, we have made the following changes in the main text to enhance its accessibility to general readers.

      (Line#268) “This integration problem has been studied across animal sensory systems, typically by analyzing motor-related signals observed in sensory neurons (Bell, 1981; Collett, 1980; Kim et al., 2017; Poulet and Hedwig, 2006). Building on the results of these studies, we developed three integrative models. The first model, termed the “addition-only model”, assumes that the outputs of the object (bar) and the background (grating) response circuits are summed to control the flight orientation (Figure 4B, see Equation 14 in Methods).”

      (Line#272) “In the second and third models, an EC is used to set priorities between different visuomotor circuits (Figure 4C,D). In particular, the EC is derived from the object-induced motor command and sent to the object response system to nullify visual input associated with the object-evoked turn (Bell, 1981; Collett, 1980; Poulet and Hedwig, 2006). These motor-related inputs fully suppress sensory processing in some systems (Poulet and Hedwig, 2006), whereas in others they selectively counteract only the undesirable components of the sensory feedback (Bell, 1981; Kennedy et al., 2014).”

      (3) There should be a discussion of how the visual efference could be represented in the biological model and an evaluation of the plausibility and alternatives.

      Thank you for this helpful comment. We have now added the following discussion to share our perspective on the circuit-level implementation of the visual efference copy in Drosophila.

      (Line#481) “Efference copy in Drosophila vision

      Under natural conditions, various visual features in the environment may concurrently activate multiple motor programs. Because these may interfere with one another, it is crucial for the central brain to coordinate between the motor signals originating from different sensory circuits. Among such coordination mechanisms, the EC mechanisms were hypothesized to counteract so-called reafferent visual input, those caused specifically by self-movement (Collett, 1980; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons during spontaneous as well as loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). One type of EC-like signals were identified in a group of wide-field visual motion-sensing neurons that were shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turns, and their amplitudes were quantitatively tuned to those of the expected visual input across cell types. Although amplitude varies among cell types, it remains inconclusive whether it also varies within a given cell type to match the amplitude of expected visual feedback, thereby implementing the graded EC signal. A more recent study examined EC-like signal amplitude in the same visual neurons for loom-evoked turns, across events (Fenk et al., 2021). Although the result showed a strong correlation between wing response and the EC-like inputs, the authors pointed that this apparent correlation could stem from noisy measurement of all-or-none motor-related inputs.

      Thus, these studies did not completely disambiguate between graded vs. all-or-none EC signaling. Another type of EC-like signals observed in the visual circuit tuned to a moving spot exhibited characteristics consistent with all-or-none EC. That is, it entirely suppressed visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015; Turner et al., 2022). 

      Efference-copy (EC)–like signals have been reported in several Drosophila visual circuits, yet their behavioral role remains unclear. Indirect evidence comes from a behavioral study showing that the dynamics of spontaneously generated flight turns were unaffected by unexpected background motion (Bender and Dickinson, 2006a). Likewise, our behavioral experiments showed that, during loom-evoked turns, responses to background motion are suppressed in an all-or-none manner (Figures 6 and 7). Consistent with this, motor-related inputs recorded in visual neurons exhibit nearly identical dynamics during spontaneous and loom-evoked turns (Fenk et al., 2021). Together, these behavioral and physiological parallels support the idea that a common efference-copy mechanism operates during both spontaneous and loom-evoked flight turns.

      Unlike loom-evoked turns, bar-evoked turn dynamics changed in the presence of moving backgrounds (Figure 5), a result compatible with both the addition-only and graded EC models. However, when the static background was updated just before a bar-evoked turn—thereby altering the amplitude of optic flow—the turn dynamics remained unaffected (Figures 5 and 7), clearly contradicting the addition-only model. Thus, the graded EC model is the only one consistent with both findings. If a graded EC mechanism were truly at work, however, an unexpected background change should have modified turn dynamics because of the mismatch between expected and actual visual feedback (Figure 4–figure supplement 1)—yet we detected no such effect at any time scale examined (Figure 7–figure supplement 1). This mismatch would be ignored only if the amplitude of the graded EC adapted to environmental changes almost instantaneously—a mechanism that seems improbable given the limited computational capacity of the Drosophila brain. In electric fish, for example, comparable adjustments take more than 10 minutes (Bell, 1981; Muller et al., 2019). Further investigation is needed to clarify how reorienting flies ignore optic flow generated by static backgrounds, potentially by engaging EC mechanisms not captured by the models tested in this study.

      Why would Drosophila rely on the all-or-none EC mechanism instead of the graded one for loom-evoked turns? A graded EC must be adjusted adaptively depending on the environment, as the amplitude of visual feedback varies with both the dynamics of self-generated movement and environmental conditions (e.g., empty vs. cluttered visual backgrounds) (Figure 4—figure supplement 1). Recent studies on electric fish have suggested that a large array of neurons in a multi-layer network is crucial for generating a modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environment. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit in the Drosophila visual system. 

      We tested the hypothesis that efference-copy (EC) signals guide action selection by suppressing specific visuomotor reflexes when multiple visual features compete. An alternative motif with a similar function is mutual inhibition between motor pathways (Edwards, 1991; Mysore and Kothari, 2020). In Drosophila, descending neurons form dense lateral connections (Braun et al., 2024), offering a substrate for such competitive interactions. Determining whether—and how—EC and mutual inhibition operate will require recordings from the neurons that ensure visual stability, which remain unidentified. Mapping these pathways and assessing how they are modulated by visual and behavioral context are important goals for future work.”

      Reviewer #2 (Public Review):

      It has been widely proposed that the neural circuit uses a copy of motor command, an efference copy, to cancel out self-generated sensory stimuli so that intended movement is not disturbed by the reafferent sensory inputs. However, how quantitatively such an efference copy suppresses sensory inputs is unknown. Here, Canelo et al. tried to demonstrate that an efference copy operates in an all-or-none manner and that its amplitude is independent of the amplitude of the sensory signal to be suppressed. Understanding the nature of such an efference copy is important because animals generally move during sensory processing, and the movement would devastatingly distort that without a proper correction. The manuscript is concise and written very clearly. However, experiments do not directly demonstrate if the animal indeed uses an efference copy in the presented visual paradigms and if such a signal is indeed non-scaled. As it is, it is not clear if the suppression of behavioral response to the visual background is due to the act of an efference copy (a copy of motor command) or due to an alternative, more global inhibitory mechanism, such as feedforward inhibition at the sensory level or attentional modulation. To directly uncover the nature of an efference copy, physiological experiments are necessary. If that is technically challenging, it requires finding a behavioral signature that unambiguously reports a (copy of) motor command and quantifying the nature of that behavior.

      We thank the reviewer for this insightful and constructive comment. We agree that our current behavioral evidence does not directly identify the underlying circuit mechanism, and that direct recordings from visual neurons modulated by an efference copy would be critical for distinguishing between potential mechanisms.

      A prerequisite for such physiological investigations would be the identification of both (1) the feedforward neurons directly involved in the optomotor response, and (2) the neurons conveying motor-related signals to the optomotor circuit. Despite efforts by several research groups, the location of the feedforward circuit mediating the optomotor response remains elusive. This limitation has prevented us from obtaining direct cellular evidence of flight turn-associated suppression of optomotor signaling.

      In light of the reviewer’s suggestion, we expanded our investigation to strengthen the behavioral evidence for efference copy (EC) mechanisms. In addition to our earlier experiments involving unexpected changes in the static background, we examined how object-evoked flight turns influence the optomotor stability reflex and vice versa (Figures 5 and 6). To quantify the interaction between different visuomotor behaviors, we systematically varied the temporal relationship between two types of visual motion—loom versus moving background, or moving bar versus moving background—and measured the resulting behavioral responses.

      Our findings support pattern- and time-specific suppressive mechanisms acting between flight turns associated with the different visual patterns. Specifically:

      The responses to a moving bar and a moving background add linearly, even when presented in close temporal proximity.

      Loom-evoked turns and the optomotor stability reflex mutually suppress each other in a time-specific manner.

      For both loom- and moving bar-evoked flight turns, changes in the static background had no measurable effect on the dynamics of the object-evoked responses.

      These results provide a detailed behavioral characterization of a suppressive interaction between distinct visuomotor responses. This, in turn, offers correlative evidence supporting the involvement of an efference copy-like mechanism acting on the visual system. While similar efference copy mechanisms have been documented in other parts of the visual system, we acknowledge that our findings do not exclude alternative explanations. In particular, it is still possible that lateral inhibition within the central brain or ventral nerve cord contributes to the suppression we observed.

      Ultimately, definitive proof will require identifying the specific neurons that convey efference copy signals and demonstrating that silencing these neurons abolishes the behavioral suppression. Until such experiments are feasible, our behavioral approach provides an important contribution toward understanding the nature of sensorimotor integration in this system.

      Reviewer #3 (Public Review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask whether flies use an all-or-none EC model or a graded EC model (in which the turn amplitude is modulated by wide-field optic flow). Particularly, the authors focus on the bar-ground discrimination problem, which has received significant attention in flies over the last 50-60 years. First, they use a model by Poggio and Reichardt to model flight response to moving small-field bars and spots and wide-field gratings. They then simulate this model and compare simulation results to flight responses in a yaw-free tether and find generally good agreement. They then ask how flies may do bar-background discrimination (i.e. complex visual environment) and invoke different EC models and an additive model (balancing torque production due to background and bar movement). Using behavioral experiments and simulation supports the notion that flies use an all-or-none EC since flight turns are not influenced by the background optic flow. While the study is interesting, there are major issues with the conceptual framework.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The methods are well detailed and the data (and statistics) are presented clearly.

      The integration of behavioral experiments and mathematical modeling of flight behavior.

      The figures are overall very clear and salient.

      Weaknesses:

      Omission of saccades: While the authors ask a significant question related to the mechanism of bar-ground discrimination, they fail to integrate an essential component of the Drosophila visuomotor responses: saccades. Indeed, the Poggio and Reichardt model, which was developed almost 50 years ago, while appropriate to study body-fixed flight, has a severe limitation: it does not consider saccades. The authors identify this major issue in the Discussion by citing a recent switched, integrate-and-fire model (Mongeau & Frye, 2017). The authors admit that they "approximated" this model as a smooth pursuit movement. However, I disagree that it is an approximation; rather it is an omission of a motor program that is critical for volitional visuomotor behavior. Indeed, saccades are the main strategy by which Drosophila turn in free flight and prior to landing on an object (i.e. akin to a bar), as reported by the Dickinson group (Censi et al., van Breugel & Dickinson [not cited]). Flies appear to solve the bar-ground discrimination problem by switching between smooth movement and saccades (Mongeau & Frye, 2017; Mongeau et al., 2019 [not cited]). Thus, ignoring saccades is a major issue with the current study as it makes their model disconnected from flight behavior, which has been studied in a more natural context since the work of Poggio.

      Thank you for this helpful comment. We agree that including saccadic turns is essential and qualitatively improves the model. In the revised manuscript, we therefore expanded our bar-tracking model to incorporate an integrate-and-saccade strategy, now presented in Figure 2—figure supplement

      The manuscript now introduces this result as follows:

      (Line#190) “Finally, one important locomotion dynamics that a flying Drosophila exhibits while tracking an object is a rapid orientation change, called a “saccade” (Breugel and Dickinson, 2012; Censi et al., 2013; Heisenberg and Wolf, 1979). For example, while tracking a slowly moving bar, flies perform relatively straight flights interspersed with saccadic flight turns (Collett and Land, 1975; Mongeau and Frye, 2017). During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2--figure supplement 2). The overall structure of the modified model is akin to the one proposed in a previous study (Mongeau and Frye, 2017), and the amplitude of a saccadic turn was determined by the sum of the position and velocity functions (Figure 2--figure supplement 2A; see Equation 13 in Methods). When simulated, our model successfully reproduced experimental observations of saccade dynamics across different object velocities (Figure 2--figure supplement 2B-D) (Mongeau and Frye, 2017). Together, our models faithfully recapitulated the results of previous behavioral observations in response to singly presented visual patterns (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008; Mongeau and Frye, 2017).”

      Apart from Figures 1 and 2, most of our data—whether from simulations or behavioral experiments—use brief visual patterns lasting 200 ms or less. These stimuli trigger a single, rapid orientation change reminiscent of a saccadic flight turn. In this part of the paper, we essentially have examined how multiple visuomotor pathways interact to determine the direction of object-evoked turns when several visual patterns occur simultaneously.

      Critically, recent work showed that a group of columnar neurons (T3) appear specialized for saccadic bar tracking through integrate-and-fire computations, supporting the notion of parallel visual circuits for saccades and smooth movement (Frighetto & Frye, 2023 [not cited]).

      Thanks for bringing up this critical issue. We have now added this paper in the following part of the manuscript.

      (Line#193) “During this behavior, it has been proposed that visual circuits compute an integrated error of the horizontal bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau and Frye, 2017).”

      (Line#462) “Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Frighetto and Frye, 2023; Gruntman et al., 2018; Ketkar et al., 2020).”

      A major theme of this work is bar fixation, yet recent work showed that in the presence of proprioceptive feedback, flies do not actually center a bar (Rimniceanu & Frye, 2023). Furthermore, the same study found that yaw-free flies do not smoothly track bars but instead generate saccades. Thus prior work is in direct conflict with the work here. This is a major issue that requires more engagement by the authors.

      Thank you for your thoughtful comments and for drawing our attention to this important paper. In our experiments, bar fixation on oscillating vertical objects emerges during the “alignment” phase of the magneto-tether protocol. The pattern movement dynamics was similar those used by Rimniceanu & Frye (2023), yet the two studies differ in a key respect: Rimniceanu & Frye employed a motion-defined bar, whereas we presented a dark vertical bar against a uniform or random-dot background. The alignment success rate—defined as the proportion of trials in which the fly’s body angle is within ±25° of the target—was about 50 % (data not shown). Our alignment pattern consisted of three vertical stripes spanning ~40° horizontally; when we replaced it with a single, narrower stripe, the success rate was lowered (data not shown). These observations suggest that bar fixation in the magnetically tethered assay is less robust than in the rigid-tethered assay, although flies still orient toward highly salient vertical objects.

      We also observed that bar-evoked turns were elicited more reliably when the bar moved rapidly (45° in 200 ms) in the magneto-tether assay, although the turn magnitude was significantly smaller than the actual bar displacement (Figure 3).

      In response to the reviewer’s comment, we now added the following description in the paper regarding the bar fixation behavior, citing Rimniceanu&Frye 2023.

      (Line#239) “Another potential explanation arises from recent studies demonstrating that proprioceptive feedback provided during flight turns in a magnetically tethered assay strongly dampens the amplitude of wing and head responses (Cellini and Mongeau, 2022; Rimniceanu et al., 2023).”

      Relevance of the EC model: EC-related studies by the authors linked cancellation signals to saccades (Kim et al, 2014 & 2017). Puzzlingly, the authors applied an EC model to smooth movement, when the authors' own work showed that smooth course stabilizing flight turns do not receive cancellation signals (Fenk et al., 2021). Thus, in Fig. 4C, based on the state of the field, the efference copy signal should originate from the torque commands to initiate saccades, and not from torque to generate smooth movement. As this group previously showed, cancellation signals are quantitatively tuned to that of the expected visual input during saccades. Importantly, this tuning would be to the anticipated saccadic turn optic flow. Thus the authors' results supporting an all-or-none model appear in direct conflict with the author's previous work. Further, the addition-only model is not particularly helpful as it has been already refuted by behavioral experiments (Rimneceanu & Frye, Mongeau & Frye).

      Thank you for this constructive comment. Efference copy is best established for brief, discrete actions like flight saccades. While motor-related modulation of visual processing has been reported across short- and long-duration behaviours (Chiappe et al., 2010; Fujiwara et al., 2017; Kim et al., 2015, 2017; Maimon et al., 2010; Turner et al., 2022), only flight saccade-associated signals exhibit the temporal profile appropriate to cancel reafferent input. However, von Holst & Mittelstaedt (1950) originally formulated efference copy to explain the smooth optomotor response of hoverflies. In HS/VS recordings in previous studies, however, we could not detect membrane-potential changes tied to baseline wing-beat amplitude (data not shown), but further work is needed. 

      Note that visually evoked flight turns analyzed in this paper have relatively fast dynamics. Fenk et al. (2021) showed that HS cells carry EC-like motor signals during both loom-evoked turns and spontaneous saccades. Building on this, we tested whether object-evoked rapid turns modulate other visuomotor pathways. Although Fenk et al. also found that optomotor turns lack motor input to HS cells, the authors did not test whether the optomotor pathway suppresses other reflexes, such as loom-evoked turns. Our new behavioral data (Figure 6) show that optomotor turns indeed suppress loom-evoked turns, suggesting a potential EC signal arising from the optomotor pathway that inhibits loom-responsive visual neurons.

      In Kim et al. (2017), the authors argued that HS/VS neurons receive a “quantitatively tuned” efference copy that varies across cell types: yaw-sensitive LPTCs are strongly suppressed, roll-sensitive cells receive intermediate input, and pitch-sensitive cells receive little or none. We also showed that when the amplitude of ongoing visual drive changes, the amplitude of saccade-related potentials (SRPs) scales linearly. This proportionality does not imply a genuinely graded EC, however, because SRP amplitude could vary solely through changes in driving force (Vm – Vrest) with a fixed EC conductance. Crucially, SRPs do not fully suppress feed-forward visual signalling, arguing against an all-or-none EC mechanism.

      How, then, can the cellular and behavioural data be reconciled? Silencing HS/VS neurons—or their primary inputs, the T4/T5 neurons—does not markedly diminish the optomotor response in flight (Fenk et al., 2014; Kim et al., 2017), indicating the presence of additional, as-yet-unidentified pathways.

      Physiological recordings from other visual neurons that drive the optomotor response in flying Drosophila are therefore needed to determine how strongly they are suppressed during loom-evoked turns.

      Behavioral evidence for all-or-none EC model: The authors state "unless the stability reflex is suppressed during the flies' object evoked turns, the turns should slow down more strongly with the dense background than the sparse one". This hypothesis is based on the fact that the optomotor response magnitude is larger with a denser background, as would be predicted by an EMD model (because there are more pixels projected onto the eye). However, based on the authors' previous work, the EC should be tuned to optic flow and thus the turning velocity (or amplitude). Thus the EC need not be directly tied to the background statistics, as they claim. For instance, I think it would be important to distinguish whether a mismatch in reafferent velocity (optic flow) links to distinct turn velocities (and thus position). This would require moving the background at different velocities (co- and anti-directionally) at the onset of bar motion. Overall, there are alternative hypotheses here that need to be discussed and more fully explored (as presented by Bender & Dickinson and in work by the Maimon group).

      We appreciate the reviewer’s important suggestion. In response, we performed the recommended experiment. In Figures 5 and 6 of the revised manuscript, we now present how bar- or loom-evoked flight turns affect the response to a moving background pattern. These experiments revealed that bar-evoked turns do not suppress the optic flow response, whereas loom-evoked turns strongly suppress it. Specifically, when background motion began 100 ms after the onset of loom expansion, the response to the background was significantly suppressed. Although weak residual responses to the background motion were observed in this case, this could be due to background motion occurring outside of the suppression interval, which may correspond in duration to the duration of flight turns (Figure 6C,D). 

      The lack of suppression of the optic flow response during and after bar-evoked turns appears to suggest that the responses are added linearly (Figure 5), seemingly contradicting the lack of dynamic change when the background dot density was altered (Figure 7, Figure 7–figure supplement 1). That is, the experimental result in Figure 5 supports either an addition-only or a graded efference copy (EC) model. However, the result in Figure 7 supports an all-or-none EC model. If a graded EC were used, the amplitude of the EC should be updated almost instantaneously when the static background changes.

      Another possibility is that the optic flow during self-generated turns in a static background is extremely weak compared to the optic flow input generated by physically moving the pattern, perhaps due to the rapid nature of head movements. Indeed, detailed kinematic analysis of head movement during spontaneous saccades in blow flies revealed that the head reaches the target angle before the body completes the orientation change, making the effective speed of reafferent optic flow higher than the speed of body rotation (Hateren and Schilstra, 1999). To test these hypotheses, further experiments will be needed for bar-evoked flight turns.

      Publishing the reviewed preprint:

      (1) The Reviewed Preprint (including the full text of the preprint we reviewed, the eLife assessment, and public reviews) will typically be published in two weeks' time.

      Please let us know if you would like to provide provisional author responses to be posted at the same time (if so, please send these by email). Please do not resubmit within the next two/three weeks, as we will need to publish the first version of the Reviewed Preprint first.

      If there are any factual errors in the eLife assessment or public reviews, or other issues we should be aware of, please let us know as soon as possible.

      (2) After publication of the Reviewed Preprint, you can use the link below to submit a revised version. There is no deadline to resubmit. Before resubmitting, please ensure that you update the preprint at the preprint server to correspond with the revised version. Upon submitting a revised version, we will ask the editors and reviewers if it's appropriate to update their assessment and public reviews, which will be included alongside the revised Reviewed Preprint. At that time we will also post the recommendations to the authors and the author responses you provide with the revised version. In the author response, please respond to the public reviews (where relevant) and the recommendations to the authors.

      (3) Alternatively, you can proceed with the current version of the Reviewed Preprint (once published), without revisions, and request an eLife Version of Record. See the Author Guide for further information: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#vor. However, most authors decide to request a Version of Record after a round of revision.

      (4) After publication of eLife's Reviewed Preprint, you also have the option to submit/publish in another journal instead: if you choose to do this, please let us know so we can update our records.

      The reviewers identified two key revisions that could improve the assessment of the paper:

      (1) Consideration of saccades within the model framework (outlined by reviewer 3).

      (2) Addition of physiology data to support the conclusions of the paper (outlined by reviewer 2). If this is not feasible within the timescale of revisions, the paper would need to be revised to clarify that the model leads to a hypothesis that would need to be tested with future physiology experiments.

      Thank you for these comments.

      Regarding revision point #1, we have added Figure 2–figure supplement 2, where we incorporated our position-velocity model (estimated in Figure 1) into the framework of the integrate-and-saccade model. A detailed description of this model is now provided in the main text (Lines 190–203).

      For revision point #2, obtaining electrophysiological evidence for efference copy remains challenging, as neither the visual neurons nor the efference-copy neuron has been identified for the wing optomotor response. As suggested by the reviewers, we have revised the title of the paper to reduce emphasis on efference copy and have noted electrophysiological recordings as a direction for future work.

      old title: A visual efference copy-based navigation algorithm in Drosophila for complex visual environments

      new title: Integrative models of visually guided steering in Drosophila

      Specific recommendations are detailed below.

      Reviewer #2 (Recommendations For The Authors):

      To directly demonstrate if an efference copy is non-scaled, the following experiments can be helpful: record from HS/VS cells and examine the relation between the amplitude of the succade-suppression signal vs. succade amplitude.

      Thanks for raising this important point. We previously carried out the suggested analysis for loom-evoked saccades in Fenk et al. (2021). There, significant correlations emerged between wing-response amplitude and saccade-related potentials (Figures 2F and 3C). However, we did not interpret the strong correlation (r ≈ 0.8) as evidence for a graded efference copy, because the amplitude of saccade-related potentials appeared to be bimodal. Upon presentation of the looming stimulus, flies either executed large evasive turns or showed minimal changes in wing-stroke amplitude. Large wing responses were accompanied by strong, saturated suppression of HS-cell membrane potential, whereas trials without wing responses produced only weak modulations—reflected in the bimodal distribution of saccade-related potential amplitudes (Figure 3C). 

      Importantly, in rigidly tethered preparations—where these potentials are typically measured—the absence of proprioceptive feedback can itself drive wingbeat amplitudes to saturation during saccades. We therefore reasoned that the lack of intermediate-sized flight saccades would naturally yield correspondingly saturated saccade-related potentials, even if a graded EC system is in play. 

      In Kim et al. (2017), we also performed a comprehensive analysis of spontaneous saccade-related potentials across all HS/VS cell types. When we later examined the relationship between saccade amplitude and the corresponding saccade-related potentials in each cell type, we could not find any statistically significant correlation (unpublished data).

      measure how much a weak visual stimulus and a strong visual stimulus are suppressed by the suppression signal. If the signal is non-scaled, visual stimuli should always be suppressed independently of their intensities.

      Thank you for this important suggestion. As mentioned in our response to the previous comment, we believe it is not feasible to record from neurons responsible for the body optomotor response at this point, as their identity remains unknown. Regarding the HS/VS cells, our previous study showed that HS cells are not always fully suppressed. The changes in saccade-related potential amplitude can be described as a linear function of the pre-saccadic visually-evoked membrane potential (Figure 7 in Kim et al., 2017). 

      As suggested by Fenk et al. 2014 (doi: 10.1016/j.cub.2014.10.042), HS cells might also be responsive to a moving bar. If that is the case, and if you present a bar and background (either sparse or dense) in a closed-loop manner to a head-fixed fly, HS cells might be sensitive only to the bar but not to the background (independently of the density).

      Thanks for pointing out this important issue. HS cells indeed respond strongly to the horizontal movement of a vertical bar, as expected given that their receptive fields are formed by the integration of local optic flow vectors. In one of our previous studies (Supplemental Figure 1 in Kim et al., 2015), we showed that the response amplitude to a single vertical bar is roughly equivalent to that elicited by a vertical grating composed of 12 bars of the same size. Therefore, we believe that HS cells are likely to contribute to the head response to a moving vertical bar. In a body-fixed flight simulator, HS cells would respond only to the bar if the bar runs in a closed loop with a static background. In this scenario, HS cells are likely to play a role in the head optomotor response.

      Note also that the role of HS cells in the wing optomotor response remains unresolved. Unilateral activation of HS cells has been shown to elicit locomotor turns in walking Drosophila (Fujiwara et al., 2017), as well as in flying individuals (unpublished data from our lab). However, a previous study also showed that strong silencing of HS/VS cells significantly reduced the head optomotor response, but not the wing optomotor response (Kim et al., 2017).

      If neurophysiology is technically challenging, an alternative way might pay attention to a head movement that exclusively follows the background (Fox et al., 2014 (doi: 10.1242/jeb.080192)). Because HS cells are thought to promote head rotation to background motion, a non-scaled suppression signal on HS cells would always suppress the head rotation independently of the background density.

      Thanks for this helpful comment. We have analyzed head movements during bar-evoked flight turns (Figure 7–figure supplement 1B) and found no significant changes across different background dot densities. We think that this might suggest that HS cells are unlikely to receive suppressive inputs during bar-evoked turns, akin to the lack of modulation during optomotor turns.

      Another way to separate a potential efference copy from other mechanisms (more global inhibition) is the directionality. A global inhibition would suppress the response to the background even if the background moves in the same direction as self-motion, but the efference copy would not.

      Thanks for this important point. In Heisenberg and Wolf, 1979, it was proposed that modulation might be bidirectional, with behavioral effects observed only for perturbations in the “unexpected” direction. In our new data on loom-evoked turns (Figure 6), the suppression appears equally strong for background motion in either direction, supporting an all-or-none suppression mechanism.

      Besides, in general, it is unclear if you think an efference copy operates both in smooth pursuits and saccades or if such a signal is only present during saccades. Your previous neurophysiological work supports the latter. Are your behavioral results consistent with the previous saccade suppression idea, or do you propose a new type of efference copy that also operates in smooth pursuits?

      Thanks for raising this important point. von Holst and Mittelstaedt (1950) originally introduced the concept of efference copy to explain the smooth optomotor response. We previously analyzed electrophysiological recordings from HS cells for membrane-potential changes associated with slow deviations in wing-steering angle but found none. However, this negative result does not entirely rule out modulation of visual processing during smooth flight turns, given the slow drift in membrane potential observed in most whole-cell recordings.

      In this study, We examined only the interactions among visuomotor pathways during these rapid flight turns as the dynamics of visually evoked turns are almost as rapid as spontaneous saccades. Our data reveal that interactions between distinct visuomotor reflexes are more diverse than previously appreciated.

      Minor comments:

      Line 108, 109: match the description between here and the labels in Fig. 1F.

      Thank you for indicating this issue. We have defined the general equation to obtain the position and velocity components in the main text lines 108,109, but due to a slight asymmetry in the data (Fig. 1E) we used the approach indicated in Fig. 1F. and explained in lines 113-117.

      Fig.1 F: If the position-dependent component is due to fatigue, the tuning curve's shape is likely changed (shrunk or extended) depending on the stimulus speed. How can you generalize the tuning curve shown here? Does the result hold even if the stimulus speed/contrast/spatial frequency is changed?

      We appreciate this indication. We believed that fatigue may be the reason why the wing response to the grating stimulus showed that significant decay (Fig. 1E). As you mention, the stimulus speed would increase the amplitude of the fly’s response up to a saturation point. We addressed this in our model by multiplying the derived value by the angular velocity of the grating.

      Regarding the contrast, and spatial frequency we did not test it experimentally, instead, we simulated our model for changing visual feedback (Fig. 4A, B), which can be seen as increasing/decreasing contrast of a grating. An increase in the contrast would increase the response of the fly to the grating and so will contribute to dampening the response to the foreground object (Fig. 4C).

      Line 233-255: Here, the description sounds like you will consider several parallel objects (e.g., two stripes) in the visual field instead of the combination of the figure and background (which is referred to in the following paragraph).

      Thank you for pointing it out. Indeed it was slightly ambiguous. We have addressed this by explaining the specific situation of a combination of an object and the background in lines 231-233.

      Figure 6C: you kept the foreground visual field between sparse and dense random dot backgrounds to keep the bar's saliency. Is it sure that this does not influence the difference in the fly's response to these two backgrounds (in Figure 6B)?

      This is a good point that we have also discussed internally. We also carried out similar experiments with a fully covered background and found no significant differences (Figure 7–figure supplement 1).

      Reviewer #3 (Recommendations For The Authors):

      Identify and analyze flight saccade dynamics in the raw trajectories (e.g., Fig. 3B). There should be some since the bar is near the 'sweet spot' for triggering saccades (see Mongeau & Frye, 2017).

      Thank you for bringing up this interesting point. In previous work, it was reported that the fly fixated on a vertical bar through saccadic turns rather than smooth-tracking (Mongeau & Frye, 2017). When the bar width was thin (<15 deg) there was barely one saccade per second (Mongeau & Frye, 2017, Fig. 4). In our magno tether essay (Fig. 3A, B) the object width was 11.25 degrees, and the object moved for a short time window, and so the fly only generated the saccade related to the onset of the object. It could not be considered as a saccade some small turns of a few degrees that are likely related to small perturbations in comparison to those previously reported (Mongeau & Frye, 2017). Additionally, in our protocol (Fig. 3A) from onset time (‘go’ mark), only a single object moved, within an empty background, so in principle there is no trigger for a switch to a smooth movement. We addressed this in lines x-x.

      Consider updating the Poggio model with flight saccades (switched, integrate-and-fire).

      We appreciate this suggestion. Following previous work (Mongeau et al., 2017), we expanded our model to include a saccade mechanism: the torque produced by the summed position- and velocity-dependent components is now replaced by an integrate-and-fire saccade (Figure 2—figure supplement 2). We optimized the saccade interval and amplitude so that both vary linearly with stimulus amplitude and faithfully reproduce the kinematic properties reported previously (Mongeau et al., 2017).  

      Please engage more with the literature, especially work that directly conflicts with your conclusions (see above). Also, highly relevant work by Bender & Dickinson was not sufficiently discussed. Spot results presented in Fig. 3 should be contextualized in light of the work of Mongeau et al., 2019, who performed similar experiments and identified a switch in saccade valence.

      We appreciate your pointing out the relevant previous work. We have added references to the following papers and tried to describe the relationship between our data and previous ones.

      Bender & Dickinson 2006

      (Line#162) “This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023; Mongeau and Frye, 2017).”

      (Line#218) “We tested the predictions of our models with flies flying in an environment similar to that used in the simulation (Figure 3A). A fly was tethered to a short steel pin positioned vertically at the center of a vertically oriented magnetic field, allowing it to rotate around its yaw axis with minimal friction (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023).”

      (Line#238) “To determine if our assay imposes additional friction compared to other assays used in previous studies, we analyzed the dynamics of spontaneous saccades during the “freeze” phase (Figure 3–figure supplement 1A). We found their duration and amplitude to be within the range reported previously (Bender and Dickinson, 2006b; Mongeau and Frye, 2017) (Figure 3–figure supplement 1B-D). 

      Mongeau et al., 2019

      (Line#196) “During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2–figure supplement 2).”

      This paper shows that the dynamics of saccadic flight turns elicited by a rotating bar or spot determine whether flies display attraction or aversion. In that study, the visual stimulus—a bar or spot—rotated slowly at a constant 75 deg s⁻¹. By contrast, in our Figure 3 the object moves much faster, driving the neural “integrator” to saturation and triggering an almost immediate flight turn. In Mongeau et al. (2019), saccades occur at variable times and their amplitudes and directions are more stochastic, again reflecting the slower stimulus speed. Because these differences all arise from the disparity in object speed, we did not cite Mongeau et al. (2019) in Figure 3 or the associated text.

      In addition to the two papers cited above, we have incorporated several relevant studies on the Drosophila visuomotor control identified through the reviewers’ insightful comments. Examples include:

      Frighetto G, Frye MA. 2023 (Line#195, 464)

      Rimniceanu et al., 2023 (Line#241)

      Cellini & Mongeau 2020 (Line#91)

      Cellini & Mongeau 2022 (Line#241)

      Cellini et al., 2022 (LIne#91, 162, 218)

      Many citations are not in the proper format (e.g. using numbers rather than authors' last name).

      Thank you for letting us know. We have changed the remaining citations to the proper format.

    1. eLife Assessment

      This valuable study reports evidence that items maintained in working memory can bias attention in an oscillatory manner, with the attentional capture effect fluctuating at theta frequency. The study provides incomplete evidence that this dynamic attentional bias is associated with oscillatory neural mechanisms, particularly in the alpha and theta bands, as measured by EEG. The study will be relevant for researchers studying attention, working memory, and neural oscillations, particularly those interested in how memory and perception interact over time.

    2. Reviewer #1 (Public review):

      Summary:

      In the presented paper, Lu and colleagues focus on how items held in working memory bias someone's attention. In a series of three experiments, they utilized a similar paradigm in which subjects were asked to maintain two colored squares in memory for a short and variable time. After this delay, they either tested one of the memory items or asked subjects to perform a search task.

      In the search task, items could share colors with the memory items, and the authors were interested in how these would capture attention, using reaction time as a proxy. The behavioral data suggest that attention oscillates between the two items. At different maintenance intervals, the authors observed that items in memory captured different amounts of attention (attentional capture effect).

      This attentional bias fluctuates over time at approximately the theta frequency range of the EEG spectrum. This part of the study is a replication of Peters and colleagues (2020).

      Next, the authors used EEG recordings to better understand the neural mechanisms underlying this process. They present results suggesting that this attentional capture effect is positively correlated with the mean amplitude of alpha power. Furthermore, they show that the weighted phase lag index (wPLI) between the alpha and theta bands across different electrodes also fluctuates at the theta frequency.

      Strengths:

      The authors focus on an interesting and timely topic: how items in working memory can bias our attention. This line of research could improve our understanding of the neural mechanisms underlying working memory, specifically how we maintain multiple items and how these interact with attentional processes. This approach is intriguing because it can shed light on neuronal mechanisms not only through behavioral measures but also by incorporating brain recordings, which is definitely a strength.

      Subjects performed several blocks of experiments, ranging from 4 to 30, over a few days, depending on the experiment. This makes the results - especially those from behavioral experiments 2 and 3, which included the most repetitions - particularly robust.

      Weaknesses:

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57), indicating a non-significant relationship.

      The behavioral data are interesting, but in my opinion, they closely replicate Peters and colleagues (2020) using a different paradigm. In that study, participants memorized four spatial positions that formed the endpoints of two objects, and one object was cued. Similarly, reaction times fluctuated at theta frequency, and there was an anti-phase relationship between the two objects. The main novelty of the present study is that this bias can be transferred to an unrelated task. While the current study extends Peters and colleagues' findings to a different task context, the lack of a thorough, direct comparison with Peters et al. limits the clarity of the novel insights provided.

      Cohen, M. X. (2015). Effects of time lag and frequency matching on phase-based connectivity. Journal of Neuroscience Methods, 250, 137-146.

      Peters, B., Kaiser, J., Rahm, B., & Bledowski, C. (2020). Object-based attention prioritizes working memory contents at a theta rhythm. Journal of Experimental Psychology: General, 150(6), 1250-1256.

    3. Reviewer #2 (Public review):

      The information provided in the current version of the manuscript is not sufficient to assess the scientific significance of the study.

      (1) In many cases, the details of the experiments or behavioral tasks described in the main text are not consistent with those provided in the Materials and Methods section. Below, I list only a few of these discrepancies as examples:

      a) For Experiment 1, the Methods section states that the detection stimulus was presented for 2000 ms (lines 494 and 498), but Figure 1 in the main text indicates a duration of 1500 ms.

      b) For Experiment 2, not only is the range of SOAs mentioned in the Methods section inconsistent with that shown in the main text and the corresponding figure, but the task design also differs between sections.

      c) For Experiment 3, the main text indicates that EEG recordings were conducted, but in the Methods section, the EEG recording appears to have been part of Experiment 2 (lines 538-540).

      (2) The results described in the text often do not match what is shown in the corresponding figure. For example:

      a) In lines 171-178, the SOAs at which a significant difference was found between the two conditions do not appear to match those shown in Figure 2A.

      b) In Figure 4, the figure legend (lines 225-228) does not correspond to the content shown in the figure.

      c) In Figure 9, not sufficient information is provided within the figure or in the text, making it difficult to understand. Consequently, the results described in the text cannot be clearly linked to the figure.

      (3) Insufficient information is provided regarding the data analysis procedures, particularly the permutation tests used for the data presented in Figures 2B, 4, and 10. The results shown in these figures are critical for the main conclusions drawn in the manuscript.

      Given these issues, it is not possible to provide a detailed review of the study, particularly regarding its scientific significance.

    1. eLife Assessment

      This study presents valuable computational findings on the neural basis of learning new motor memories and the savings using recurrent neural networks. The evidence supporting the claims of the authors is solid, but it would benefit from more controls and from considering the role of explicit strategies and other brain regions. This work will be of interest to computational and experimental neuroscientists working in motor learning.

    2. Reviewer #1 (Public review):

      Summary:

      Shahbazi et al used a recurrent neural network model trained to control a musculoskeletal model of the arm to investigate how neural populations accommodate activity patterns underpinning savings. The paper draws upon the recent finding of a "uniform shift" in preparatory activity in monkey motor cortex associated with savings, and leverages full access to a computational model to establish causality.

      Strengths:

      The paper is well written, and the figures are clearly presented. The key finding that the uniform shift first reported based on neural recordings by Sun et al. emerges in artificial neural networks performing a similar task is interesting and well-backed by their analyses. Manipulating this uniform shift to show that it drives behavioural savings is an important causal confirmation of the proposal by Sun et al.

      Weaknesses / Comments:

      As mentioned earlier, the core results are well backed by the analyses. Most of my comments relate to adding more controls and additional questions that could be explored with the model to strengthen the paper.

      (1) Savings are quantified as more rapid relearning of the FF upon re-exposure (e.g., Figure 3). This finding is based on backpropagation through time, but would this hold when using a different optimiser, e.g., FORCE?

      (2) The authors should include a "null model" showing that training on a different reaching task following NF, as opposed to FF2, won't show something akin to a uniform shift during preparation due to the adoption of TDR and having similar targets.

      (3) The analyses of network activity during movement preparation (Figure 4) nicely replicate the key finding in Sun et al, but I think the authors could leverage the full access to their network and go further, e.g., by examining changes (or the lack of) during execution in FF2 with respect to FF (and perhaps in a future NF2 with respect to NF), including whether execution activity lives also lives in parallel hyperplanes, etc.

      (4) Related to the above, while the results are interesting and the paper is well done, I kept wishing that the authors had done "more" with their model. This could be one or two final sections on "predictions" that would nicely complement their "validation" of the uniform shift, and that, in my opinion, would greatly increase the impact of the paper. In particular:<br /> a) What would be the effect of learning more "tasks"? For example, is there a limit on how many fields can be learned? (You show something related by manipulating network size, but this is slightly different.)<br /> b) Figure 5 is a nice causal demonstration that the uniform shift is related to savings. However, and related to comment #3, it'd be interesting to see more details about how the behaviour and the network activity changes as preparatory activity shifts along this axis, in particular regarding how moving the preparatory states affect the organisation and dynamics of upcoming execution activity -these are the kind of intuitions that modelling studies like this one can provide.<br /> c) The authors focus on a task design that spans baseline, FF, NF, FF2 to replicate the original study by Sun et al. However, it would be interesting if they generated predictions for neural changes to other types of tasks that have been studied behaviourally. These could include, for example: (i) modelling a visuomotor rotation or a mirror reversal task; (ii) having to adapt to a FF in the opposite direction; (iii) investigating the role of adding an explicit context and having the networks learn multiple FF; and (iv) trying to learn FF fields in opposite directions, perhaps restricted to specific targets. As the authors know, all these questions and more have been studied with similar behavioural paradigms, and it would be nice to see what neural predictions are generated by this model.

      (5) On the Discussion: When extrapolating from neural network results to animals, the fact that your networks can learn implicitly doesn't mean that animals do learn implicitly. Indeed, I think the consensus view is that different perturbations may lead to the expression of different types of savings (e.g., FF vs VR, which seems to be more explicit). Besides, these different mechanisms may be primarily implemented by brain regions less directly tied to motor control (e.g., cerebellum, parietal cortex?), which are not directly implemented in the authors' model.

      These aspects (limitations) should be discussed in the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.<br /> b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

    1. eLife Assessment

      This useful study replicates a previous finding that information about peripherally presented visual stimuli is represented in the foveal visual cortex, and extends it by demonstrating that these representations are similar to those evoked by foveally presented stimuli. The authors' gaze-contingent fMRI design provides solid evidence for these findings. Some of the stronger theoretical claims, such as that the effects are due to predictive pre-saccadic remapping, are not fully supported by the current results.

    2. Reviewer #1 (Public review):

      Summary:

      The main contributions of this paper are: (1) a replication of the surprising prior finding that information about peripherally-presented stimuli can be decoded from foveal V1 (Williams et al 2008), (2) a new demonstration of cross-decoding between stimuli presented in the periphery and stimuli presented at the fovea, (3) a demonstration that the information present in the fovea is based on shape not semantic category, and (4) a demonstration that the strength of foveal information about peripheral targets is correlated with the univariate response in the same block in IPS.

      Strengths:

      The design and methods appear sound, and finding (2) above is new, and importantly constrains our understanding of this surprising phenomenon. The basic effect investigated here is so surprising that even though it has been replicated several times since it was first reported in 2008, it is useful to replicate it again.

      Weaknesses:

      (1) The paper, including in the title ("Feedback of peripheral saccade targets to early foveal cortex") seems to assume that the feedback to foveal cortex occurs in conjunction with saccade preparation. However, participants in the original Williams et al (2008) paper never made saccades to the peripheral stimuli. So, saccade preparation is not necessary for this effect to occur. Some acknowledgement and discussion of this prior evidence against the interpretation of the effect as due to saccade preparation would be useful. (e.g., one might argue that saccade preparation is automatic when attending to peripheral stimuli.)

      (2) The most important new finding from this paper is the cross-decodability between stimuli presented in the fovea and stimuli presented in the periphery. This finding should be related to the prior behavioral finding (Yu & Shim, 2016) that when a foveal foil stimulus identical to a peripheral target is presented 150 ms after the onset of the peripheral target, visual discrimination of the peripheral target is improved, and this congruency effect occurred even though participants did not consciously perceive the foveal stimulus (Yu, Q., & Shim, W. M., 2016). Modulating foveal representation can influence visual discrimination in the periphery (Journal of Vision, 16(3), 15-15).

      (3) The prior literature should be laid out more clearly. For example, most readers will not realize that the basic effect of decodability of peripherally-presented stimuli in the fovea was first reported in 2008, and that that original paper already showed that the effect cannot arise from spillover effects from peripheral retinotopic cortex because it was not present in a retinotopic location between the cortical locus corresponding to the peripheral target and the fovea. (For example, this claim on lines 56-57 is not correct: "it remains unknown 1) whether information is fed back all the way to early visual areas".) What is needed is a clear presentation of the prior findings in one place in the introduction to the paper, followed by an articulation and motivation of the new questions addressed in this paper. If I were writing the paper, I would focus on the cross-decodability between foveal and peripheral stimuli, as I think that is the most revealing finding.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigated whether the identity of a peripheral saccade target object is predictively fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.

      Strengths:

      This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.

      Weaknesses:

      The conclusions feel a bit over-reaching; some strong theoretical claims are not fully supported, and the framing of prior literature is currently too narrow. A critical weakness lies in the inability to test a distinction between these findings (claiming to demonstrate that "feedback during saccade preparation must underlie this effect") and foveal feedback previously found during passive fixation (Williams et al., 2008). Discussions (and perhaps control analysis/experiments) about how these findings are specific to the saccade target and the temporal constraints on these effects are lacking. The relationship between the concepts of foveal prediction, foveal feedback, and predictive remapping needs more thorough treatment. The choice to use only 4 stimuli is justified in the manuscript, but remains an important limitation. The IPS results are intriguing but could be strengthened by additional control analysis. Finally, the manuscript claims the study was pre-registered ("detailing the hypotheses, methodology, and planned analyses prior to data collection"), but on the OSF link provided, there is just a brief summary paragraph, and the website says "there have been no completed registrations of this project".

      Specifics:

      (1) In the eccentricity-dependent decoding results (Figure 2B), are there any statistical tests to support the results being a U-shaped curve? The dip isn't especially pronounced. Is 4 degrees lower than the further ones? Are there alternative methods of quantifying this (e.g., fitting it to a linear and quadratic function)?

      (2) In the parametric modulation analysis, the evidence for IPS being the only region showing stronger fovea vs peripheral beta values was weak, especially given the exploratory nature of this analysis. The raw beta value can reflect other things, such as global brain fluctuations or signal-to-noise ratio. I would also want to see the results of the same analysis performed on the control condition decoding results.

      (3) Many of the claims feel overstated. There is an emphasis throughout the manuscript (including claims in the abstract) that these findings demonstrate foveal prediction, specifically that "image-specific feedback during saccade preparation must underlie this effect." To my understanding, one of the key aspects of the foveal prediction phenomenon that ties it closely to trans-saccadic stability is its specificity to the saccade target but not to other objects in the environment. However, it is not clear to what degree the observed findings are specific to saccade preparation and the peripheral saccade target. Should the observers be asked to make a saccade to another fixation location, or simply maintain passive fixation, will foveal retinotopic cortex similarly contain the object's identity information? Without these control conditions, the results are consistent with foveal prediction, but do not definitively demonstrate that as the cause, so claims need to be toned down.

      (4) Another critical aspect is the temporal locus of the feedback signal. In the paradigm, the authors ensured that the saccade target object was never foveated via the gaze-contingent procedure and a conservative data exclusion criterion, thus enabling the test of feedback signals to foveal retinotopic cortex. However, due to the temporal sluggishness of fMRI BOLD signals, it is unclear when the feedback signal arrives at the foveal retinotopic cortex. In other words, it is possible that the feedback signal arrives after the eyes land at the saccade target location. This possibility is also bolstered by Chambers et al. (2013)'s TMS study, where they found that TMS to the foveal cortex at 350-400 ms SOA interrupts the peripheral discrimination task. The authors should qualify their claims of the results occurring "during saccade preparation" (e.g., pg 1 ln 22) throughout the manuscript, and discuss the importance of temporal dynamics of the effect in supporting stability across saccades.

      (5) Relatedly, the claims that result in this paradigm reflect "activity exclusively related to predictive feedback" and "must originate from predictive rather than direct visual processes" (e.g., lines 60-65 and throughout) need to be toned down. The experimental design nicely rules out direct visual foveal stimulation, but predictive feedback is not the only alternative to that. The activation could also reflect mental imagery, visual working memory, attention, etc. Importantly, the experiment uses a block design, where the same exact image is presented multiple times over the block, and the activation is taken for the block as a whole. Thus, while at no point was the image presented at the fovea, there could still be more going on than temporally-specific and saccade-specific predictive feedback.

      (6) The authors should avoid using the terms foveal feedback and foveal prediction interchangeably. To me, foveal feedback refers to the findings of Williams et al. (2008), where participants maintained passive fixation and discriminated objects in the periphery (see also Fan et al., 2016), whereas foveal prediction refers to the neural mechanism hypothesized by Kroell & Rolfs (2022), occurring before a saccade to the target object and contains task irrelevant feature information.

      (7) More broadly, the treatment of how foveal prediction relates to saccadic remapping is overly simplistic. The authors seem to be taking the perspective that remapping is an attentional phenomenon marked by remapping of only attentional/spatial pointers, but this is not the classic or widely accepted definition of remapping. Within the field of saccadic remapping, it is an ongoing debate whether (/how/where/when) information about stimulus content is remapped alongside spatial location (and also whether the attentional pointer concept is even neurophysiologically viable). This relationship between saccadic remapping and foveal prediction needs clarification and deeper treatment, in both the introduction and discussion.

      (8) As part of this enhanced discussion, the findings should be better integrated with prior studies. E.g., there is some evidence for predictive remapping inducing integration of non-spatial features (some by the authors themselves; Harrison et al., 2013; Szinte et al., 2015). How do these findings relate to the observed results? Can the results simply be a special case of non-spatial feature integration between the currently attended and remapped location (fovea)? How are the results different from neurophysiological evidence for facilitation of the saccade target object's feature across the visual field (Burrow et al., 2014)? How might the results be reconciled with a prior fMRI study that failed to find decoding of stimulus content in remapped responses (Lescroart et al, 2016)? Might this reflect a difference between peripheral-to-peripheral vs peripheral-to-foveal remapping? A recent study by Chiu & Golomb (2025) provided supporting evidence for peripheral-to-fovea remapping (but not peripheral-to-peripheral remapping) of object-location binding (though in the post-saccadic time window), and suggested foveal prediction as the underlying mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors used fMRI to determine whether peripherally viewed objects could be decoded from the foveal cortex, even when the objects themselves were never viewed foveally. Specifically, they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from the foveal cortex. They found that object shape, but not semantic category, could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.

      Strengths:

      I think this is another nice demonstration that peripheral information can be decoded from / is processed in the foveal cortex - the methods seem appropriate, and the experiments and analyses are carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.

      Weaknesses:

      There are a couple of reasons why I think the main theoretical conclusions drawn from the study might not be supported, and why a more thorough investigation might be needed to draw these conclusions.

      (1) The authors used a blocked design, with each object being shown repeatedly in the same block. This meant that the stimulus was entirely predictable on each block, which weakens the authors' claims about this being a predictive mechanism that facilitates object recognition - if the stimulus is 100% predictable, there is no aspect of recognition or discrimination actually being tested. I think to strengthen these claims, an experiment would need to have unpredictable stimuli, and potentially combine behavioural reports with decoding to see whether this mechanism can be linked to facilitating object recognition across saccades.

      (2) Given that foveal feedback has been found in previous studies that don't incorporate saccades, how is this a mechanism that might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli? I don't think this paper addresses this point, which would seem to be crucial to differentiate the results from those of previous studies.

    1. eLife Assessment

      This important study uses a combination of eye-tracking and computational models based on Active Inference to explain behavior in a gaze-contingent cued-reversal paradigm with 6 - 10-month-old infants. The study demonstrates solid evidence that the same rigorous computational modeling standards commonly applied in studies in adults can also be applied in studies of infants' learning, and a cluster analysis reveals that the parameters of the winning model provide better pattern separation between identified subgroups than behavior or questionnaire data alone. However, the evidence for some specific claims is incomplete, due to poor behavioral performance, unclear significance of the pupil data, and complexity of the model fitting; the claims regarding implications for psychiatry were also considered to be too strong and unsupported by evidence. This work will be of interest to developmental psychologists and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed a new gaze-based reversal task to study 6 - 10-month-old infants, in what would typically be a very challenging age group to study behavior related to learning, exploration, and perseveration. Here, the research question is excellently motivated by pointing out the limitation of past work that has typically studied adult clinical populations using similar approaches, which presents only the endpoint of the developmental process. Thus, there is important clinical and scientific value in studying much earlier stages in the developmental process. Here, the authors accomplish this with a new gaze-based paradigm that allows them to fit a variety of complex computational models to data from 41 infants. The main advantage of their winning model is that the parameters provide better pattern separation between two identified clusters of participants compared to behavioral variables alone.

      Strengths:

      Overall, the paper is well-written, and the models and analyses are applied in a principled and thorough fashion. The authors do an excellent job of both motivating their research question and addressing it through their task and set of computational models. The scope is also quite ambitious, modeling both choices and pupillary responses, while also using the models to generate behavior that is comparable to the experimental data and performing a cluster analysis to compare the suitability of the model parameters vs. other behavioral/questionnaire data in performing pattern separation between participants.

      Weaknesses:

      However, despite these strengths, I had a number of concerns that may limit the reliability of the findings.

      First, given the fact that the rewards for the initial pre-reversal setting are defined by the first choice of the infants, it was unclear to me whether the behavioral patterns in Figure 2 really support the fact that there was in fact, (prediction-error-based) learning in the task at all. The behavioral analyses proceed very briskly without really addressing this question, before rapidly jumping off the complexity cliff to present the models. However, even with the models, the winning model only had free parameters for preference (c) and a left-right dominance (epsilon), which don't really capture mechanisms related to learning. The epistemic and extrinsic components included in the model at the 2nd stage could potentially help shed light on this question, but (unless I've misunderstood) they seem to be all-or-nothing parts of the model, and thus don't reappear in later analyses (e.g., cluster analysis) because they are not individual-specific parameters. Thus, the main learning-relevant aspects of the model seem divorced from the ability to perform clustering or other clinically relevant diagnoses downstream. Thus, it was unclear to me whether the results really capture mechanisms related to cognitive flexibility that motivate the manuscript in the introduction.

      My other main concern was the complexity of the models and the way model comparison was performed using the three stages. First of all, the set of models is quite complex and risks alienating many developmental psychologists who would otherwise be very interested in these findings. Thus, I'm curious why the authors didn't consider including much simpler context-based RL models (e.g., Rescorla-Wagner/Q-learning models) that explicitly use prediction-error updates and whose simplicity might better match the simplicity of the behavior that 6-10 month infants are capable of displaying. Certainly, preference (as an inverse temperature parameter for a softmax policy) and left-right dominance (as a bias) could be implemented with these much simpler models. Second, while the three-stage model comparison seems somewhat principled, it left me questioning whether the 1st stage or 2nd stage results might be impacted by later stages. For instance, if the Simple-discard model were to still win in the first stage, once omega and eta have been eliminated as free parameters. Of course, I understand that there may be feasibility issues with testing all combinatorial variants of the model. But it was unclear why this specific order was chosen and what consequences this sequential dependency in the model fitting may have for the conclusions. And while model identifiability is stated in the abstract as one of the strengths of this approach, there don't seem to be any clear analyses supporting this fact. I would have loved to see a model recovery analysis (see Wilson & Collins et al., eLife 2019) to support this statement.

    3. Reviewer #2 (Public review):

      Summary:

      This paper examines infants' learning in a novel gaze-contingent cued reversal learning task. The study provides strong evidence that infants learn in the task, and they characterize individual differences in learning using computational modeling. The best-fitting model of the set compared reflects a learning of mappings between context cues and outcomes that do not carry over across blocks. Infants are then clustered into two groups based on model parameter estimates capturing primacy bias and reward sensitivity. These groupings exhibited differences in infant temperament and other developmental measures. The modeling is rigorous, with model predictions accounting for substantial variance in infants' choices, and parameter estimates showing high recoverability. This study is important in that it demonstrates that such rigorous standards in computational modeling of behavior can be successfully deployed in infant studies.

      Strengths:

      The study provides evidence that infants exhibit cognitive flexibility within a reversal learning task and do not simply perseverate.

      The methods used within the novel gaze-contingent will be useful for other groups interested in studying learning and decision-making in infants.

      The study applies rigorous computational modeling approaches to infants' choices (inferred from gaze) and their physiological responses (i.e., pupil dilation) in the task, demonstrating that infants' reward learning is well-captured by an error-driven learning process.

      The authors conduct model comparison, posterior predictive checks, and parameter recoverability analyses and demonstrate that model parameters can be well estimated and that the model can recapitulate infant choice behavior.

      Physiological pupil dilation measures that correlate with prediction error signals from the model further validate the model as capturing the learning process.

      Weaknesses:

      It is not entirely clear that the individual differences in reversal learning identified between the two clusters of infants (ostensibly reflecting differences in cognitive flexibility) have construct validity or specificity for the associated developmental abilities that differ between groups (daily living, communication, motor function, and socialization).

      Similarly, it's not clear why the paper is framed as an advance for infant computational *psychiatry* rather than simply an advance in computational modeling of infant behavior. It seems to me that a more general framing is warranted. Basic cognitive development research can also benefit from cognitive hypothesis testing via computational model comparison and precise measurement of infants' behavior in reward learning tasks. Is there reason to believe that infants' behavior in this task might have construct validity for mental health problems related to cognitive flexibility later in development? Do the Vineland or IBQ-R-VSF prospectively predict clinical symptoms?

      A large proportion of the recruited infants (14 of 55) were excluded, but few details are provided on why and when they were excluded. Did the excluded infants differ on any of the non-task measures? This information would be helpful to understand limitations in the utility of the task or the generalizability of the findings.

      It is stated that: "The infants who completed at least three trials following the reversal were included in the analysis, as it is more likely that their expectations were violated in this interval." Are three trials post-reversal sufficient to obtain reliable estimates of model parameters? More details should be provided on the number of trials completed for all of the included/excluded infants.

    4. Reviewer #3 (Public review):

      This paper used computational modeling of infants' performance in a reversal learning paradigm to identify two subgroups of infants, one that initially learned a bit faster but then perseverated more and failed to switch after the reversal (yellow cluster), and those who sampled more before the switch but then perseverated less/switched better (magenta cluster - though see below for comments about infants' overall weak performance). The authors describe magenta babies as showing a profile of greater cognitive flexibility, which they note in adults is linked to better outcomes and a lower incidence of psychiatric disorder. Indeed, the yellow cluster scored less well on several scales of the Vineland and showed lower surgency on the IBQ than the magenta cluster. The authors argue that this paper paves the way for the field of "infant computational neuropsychiatry."

      In general, I think this is a fun and intriguing paper. That said, I have a number of concerns with how it is currently written.

      First, the role of pupil dilation in the models was really unclear -- I've read it through a few times and came away with different impressions each time. I am now pretty sure the models were only based on infants' behavioural responses (e.g., choice for the correct versus incorrect location) rather than differences in pupil size, but pupil size kept popping up throughout, and so I initially thought the clusters were based on that. The authors should clarify this so other readers are not confused. (One thing that might help is avoiding the word "behaviour" on its own, unless it is further specified as looking behaviour or not, as I assume that some would characterize pupil dilation as a behaviour as well.)

      If clusters were NOT based on pupil size (e.g., reaction to prediction error), why not? Was this attempted, and did no clusters emerge? Did the yellow and magenta group also differ in reaction to prediction error, or not? It seems like the argument that this work will be the basis of infant computational psychiatry would require that there not simply be a link between behaviour in an infant study and other measurements of their functioning - because many other papers to date have demonstrated such relationships, many longitudinally - but instead with the link to something where the neurobiology of the behaviour being studied is better understood. I assume this is why pupil dilation kept coming up, but again, it didn't actually seem to be part of the modelling unless I missed something. That is, although I think that this is a nice finding, currently I think the novelty of the finding, as well as the suggestion that it will start a whole new field, may be overblown. I certainly think the pupillometry data has promise, as does the LUMO data, which the authors alluded to being in the works. But perhaps the implications should be toned down a bit in this paper, until those data are further along.

      My final substantial comment (a few more minimal ones below) is that overall, babies did quite poorly at this task. Even after 9 post-switch trials, the magenta group was still responding at chance, and the yellow group seemed not to switch at all. Infants then all seemed to perform very well again during block 2, which makes it seem like they still had the original contingency in mind. That said, from what I could see, no data was provided about how many babies looked to the original correct first during Block 2. But based on the data, I assume they basically all went back to predicting on the first side, as otherwise their return to high levels of successful trials would not make sense, unless they somehow forgot the entire thing. It would be good to know for sure, and to have that data (specifically, how many babies looked to the original side again at the start of block 2) in the main paper. Given this overall lack of sensitive performance in the paradigm, even despite the cues signaling where the rewarding video would be changing completely (that is, the contingency between cue and outcome did not itself switch, the cues themselves did), it seems odd to discuss things like statistical or even skillful learning alongside these data.

    1. eLife Assessment

      This valuable study shows the impact of the metabolic state of bacteria on phage infection. The experimental results, based on various phages infecting E. coli, are solid and consistent with a two-step adsorption mathematical model, although the detailed evidence supporting this model is currently incomplete. This study should be of interest to the communities working on cell metabolism and on host-pathogen interactions.

    2. Reviewer #1 (Public review):

      In the wild, bacteria can be found in a wide range of metabolic states, including states in which they are resource-limited. Because phages heavily rely on the infected cell's molecular machinery to replicate, it is natural to wonder how phage-bacteria interactions depend on the metabolic state of the cell. In this work, Marantos et al. investigate specifically how the rate of infection of 5 different phages changes between cells grown in energy-rich conditions and cells grown in energy-depleted conditions. Their results clearly show that 4 out of the 5 phages studied display a significant reduction in infection rate in cells that are energetically depleted and provide a potential explanation for this observation by looking into the mechanisms that these phages use to irreversibly infect their host cells.

      The work also tries to explain the observation using a mathematical/mechanistic model that describes infection as the sequence of two steps, where a phage first needs to bind to a cell receptor, from which it can potentially unbind, and then irreversibly infects by injecting its genome. While the model is sensible from a mechanistic perspective, the experimental evidence that supports how each model's rate is affected by the cell metabolic state is weak, as only ratios of these rates can be inferred from the data.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate the dependence of phage adsorption rates on host metabolic state, using 5 coliphages that differ in their infection cycles and host receptors. They find that four of the 5 phages showed significantly reduced infection under low metabolic states, with phages that generally have weaker adsorption being more strongly affected by low metabolism. The authors complement their findings with a 2-step infection model where phages can disengage from their hosts after initial adsorption. The paper illustrates the power of standardized experimental protocols for quantitative trait comparisons and highlights the dependence of phage infection success on host physiology.

      Strengths:

      The paper is well written and clearly structured.

      The experiments are well-designed, and particularly commendable is the diligent use of control scenarios to allow for quantitative comparison between phages. This standardized protocol will be valuable for the entire phage community.

      The authors convincingly show the impact of host physiology on phage adsorption success. This dependence has so far mainly been considered for intracellular phage replication, and the paper shows that host physiology has to be taken into account at all steps of phage infection.

      Weaknesses:

      There are some concerns about the experimental setup and which conclusions can be drawn from it:

      Before phage infection, bacterial cultures are grown to exponential growth, washed, and then resuspended with glucose or arsenate-azide for 10min. It is however, questionable that 10 minutes is enough to simulate high and low metabolic states realistically. 10 minutes seems to be quite short to go from exponential growth to a low metabolic state, given the transcriptional memory of previous environments. It seems more likely that the population will be quite heterogeneous, with cells in various states of transition towards low metabolic states.

      Given that arsenate and azide inhibit cellular metabolism, i.e., have antimicrobial effects, cells might not just downregulate metabolism but also activate the stress response, and this causes some of the observed effects on phage adsorption. Therefore, the 'low metabolic state' of the cells in this paper could mean that cells are starved or that they are stressed or both.

      The abundance of receptors could change between the high and low metabolic media conditions and contribute to the observed differences in adsorption, while the authors seem to assume in their model that the initial adsorption rate always remains the same.

    4. Reviewer #3 (Public review):

      Summary:

      Marantos et al. showed that for some coliphages, the energetic state of the bacterial host cell has a strong impact on whether phage infection is initiated. The authors drew this conclusion from the observation that there are more free phages remaining in the medium after infection of arsenate-azide-treated cells as compared to after infection of untreated cells. These data were analyzed and reported both as ratios of the treated vs. untreated conditions and using a mass-action kinetic model of phage-cell collision in the infection mixture. The data supported the findings that for four phages infecting Escherichia coli bacteria, namely, phages λ, 𝜙80, m13, and T6, the phages are less likely to initiate infection if the host bacteria are energy-depleted. However, for phage T5, the authors found that their infection propensity is not impacted.

      Strengths:

      The data presented by the authors clearly supported the principal conclusion of the study ("Viral commitment to infection depends on host metabolism"). The five phages chosen by the authors represent different viral lifestyles and infection mechanisms, highlighting the potential applicability to other Escherichia coli phages. Finally, the authors successfully used a classic mass-action model of phage-cell collision to interpret their data. The simplicity of their experimental assay, combined with the use of this mathematical model, offers other investigators who study phage-bacterial interactions in other contexts a potentially useful toolkit to examine infection in general, and specifically, the dependence of phage infection on the host's metabolic state.

      Weaknesses:

      (1) The authors isolated and measured the numbers of free phages in the medium after infection of bacteria under different treatments. These measurements were analyzed in two different ways: (1) simply as ratios (corrected/normalized using different controls), and (2) fitted using a simple mathematical model. I have concerns regarding both analyses.

      1.1) For the first method, having different time points at which the sample of each phage is collected critically complicates data interpretation. As one incubates the phage-bacteria mixture for a longer time, more infection occurs, and the number of phages collected from the mixture decreases. Therefore, the different incubation time forfeits the goal of "a systematic and quantitative comparison across different phages [...]" (line 81), just as the authors self-criticized. Conceivably, the authors could have used the shortest measurement time for all phages (i.e., 10 minutes, as for phage λ). Alternatively, the authors could have applied a systematic criterion such as half (or any other fraction) of the latent period of each phage, which would still "maximize the incubation period while ensuring that manipulations were completed before the first infection cycle concluded" (lines 126-127). In my view, the seemingly arbitrary measurement time for each phage renders the entire first analysis very challenging to interpret. It also goes against the author's proposition that the protocol was "standardized" (line 92) or "consistent" (line 200). It is not clear what the readers are supposed to take away from this first analysis, or rather, which evidence, finding, or conclusion the manuscript would lose if the authors only presented the modeling-based analysis.

      1.2) The second method of analysis sought to remove the dependence of the measurements on time. I completely agree with this goal, and the findings extracted from this analysis significantly contributed to the merits of this manuscript. However, the authors achieved this goal using a single time point for each phage to calculate the infection rate (η). As shown in Figure S3, each of the phage depletion curves is anchored by only one data point (note that the P(t)/P(0) = 1 at t = 0 is assumed, not measured). This goes against the typical way this collision model is used in the literature, where a time series is measured and used to fit the model (e.g., DOI 10.1007/978-1-60327-164-6 18, or more recently, PMID 39700139). This practice in the current manuscript reduced the robustness of the inferred η values. This problem is exacerbated by assumptions used by the authors in formulating this model. For instance, the authors used a constant value for the bacterial concentration, B, because "bacterial growth and lysis were negligible" (lines 135-136). However, considering that the bacteria were cultured at 37oC in a very rich medium (first in YT broth, then in 2% glucose), the measurement times of 20, 30, and 55 minutes are most likely one or a few generations of bacterial growth and division.

      Related note: I suggest that one of the panels in Figure S3 should be moved to the main text, since it is critical to the second method of analysis.

      (2) The data were able to distinguish phages that successfully infected bacteria and those that remained free in the medium, and the authors appropriately interpreted the data as such throughout the Results section. However, in the Discussion (starting from the very first sentence, line 172), the authors used terms that include "adsorption" and "entry" more interchangeably (for example, see the three sentences in lines 310-313, for "viral entry efficiency is shaped by [...]", then "adsorption kinetics modeling"). I do not see how the authors' data could distinguish between adsorption (the phage particles attaching to the outside of the cell) and entry (the phage DNA being injected into the cell). Conceivably, any phage particles that irreversibly attach to a cell but do not yet inject their genome into the cell would still be removed from the medium and therefore not quantified. Another example: in lines 189-191, the authors interpreted that "[...] when the bacterium is in a low metabolic state, the phage does not bind irreversibly to the host", but how do the authors eliminate the case of no phage binding (i.e., the reversible step) to begin with? Similarly, in lines 283-293, how do the authors delineate whether energy depletion would increase the k_off term or decrease the k_inj term, because either would result in more free phages in the medium as observed in the data? I believe that the writing of the Discussion, as it stands now, is doing a disservice to the conclusions presented in the Results section.

      (3) The authors presented an argument that performing infection of all five phages in the same condition is an advantage, allowing for comparison across different phages. While this goal is a completely valid one, it is difficult to reconcile that with the fact that different phages require different optimal conditions for successful infection. For instance, phage T5 famously requires Ca2+ for successful infection into the host bacterium (and later successful replication); see PMID 13174489. However, all infections were performed in TMG, which lacks Ca2+. Perhaps the absence of T5 dependence on the host metabolism is because the infection condition used by the authors was not optimal for T5 to begin with? Similar arguments could be made for other phages.

      (4) Whereas the manuscript examined five coliphages, only phage T5 and phage λ were discussed extensively. I believe some discussion points for these two phages need clarification.

      4.1) Phage T5: The data obtained by the authors show that the infection rate of phage T5 is not impacted by the metabolic state of the host cell. Considering that the authors used the terms "infection", "adsorption", and "entry" interchangeably to refer to the irreversible commitment of a phage to a host cell (see point 2), this discussion regarding phage T5 lacks one critical literature context: DNA entry of phage T5 is known to occur in two phases (first-step transfer and second-step transfer). Critically, the second step can only occur if phage proteins encoded by the phage DNA transferred in the first step are expressed (see PMID 10577483 and the cited papers therein). In that context, metabolic poisoning of the host bacteria should have impeded T5 infection. The authors should comment on this point.

      4.2) Phage λ: The experiment using phage λ in this current study shares many resemblances to that in Brown et al. 2022. That feature alone is not a problem, but at many places in the text, the writing is ambiguous as to whether it is discussing the results in Brown et al. 2022 or in the current manuscript. I am giving three examples below, but this is not exhaustive: (i) Lines 67-69, there is no Brown et al. 2022 reference immediately after "a mutant phage variant (λh) could bypass this dependency [...]" (not just in the previous sentence); (ii) Line 228 should clearly say "Our previous findings suggested that phage λ is capable of [...]", since it concerns Brown et al., 2022, not the current study; and (iii) Lines 245-246, there is no Brown et al., 2022 reference immediately after "we observed that a mutant variant [...] even energy-depleted host" (without a reference, it reads like the authors "observed" that finding in this current manuscript).

      Also, regarding phage λ: The discussion between line 230 and line 249 is very interesting, but since it concerns the differences between λ PaPa and Ur-λ, the authors should consider mentioning and discussing a very relevant recent study, PMCID: PMC6312755.

      (5) Control experiments, or references to prior studies, are needed to support that the As/Az treatment at this concentration and duration (at least 10 minutes) is sufficient to deplete the metabolic state of the cell. For instance, this can be shown by impeded or null cell growth, arrested motility (using a standard swimming assay), or a fluorescent reporter for the energetic state of the cell.

    1. eLife Assessment

      Zandvoort and colleagues describe respiration-brain coupling in the context of apnoea in human newborns. The authors have addressed an important question and supported their claims with solid data. The rigor of the findings could perhaps be further strengthened with some relatively minor changes to the analysis methodology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the extent to which phase-amplitude coupling (PAC) of respiratory and electrophysiological brain activity recordings was related to episodes of life-threatening apnoea in human newborns.

      Strengths:

      I want to commend the authors for acquiring unique and illuminating data; the difficulty in recording and handling these data has to be appreciated. As far as I can tell, Zandvoort and colleagues are the first to provide robust evidence for respiration-brain coupling in newborns. Their creative use of the phase-slope index for peripheral-central interactions is innovative and credible. If proven to be robust, the authors' findings have important implications well beyond the field of brain-body research.

      Weaknesses:

      While the analyses were overall competently conducted and well-justified, I was not entirely convinced by a few methodological choices, specifically i) the computation of PAC surrogates, ii) details of the linear mixed-effects model, and iii) the electrode selection for linking phase-amplitude coupling to apnoea frequency.

    3. Reviewer #2 (Public review):

      Summary:

      The author's central hypothesis was that the strength of cortico-respiratory coupling in infants is negatively associated with apnoea rate. To prove this, they first investigated the existence of cortico-respiratory coupling in premature and term-born infants, the spatial localisation of the cortical activity and its relationship with the phase of the respiratory cycle, and the directionality of coupling.

      Strengths:

      The researchers used synchronised EEG and impedance pneumography to detect the phase amplitude coupling.

      They have studied a wide range of gestations, from 28 weeks to 42 weeks, including males and females. Their exclusion criteria ensured that healthy babies were studied and potential confounders of impaired respiratory activity were avoided. Their sequential approach in addressing the objectives was appropriate.

      Weaknesses:

      As a neonatal clinician and neuroscientist, I have commented based on my expertise. I have not commented on signal processing.

      I did not identify any major weaknesses in the study. Some minor weaknesses include:

      (1) Data relating to the cortical oscillations and the respiratory phase is given. However, whether this would lead to their hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is unclear. What preceding data enabled the authors to link the strength of coupling to the rate of apnoea?

      (2) If we did not know of data showing the existence of cortico-respiratory coupling in newborn infants, then should it not be the first research question to examine?

      (3) What are the characteristics of the infants who contributed data to establish the cortico-respiratory coupling (Figures 2 and 3)?

      (4) Although it is the most plausible direction of the relationship, with neural activation driving respiratory muscle contraction, how can the authors prove this with their data? Given that they show coherence between signals, how do we know that the cortical signal precedes the respiratory muscle contraction?

      (5) Apgar score is an ordinal variable. The authors should summarise this as median (range).

    4. Reviewer #3 (Public review):

      Summary:

      This is a strong and important report that presents a framework for understanding cortical contributions to neonatal respiration. Overall, the authors successfully achieved their goal of linking cortical activity to respiratory drive. Despite the correlational nature of this study, it is a crucial step in establishing a foundation for future work to elucidate the interaction between cortical activity and breathing.

      Strengths:

      (1) The introduction and use of workflows that establish correlational relationships between breathing and brain activity.

      (2) The execution of these workflows in human neonates.

      Weaknesses:

      Interpretations related to causal inference, confounds of sleep and caffeine, and the spatial interpretation of EEG data need to be addressed to ensure that the data appropriately support the conclusions.

    5. Author response:

      We would like to thank the reviewers for their helpful comments and critique of our manuscript. We plan to make the following revisions, which will improve the clarity of our manuscript and the robustness of our findings.

      We will revise methodological details and interpretation throughout the manuscript. In particular, we will consider alternative methods for calculating surrogates. We intend to investigate the relationship between apnoea rate and phase-amplitude coupling at other electrodes as suggested by Reviewer 1, and we will revise the details of the linear-mixed effects models.

      In relation to the comments raised by both Reviewers 2 and 3, we will carefully address the wording throughout the manuscript, including addressing the order of hypotheses, our interpretation of the directionality of the relationship between cortical and respiratory activity, and the connection between cortical-respiratory coupling and apnoea. We will further clarify the limitations of our recording setup and approach, in particular the limited EEG montage, and add further details with regards to sleep state and caffeine.

    1. eLife Assessment

      This study presents valuable and compelling evidence that β-glucan-induced trained immunity can protect against intestinal inflammation by reprogramming innate immune cells toward a reparative phenotype. The authors employ a convincing combination of functional assays, adoptive transfers, and single-cell transcriptomics to uncover mechanistic insights and demonstrate the therapeutic potential of innate immune memory in IBD. While the work is robust, addressing the underlying epigenetic mechanisms and including additional controls would further reinforce the trained immunity-specific interpretation.