10,000 Matching Annotations
  1. Sep 2025
    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Recommendations for the authors):

      Line 364-370: This paragraph is not very clear to me.

      Thank you for pointing this out, we agree our point could have been made clearer. We have clarified as follows:

      “The geographic positions of species’ ranges determine the local pressures and environmental factors to which they are exposed (MacLean and Beissinger, 2017; Pacifici et al., 2020), potentially masking or confounding the effects of traits that evolved under conditions determined by range geography (Schuetz et al., 2019). This process could cause trait-related trends to differ across levels of biological organization (Srivastava et al., 2021), from local populations (where traits might be critical) to biogeographical extents (where traits might be unrelated to range or phenological shifts; Grewe et al., 2013; Gutiérrez and Wilson, 2021; Sunday et al., 2015; Zografou et al., 2021).” (Lines 370-377).

      Reviewer #3 (Recommendations for the authors):

      L313: '...higher population growth' compared to what? Does this mean that species shifting to earlier emergence really show higher population growth over time?

      Thank you for this suggestion, we have clarified as follows: “Earlier seasonal timing allows species to stay within their climatic limits and maintain population growth rates (Macgregor et al., 2019), although earlier emergence could expose individuals to early season weather extremes (McCauley et al., 2018).” (Lines 316-319).

      L336: Same here. Please refer to your comparative counterpart in such statements. Does 'plasticity may enable higher population growth' mean higher than for species shifting range or phenology or higher compared to the previous level for a given species. In many cases it seems you are referring to an overall baseline, so that the 'higher' means 'lesser decline'. Wouldn't plasticity maintain population growth at similar levels as before? The current wording suggests that plasticity results in species exceeding their previous population growth. Please rephrase.

      We agree it was confusing with no comparative counterpart, so we changed the sentence as follows: “Adaptive evolution and plasticity may enable high population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Lines 341-343).

      L307: The term 'universal winners' appears too strong and not well justified given the lack of the crucial third dimension of response. In fact, changes in phenology are less indicative than abundance trends. Combined with range shifts they would tell a story of success or failing, while phenological shifts would rather help to understand how species adapted. I am not saying the insight cannot stand alone, but it is important to adapt the wording in this regard.

      Thank you for this comment, we have clarified the text as follows: “These results suggest that some species may have an advantage with respect to climate change: they demonstrate the flexibility to respond both temporally and spatially to the onset of rapid climate change.” (Lines 310-313).

      We also softened language around winners and losers on line 388: “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there may be ‘winners’ and ‘losers’ under climate change (Figure 4).” (Lines 387-388).

      L326-240: I agree with line 330 that abundance trends are needed to clarify the situation of species shifting or not shifting ranges and phenology. However, this abstract should clarify that this is particularly important to understand whether non shifting species are really the 'losers'. If these species show adapted evolution or plasticity, we would expect they do not decline in abundance. Even without shifts in range or phenology they would be the 'ultimate winners' as you call it.

      Thank you for this comment, we agree that abundance trends are necessary to understand potential winners and losers. We have made this addition to the abstract as follows: “Species shifting in both space and time may be more resilient to extreme conditions, although further work integrating abundance data is needed.” (Lines 16-18).

    1. eLife assessment

      This paper provides valuable insights into the consequences of hydrogen sulfide (H2S) exposure on the behavior and physiology of the nematode C. elegans. While solid evidence supports most of the paper's findings, the evidence that H2S is detected by the nervous system to mediate behavioral avoidance is incomplete. The paper provides a wide range of intriguing observations that could serve as a foundation for future work to synthesize these disparate results or provide insight into the mechanisms of H2S detection in C. elegans.

    2. Reviewer #3 (Public review):

      Summary:

      The manuscript explores behavioral responses of C. elegans to hydrogen sulfide, which is known to exert remarkable effects on animal physiology in a range of contexts. The possibility of genetic and precise neuronal dissection of responses to H2S motivates the study of responses in C. elegans. The revised manuscript does not seem to have significantly addressed what was lacking in the initial version.

      The authors have added further characterization of possible ASJ sensing of H2S by calcium imaging but ASJ does not appear to be directly involved. Genetic and parallel analysis of O2 and CO2 responsive pathways do not reveal further insights regarding potential mechanisms underlying H2S sensing. Gene expression analysis extends prior work. Finally, the authors have examined how H2S-evoked locomotory behavioral responses are affected in mutants with altered stress and detoxification response to H2S, most notably hif-1 and egl-9. These data, while examining locomotion, are more suggestive that observed effects on animal locomotion are secondary to altered organismal toxicity as opposed to specific behavioral responedse

      Overall, the manuscript provides a wide range of intriguing observations, but mechanistic insight or a synthesis of disparate data is lacking.

    3. Reviewer #4 (Public review):

      Summary:

      The authors establish a behavioral paradigm for avoidance of H2S and conduct a large candidate screen to identify genetic requirements. They follow up by genetically dissecting a large number of implicated pathways - insulin, TGF-beta, oxygen/HIF-1, and mitochondrial ROS, which have varied effects on H2S avoidance. They additionally assay whole-animal gene expression changes induced by varying concentrations and durations of H2S exposure.

      Strengths:

      The implicated pathways are tested extensively through mutants of multiple pathway molecules. The authors address previous reviewer concerns by directly testing the ability of ASJ to respond to H2S via calcium imaging. This allows the authors to revise their previous conclusion and determine that ASJ does not directly respond to H2S and likely does not initiate the behavioral response.

      Weaknesses:

      Despite the authors focus on acute perception of H2S, I don't think the experiments tell us much about perception. I think they indicate pathways that modulate the behavior when disrupted, especially because most manipulations used broadly affect physiology on long timescales. For instance, genetic manipulation of ASJ signaling, oxygen sensing, HIF-1 signaling, mitochondrial function, as well as starvation are all expected to constitutively alter animal physiology, which could indirectly modulate responses to H2S. The authors rule out effects on general locomotion in some cases, but other physiological changes could relatively specifically modulate the H2S response without being involved in its perception.

      I am actually not convinced that H2S is directly perceived by the C. elegans nervous system at all. As far as I can tell, the avoidance behavior could be a response to H2S-induced tissue damage rather than the gas itself.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This paper sets out to achieve a deeper understanding of the effects of hydrogen sulfide on C. elegans behavior and physiology, with a focus on behavior, detection mechanism(s), physiological responses, and detoxification mechanisms.

      Strengths: 

      The paper takes full advantage of the experimental tractability of C. elegans, with thorough, welldesigned genetic analyses. 

      Some evidence suggests that H<SUB>2</SUB>S may be directly detected by the ASJ sensory neurons.  The paper provides interesting and convincing evidence for complex interactions between responses to different gaseous stimuli, particularly an antagonistic role between H<SUB>2</SUB>S and O2 detection/response.  Intriguing roles for mitochondria and iron homeostasis are identified, opening the door to future studies to better understand the roles of these components and processes. 

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      The claim that worms' behavioral responses to H<SUB>2</SUB>S are mediated by direct detection is incompletely supported. While a role for the chemosensory neuron ASJ is implicated, it remains unclear whether this reflects direct detection. Other possibilities, including indirect effects of ASJ and the guanylyl cyclase daf-11 on O2 responses, are also consistent with the authors' data. 

      We thank the reviewer for the insightful comment and agree that the role of ASJ neurons in H<SUB>2</SUB>S detection was not clear. We included new experiments and revised our text to make it clearer.

      Since our initial analyses suggest a role of ASJ neurons in H<SUB>2</SUB>S-evoked locomotory responses (Figure 2F and G), We thought that this would offer us a starting point to dissect the neuronal circuit involved in H<SUB>2</SUB>S responses. Expression of the tetanus toxin catalytic domain in ASJ, which blocks neurosecretion, inhibited H<SUB>2</SUB>S-evoked locomotory speed responses (Figure 2H), suggesting that neurosecretion from ASJ promotes H<SUB>2</SUB>S-evocked response (Lines 162–165). We then performed calcium imaging of ASJ neurons in response to H<SUB>2</SUB>S exposure. However, while we observed CO<SUB>2</SUB>-evoked calcium transients in ASJ using GCaMP6s, we did not detect any calcium response to H<SUB>2</SUB>S, under several conditions, including animals on food, off food, and with different H<SUB>2</SUB>S concentrations and exposure times (Figure2—Figure supplement 2E and F) (Lines 166–170). Since signaling from ASJ neurons regulates developmental programs that modify sensory functions in C. elegans (Murakami et al., 2001), the involvement of ASJ neurons is not specific to H<SUB>2</SUB>S and ASJ neurons are unlikely to serve as the primary H<SUB>2</SUB>S sensor (Discussed in Line 449–458). Therefore, the exact sensory neuron, circuit and molecular triggers mediating acute H<SUB>2</SUB>S avoidance remain to be elucidated.

      Our subsequent investigation on mitochondrial components suggests that a burst of mitochondrial ROS production may be the trigger for H<SUB>2</SUB>S avoidance, as transient exposure to rotenone substantially increases baseline locomotory speed (Figure 7E) (Line 391–396). However, to initiate avoidance behavior to H<SUB>2</SUB>S, mitochondrial ROS could potentially target multiple neurons and cellular machineries, making it challenging to pinpoint specific sites of action. Nevertheless, we agree that further dissection of the neural circuits and mitochondrial signaling in H<SUB>2</SUB>S avoidance will be important and should be explored in future studies.

      The role of H<SUB>2</SUB>S-mediated damage in behavioral responses, particularly when detoxification pathways are disrupted, remains unclear. 

      We thank the reviewer for the insightful comment and fully agree with the concern raised. The same issue was also noted by the other reviewers. We agree that decreased locomotory responses in H<SUB>2</SUB>S-sensitized animals can arise from distinct causes, either systemic toxicity or behavioral adaptation, and distinguishing between these is critical. We have included new experiments and revised the text to clarify this issue.

      Our data suggest that increased initial omega turns and a rapid loss of locomotion in hif-1 and detoxification-defective mutants including sqrd-1 and ethe-1 likely reflect an enhanced sensitivity to H<SUB>2</SUB>S toxicity due to their failure to induce appropriate adaptative responses (Figure 5D–F, Figure 5J–L, Figure 5—Figure supplement 1F–P).  Supporting this, hif-1 mutants become less responsive to unrelated stimuli (near-UV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I).

      In contrast, egl-9 and SOD-deficient animals show reduced initial omega-turn and reduced speed responses (Figure 5B, Figure 7G, Figure 5—Figure supplement 1A and B, and Figure 7—Figure supplement 1F and G), although both egl-9 and sod mutants respond normally to the other stimuli prior or after H<SUB>2</SUB>S exposure (Figure 5I, Figure 5—Figure supplement 1C, and Figure 7—Figure supplement 1H). Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Taken together, these findings support the view that reduced locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct mechanisms: early systemic toxicity in hif-1 and detoxificationdefective mutants, versus enhanced cellular adaptation in egl-9 and SOD mutants. We have integrated the relevant information across the result section and discussed this in Lines 485-536. 

      The findings of the paper are somewhat disjointed, such that a clear picture of the relationships between H<SUB>2</SUB>S detection, detoxification mechanisms, mitochondria, and iron does not emerge from these studies. Most importantly, the relative roles of H<SUB>2</SUB>S detection and integration, vs. general and acute mitochondrial crisis, in generating behavioral responses are not convincingly resolved.  

      We thank the reviewer for this comment and agree that our presentation did not fully connect different findings into a cohesive picture. To address this, we have acquired new data, and revised the abstract, results and discussion sections to clarify two phases of H<SUB>2</SUB>S-evoked responses: an initial avoidance behavior upon H<SUB>2</SUB>S exposure, followed by a later phase of adaption and detoxification when the escape is not successful.

      In brief, we began with the basic characterization of H<SUB>2</SUB>S-induced locomotory speed response, followed by a candidate gene screen to identify key molecules and pathways involved in initial speed response to H<SUB>2</SUB>S. Subsequently, we focused on three major intersecting pathways that contributed to the acute behavioral response to H<SUB>2</SUB>S. These include cGMP signaling, which led to the identification of ASJ neurons; nutrient-sensitive pathways that modulate behavioral responses to both H<SUB>2</SUB>S and CO2; and O2sensing signaling, whose activation inhibits responses to H<SUB>2</SUB>S. However, the molecules and neurons in these pathways, including ASJ, likely play modulatory roles and are unlikely to serve as the primary H<SUB>2</SUB>S sensors. Our subsequent analysis, however, suggests that mitochondria play a critical role in triggering avoidance behavior upon H<SUB>2</SUB>S exposure. Brief treatment with rotenone, a potent inducer of ROS, led to marked increase in locomotory speed (Figure 7E). This suggests the possibility that a burst of ROS production triggered toxic levels of H<SUB>2</SUB>S (Jia et al., 2020) may initiate the avoidance behavior.

      When the initial avoidance fails, H<SUB>2</SUB>S detoxification programs are induced as a long-term survival strategy. The induction of detoxification programs appears to enhance tolerance to H<SUB>2</SUB>S exposure and contributes to the gradual decrease of locomotory speed in H<SUB>2</SUB>S. We now provide a clearer image of how different pathways modulate H<SUB>2</SUB>S detoxification and adaptation (see our responses to other comments). Briefly, mutants defective in detoxification, such as hif-1 and other detoxification-defective mutants, showed stronger initial omega-turn response and a rapid loss of locomotion. This loss of locomotion is likely caused by early cellular toxicity as the mutants failed to respond to other unrelated stimuli (nearUV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I). Likewise, smf-3 mutants and BP-treated animals were hypersensitive to H<SUB>2</SUB>S (Figure 6D and E, and Figure 6—Figure supplement 1G and I), likely due to impaired H<SUB>2</SUB>S detoxification under low iron conditions, as iron is a co-factor required for the activity of the H<SUB>2</SUB>S detoxification enzyme ETHE-1 (Figure 5K and Figure 5—Figure supplement 1E).

      In contrast, reduced locomotion and response in other contexts such as egl-9 mutants and SODdeficient animals reflect H<SUB>2</SUB>S-induced adaptive mechanism rather than toxicity as they remain responsive to the other stimuli after H<SUB>2</SUB>S exposure. Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Therefore, different animals decline their locomotory speed to the effects of H<SUB>2</SUB>S through distinct mechanisms. We have integrated the relevant information across the result section and discussed this in Lines 485-536.

      Reviewer #2 (Public Review): 

      Summary: 

      H<SUB>2</SUB>S is a gas that is toxic to many animals and causes avoidance in animals such as C. elegans. The authors show that H<SUB>2</SUB>S increases the frequency of turning and the speed of locomotion. The response was shown to be modulated by a number of neurons and signaling pathways as well as by ambient oxygen concentrations. The long-term adaptation involved gene expression changes that may be related to iron homeostasis as well as the homeostasis of mitochondria. 

      Strengths: 

      Overall, the authors provide many pieces that will be important for solving how H<SUB>2</SUB>S signals through neuronal circuits to change gene expression and physiological programs. The experiments rely mostly on a behavioral assay that measures the increase of locomotion speed upon exposure to H<SUB>2</SUB>S. This assay is then combined with manipulations of environmental factors, different wild-type strains, and mutants. The mutants analyzed were obtained as candidates from the literature and from transcriptional profiling that the authors carried out in worms that were exposed to H<SUB>2</SUB>S. These studies imply several genetic signaling pathways, some neurons, and metabolism-related factors in the response to H<SUB>2</SUB>S. Hence the data provided should be useful for the field.  

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      On the other hand, many important aspects of the underlying mechanisms remain unsolved and the reader is left with many loose ends. For example, it is not clear how H<SUB>2</SUB>S is actually sensed, how sensory neurons are activated and signal to downstream circuits, and what the role of ciliated and RMG neurons is in this circuit. It remains unclear how signals lead to gene expression and physiological changes such as metabolic rewiring. Solving all this would clearly be beyond the scope of a single manuscript. Yet, the manuscript also does not focus on understanding one of these central aspects and rather is all over the place, which makes it harder to understand for readouts that are not in this core field. Multiple additional methods and approaches exist to dig deeper into these mechanisms in the future, such as neuronal calcium imaging, optogenetics, and metabolic analysis. To generate a story that will be interesting to a broad readership substantial additional experimentation would be required. Further, in the current manuscript, it is often difficult to understand the rationales of the experiments, why they were carried out, and how to place them into a context. This could be improved in terms of documentation, narration/explanation, and visualization.  

      We thank the reviewer for the comment, which has also been raised by the other reviewers. We agree that our initial submission was poorly presented. We also acknowledge the fact that some aspects, such as detailed neural circuit and sensory transduction, still remain unresolved. We have now included additional experiments and revised the manuscript to clarify the logic of our experiments, provided better context for our findings, and improved both the narrative flow and data visualization to make the manuscript more accessible to readers. We now provide a clearer image of how different pathways interact to modulate the initial avoidance response, and the H<SUB>2</SUB>S detoxification and behavioral habituation during prolonged H<SUB>2</SUB>S exposure. The following response is similar to the one for reviewer #1.

      In brief, we began with the basic characterization of H<SUB>2</SUB>S-induced locomotory speed response, followed by a candidate gene screen to identify key molecules and pathways involved in initial speed response to H<SUB>2</SUB>S. Subsequently, we focused on three major intersecting pathways that contributed to the acute behavioral response to H<SUB>2</SUB>S. These include cGMP signaling, which led to the identification of ASJ neurons; nutrient-sensitive pathways that modulate behavioral responses to both H<SUB>2</SUB>S and CO2; and O2sensing signaling, whose activation inhibits responses to H<SUB>2</SUB>S. However, the molecules and neurons in these pathways, including ASJ, likely play modulatory roles and are unlikely to serve as the primary H<SUB>2</SUB>S sensors. Our subsequent analysis, however, suggests that mitochondria play a critical role in triggering avoidance behavior upon H<SUB>2</SUB>S exposure. Brief treatment with rotenone, a potent inducer of ROS, led to marked increase in locomotory speed (Figure 7E). This suggests the possibility that a burst of ROS production triggered toxic levels of H<SUB>2</SUB>S (Jia et al., 2020) may initiate the avoidance behavior.

      When the initial avoidance fails, H<SUB>2</SUB>S detoxification programs are induced as a long-term survival strategy. The induction of detoxification programs appears to enhance tolerance to H<SUB>2</SUB>S exposure and contributes to the gradual decrease of locomotory speed in H<SUB>2</SUB>S. We now provide a clearer image of how different pathways modulate H<SUB>2</SUB>S detoxification and adaptation (see our responses to other comments). Briefly, mutants defective in detoxification, such as hif-1 and other detoxification-defective mutants, showed stronger initial omega-turn response and a rapid loss of locomotion. This loss of locomotion is likely caused by early cellular toxicity as the mutants failed to respond to other unrelated stimuli (nearUV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I). Likewise, smf-3 mutants and BP-treated animals were hypersensitive to H<SUB>2</SUB>S (Figure 6D and E, and Figure 6—Figure supplement 1G and I), likely due to impaired H<SUB>2</SUB>S detoxification under low iron conditions, as iron is a co-factor required for the activity of the H<SUB>2</SUB>S detoxification enzyme ETHE-1 (Figure 5K and Figure 5—Figure supplement 1E).

      In contrast, reduced locomotion and response in other contexts such as egl-9 mutants and SODdeficient animals reflect H<SUB>2</SUB>S-induced adaptive mechanism rather than toxicity as they remain responsive to the other stimuli after H<SUB>2</SUB>S exposure. Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Therefore, different animals decline their locomotory speed to the effects of H<SUB>2</SUB>S through distinct mechanisms. We have integrated the relevant information across the result section and discussed this in Lines 485-536.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript explores the behavioral responses of C. elegans to hydrogen sulfide, which is known to exert remarkable effects on animal physiology in a range of contexts. The possibility of genetic and precise neuronal dissection of responses to H<SUB>2</SUB>S motivates the study of responses in C. elegans. The manuscript is well-written in communicating the complex physiology around C. elegans behavioral responses to H<SUB>2</SUB>S and in appropriately citing prior and related relevant work. 

      There are three parts to the manuscript.

      In the first, an immediate behavioral response-increased locomotory rate-upon exposure to H<SUB>2</SUB>S is characterized. The experimental conditions are critical, and data are obtained from exposure of animals to 150ppm H<SUB>2</SUB>S at 7% O2. The authors provide evidence that this is a chemosensory response to H<SUB>2</SUB>S, showing a requirement for genes encoding components of the cilia apparatus and implicating a role for tax-4 and daf-11. Neuron-specific rescue in the ASJ neurons suggests the ASJ neurons contribute to the response to H<SUB>2</SUB>S. One caveat is that previous work has shown that the dauer-constitutive phenotype of daf-11 mutants can be suppressed by ASJ ablation, suggesting that there may be pervasive changes in animal nervous system signaling that are ASJ-dependent in daf-11 mutants, which may indirectly alter chemosensory responses to H<SUB>2</SUB>S. More direct methods to assess whether ASJ senses H<SUB>2</SUB>S, e.g. using calcium imaging, would better assess a direct role for the ASJ neurons in a behavioral response to H<SUB>2</SUB>S. The authors also point out interesting parallels between the response to H<SUB>2</SUB>S and CO2 though provide some genetic data separating the two responses. Importantly, the authors note that when aerotaxis (O2sensing and movement) in the presence of bacterial food is intact, as in npr-1 215F animals, the response to H<SUB>2</SUB>S is abrogated. Mutation in gcy-35 in the npr-1 215F background restores the H<SUB>2</SUB>S chemosensory response. 

      There is a second part of the paper that conducts transcriptional profiling of the response to H<SUB>2</SUB>S that corroborates and extends prior work in this area. 

      The final part of the paper is the most intriguing, but for me, also the most problematic. The authors examine how H<SUB>2</SUB>S-evoked locomotory behavioral responses are affected in mutants defective in the stress and detoxification response to H<SUB>2</SUB>S, most notably hif-1. Prior genetic studies have established the pathways leading to HIF-1 activation/stabilization, as well as potential downstream mechanisms. The authors conduct logical genetic analysis to complement studies of the hif-1 mutant and in part motivated by their transcriptional profiling studies, examine the role of iron sequestration/free iron in the locomotory response to H<SUB>2</SUB>S, and further speculate on how the behavior of mutants defective in mitochondrial function might be affected by exposure to H<SUB>2</SUB>S. 

      In some regard, this part of the manuscript is interesting because the analysis begins to connect how the behavior of an animal to a toxic compound is affected by mutations that affect sensitivity to the toxic compound. However, what is unclear is what is being studied at this point. In the context, of noting that H<SUB>2</SUB>S at 150ppm is known to be lethal, its addition to mutants clearly sensitized to its effects would be anticipated to have pervasive effects on animal physiology and nervous system function. The authors note that the continued increased locomotion of wild-type animals upon H<SUB>2</SUB>S exposure might be due to the byproducts of detoxification or the detrimental effects of H<SUB>2</SUB>S. The latter explanation seems much more likely, in which case what one may be observing is the effects of general animal sickness, or even a bit more specifically, neuronal dysfunction in the presence of a toxic compound, on locomotion. As such, what is unclear is what conclusions can be taken away from this part of the work.  

      Strengths: 

      (1) Characterization of a motor behavior response to H<SUB>2</SUB>S 

      (2) Transcriptional profiling of the response to H<SUB>2</SUB>S corroborating prior work.  

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      Unclear significance and experimental challenges regarding the study of locomotory responses to animals sensitized to the toxic effects of H<SUB>2</SUB>S under exposure to H<SUB>2</SUB>S. 

      We thank the reviewer for the comment, which has also been raised by the other reviewers. We agree that our initial submission left several important questions open, and we acknowledge the fact that some aspects, such as detailed neural circuit and sensory transduction, still remain unresolved. Nevertheless, we acquired new data and revised our text, aiming to clarify the distinct mechanisms underlying the reduced locomotion in different mutants during prolonged H<SUB>2</SUB>S exposure.

      Our data suggest that increased initial omega turns and a rapid loss of locomotion in hif-1 and detoxification-defective mutants including sqrd-1 and ethe-1 likely reflect an enhanced sensitivity to H<SUB>2</SUB>S toxicity due to their failure to induce appropriate adaptative responses (Figure 5D–F, Figure 5J–L, Figure 5—Figure supplement 1F–P).  Supporting this, hif-1 mutants become less responsive to unrelated stimuli (near-UV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I).

      In contrast, egl-9 and SOD-deficient animals show reduced initial reorientation and reduced speed responses (Figure 5B, Figure 7G, Figure 5—Figure supplement 1A and B, and Figure 7—Figure supplement 1F and G), although both egl-9 and sod mutants respond normally to the other stimuli prior or after H<SUB>2</SUB>S exposure (Figure 5I, Figure 5—Figure supplement 1C, and Figure 7—Figure supplement 1H). Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, constant high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Taken together, these findings support the view that reduced locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct mechanisms: early systemic toxicity in hif-1 and detoxification-defective mutants, versus enhanced cellular adaptation in egl-9 and SOD mutants. We have integrated the relevant information across the result section and discussed this in Lines 485-536.

      Reviewer #1 (Recommendations For The Authors): 

      To better substantiate a role for H<SUB>2</SUB>S detection, it would be useful for the authors to image Ca responses to H<SUB>2</SUB>S in ASJ in WT and unc-13, and to rule out the possibility that the requirement for daf-11 in ASJ reflects a role in O2 rather than H<SUB>2</SUB>S detection. 

      We thank the reviewer for this comment. As suggested, we performed calcium imaging of ASJ neurons using GCaMP6s. As previously described, 3% CO<SUB>2</SUB> evoked a calcium transient in ASJ (Figure 2—figure supplement 2F). To investigate whether H<SUB>2</SUB>S evoked a calcium transient in ASJ neurons, we tested several conditions, including animals on food or off food, with different H<SUB>2</SUB>S concentrations (~75 or ~150ppm), and different exposure time (4 or 8 mins). However, we did not detect a calcium response to H<SUB>2</SUB>S in ASJ under any of the conditions tested (Figure2—figure supplement 2E) (Lines 166–168). Given that neuronspecific rescue of daf-11 or tax-4 mutants pointed to a role of ASJ neurons in promoting H<SUB>2</SUB>S responses, we sought to determine how ASJ neurons were involved. Expression of the tetanus toxin catalytic domain in ASJ neurons, which blocks neurosecretion, inhibited H<SUB>2</SUB>S-evoked locomotory speed responses (Figure 2H), similar to the phenotypes observed in daf-11 and daf-7 mutants (Figure 2C and D) (Lines 162–165). These results confirm that ASJ activity and neurosecretion contribute to the H<SUB>2</SUB>S responses, although ASJ is unlikely to serve as the primary H<SUB>2</SUB>S sensor. One potential explanation is that DAF-7 released by ASJ controls the starvation program, which in turn modulates the animal’s response to H<SUB>2</SUB>S. We also discussed this in Lines 449–458.

      The paper would be significantly strengthened by testing the possibility (as the authors acknowledge in lines 348-52) that disruption of detoxification mechanisms reduces sustained behavioral responses to H<SUB>2</SUB>S because of physiological damage. Authors use acute exposure to high O2 for this purpose earlier in the paper, but not to probe the consequences of loss of hif-1 and detoxification factors.  

      We thank the reviewer for the valuable suggestion. As the reviewer highlighted, we attributed the brief locomotory speed responses to H<SUB>2</SUB>S observed in hif-1 mutants to the lack of detoxification response, leading to the rapid intoxication of the animals. Several lines of evidence support this conclusion. First, we observed that hif-1 and the detoxification mutants displayed a stronger initial reorientation response (omega turns) and a more rapid decline in speed and reversals compared to wild type (Figure 5 D–F). Second, to test if hif-1 mutants were indeed more susceptible to H<SUB>2</SUB>S toxicity, we exposed WT and hif-1 animals to H<SUB>2</SUB>S for 30 mins and subsequently tested their ability to respond to near-UV light. Unlike WT animals, the speed response to near-UV light was inhibited in hif-1 mutants (Figure 5I), suggesting that exposure to H<SUB>2</SUB>S for 30 min causes a stronger toxicity in animals deficient of HIF-1 signaling. Third, hif-1 and detoxification mutants displayed a sustained high speed in response to 1% O<SUB>2</SUB> , suggesting the specific impairment of H<SUB>2</SUB>S response. The data were presented in Lines 318–347, and were further discussed this in Lines 485–508.

      To better understand whether mitochondrial damage has a role in H<SUB>2</SUB>S-evoked behavior, it might be useful for the authors to determine whether general ROS response pathways are important for H<SUB>2</SUB>S behavioral responses.

      We thank the reviewer for this insightful comment. As suggested, we investigated whether ROS detoxification pathways contribute to H<SUB>2</SUB>S-evoked locomotory speed responses by analyzing mutants in the superoxide dismutase (SOD) family. These experiments, together with other observations, suggest that mitochondrial ROS play a dual role in H<SUB>2</SUB>S-evoked locomotion. The relevant results were presented in Lines 401–425, and were further discussed in Lines 509–536.

      First, we found that increased mitochondrial ROS formation, either induced pharmacologically by rotenone or genetically in mitochondrial electron transport chain (ETC) mutants (Ishii et al., 2013; Ochi et al., 2016; Ramsay & Singer, 1992; Yang & Hekimi, 2010; Zorov, Juhaszova, & Sollott, 2014), suppressed the behavioral response to toxic H<SUB>2</SUB>S (Figure 7A–E). This indicates that mitochondrial ROS plays a significant role in H<SUB>2</SUB>S-evoked responses. One likely explanation is that high ROS formation may dampen the H<SUB>2</SUB>S-triggered ROS spike, or may impair other H<SUB>2</SUB>S signaling processes required to initiate avoidance. Second, consistent with previous reports (Onukwufor et al., 2022), we observed that shortterm rotenone exposure (<1 hour) significantly increased baseline locomotory speed. Given that toxic H<SUB>2</SUB>S levels promote ROS formation (Jia et al., 2020), our findings suggest that acute mitochondrial ROS production by toxic levels of H<SUB>2</SUB>S exposure may serve as a trigger for the avoidance response.

      In contrast, animals with sustained mitochondrial ROS production do not have an increased baseline locomotory speed. This effect was observed after 2 hours of rotenone exposure, in mitochondrial ETC mutants, and in animals lacking all SOD enzymes (Figures 7A–K). A likely explanation for the reduced basal locomotory speed during sustained mitochondrial ROS production is the activation of ROSresponsive signaling pathways including HIF-1, NRF2/SKN-1, and DAF-16/FOXO (Lennicke & Cocheme, 2021; Patten, Germain, Kelly, & Slack, 2010), which may promote adaptation to prolonged oxidative stress (Figure 7H). Notably, unlike hif-1 mutants, SOD-deficient animals remained as responsive as WT to other stimuli after 30 minutes of H<SUB>2</SUB>S exposure (Figure 7—figure supplement 1H), indicating that elevated ROS levels do not compromise overall viability or the ability to detoxify H<SUB>2</SUB>S.

      Taken together, these results support a model in which mitochondrial ROS exerts a biphasic effect on H<SUB>2</SUB>S-induced avoidance. It enhances detection and avoidance under acute stress but contributes to locomotory suppression when ROS levels remain elevated chronically.

      Reviewer #2 (Recommendations For The Authors):

      The way the manuscript is presented could be improved without much effort by rewriting/editing. For the reader, it is hard at present to understand the rationales of the experiments, why they were carried out, and how to place them into a context. This could be improved on three levels:

      (1) Documentation 

      (2) Narration/Explanation 

      (3) Visualization 

      (1) Documentation

      Not all of the results in the text are well documented. The results should be described with more details in the written text and improved documentation and quantification of the results. Example: 

      Turning behavior is mentioned as an important aspect of the response to H<SUB>2</SUB>S. There is no citation given but this effect is not well documented. The authors image the animals and could provide video footage of the effect, could quantify eg turning/pirouettes, and provide the data. At the moment the manuscript largely relies on measuring the increase in speed, but the reader is left wondering what other behavioral effects occur and how this is altered in all of the mutant and other conditions tested. Just quantifying speed reduces the readout and seems like an oversimplification to characterize the behavioral response.  

      We are grateful for this comment. We now provide a video footage of the H<SUB>2</SUB>S effects (Figure 1—Video 1). As suggested, we analyzed the recordings to extract reorientation (omega-turns) and reversals. These analyses are now included in the Supplemental file 1 with representative panels displayed in Figure 5 and supplements to Figures 2, 3, 5, 6 and 7. Even though the mutant effects on omega-turns were often subtle, and reversal responses showed considerable variability (likely due to differences in population density, food availability, or animals’ physiological state prior to the assay), this analysis has proven valuable for distinguishing mutants that exhibit adaptation from those that display hypersensitivity to H<SUB>2</SUB>S toxicity. For instance, although both SOD-deficient and BP-treated animals failed to increase their locomotory speed in H<SUB>2</SUB>S (Figure 6E and Figure 7G), they exhibited distinct omega-turn responses (Figure 6—figure supplement 1I and Figure 7—figure supplement 1F), suggesting that different mechanisms likely underlie the locomotory defects of these two animals. We have integrated the omega-turn and reversal data into the text and discussed under relevant contexts.

      (2) Narration/Description.

      Generally, the description of the results part is very brief and it is often not clear why a certain experiment was carried out and how. Surely it is possible to check the methods but this interrupts the flow of reading and it would be easier for the reader to be guided through the results with more information what the initial motivation for an experiment is, what the general experimental outline is, and what specific experiments are carried out. 

      We apologize for the lack of clarity and logical structure in the initial submission. In the revised manuscript, we have thoroughly revised the text to improve its organization and readability.

      Examples: 

      Line 97ff: The authors performed a candidate screen yet it is not described why which genes were chosen. Are there also pathways that were tested that turned out to not be involved? 

      We thank the reviewer for the suggestion. To address this, we have added a new section, explaining the rationale for selecting genes and pathways in our candidate screen. Briefly, we focused on genes known or predicted to be involved in sensory responses to gaseous stimuli in C. elegans and mammals, including globins and guanylate cyclases (21% O<SUB>2</SUB> sensing), potassium channels (acute hypoxia), and nutrientsensitive pathways (CO<SUB>2</SUB> responses). We also included mutants defective in sensory signal transduction and neurotransmission. In addition, mitochondrial mutants were analyzed because mitochondria play a central role in H<SUB>2</SUB>S detoxification. The pathways that contributed to the acute H<SUB>2</SUB>S response included cGMP, insulin, and TGF-β signaling, as well as mitochondrial components. In contrast, globins, potassium channels, and biogenic amine signaling did not appear to play significant roles under our assay conditions. The results of the candidate screen are described in Lines 106–138 and summarized in Supplementary File 1.

      line 262ff: the paragraph starts with explaining ferritin genes that are important for iron control but the reader does not yet know why. Then it is explained that a ferritin gene is DE in the H<SUB>2</SUB>S transcriptomes. then a motivation to look into the labile iron pool is described. Why not first explain what genes are strongly regulated and why they are selected based on their DE? Then explain what is known about these genes and pathways, and then motivate a set of experiments. 

      We agree with the reviewer that our initial description could have been more logically organized. We reframed this section to first present the RNA-seq data, followed by an explanation of their known biological functions and the motivation for the subsequent experiments (Lines 350–357).

      nhr-49 appears suddenly in the results part and it is not clear why it was tested and how the result links. Is nhr-49 a key transcription factor that is activated by H<SUB>2</SUB>S sensory or physiological response, and does it control the signaling or protective changes induced by H<SUB>2</SUB>S?  

      We thank the reviewer for the comment. As suggested, we revised the text to present the information more clearly. In our candidate gene screen, a set of mutants exhibiting reduced speed responses to H<SUB>2</SUB>S has previously been shown to be defective in response to CO<SUB>2</SUB> stimulation (Hallem & Sternberg, 2008). These included animals deficient in nutrient-sensitive pathways, including insulin, TGF-beta, and NHR49, which were reported by Sternberg’s lab to exhibit dampened responses to CO<SUB>2</SUB> (Hallem & Sternberg, 2008) (Lines 173–179). We also included a simply cartoon to further illustrate this (Figure 3C).

      The nuclear hormone receptor NHR-49 has been implicated in a variety of stress responses, including starvation (Van Gilst, Hadjivassiliou, & Yamamoto, 2005), bacterial pathogen (Naim et al., 2021; Wani et al., 2021), and hypoxia (Doering et al., 2022). The nhr-49 mutants exhibited a rapid decline in locomotory speed during H<SUB>2</SUB>S exposure, implicating a role in sustaining high speed in the presence of H<SUB>2</SUB>S. Furthermore, we observed that fmo-2, a well-characterized target gene of NHR-49, was significantly upregulated after 1 hour of exposure to 50 and 150 ppm H<SUB>2</SUB>S (Supplementary file 2), suggesting that NHR-49 signaling is rapidly activated by H<SUB>2</SUB>S exposure. Exactly how NHR-49 contributes to H<SUB>2</SUB>S response requires further investigation.

      (3) Visualization 

      Adding a model/cartoon summary that describes the pathways tested and their interaction would be helpful in some of the figures for the reader to keep an overview of the pathways that were tested. Also, a final summary cartoon that integrates all the puzzle pieces into one larger picture would be helpful. Such a final cartoon overview could also point to the key open questions of the underlying mechanisms. 

      We thank the reviewer for this suggestion. We have added a series of models/cartoons to illustrate the different pathways and their interactions. These include starvation regulatory mechanisms (Figure 3C), 21% O<SUB>2</SUB> sensing mechanisms (Figure 3G), HIF-1 signaling and detoxification (Figure 5—figure supplement 1E), HIF-1 signaling and the regulation of labile iron (Figure 6H), as well as ROS signaling and regulation (Figure 7L). To help interpretation and to elaborate on these models, we have also included explanatory sentences in the corresponding figure legends.

      Other comments: 

      Introduction and line 93: The authors mention that 50 ppm H<SUB>2</SUB>S has beneficial effects on lifespan yet does not have a detectable phenotype." Are there any concentrations of H<SUB>2</SUB>S that cause attraction of C. elegans and what is the preferred range if it exists? Could this be measured in an H<SUB>2</SUB>S gradient? 

      We thank the reviewer for the insightful comment. We performed an H<SUB>2</SUB>S gradient assay, which suggests that wild type animals are attracted toward low concentrations of H<SUB>2</SUB>S around 40 ppm (Figure 1G and H) (Lines 95–104). These results suggest that H<SUB>2</SUB>S acts as a strong repellent for C. elegans at high concentrations but as an attractant at low levels. This dual role may be ecologically relevant, as wild C. elegans lives in complex and dynamic environments where H<SUB>2</SUB>S levels likely fluctuate over short distances (Adams, Farwell, Pack, & Bamesberger, 1979; Budde & Roth, 2011; Morra & Dick, 1991; Patange, Breen, Arsuffi, & Ruvkun, 2025; Rodriguez-Kabana, Jordan, & Hollis, 1965; Romanelli-Cedrez, Vairoletti, & Salinas, 2024).

      Line 146: "Local H<SUB>2</SUB>S concentrations could also be significantly higher in decomposing substances where wild C. elegans thrives" please provide a citation.

      As suggested, we included a set of references that have described the H<SUB>2</SUB>S enrichment in the natural environment in early field studies (Adams et al., 1979; Morra & Dick, 1991; Rodriguez-Kabana et al., 1965), as well as references that have discussed and implied this in C. elegans studies (Budde & Roth, 2011; Patange et al., 2025; Romanelli-Cedrez et al., 2024). They can be found in the introduction (Lines 59–62) and in the result (Lines 197–199).

      Line 311 "Wild C. elegans isolates thrive in the decomposing matters, where the local concentrations of O2 are low while the levels of CO2 and H<SUB>2</SUB>S could be high. These animals have adapted their behavior in such an environment, displaying increased sensitivity to high O2 exposure but dampened responses to CO2." Please provide citations for these statements.  

      As suggested, we cited the relevant articles or books describing the variation of O<SUB>2</SUB> and CO<SUB>2</SUB> levels in the decomposing matters including several C. elegans papers that mentioned this in Lines 197–199 (Bretscher, Busch, & de Bono, 2008; Gea, Barrena, Artola, & Sanchez, 2004; Hallem & Sternberg, 2008; Oshins, Michel, Louis, Richard, & Rynk, 2022), and the above-mentioned articles for H<SUB>2</SUB>S (Adams et al., 1979; Budde & Roth, 2011; Morra & Dick, 1991; Patange et al., 2025; Rodriguez-Kabana et al., 1965; Romanelli-Cedrez et al., 2024).

      For C. elegans’ sensitivity to O2 and CO2, these articles were cited in Lines 201–203 (Beets et al., 2020; Bretscher et al., 2008; Carrillo, Guillermin, Rengarajan, Okubo, & Hallem, 2013; Hallem & Sternberg, 2008; Kodama-Namba et al., 2013; McGrath et al., 2009).

      Reviewer #3 (Recommendations For The Authors): 

      More work could be conducted establishing the neuronal circuitry involved in the initial, tractable response to H<SUB>2</SUB>S. 

      We thank the reviewer for the insightful comment. Since our initial analyses suggest a role of ASJ neurons in H<SUB>2</SUB>S-evoked locomotory responses (Figure 2F and G), We thought that this would offer us an entry point to dissect the neuronal circuit involved in H<SUB>2</SUB>S responses. Expression of the tetanus toxin catalytic domain in ASJ, which blocks neurosecretion, inhibited H<SUB>2</SUB>S evoked locomotory responses (Figure 2H), suggesting that neurosecretion from ASJ promotes the speed response to H<SUB>2</SUB>S (Lines 162– 165). We then performed calcium imaging of ASJ neurons in response to H<SUB>2</SUB>S exposure. However, while we observed CO<SUB>2</SUB> -evoked calcium transients in ASJ using GCaMP6s, we did not detect any calcium response to H<SUB>2</SUB>S, under several conditions, including animals on food, off food, and with different H<SUB>2</SUB>S concentrations and exposure times (Figure2—Figure supplement 2E and 2F) (Lines 166–168). Since signaling from ASJ neurons regulates developmental programs that modify sensory functions in C. elegans, including CO<SUB>2</SUB> and O<SUB>2</SUB> responses (Murakami, Koga, & Ohshima, 2001), the involvement of ASJ neurons is not specific to H<SUB>2</SUB>S responses and ASJ neurons are unlikely to serve as a primary H<SUB>2</SUB>S sensor (Discussed in Line 449–458). Therefore, the exact sensory neuron, circuit and molecular triggers mediating acute H<SUB>2</SUB>S avoidance behavior remain to be elucidated.

      Our subsequent investigation on mitochondrial components suggests that a burst of mitochondrial ROS production may be the trigger for H<SUB>2</SUB>S avoidance, as transient exposure to rotenone substantially increases baseline locomotory activity (Figure 7E) (Line 391–396). However, mitochondrial ROS could potentially target multiple neurons and cellular machineries to initiate avoidance behavior to H<SUB>2</SUB>S, making it challenging to pinpoint specific sites of action. Nevertheless, we agree that further dissection of the neural circuits and mitochondrial signaling in H<SUB>2</SUB>S avoidance will be important and should be explored in future studies. We discussed this in Lines 509–536. 

      I am not sure how to overcome the challenges involved in reaching conclusions from the decreased locomotory responses of animals that are sensitized to the effects of H<SUB>2</SUB>S. Perhaps this conundrum could be discussed in more detail in the text. 

      We thank the reviewer for this important comment. We agree that decreased locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct causes, either systemic toxicity or adaptation, and distinguishing between these is critical. We have included new experiments and revised the text to clarify this issue.

      Our data suggest that increased initial omega turns and a rapid loss of locomotion in hif-1 and detoxification-defective mutants including sqrd-1 and ethe-1 likely reflect an enhanced sensitivity to H<SUB>2</SUB>S toxicity due to their failure to induce appropriate adaptative responses (Figure 5D–F, Figure 5J–L, Figure 5—Figure supplement 1F–P).  Supporting this, hif-1 mutants become less responsive to unrelated stimuli (near-UV light) after 30 minutes of H<SUB>2</SUB>S exposure (Figure 5I).

      In contrast, egl-9 and SOD-deficient animals show reduced initial reorientation and reduced speed responses (Figure 5B, Figure 7G, Figure 5—Figure supplement 1A and B, and Figure 7—Figure supplement 1F and G), although both egl-9 and sod mutants respond normally to the other stimuli prior or after H<SUB>2</SUB>S exposure (Figure 5I, Figure 5—Figure supplement 1C, and Figure 7—Figure supplement 1H). Since disrupting egl-9 stabilizes HIF-1 and upregulates the expression of numerous genes involved in cellular defense against H<SUB>2</SUB>S toxicity, the enhanced detoxification capacity in egl-9 mutants likely increases animals’ tolerance to H<SUB>2</SUB>S, thereby reducing avoidance to otherwise toxic H<SUB>2</SUB>S levels. Similarly, persistently high ROS in SOD deficient animals activates a variety of stress-responsive signaling pathways, including HIF-1, NRF2/SKN-1 and DAF-16/ FOXO signaling (Lennicke & Cocheme, 2021; Patten et al., 2010), facilitating cellular adaptation to redox stress and reducing animals’ responsiveness to toxic H<SUB>2</SUB>S levels. Taken together, these findings support the view that reduced locomotory speed during H<SUB>2</SUB>S exposure can arise from distinct mechanisms: early systemic toxicity in hif-1 and detoxificationdefective mutants, versus enhanced cellular adaptation in egl-9 and SOD mutants. We have integrated the relevant information across the result section and discussed this in Lines 485–536. 

      References

      Adams, D. F., Farwell, S. O., Pack, M. R., & Bamesberger, W. L. (1979). Preliminary Measurements of Biogenic Sulfur-Containing Gas Emissions from Soils. Journal of the Air Pollution Control Association, 29(4), 380-383. doi:Doi 10.1080/00022470.1979.10470805

      Beets, I., Zhang, G., Fenk, L. A., Chen, C., Nelson, G. M., Felix, M. A., & de Bono, M. (2020). NaturaL Variation in a Dendritic Scaffold Protein Remodels Experience-Dependent Plasticity by Altering Neuropeptide Expression. Neuron, 105(1), 106-121 e110. doi:10.1016/j.neuron.2019.10.001  

      Bretscher, A. J., Busch, K. E., & de Bono, M. (2008). A carbon dioxide avoidance behavior is integrated with responses to ambient oxygen and food in Caenorhabditis elegans. Proc Natl Acad Sci U S A, 105(23), 8044-8049. doi:10.1073/pnas.0707607105

      Budde, M. W., & Roth, M. B. (2011). The response of Caenorhabditis elegans to hydrogen sulfide and hydrogen cyanide. Genetics, 189(2), 521-532. doi:10.1534/genetics.111.129841

      Carrillo, M. A., Guillermin, M. L., Rengarajan, S., Okubo, R. P., & Hallem, E. A. (2013). O-2-Sensing Neurons Control CO2 Response in C. elegans. Journal of Neuroscience, 33(23), 9675-9683. doi:10.1523/Jneurosci.4541-12.2013  

      Doering, K. R. S., Cheng, X., Milburn, L., Ratnappan, R., Ghazi, A., Miller, D. L., & Taubert, S. (2022). Nuclear hormone receptor NHR-49 acts in parallel with HIF-1 to promote hypoxia adaptation in Caenorhabditis elegans. Elife, 11. doi:10.7554/eLife.67911

      Gea, T., Barrena, R., Artola, A., & Sanchez, A. (2004). Monitoring the biological activity of the composting process: Oxygen uptake rate (OUR), respirometric index (RI), and respiratory quotient (RQ). Biotechnol Bioeng, 88(4), 520-527. doi:10.1002/bit.20281

      Hallem, E. A., & Sternberg, P. W. (2008). Acute carbon dioxide avoidance in Caenorhabditis elegans. Proc Natl Acad Sci U S A, 105(23), 8038-8043. doi:10.1073/pnas.0707469105

      Ishii, T., Miyazawa, M., Onouchi, H., Yasuda, K., Hartman, P. S., & Ishii, N. (2013). Model animals for the study of oxidative stress from complex II. Biochim Biophys Acta, 1827(5), 588-597. doi:10.1016/j.bbabio.2012.10.016

      Jia, J., Wang, Z., Zhang, M., Huang, C., Song, Y., Xu, F., . . . Cheng, J. (2020). SQR mediates therapeutic effects of H(2)S by targeting mitochondrial electron transport to induce mitochondrial uncoupling. Sci Adv, 6(35), eaaz5752. doi:10.1126/sciadv.aaz5752  

      Kodama-Namba, E., Fenk, L. A., Bretscher, A. J., Gross, E., Busch, K. E., & de Bono, M. (2013). Crossmodulation of homeostatic responses to temperature, oxygen and carbon dioxide in C. elegans. PLoS Genet, 9(12), e1004011. doi:10.1371/journal.pgen.1004011

      Lennicke, C., & Cocheme, H. M. (2021). Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell, 81(18), 3691-3707. doi:10.1016/j.molcel.2021.08.018

      McGrath, P. T., Rockman, M. V., Zimmer, M., Jang, H., Macosko, E. Z., Kruglyak, L., & Bargmann, C. I. (2009). Quantitative mapping of a digenic behavioral trait implicates globin variation in C. elegans sensory behaviors. Neuron, 61(5), 692-699. doi:10.1016/j.neuron.2009.02.012

      Morra, M. J., & Dick, W. A. (1991). Mechanisms of h(2)s production from cysteine and cystine by microorganisms isolated from soil by selective enrichment. Appl Environ Microbiol, 57(5), 14131417. doi:10.1128/aem.57.5.1413-1417.1991

      Murakami, M., Koga, M., & Ohshima, Y. (2001). DAF-7/TGF-beta expression required for the normal larval development in C-elegans is controlled by a presumed guanylyl cyclase DAF-11. Mechanisms of Development, 109(1), 27-35. doi:Doi 10.1016/S0925-4773(01)00507-X

      Naim, N., Amrit, F. R. G., Ratnappan, R., DelBuono, N., Loose, J. A., & Ghazi, A. (2021). Cell nonautonomous roles of NHR-49 in promoting longevity and innate immunity. Aging Cell, 20(7). doi:ARTN e13413 10.1111/acel.13413

      Ochi, R., Dhagia, V., Lakhkar, A., Patel, D., Wolin, M. S., & Gupte, S. A. (2016). Rotenone-stimulated superoxide release from mitochondrial complex I acutely augments L-type Ca2+ current in A7r5 aortic smooth muscle cells. Am J Physiol Heart Circ Physiol, 310(9), H1118-1128. doi:10.1152/ajpheart.00889.2015  

      Onukwufor, J. O., Farooqi, M. A., Vodickova, A., Koren, S. A., Baldzizhar, A., Berry, B. J., . . . Wojtovich, A. P. (2022). A reversible mitochondrial complex I thiol switch mediates hypoxic avoidance behavior in C. elegans. Nat Commun, 13(1), 2403. doi:10.1038/s41467-022-30169-y

      Oshins, C., Michel, F., Louis, P., Richard, T. L., & Rynk, R. (2022). Chapter 3 - The composting process. In R. Rynk (Ed.), The Composting Handbook (pp. 51-101): Academic Press.  

      Patange, O., Breen, P., Arsuffi, G., & Ruvkun, G. (2025). Hydrogen sulfide mediates the interaction between C. elegans and Actinobacteria from its natural microbial environment. Cell Reports, 44(1), 115170. doi:10.1016/j.celrep.2024.115170

      Patten, D. A., Germain, M., Kelly, M. A., & Slack, R. S. (2010). Reactive oxygen species: stuck in the middle of neurodegeneration. J Alzheimers Dis, 20 Suppl 2, S357-367. doi:10.3233/JAD-2010100498

      Ramsay, R. R., & Singer, T. P. (1992). Relation of superoxide generation and lipid peroxidation to the inhibition of NADH-Q oxidoreductase by rotenone, piericidin A, and MPP+. Biochem Biophys Res Commun, 189(1), 47-52. doi:10.1016/0006-291x(92)91523-s

      Rodriguez-Kabana, R., Jordan, J. W., & Hollis, J. P. (1965). Nematodes: Biological Control in Rice Fields: Role of Hydrogen Sulfide. Science, 148(3669), 524-526. doi:10.1126/science.148.3669.524

      Romanelli-Cedrez, L., Vairoletti, F., & Salinas, G. (2024). Rhodoquinone-dependent electron transport chain is essential for Caenorhabditis elegans survival in hydrogen sulfide environments. J Biol Chem, 300(9), 107708. doi:10.1016/j.jbc.2024.107708

      Van Gilst, M. R., Hadjivassiliou, H., & Yamamoto, K. R. (2005). A Caenorhabditis elegans nutrient response system partially dependent on nuclear receptor NHR-49. Proc Natl Acad Sci U S A, 102(38), 13496-13501. doi:10.1073/pnas.0506234102

      Wani, K. A., Goswamy, D., Taubert, S., Ratnappan, R., Ghazi, A., & Irazoqui, J. E. (2021). NHR- 49/PPAR-α and HLH-30/TFEB cooperate for   host defense via a flavin-containing monooxygenase. Elife, 10. doi:ARTN e62775 10.7554/eLife.62775

      Yang, W., & Hekimi, S. (2010). A mitochondrial superoxide signal triggers increased longevity in Caenorhabditis elegans. PLoS Biol, 8(12), e1000556. doi:10.1371/journal.pbio.1000556

      Zorov, D. B., Juhaszova, M., & Sollott, S. J. (2014). Mitochondrial reactive oxygen species (ROS) and ROS-induced ROS release. Physiol Rev, 94(3), 909-950. doi:10.1152/physrev.00026.2013

    1. eLife Assessment

      This study presents significant and novel insights into the roles of zinc in mammalian meiosis/fertilization events. These findings are useful to our understanding of these processes. The evidence presented is solid, with experiments being well-designed, carefully described, and interpreted with appropriate rigor. The main limitation of lack of mechanistic insight needs to be acknowledged.

    2. Reviewer #1 (Public review):

      The revised manuscript addresses several reviewer concerns, and the study continues to provide useful insights into how ZIP10 regulates zinc homeostasis and zinc sparks during fertilization in mice. The authors have improved the clarity of the figures, shifted emphasis in the abstract more clearly to ZIP10, and added brief discussion of ZIP6/ZIP10 interactions and ZIP10's role in zinc spark-calcium oscillation decoupling. However, some critical issues remain only partially addressed.

      (1) Oocyte health confound: The use of Gdf9-Cre deletes ZIP10 during oocyte growth, meaning observed defects could result from earlier disruptions in zinc signaling rather than solely from the absence of zinc sparks at fertilization. The authors acknowledge this and propose transcriptome profiling as a future direction. However, since mRNA levels often do not accurately reflect protein levels and activity in oocytes, transcriptomics may not be particularly informative in this context. Proteomic approaches that directly assess the molecular effects of ZIP10 loss seem more promising. Although current sensitivity limitations make proteomics from small oocyte samples challenging, ongoing improvements in this area may soon allow for more detailed mechanistic insights.

      (2) ZIP6 context and focus: The authors clarified the abstract to emphasize ZIP10, enhancing narrative clarity. This revision is appropriate and appreciated.

      (3) Follicular development effects: The biological consequences of ZIP6 and ZIP10 knockout during folliculogenesis are still unknown. The authors now say these effects will be studied in the future, but this still leaves a major mechanistic gap unaddressed in the current version.

      (4) Zinc spark imaging and probe limitations: The addition of calcium imaging enhances the clarity of Figure 3. However, zinc fluorescence remains inadequate, and the authors depend solely on FluoZin-3AM, a dye known for artifacts and limited ability to detect subcellular labile zinc. The suggestion that C57BL/6J mice may differ from CD1 in vesicle appearance is plausible but does not fully address concerns about probe specificity and resolution. As the authors acknowledge, future studies with more selective probes would increase confidence in both the spatial and quantitative analysis of zinc dynamics.

      (5) Mechanistic insight remains limited: The revised discussion now recognizes the lack of detailed mechanistic understanding but does not significantly expand on potential signaling pathways or downstream targets of ZIP10. The descriptive data are useful, but the inability to pinpoint how ZIP10 mediates zinc spark regulation remains a key limitation. Again, proteomic profiling would probably be more informative than transcriptomic analysis for identifying ZIP10-dependent pathways once technical barriers to low-input proteomics are overcome.

      Overall, the authors have reasonably revised and clarified key points raised by reviewers, and the manuscript now reads more clearly. However, the main limitation, lack of mechanistic insight and the inability to distinguish between developmental and fertilization-stage roles of ZIP10, remains unresolved. These should be explicitly acknowledged when framing the conclusions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The authors investigated the role of the zinc transporter ZIP10 in regulating zinc sparks during fertilization in mice. By utilizing oocyte-specific Zip6 and Zip10 conditional knockout mice, the authors effectively demonstrate the importance of ZIP10 in zinc homeostasis, zinc spark generation, and early embryonic development. The study is overall useful as it identifies ZIP10 as an important component of oocyte processes that support embryo development, thus opening the door for further investigations. While the study provides solid evidence for the requirement of ZIP10 in the regulation of zinc sparks and zinc homeostasis, it falls short of revealing the underlying mechanism of how ZIP10 exerts this important function.

      This report is the first to clarify the role of the zinc transporters ZIP10 expressed in oocytes, which was previously unknown, and does not focus on the detailed mechanism. As you pointed out, we believe that the mechanism will also be important information in the field of fertilization and embryogenesis research, and we believe that it is necessary to consider this issue in the future.

      (1) The zinc transporters the authors are knocking out are expressed in mouse oocytes through follicular development, and the Gdf9-cre driver used means these oocytes were grown in the absence of appropriate Zinc signaling. Thus, it would be difficult to assert that the lack of fertilization associated with zinc sparks is solely responsible for the failure of embryo development. Spindle morphology and other meiotic parameters do not necessarily report oocyte health, so normalcy of these features may not be a strong argument when it comes to metabolic issues.

      As you rightly observe, the results of this study do not entirely exclude the possibility of oocyte health in the absence of adequate zinc homeostasis during oocyte growth. However, evidence has been presented demonstrating that spindle formation does occur in Zip10<sup>d/d</sup> mouse oocytes (Fig.2 C), that fertilization occurs despite the absence of zinc spark (Fig.3 and Fig. 4A), and that some embryos develop to blastocysts (Fig. 4 B). We believe that future studies should evaluate the transcriptome profile of Zip10<sup>d/</sup> mouse oocytes.

      (2) While comparing ZIP6 and ZIP10 in the abstract provides context, focusing more on ZIP10 would improve reader comprehension, as ZIP10 is the primary focus of the study. Emphasizing the specific role of ZIP10 will help the reader grasp the core findings more clearly.

      Thank you for your valuable input. We have revised the summary to focus more on ZIP10 by removing the section in the summary that mentions ZIP6 (P.1-2 Line 34-52).

      (3) Zinc transporters ZIP6 and ZIP10 are expressed during follicular development, but the biological significance of the observation is not clearly addressed. The authors should investigate whether the ZIP6 and ZIP10 knockout affects follicular development and discuss the potential implications.

      Thank you for your valuable input. As you mentioned, we have not been able to clarify the effects of ZIP6 and ZIP10 knockout on follicle formation. However, this report clarifies the role of ZIP-mediated zinc ions in their inclusion. The effect of ZIP knockout on follicle formation will be discussed in the future.

      (4) In Figure 3, the zinc fluorescence images are unclear, making it difficult for readers to interpret the data. Including snapshot images of calcium and zinc spikes as part of the main figure would improve clarity. Moreover, adding more comparative statements and a deeper explanation of why Zip10 KO mice exhibit normal calcium oscillations but lack zinc sparks would strengthen the manuscript.

      Thank you for your suggestion. We have also added images of calcium elevation after fertilization to Fig. 3 and Fig. S3. In addition, the figure legends have been changed (P.29 Line 937-939, P.34 Line 1104-1106). As to why Zip10 KO mice show normal calcium oscillations but lack zinc spikes, as mentioned in Discussion (P. 10 Line 299-300), we speculate that zinc ions existed in Zip10d/d mouse oocytes induce Ca2+ release without compromising IP3R1 sensitivity. We also assume that the lack of zinc spark is due to low accumulation of zinc ion levels in the oocytes via ZIP10, as described in Discussion (P.10 Line 300-302).

      (5) While the study identifies the role of ZIP10 in zinc spark generation, it lacks a clear mechanistic insight. The topic itself is interesting, but without providing a more detailed explanation of the underlying mechanisms, the study leaves an important gap. Further discussion on the signaling pathways potentially involved in zinc spark regulation would add depth to the findings.

      Thank you for your input. This report is the first to clarify the role of the zinc transporters ZIP6 and ZIP10 expressed in oocytes, which was previously unknown, and does not focus on the detailed mechanism. As you pointed out, we believe that the mechanism and signaling pathways will also be important information, and we believe that it is necessary to research this issue in the future.

      Reviewer #2 (Public review):

      Summary:

      In this important study, the authors examine the role of two zinc uptake transporters, Zip6 and Zip10, which are important during the maturation of oocytes, and are critical for both successful fertilization and early embryogenesis.

      Strengths:

      The authors report that oocytes from Zip10 knockout mice exhibit lower labile zinc content during oocyte maturation, decreased amounts of zinc exocytosis during fertilization, and affect the rate of blastocyst generation in fertilized eggs relative to a control strain. They do not observe these changes in their Zip6 knockout animals. The authors present clear and well-documented results from a broad range of experimental modalities in support of their conclusions.

      Thank you for your positive comments.

      Weaknesses:

      (1) The authors' statement that Zip10 is not expressed in the oocyte nuclei (line 252). Furthermore, in that study, ZIP10 was detected in the nuclear/nucleolar positions of oocytes of all follicular stages (Chen et al., 2023), which we did not observe. This is not supported by Figure 1, where some Zip10 signal is apparent in the primordial, primary, and secondary follicle oocytes. This statement should be corrected.

      Thank you for pointing this out. Our results of ISH staining (Fig. 1A) and immunofluorescence staining (Fig. 1B) showed that it was not detected at the nucleus/nucleolus location. In other words, they could not be detected at the mRNA and protein levels. Based on the results of ISH staining and immunofluorescence staining, we conclude that it is expressed in the plasma membrane.

      (2) Based on the FluoZin-3AM data, there appears to be less labile zinc in the Zip10d/d oocyte, eggs, and embryos; however, FluoZin-3AM has a number of well-known artifacts and does not accurately capture the localization of labile zinc pools. The patterns do not correspond to the well-documented zinc-containing cortical vesicles. Another zinc probe, such as ZinPyr-4 or ZincBY-1 should be used to visualize the zinc vesicles and confirm that there is less labile zinc in these locations as well.

      Thank you for your suggestion. Previous studies (Lisle et al., 2013, Reproduction) and our report (Kageyama et al., 2022, Animal Science Journal) have shown that it is possible to examine the presence of labile zinc ions in oocytes and embryos. In addition, mouse oocytes (embryos) reported in previous studies are from CD1 (ICR) mice, whereas our study was conducted using C57BL/6J mice. In our report (Kageyama et al., 2024, Journal of Reproduction and Development), we reported that the appearance of zinc vesicles in the oocytes observed by Fluozin-3AM staining in CD1 and C57BL/6J mice is different, and we believe that this appearance of cortical vesicles in C57BL/6J mice is not a problem. As you say, we have not used other zinc probes and will consider this in the future.

      (3) Line 268 The results indicate that ZIP10 is mostly responsible for the uptake of zinc ions in mouse oocytes. The situation seems a bit more complicated given that the differences in labile zinc content between oocytes from the WT and Zip10d/d animals are small (only 20-30 %) and that the zinc spark is diminished but still apparent at a low level in the Zip10d/d oocytes. Clearly, other factors are involved in zinc uptake at these stages. A variety of studies have suggested that Zip6 and Zip10 work together, perhaps even functioning as a heterodimer in some systems. The double KO would address this more clearly, but if it is not available, it might be more prudent to state that Zip10 plays some role in uptake of zinc in mouse oocytes while the role of Zip6 remains uncertain.

      We would like to express our gratitude for the comments received. The phenotype of double knockout mice for ZIP6 and ZIP10 will be discussed at a future date. We have also added to the text that the role of ZIP6 remains uncertain (P. 11 Line 353-354).

      (4) Zip6d/d oocytes did not have changes in labile zinc, nor did the lack of Zip6 have an impact on the zinc spark. However, Figure S1 does show a small amount of detectable Zip6 in the western blot. It is possible that this small amount could compensate for the complete lack of Zip6. Can ZIP6 be found in immunofluorescence of GV oocytes or MII eggs from the Zip6d/d animals? Additionally, it is possible that Zip6's role is only supplementary to that of Zip10. The authors should discuss this possibility. It would also be interesting to see if the Zip6/Zip10 double knockout displays greater defects compared to the Zip10 knockout when considering previous studies.

      Thank you for your input. The mice are deficient in the gene so that ZIP6 is not functional. It is our notion that the results of WB analysis are not indicative of protein structural functionality, even in cases where the ZIP6 antibody detects a small amount of protein. Since the role of ZIP6 was not elucidated in this study, we added a statement to that effect in the text (P. 11 Line 353-354). In addition, studies using ZIP6/Zip10 double knockout mice will be discussed in the future.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      We have revised the text based on the reviewerʼs suggestions.

      Reviewer #1 (Recommendations for the authors):

      (1) In lines 133-136, it seems that the authors would like to aim to emphasize the lack of research on oocytes compared to other tissues and cells. However, the inclusion of unrelated contexts, such as the role of ZIP10 in cancer and skin, appears unnecessary and detracts from the focus on oocyte-specific mechanisms. Removing these unrelated sentences would help maintain clarity and relevance in the introduction.

      *As you indicated, we removed the sentence that is not related to oocytes (P.4 Line 120-125). Further, they reported that targeted disruption using Zip6- and Zip10- specific morpholino injection or antibody incubation induced alteration of the intracellular labile zinc content, spontaneous resumption of meiosis from the PI arrest and premature arrest at a telophase I-like state (Kong et al., 2014). It is clear from these reports that ZIP6 and ZIP10 are involved in zinc transport in oocytes, but the function is not elucidated.”

      (2) Ensure that all video files are properly labeled to enhance understanding.

      Improved video labels for clarity (Movie 1-8, Movie S1-S4)

      (3) Correct mislabeling issues, such as the one in line 209.

      Corrected as follows: Zip10<sup>d/d</sup> mouse oocytes can be fertilized but were unlikely to develop to blastocysts (P. 6-7 Line 196-197).

      (4) In Figure 4D, the amount of ZIP2 appears to increase relative to actin. Including quantification would make the data more robust. Similarly, in Figure 4F, JUNO levels appear increased in Zip10 KO. Please provide quantification.

      The WB band images in Fig. 4D were quantified and their graphs were added to lower part of Fig. 4D. Furthermore, the Juno of Immunofluorescent images in Figure 4F were quantified and their graphs were added to Fig. S4. Figure legends and text were corrected and added.P. 30 Line 975-979: Expression level of β-actin serves as a protein loading control and quantified the expression level of ZP2. Molecular mass is indicated at the left. Statistical differences were calculated according to the one-way ANOVA. Different letters represent significant differences (p < 0.05).

      P. 35 Line: Fig. S4 Comparison of JUNO expression in Zip10<sup>f/f</sup> and Zip10<sup>d/d</sup> mouse MII oocytes. To measure JUNO-immunofluorescence intensity, oocytes images were selected as regions of interest (ROIs) and measured using ImageJ. Statistical differences were calculated according to student’s t-test (p > 0.05; no significant difference).P.7 Line 206-209: As for the expression of JUNO, it had the same expression than between null and control oocytes (Fig. S4) and the temporal dynamics of its disappearance from the cortex after fertilization was similar for both Zip10<sup>f/f</sup> and Zip10<sup>d/d</sup> groups (Fig. 4F).

      (5) Some of the sentences lack proper references.

      The entire text was reviewed and references inserted where necessary.

      P.7 Line 221, P.7 Line222-223, P.8 Line 253-254, P.12 Line 358-360 and P.24 Line 698-699.

      Reviewer #2 (Recommendations for the authors):

      Revisions are warranted in order to address the issues noted in the Weaknesses section of the Public Review. 

      Thank you for your comments, we have individually addressed the areas you pointed out in the Weaknesses section. The following text has also been corrected and edited.

      (1) Line 247 "In primordial follicles, the ooplasmic staining of ZIP10 we anticipate corresponds to ooplasmic vesicular sites. 

      The text of P. 8 Line 230-232 was revised as follows.

      "In primordial follicles, the ooplasm staining of ZIP10 we anticipate corresponds to ooplasmic vesicular sites.

      (2) Line 926 "ZP2 was not stained in primordial follicle, but primary, secondary, and antral follicles stained. FOXL2 was observed in granulosa cells in 928 of all stage follicles. The scale bar represents 20 μm of primordial-secondary follicle and 150 μm of antral follicle." All three sentences have grammar issues that should be fixed. 

      The text of p.28 Line 908-911 was revised as follows.

      It was observed that ZP2 was not present in the primordial follicle; however, it was present in the primary, secondary and antral follicles. Furthermore, FOXL2 was observed at granulosa cells of all stage follicles. Scale bar: 20 µm (primordial, primary and secondary follicle); 150 µm (antral follicle).

    1. eLife Assessment

      This study in the Drosophila antennal lobe, which contains multiple non-equivalent sensory channels, provides valuable new insight into how early-life sensory experience can produce lasting, cell-type-specific changes in neural circuit function. The work convincingly demonstrates that glial-mediated pruning during a defined developmental window leads to persistent suppression of odor responses in one olfactory neuron type, while sparing another. The evidence is solid and supported by multiple complementary approaches, although some mechanistic interpretations remain speculative and would benefit from additional functional testing.

    2. Reviewer #1 (Public review):

      Summary:

      This study builds on earlier work showing that early-life odor exposure can trigger glial-mediated pruning of specific olfactory neuron terminals in Drosophila. Moving from indirect to direct functional imaging, the authors show that pruning during a narrow developmental window leads to long-lasting suppression of odor responses in one neuron type (Or42a) but not another (Or43b). The combination of calcium and voltage imaging with connectomic analysis is a strength, though the voltage imaging results are less straightforward to interpret and may not reflect synaptic output changes alone.

      Strengths:

      Biologically, one of the main strengths of this work is the direct comparison between two odor-responsive OSN types that differ in their long-term adaptation to early-life odor exposure. While Or42a OSNs undergo pruning and remain persistently suppressed into late adulthood, Or43b OSNs, which also respond to the same odor, show little lasting change. This contrast not only underscores the cell-type specificity of critical-period plasticity but also points to a potential role of inhibitory network architecture in determining susceptibility. The persistence of the Or42a suppression well beyond the developmental window provides compelling evidence that early glia-mediated pruning can imprint a stable, life-long functional state on selected sensory channels. By situating these functional outcomes within the context of detailed connectomic data, the study offers a framework for linking structural connectivity to long-term sensory coding stability or vulnerability.

      Weaknesses:

      The narrative begins with the absence of changes in PN dendrites and axons. While this establishes specificity, it is a relatively weak starting point compared to the novel OSN functional results. Calcium imaging with GCaMP, though widely used, is an indirect measure of synaptic function, and reduced signals could reflect changes in non-synaptic calcium influx as well as release probability. The interpretation of the voltage imaging results is also unclear: if suppression were solely due to impaired synaptic release, one might expect action potential-evoked voltage signals to remain unchanged. The reported changes raise the possibility of deficits in action potential initiation or propagation, which would shift the mechanistic explanation.

      The difference between Or42a and Or43b OSNs is attributed to varying inhibitory input densities from connectome data, but this remains speculative without functional tests such as manipulating GABA receptor expression in OSNs. In Or43b, there is essentially no strong phenotype, making it premature to ascribe the absence of suppression solely to inhibitory connectivity. Finally, the study does not connect circuit-level changes to behavioral outcomes; assays of odor-guided attraction or discrimination could place the findings in an organismal context. Some introduction material overlaps with the authors' 2024 paper, and the novelty of the present study could be signposted more clearly.

    3. Reviewer #2 (Public review):

      Recent work from the authors identified the synaptic changes and glial reaction that occur during exposure of a Drosophila odorant receptor neuron population to continued exposure of a stimulating odorant. This work markedly advanced our understanding of cellular response to critical periods. This current Advance manuscript carries that work forward and examines the non-autonomous responses to constant odorant exposure. The authors discover that the changes to ORN populations are not accompanied by changes to either PN dendrite or PN axon volume, nor are they concurrent with changes in postsynaptic PN structures. These changes are, however, notable, accompanied by changes in Ca2+ and voltage responses in ORNs. Importantly, this set of responses is specific to the Or42a ORNs (that are highly sensitive to the odorant in question, ethyl butyrate) and not the Or43b ORNs (which respond to ethyl butyrate, but not as drastically). Finally, the authors include connectomics analyses showing that Or43b and Or42a ORNs differ in their synaptic input/output relationships.

      This is an excellent use of the Advance mechanism for the journal, as these are important follow-up findings for the parent story. The non-autonomous effects (or lack thereof) on PNs is an important part of the story, as is the functional response of Or42a ORNs and the differing response of similarly (but not identically) sensitive Or43b ORNs. The experiments are well-conceived, controlled, and conducted. Where the story falters a bit, though, is with the connectomics analysis. The authors show distinct differences between Or43b and Or42b ORN input-output relationships, and suggest that those differences may underlie the differences observed in their response to ethyl butyrate exposure during the critical period. This is certainly a possibility, but as it stands now, it is too disconnected to offer significant proof. There would have to be additional experiments to address this. Right now, the inclusion of the connectomics work feels like a distraction at best, and a complete non sequitur at worst. To be clear, the connectomics work is well done and I have no issues with its validity, but it is not helpful to the central thesis of the work. I would suggest the authors either remove it entirely or strongly rethink how it fits into the paper.

      Major Concerns:

      (1) The examination of PN axon terminals in the MB and LH is interesting, but it is only one possibility. Oftentimes, the volume of neurons remains constant with perturbation, while the synapse number is affected. Figure 1C and E would be greatly helped by examining synapse number (via Brp or Brp-Short) in the PN axons.

      (2) The use of dlg1[4K] is a strong use of a new tool, but the result is surprising. The presynaptic ORN synapse number onto the PNs is notably changed, but that is not reflected in a postsynaptic PSD-95 change. That suggests a compensatory mechanism that the authors might explore. A good proportion of PN puncta should be postsynaptic to those ORNs, so why aren't they adjusted?

    1. eLife Assessment

      The authors used three genetically diverse mouse models to investigate the impact of genome diversity on metabolic disease outcomes, such as obesity and glucose tolerance. This study is important because it integrates comprehensive metabolic analyses and multi-tissue phenotyping across sexes to reveal pathways relevant to obesity and its complications; the data are convincing and uncover several pathways that advance understanding of disease etiology while suggesting potential therapeutic avenues to prevent obesity-related health risks. There are limitations, such as a limited number of mouse strains used in the work, the 9-week feeding regime may be too short to capture full metabolic remodeling, and the mechanisms by which the immune-adipose axis impacts the broader phenotype are not fully described. Overall, the study is compelling, but the manuscript could be improved by justifying the strain selection, addressing the concern about the feeding duration, and providing stronger mechanistic support or discussion.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed an in-depth analysis of three mouse strains with different levels of susceptibility to metabolic disease. Transcriptomics analyses of relevant deep tissues revealed many strain-specific differences in response to diet. They used gene set enrichment analysis to highlight possible biological pathways that may be involved in obesity and its metabolic consequences. These results were then confirmed using public data in both mice and humans.

      Strengths:

      Overall, this is an interesting study into the biological basis of differing phenotypic outcomes in response to metabolic challenges. The findings uncover several pathways that may shed light on the etiology of obesity and the associated health risks, as well as offer potential therapeutic avenues to prevent them.

      Weaknesses:

      While the experimental design and analysis are mostly good, some aspects of the present paper could be improved.

      (1) Most results are insufficiently described. P-values are almost entirely absent in the main text. Sometimes the significance is indicated in the figures, and other times it is missing. For example, strains are sometimes described as having a higher XYZ, something that is never shown in the plots, and no p-value is ever given.

      (2) While the biological methods are meticulously described, statistical methods are barely mentioned in the methods section. For example, line 578, "multiple comparisons (...) were performed using the glht function of the multcomp package". What is this? What method does it use? And how was mediation analysis done? Line 575 mentions that models were compared, with no description of how this was done. Mentioning the package (or even function) is not sufficient. The package and function are an implementation; they are not the method. The actual method needs to be clearly mentioned and (at least minimally) described, in addition to having the reference for methods that are not ubiquitous (i.e., the Benjamin-Hochberg method is well-enough established to forgo this).

      (3) The methods should also be briefly introduced in the results section before describing the results of those methods.

      (4) The role of immune signaling pathways and associated phenotypes (e.g., monocyte fraction) is over-interpreted. While the differences shown are convincing, they do not convincingly show a role in either obesity or disease. The parsimonious explanation is that such changes happen as a consequence of dyslipidemia rather than a cause. It is possible that these pathways play a more direct role in this, but the authors do not present compelling evidence of this, and, failing this, the language in the text needs to be toned down.

    3. Reviewer #2 (Public review):

      This study investigated changes in metabolic health across three genetically diverse mouse strains (NZO/HlLtJ, C57BL/6J mice, CAST/EiJ) that were fed either control or high-fat high-sucrose diets. The strength of this study is the depth of metabolic phenotyping, the use of both male and female mice, and the multi-tissue metabolic analysis, including metabolic and gene expression analysis in pancreatic islets, kidney, muscle, heart, liver, and adipose tissue.

      Weaknesses include that only three mouse strains were included in this comparison, particularly given that similar comparisons have been published in the past and that the Jax lab has access to a wide range of mouse strains with diverse genetic backgrounds. Why were CAST mice included over (for example) BALB/c mice that are more commonly used in metabolic studies and are well known to show protection against diet-induced metabolic disease? Furthermore, the feeding regime was limited to 9 weeks, which may not be sufficient to evoke pronounced metabolic remodelling.

      NZO mice are well known to develop obesity. However, only approximately 50% develop type 2 diabetes and beta-cell dysfunction. How were these mice selected in the study? The results state 'Most of the male NZO mice and a few female mice displayed overt diabetes', suggesting that all mice were included irrespective of their diabetic phenotype. More information on the rationale for this is required.

      The transcriptomics data are presented in a convoluted way. As a reader, the main interest would be to determine the differences in diet-induced adaptations within each strain (e.g., why are CAST mice resistant to diet-induced metabolic defects?). However, the way Figure 4 is currently presented does not allow for this. Instead, the data are 'compressed' by looking at general changes in metabolic pathways between tissues in all three mouse strains. In addition, Figure 4E does not show the directionality of the responses within each pathway. For example, are the metabolism and inflammation pathways suppressed or activated? While more data is shown for adipose tissue, this is not sufficient.

      Currently, the metabolic cage data are separated by diet within the main figures. However, given that the diet effect is the major comparison, this needs to be rearranged, and strain differences within each diet could be shown within the supplement.

      The graphs lack labelling throughout to specify which lines/bars represent which strains and diets. This is particularly the case in the metabolic cage analysis.

    4. Reviewer #3 (Public review):

      Summary:

      Using three strains of mice that are founders of the Diversity Outbred Population of mice, this paper attempts to identify key genetic drivers of obesity and metabolic dysfunction. Through a series of in-depth phenotyping experiments, the authors describe substantial differences in the propensity of these strains to develop obesity and complications associated with obesity. The key here was the careful selection of these strains, as they mostly spanned the spectrum of minor susceptibility (C57BL/6J), major susceptibility (NZO/HILtJ), and complete resistance to diet-induced obesity (CAST/EiJ). This was done in the setting of both a normal diet and a high-fat diet. These studies identified that one of the most transcriptionally activated tissues in this setting across the strains was adipose tissue. Furthermore, a critical pathway in adipose tissue that inferred protection against obesity in the CAST strain was related to immune infiltration. Subsequently, the authors extended their studies into this phenotype using their existing access to the vast array of genetic information from the DO datasets. From this analysis, it was identified that a key region on Chr19 had a significant influence on this phenotype, and subsequent work investigated the potentially causal genes. Overall, this study encompasses an impressive amount of in vivo and genetic work and identifies some new gene regulators associated with obesity complications, which warrant further investigation.

      Strengths:

      This study engages multiple mouse lines with diet intervention, as well as powerful genetic mapping tools to isolate genetic drivers of various obesity related phenotypes. The animal studies are thorough and well performed, and they also include detailed omics analysis of several tissues. Subsequent genetic mapping uses some of the world's most powerful preclinical genetic approaches, and findings identify some novel genes associated with obesity.

      Weaknesses:

      These mouse lines and hybrid genetic screens in this paper have been used for some years now to map similar phenotypes, so in that sense, the approach is not overly novel. Moreover, the most compelling and exciting part of the study, in this reviewer's opinion, is the DO mapping of the immune phenotype in adipose tissue. In some ways, the authors could have potentially come to this same conclusion without the need to perform the mouse studies in the three different strains (other than the nice storytelling of finding the phenotype initially in CAST). Likewise, with this being the most novel aspect of the study, it was a shame that the genes identified at Chr19 were not investigated in more detail in the manuscript, other than just some associative outcomes in mice and humans. It would have been pleasing to see some attempt to validate one of these genes in a mouse model, linking it to either obesity or immune phenotypes in WAT.

    1. eLife Assessment

      This work provides an important resource identifying 72 proteins as novel candidates for plasma membrane and/or cell wall damage repair in budding yeast, and describes the temporal coordination of exocytosis and endocytosis during the repair process. The data are convincing; however, additional experimental validation will better support the claim that repair proteins shuttle between the bud tip and the damage site.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yamazaki et al. conducted multiple microscopy-based GFP localization screens, from which they identified proteins that are associated with PM/cell wall damage stress response. Specifically, the authors identified that bud-localized TMD-containing proteins and endocytotic proteins are associated with PM damage stress. The authors further demonstrated that polarized exocytosis and CME are temporally coupled in response to PM damage, and CME is required for polarized exocytosis and the targeting of TMD-containing proteins to the damage site. From these results, the authors proposed a model that CME delivers TMD-containing repair proteins between the bud tip and the damage site.

      Strengths:

      Overall, this is a well-written manuscript, and the experiments are well-conducted. The authors identified many repair proteins and revealed the temporal coordination of different categories of repair proteins. Furthermore, the authors demonstrated that CME is required for targeting of repair proteins to the damage site, as well as cellular survival in response to stress related to PM/cell wall damage. Although the roles of CME and bud-localized proteins in damage repair are not completely new to the field, this work does have conceptual advances by identifying novel repair proteins and proposing the intriguing model that the repairing cargoes are shuttled between the bud tip and the damaged site through coupled exocytosis and endocytosis.

      Weaknesses:

      While the results presented in this manuscript are convincing, they might not be sufficient to support some of the authors' claims. Especially in the last two result sessions, the authors claimed CME delivers TMD-containing repair proteins from the bud tip to the damage site. The model is no doubt highly possible based on the data, but caveats still exist. For example, the repair proteins might not be transported from one localization to another localization, but are degraded and resynthesized. Although the Gal-induced expression system can further support the model to some extent, I think more direct verification (such as FLIP or photo-convertible fluorescence tags to distinguish between pre-existing and newly synthesized proteins) would significantly improve the strength of evidence.

      Major experiment suggestions:

      (1) The authors may want to provide more direct evidence for "protein shuttling" and for excluding the possibility that proteins at the bud are degraded and synthesized de novo near the damage site. For example, if the authors could use FLIP to bleach bud-localized fluorescent proteins, and the damaged site does not show fluorescent proteins upon laser damage, this will strongly support the authors' model. Alternatively, the authors could use photo-convertible tags (e.g., Dendra) to differentiate between pre-existing repair proteins and newly synthesized proteins.

      (2) In line with point 1, the authors used Gal-inducible expression, which supported their model. However, the author may need to show protein abundance in galactose, glucose, and upon PM damage. Western blot would be ideal to show the level of full-length proteins, or whole-cell fluorescence quantification can also roughly indicate the protein abundance. Otherwise, we cannot assume that the tagged proteins are only expressed when they are growing in galactose-containing media.

      (3) Similarly, for Myo2 and Exo70 localization in CME mutants (Figure 4), it might be worth doing a western or whole-cell fluorescence quantification to exclude the caveat that CME deficiency might affect protein abundance or synthesis.

      (4) From the authors' model in Figure 7, it looks like the repair proteins contribute to bud growth. Does laser damage to the mother cell prevent bud growth due to the reduction of TMD-containing repair proteins at the bud? If the authors could provide evidence for that, it would further support the model.

      (5) Is the PM repair cell-cycle-dependent? For example, would the recruitment of repair proteins to the damage site be impaired when the cells are under alpha-factor arrest?

    3. Reviewer #2 (Public review):

      This paper remarkably reveals the identification of plasma membrane repair proteins, revealing spatiotemporal cellular responses to plasma membrane damage. The study highlights a combination of sodium dodecyl sulfate (SDS) and lase for identifying and characterizing proteins involved in plasma membrane (PM) repair in Saccharomyces cerevisiae. From 80 PM, repair proteins that were identified, 72 of them were novel proteins. The use of both proteomic and microscopy approaches provided a spatiotemporal coordination of exocytosis and clathrin-mediated endocytosis (CME) during repair. Interestingly, the authors were able to demonstrate that exocytosis dominates early and CME later, with CME also playing an essential role in trafficking transmembrane-domain (TMD) containing repair proteins between the bud tip and the damage site.

      Weaknesses/limitations:

      (1) Why are the authors saying that Pkc1 is the best characterized repair protein? What is the evidence?

      (2) It is unclear why the authors decided on the C-terminal GFP-tagged library to continue with the laser damage assay, exclusively the C-terminal GFP-tagged library. Potentially, this could have missed N-terminal tag-dependent localizations and functions and may have excluded functionally important repair proteins.

      (3) The use of SDS and laser damage may bias toward proteins responsive to these specific stresses, potentially missing proteins involved in other forms of plasma membrane injuries, such as mechanical, osmotic, etc.). SDS stress is known to indirectly induce oxidative stress and heat-shock responses.

      (4) It is unclear what the scale bars of Figures 3, 5, and 6 are. These should be included in the figure legend.

      (5) Figure 4 should be organized to compare WT vs. mutant, which would emphasize the magnitude of impairment.

      (6) It would be interesting to expand on possible mechanisms for CME-mediated sorting and retargeting of TMD proteins, including a speculative model.

    4. Reviewer #3 (Public review):

      Summary:

      This work aims to understand how cells repair damage to the plasma membrane (PM). This is important, as failure to do so will result in cell lysis and death. Therefore, this is an important fundamental question with broad implications for all eukaryotic cells. Despite this importance, there are relatively few proteins known to contribute to this repair process. This study expands the number of experimentally validated PM from 8 to 80. Further, they use precise laser-induced damage of the PM/cell wall and use live-cell imaging to track the recruitment of repair proteins to these damage sites. They focus on repair proteins that are involved in either exocytosis or clathrin-mediated endocytosis (CME) to understand how these membrane remodeling processes contribute to PM repair. Through these experiments, they find that while exocytosis and CME both occur at the sites of PM damage, exocytosis predominates in the early stages of repairs, while CME predominates in the later stages of repairs. Lastly, they propose that CME is responsible for diverting repair proteins localized to the growing bud cell to the site of PM damage.

      Strengths:

      The manuscript is very well written, and the experiments presented flow logically. The use of laser-induced damage and live-cell imaging to validate the proteome-wide screen using SDS-induced damage strengthens the role of the identified candidates in PM/cell wall repair.

      Weaknesses:

      (1) Could the authors estimate the fraction of their candidates that are associated with cell wall repair versus plasma membrane repair? Understanding how many of these proteins may be associated with the repair of the cell wall or PM may be useful for thinking about how these results are relevant to systems that do or do not have a cell wall. Perhaps this is already in their GO analysis, but I don't see it mentioned in the manuscript.

      (2) Do the authors identify actin cable-associated proteins or formin regulators associated with sites of PM damage? Prior work from the senior author (reference 26) shows that the formin Bnr1 relocalizes to sites of PM damage, so it would be interesting if Bnr1 and its regulators (e.g., Bud14, Smy1, etc) are recruited to these sites as well. These may play a role in directing PM repair proteins (see more below).

      (3) Do the authors suspect that actin cables play a role in the relocalization of material from the bud tip to PM damage sites? They mention that TMD proteins are secretory vesicle cargo (lines 134-143) and that Myo2 localizes to damage sites. Together, this suggests a possible role for cable-based transport of repair proteins. While this may be the focus of future work, some additional discussion of the role of cables would strengthen their proposed mechanism (steps 3 and 4 in Figure 7).

      (4) Lines 248-249: I find the rationale for using an inducible Gal promoter here unclear. Some clarification is needed.

    1. eLife Assessment

      This important manuscript presents a novel application of the SANDI (Soma and Neurite Density Imaging) model to study microstructural alterations in the basal ganglia of individuals with Huntington's disease (HD). The compelling methods, to our understanding, the first application of SANDI to neurodegenerative diseases, provide strong evidence for HD-related neurodegeneration in the striatum, account significantly for striatal atrophy, and correlate with motor impairments. The integration of novel diffusion acquisition and modelling methods with multimodal behavioural data are both of high value in their own right, and create a framework for future studies.

    2. Reviewer #1 (Public review):

      (1) In this study, the authors aimed at characterizing Huntington's Disease (HD) - related microstructural abnormalities in the basal ganglia and thalami as revealed using Soma and Neurite Density Imaging (SANDI) indices (apparent soma density, apparent soma size, extracellular water signal fraction, extracellular diffusivity, apparent neurite density, fractional anisotropy and mean diffusivity).

      (2) The study implements a novel biophysical diffusion model that extends up-to-date methodologies and presents a significant potential for quantifying neurodegenerative processes of the grey matter of the human brain in vivo. The authors comment on the usefulness of this technique in other pathologies, but they exemplify it only with multiple sclerosis. Further development of this, building evidence, should be provided.

      (3) The study found that HD-related neurodegeneration in the striatum accounted significantly for striatal atrophy and correlated with motor impairments. HD was associated with reduced soma density, increased apparent soma size, and extracellular signal fraction in the basal ganglia, but not in the thalami. Additionally, these effects were larger at the manifest stage.

      (4) The results of this work demonstrate the impact of HD on the basal ganglia and thalami, which can be further explored as a non-invasive biomarker of disease progression. Additionally, the study shows that SANDI can be used to explore grey matter microstructure in a variety of neurological conditions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate whether advanced microstructural diffusion MRI modeling using the SANDI framework could reveal clinically relevant tissue alterations in the subcortical structures of individuals with Huntington's disease (HD). Specifically, they sought to determine if SANDI-derived parameters-such as soma density, soma size, and extracellular diffusivity-could detect abnormalities in both manifest and premanifest HD stages, complement standard MRI biomarkers (e.g., volume, MD), and correlate with disease burden and motor impairment. Through this, they hoped to demonstrate the feasibility and added biological specificity of SANDI for early detection and characterization of HD pathology.

      Strengths:

      (1) Novelty and relevance:

      This is, to the best of my knowledge, the first clinical deployment of SANDI in HD, offering more biophysically interpretable and specific imaging biomarkers than standard DTI or volumetric features.

      (2) More specific microstructural insight: Traditional approaches have used volumetric features (e.g., striatal volume loss) or DTI metrics (like FA and MD), which are indirect and non-specific markers. They can indicate something is "wrong" but not what is wrong.

      (3) SANDI parameters permit establishing clearer links with microstructure:

      o Apparent soma density (fis): proxy for neuronal/glial cell body density.

      o Apparent soma size (rs): reflects possible gliagl hypertrophy or neuronal shrinkage.

      o Neurite density (fin): linked to dendritic/axonal integrity.

      o Extracellular fraction and diffusivity: sensitive to edema, gliosis, and tissue loss.

      In this way, a decrease in soma density can be related to neural loss (e.g., medium spiny neurons), and an increase in soma size and extracellular fraction could be related to glial reactivity (astrocytes, microglia). This enables differentiating between atrophy due to neuron loss vs reactive gliosis, which volumetrics or DTI cannot do.

      (4) Integration of modalities: The inclusion of motor impairment (Q-Motor), HD-ISS staging, and multi-compartment diffusion modeling is a methodological strength.

      (5) Early detection potential: SANDI metrics showed abnormalities in premanifest HD, sometimes even when volume loss was mild or absent. This suggests the potential for earlier, more sensitive biomarkers of disease progression.

      (6) Predictive power: Regression models showed that SANDI metrics explained up to 63% of the variance in striatal volumes in HD. And this correlated strongly with motor impairment and disease burden (CAP100). This shows they are not just redundant with volume or DTI, but they are complementary and potentially more mechanistically meaningful.

      Weaknesses:

      Certain aspects of the study would benefit from clarification:

      (1) Scanner and acquisition consistency: While HD data are from the WAND study, it is not clear whether controls were scanned on the same scanner or protocol. Given the use of model-derived metrics (especially SANDI), differences in scanner or acquisition could introduce confounds. Also, although it offers novel and biologically informative markers, widespread clinical translation still faces hurdles. For instance, the study used a 3T Connectom scanner (300mT/m gradients), which is not widely available. Reproduction of these results in standard 3T clinical scanners would be a great addition, in scenarios with lower resolution, less precise parameter recovery, and longer scans if SNR needs to be maintained.

      (2) HD-ISS staging and group comparisons:<br /> a) Only 26-27 out of 56 gene-positive participants could be assigned HD-ISS stages, and none were classified into stages 0 or 4.

      b) Visual overlap between stages 1 and 2 in behavioral and imaging features suggests that staging-based group separation may not be robust.

      c) The above may lead to claims based on progression across HD-ISS stages to be overinterpreted or underpowered

      (3) Regression modeling choices:<br /> a) SANDI metrics included in the models differ between HC and HD groups, reducing comparability.

      b) The potential impact of multicollinearity (e.g., between fis and rs) is not discussed.

      c) Beta coefficients could reflect model instability or parameter degeneracy rather than true biological effects.

      These issues do not undermine the study's main conclusions, which effectively demonstrate the feasibility and initial clinical relevance of applying SANDI to HD. Nonetheless, addressing them more thoroughly would enhance the clarity and interpretability of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Ioakeimidis and colleagues studied microstructural abnormalities in N=56 Huntington's disease (HD) patients compared to N=57 normative controls. The authors used a powerful MRI Connectom scanner and applied the SANDI model to estimate the soma size, neurite size, soma density, and extracellular fraction in key subcortical nuclei related to HD. In the striatum, they found decreased soma density and increased soma size, which also seemed to become more pronounced in advanced HD individuals in the final exploratory analyses. The authors conducted important analyses to find whether the SANDI measures correlate with clinical scores (i.e., QMotor) and whether the variance of the striatal volume is explained by the SANDI measures. They found a relationship between SANDI measures for both.

      Strengths:

      The study is both innovative and of high interest for the HD community. The authors provide a rich pool of statistical analyses and results that anticipate the questions that may emerge in the HD research community. Statistics are carefully chosen and image processing is done with state-of-the-art methods and tools. The sample size gives sufficient credibility to the findings. Altogether, I think this study sets a milestone in the attempts of the HD community to understand neuropathological processes with non-invasive methods, and extends the current knowledge of microstructural anomalies identified in HD with diffusion MRI. More importantly, the newly identified anomalies in soma size and soma density open new avenues for studying these biological effects further and perhaps developing these biomarkers for use in clinical trials.

      Weaknesses:

      (1) An important question is whether the SANDI measures, which require an expensive scanner and elaborate processing, are better biomarkers than the more traditional DTI measures. Can the authors compare the effect size of FA/MD with SANDI measures? In some of the plots and tables, FA/MD seem to have comparable, if not higher, correlations with QMotor or CAP scores. On the same vein, it is unclear whether DTI measures were included in hierarchical stepwise regression. I wonder if the stepwise models may have picked up FA/MD instead of SANDI measures if they are given a chance. Overall, I hope the authors can discuss their findings also in this light of cost vs. benefit of adopting SANDI in future studies, which is an important topic for clinical trials.

      (2) Similar to the above point, it is very important to consider how strong the biomarking signal is from SANDI measures compared to the good old striatal volume. Some plots seem to indicate that volumes still have the highest correlation with QMotor and the highest effect size in group comparisons. It would be helpful for the community to know where the new SANDI measures stand compared to the most typically used volumes in terms of effect size.

      (3) The diffusion measures are inevitably correlated to some degree. Please provide a correlation matrix in the supplementary material, including all DWI measures, to enable readers to better understand how similar SANDI measures are to each other or vs. other DTI measures. Perhaps adding volumes to this correlation matrix may also be a good future reference.

      (4) ISS stages:

      a) The online ISS calculator requires cut-offs derived from the longitudinal Freesurfer pipeline, while the authors do not have longitudinal data. Thus, the ISS classification might be inaccurate to some degree if the authors used the FS cross-sectional pipeline. Please review this issue and see if updated cut-offs should be used to classify participants.

      b) Were there really no participants with ISS 0 among the 56 HD individuals? Please clarify in the manuscript.

      (5) A note on terminology that might be confusing to some readers. According to the creators of ISS, the ISS stages are created for research only; they are not used or applied in the clinic. On the other hand, the terms "premanifest" and "manifest" have a clinical meaning, typically based on the diagnostic confidence level. The assignment of ISS0-1 to premanifest and ISS2-3 to manifest may create some non-trivial confusion, if not opposition, in some segments of the HD community. The authors can keep their current terminology, but will need to at least clarify to the reader that this assignment is speculative, does not fully match the clinically-based categories, and should not be confused with similarly named groups in the previous literature.

    5. Author response:

      Response to Reviewer 1:

      Ad (2) Clinical applications of SANDI have primarily focused on Multiple Sclerosis. However, since the preparation of the manuscript, one study has been published reporting reductions in apparent soma density and white and grey matter differences in apparent soma size in amyotrophic lateral sclerosis (ALS) (https://doi.org/10.1016/j.ejrad.2025.111981). We will include this paper in our revised manuscript.

      Responses to Reviewer 2:

      Strength:

      Ad (3) SANDI cannot directly differentiate between neural and glia cells but the pattern of differences in the SANDI parameters we observed in Huntington’s disease (HD) are consistent with the known pathology in HD.

      Weaknesses:

      Ad (1) With regards to the question about scanner and acquisition consistency, we can confirm that all diffusion data of individuals with HD and healthy controls from the WAND study were acquired with the same multi-shell High Angular Resolution Diffusion Imaging (HARDI) protocol on the 3T Connectom scanner at CUBRIC. Thus, all diffusion data analysed and reported in this manuscript were acquired with the same protocol on the same strong gradient MRI system for harmonization and consistency purposes.

      We agree that for clinical adoption it is important to demonstrate that HD-related SANDI differences do not require ultra-strong gradient imaging and can be detected on standard clinical MRI systems. While we have not collected such data in people with HD, we and others have demonstrated the feasibility of modelling SANDI metrics from multi-shell diffusion-weighted imaging data acquired with maximum b-value 3,000 s/mm2 on clinical 3T MRI system in typical adults and people with MS or ALS (https://doi.org/10.1002/hbm.26416, https://doi.org/10.1038/s41598-024-60497-6, https://doi.org/10.1016/j.ejrad.2025.111981). These studies have demonstrated that it is feasible to characterise brain microstructural differences with SANDI on clinical scanners and that comparable patterns of results can be observed across different MRI systems. It should also be noted that there is presently a move towards stronger gradient implementation in clinical systems as demonstrated by the release of the Siemens Cima.X system which will allow higher b-value diffusion scanning on clinical systems. 

      ad (2) We agree that due to the small number of HD participants with HD-ISS staging the exploratory comparisons between ISS stages need to be interpreted with caution. We hope to gain access to some of the missing ISS information and plan to include these in the revised paper.

      Ad (3) With regards to the queries about the regression modelling choices:

      (1) As SANDI metrics differed between HC and HD groups, and hence may not be directly comparable, separate regression models for HC and HD data were conducted without formal comparisons between slopes. Only descriptive exploratory comparisons of the observed pattern were included.

      (2) We will provide cross-correlational analyses between all SANDI parameters in the supplements of the revised version of the paper to check for multicollinearity.

      (3)All model-based approaches, including SANDI, may be prone to model instability or parameter degeneracy and we will acknowledge and discuss this in the revised version.

      Responses to Reviewer 3:

      Weaknesses: 

      Ad (1) and (2) The effect sizes (ES) of group differences in SANDI, DTI, and volume measures in the caudate and putamen (Tables 3 and 4) were broadly comparable: apparent soma radius rs (rrb = 0.45 -0.53), apparent soma size fis (rrb = 0.32 -0.45), FA (rrb = 0.38 -0.55), MD (rrb = 0.51 -0.61) and volumes (rrb = 0.49 -0.55 ). Similar ES were observed between fis and FA, and between rs and volumes. MD showed the largest ES, likely due to striatal atrophy-related CSF partial volume contamination.Cost-benefit analyses of imaging marker choices in clinical trials depend on the aim of the study. DTI provides sensitive but unspecific indices that are influenced by biological and geometrical tissue properties and capture a multitude of microstructural properties. Similarly, volumetric measurements do not inform about the underpinning neurodegenerative processes.

      With the advancement of disease-modifying therapies for HD it has become important to identify non-invasive imaging markers that can inform about the mechanistic effects of novel therapies. While DTI and volume metrics are sensitive to detect brain changes, they do not provide specific information about the underpinning tissue properties. Such information, however, may turn out to be important for the evaluation of mechanistic effects of novel therapeutics in clinical trials. Advanced microstructural models such as SANDI may help provide such information. We found that SANDI indices had statistically similar power to the gold standard measures of volumes, but with the added value of information underpinning microstructure. We and others have also shown that SANDI can be applied to multi-shell diffusion data acquired in a clinically feasible time (~10 min) on standard 3T MRI systems (please refer to our response above).

      To summarise, DTI and volumes are sensitive to brain changes but will need to be complemented by more advanced microstructural measurements such as SANDI to gain a better understanding of the underlying tissue changes and effects of disease-modifying therapies.

      Ad (3) We will provide a correlation matrix of all DWI measures in supplementary material to allow a better understanding how similar SANDI measures are to each other and compared to DTI measures. 

      Ad (4) Most of the people with HD who have taken part in our study were participants in the Enroll-HD study. We will use HD-ISS information from ENROLL as much as possible. As we do not have longitudinal imaging data for all individuals classified as ISS <2, we will compare our cross-sectional striatal volumes with those from age and sex matched individuals from WAND to determine whether people fall into ISS 0 or 1 category. This approach will hopefully allow us to increase the total HD-ISS sample size and to determine whether there were participants with ISS 0 in our sample.

      Ad (5) We will explain in the revised manuscript that ISS stages are created for research only purposes and are not used or applied in clinic, while “premanifest” and “manifest” are helpful concepts in the clinical context. We will clarify that we refer to individuals without motor symptoms as assessed with Total Motor Score (TMS) as premanifest and to those with motor symptoms as manifest. This roughly corresponds to individuals at ISS 0/1 without signs of motor symptoms compared to individuals at ISS 2-3 with signs of motor symptoms.

    1. eLife Assessment

      In this important manuscript, Cassell and colleagues set out on a mechanistic and pharmacological exploration of an engineered chimeric small conductance calcium-activated potassium channel 2 (SK2). They show compelling evidence that the SK2 channel possesses a unique extracellular structure that modulates the conductivity of the selectivity filter, and that this structure is the target for the SK2 inhibitor apamin. The interpretations are sound and the writing is clear, and the manuscript was strengthened during review by providing more detailed information for the electrophysiological experiments and the structural analyses attempted, in addition to relating dilation of the filter to mechanisms of inactivation in other potassium channels. This high-quality study will be of interest to membrane protein structural biologists, ion channel biophysicists, and chemical biologists, and will help to inform future drug development targeting SK channels.

    2. Reviewer #3 (Public review):

      This is a fundamentally important study presenting cryo-EM structures of a human small conductance calcium-activated potassium (SK2) channel in the absence and presence of calcium, or with interesting pharmacological probes bound, including the bee toxin apamin, a small molecule inhibitor, and a small molecule activator. As efforts to solve structures of the wild-type hSK2 channel were unsuccessful, the authors engineered a chimera containing the intracellular domain of the SK4 channel, the subtype of SK channel that was successfully solved in a previous study (reference 13). The authors present many new and exciting findings, including opening of an internal gate (similar to SK4), for the first time resolving the S3-S4 linker sitting atop the outer vestibule of the pore and unanticipated plasticity of the ion selectivity filter, and the binding sites for apamin, one new small molecule inhibitor and another small molecule activator. Appropriate functional data are provided to frame interpretations arising from the structures of the chimeric protein; the data are compelling, the interpretations are sound, and the writing is clear. This high-quality study will be of interest to membrane protein structural biologists, ion channel biophysicists, and chemical biologists, and will be valuable for future drug development targeting SK channels.

      Comments on revisions:

      The authors have done a nice job of revising the manuscript to address the issues raised in the first round of review and I have no further suggestions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The small conductance calcium-activated potassium channel 2 (SK2) is an important drug target for treating neurological and cardiovascular diseases. However, structural information on this subtype of SK channels has been lacking, and it has been diOicult to draw conclusions about activator and inhibitor binding and action in the absence of structural information.

      Here the authors set out to (1) determine the structure of the transmembrane regions of a mammalian SK2 channel, (2) determine the binding site of apamin, a historically important SK2 inhibitor whose mode of action is unclear, and (3) use the structural information to generate a novel set of activators/inhibitors that selectively target SK2.

      The authors largely achieved all the proposed goals, and they present their data clearly.

      Unable to solve the structure of the human SK2 due to excessive heterogeneity in its cytoplasmic regions, the authors create a chimeric construct using SK4, whose structure was previously solved, and use it for structural studies. The data reveal a unique extracellular structure formed by the S2-S3 loop, which appears to directly interact with the selectivity filter and modulate its conductivity. Structures of SK2 in the absence and presence of the activating Ca2+ ions both possess non-K+-selective/conductive selectivity filters, where only sites 3 and 4 are preserved. The S6 gates are captured in closed and open states, respectively. Apamine binds to the S2-S3 loop, and unexpectedly, induces a K+ selective/conductive conformation of the selectivity filter while closing the S6 gate.

      Through high-throughput screening of small compound libraries and compound optimization, the group identified a reasonably selective inhibitor and a related compound that acts as an activator. The characterization shows that these compounds bind in a novel binding site. Interestingly, the inhibitor, despite binding in a site diOerent from that of apamine, also induces a K+ selective/conductive conformation of the selectivity filter while the activator induces a non-K+ selective/conductive conformation and an open S6 gate.

      The data suggest that the selectivity filter and the S6 gate are rarely open at the same time, and the authors hypothesize that this might be the underlying reason for the small conductance of SK2. The data will be valuable for understanding the mechanism of SK2 channel (and other SK subtypes).

      Overall, the data is of good quality and supports the claims made by the authors. However, a deeper analysis of the cryo-EM data sets might yield some important insights, i.e., about the relationship between the conformation of the selectivity filter and the opening of the S6 gate.

      We attempted focused 3D classification to identify subsets of particles with the S6 open and the SF in a conductive state but were not able to isolate such a particle class. This indicates that either none or a very small percentage of particles exists in a fully conductive state. This sentence was included in the results section: 

      “Focused 3D classification of the S3-S4 linker was unsuccessful in identifying particles subsets with a dilated extracellular constriction suggesting that either none or a very small percentage of Ca<sup>2+</sup>-bound SK2-4 is in a conductive state”

      Some insight and discussion about the allosteric networks between the SF and the S6 gate would also be a valuable addition.

      The extracellular constriction is in the same non-conductive conformation in the Ca<sup>2+</sup> bound and Ca<sup>2+</sup> -free SK2-4 structures suggesting that the conformation of S3-S4 linker/SF and the S6 are not allosterically coupled. We predict that Ca<sup>2+</sup> opens the intracellular gate and another physiological factor (not yet identified) promotes extracellular gate opening. These sentences were added to the results and discussion: “This along with the similar conformation of the S3-S4 linker in the Ca<sup>2+</sup> -bound and Ca<sup>2+</sup> -free states of SK2-4 suggest that Ca<sup>2+</sup> -dependent intracellular gate dynamics are not coupled to the conformation of the S3-S4 linker. Other yet to be identified physiological factors may be required to dilate the extracellular constriction.”

      “Alternatively, other physiological factors, such as PIP2[46,47] or protein-protein interactions[48-50], may exist in live cells that modulate the interaction between S3-S4 linker and the selectivity filter.”

      Reviewer #2 (Public review):

      Summary:

      The authors have used single-particle cryoEM imaging to determine how small-molecule regulators of the SK channel interact with it and modulate their function.

      Strengths:

      The reconstructions are of high quality, and the structural details are well described.

      Weaknesses:

      The electrophysiological data are poorly described. Several details of the structural observations require a mechanistic context, perhaps better relating them to what is known about SK channels or other K channel gating dynamics.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification.  

      The most pressing point I have to make, which could help improve the manuscript, relates to the selectivity filter (SF) conformation. Whether the two ion-bound state of SK2-4 (Figure 4A) represents a non-selective, conductive SF occluded by F243 or represents a C-type inactivated SF, further occluded by F243, is unclear. It would be important to discuss this. Reconstructions of Kv1.3 channels also feature a similar configuration, which has been correlated to its accelerated C-type inactivation.

      Structural overlays of Ca<sup>2+</sup> bound SK2-4, HCN, and C-type inactivated Kv1.3 selectivity filters demonstrate that each have conformational diVerences and it is diVicult to definitively determine if the SK2-4 selectivity filter is in a non-selective conformation like HCN or a C-type inactivated conformation like Kv1.3. Based on the number of ions observed in the filter and the position of Tyr361 we believe the selectivity filter most closely resembles that of HCN. Importantly, the selectivity filter conformation observed in the SK2-4 Ca<sup>2+</sup> -bound and Ca<sup>2+</sup> -free structures is ultimately nonconductive due to the Phe243 extracellular constriction blocking K<sup>+</sup> eVlux. 

      A comparison of the SK2-4 selectivity filter to HCN and C-type inactivated Kv1.3 was included in Figure 4 and this sentence was included in the results section:

      “The selectivity filter of SK2-4 resembles that of to HCN in both the position of Tyr361 and the number of K<sup>+</sup> coordination sites (Fig 4E,F,G,H)”

      Furthermore, binding of a toxin derivative to Kv1.3 restores the SF into a conductive form, though occluded by the toxin. It appears that apamin binding to SK2-4 might be doing something similar. Although I am not sure whether SK channels undergo C-type inactivation like gating, classical MTS accessibility studies have suggested that dynamics of the SF might play a role in the gating of SK channels. It would be really useful (if not essential) to discuss the SF dynamics observed in the study and relate them better to aspects of gating reported in the literature.

      Extracellular toxin binding to SK2-4 and K<sub>v</sub>1.3 induce a conformational change in the selectivity filter to produce a canonical K<sup>+</sup> selective structure with four coordination sites. However, the mechanism by which the toxins produce the conformational change is diVerent. For SK2-4, apamin interacts primarily with S3-S4 linker residues and induces a shift in the S3-S4 linker away from the pore axis. This in turn prevents the hydrogen bonds between Arg240 and Tyr245 of the S3-S4 linker and Asp363 at the C-terminus of the selectivity filter to produce a selectivity filter conformation with four K<sup>+</sup> coordination sites. For K<sub>v</sub>1.3, the sea anemone toxin ShK binds directly to the C-terminus of the selectivity filter disrupting interactions required for the C-type inactivated structure and thereby inducing the conformational change. These sentences were added to the results:

      “Toxin induced selectivity filter conformational change has also been reported for K<sub>v</sub 1.3 with the sea anemone toxin ShK. However, unlike apamin binding to SK2-4, ShK binds directly to the K<sub>v</sub> 1.3 selectivity filter to convert a C-type inactivated conformation to a canonical K<sup>+</sup> selective structure with four coordination sites [39,40]. The change in selectivity filter conformation in apamin-bound SK2-4 seems to be driven instead by the weakening of interactions between the selectivity filter and the S3-S4 linker.”

      The SF of K channels, in conductive states, are usually stabilized by an H-bond network involving water molecules bridged to residues behind the SF (D363 in the down-flipped conformation and Y361). Considering the high quality of the reconstructions, I would suspect that the authors might observe speckles of density (possibly in their sharpened map) at these sites, which overlap with water molecules identified in high-resolution X-ray structures of KcsA, MthK, NaK, NaK2K, etc. It could be useful to inspect this region of the density map.

      We did not observe strong density near Y361 or D363 that could be confidently model as water. However, in the structures of SK2-4 bound to apamin and compound 1 Tyr361 in the selectivity filter rotates 180° and forms a hydrogen bond with Thr355 in the pore helix. The homologous hydrogen bond is also observed in SK4 and the conductive/ K<sup>+</sup> selective selectivity filter conformation of Kv1.3.  The rotation of Tyr361 to form a hydrogen bond with Thr355, reorientation of Asp363 and Trp350 into hydrogen bonding position, and the presence of four K<sup>+</sup> coordination sites upon binding of apamin and compound 1 strongly suggest that the selectivity filter is in a K<sup>+</sup> selective/conductive conformation. The Tyr361/Thr355 hydrogen bond is now described in the paper and shown in Figures 4D, 5D, and S6F.

      Reviewer #3 (Public review):

      This is a fundamentally important study presenting cryo-EM structures of a human small conductance calcium-activated potassium (SK2) channel in the absence and presence of calcium, or with interesting pharmacological probes bound, including the bee toxin apamin, a small molecule inhibitor, and a small molecule activator. As eOorts to solve structures of the wild-type hSK2 channel were unsuccessful, the authors engineered a chimera containing the intracellular domain of the SK4 channel, the subtype of SK channel that was successfully solved in a previous study (reference 13). The authors present many new and exciting findings, including opening of an internal gate (similar to SK4), for the first time resolving the S3-S4 linker sitting atop the outer vestibule of the pore and unanticipated plasticity of the ion selectivity filter, and the binding sites for apamin, one new small molecule inhibitor and another small molecule activator. Appropriate functional data are provided to frame interpretations arising from the structures of the chimeric protein; the data are compelling, the interpretations are sound, and the writing is clear. This high-quality study will be of interest to membrane protein structural biologists, ion channel biophysicists, and chemical biologists, and will be valuable for future drug development targeting SK channels.

      The following are suggestions for strengthening an already very strong and solid manuscript:

      (1) It would be good to include some information in the text of the results section about the method and configuration used to obtain electrophysiological data and the limitations. It is not until later in the text that the Qube instrument is mentioned in the results section, and it is not until the methods section that the reader learns it was used to obtain all the electrophysiological data. Even there, it is not explicitly mentioned that a series of diOerent internal solutions were used in each cell where the free calcium concentration was varied to obtain the data in Figure1C. Also, please state the concentration of free calcium for the data in Figure 1B.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification.  

      (2) The authors do a nice job of discussing the conformations of the selectivity filter they observed here in SK as they relate to previous work on NaK and HCN, but from my perspective the authors are missing an opportunity to point out even more striking relationships with slow C-type inactivation of the selectivity filter in Shaker and Kv1 channels. C-type inactivation of the filter in Shaker was seen in 150 mM K using the W434F mutant (PMC8932672) or in 4 mM K for the WT channel (PMC8932672), and similar results have been reported for Kv1.2 (PMC9032944; PMC11825129) and for Kv1.3 (PMC9253088; PMC8812516) channels. For Kv1.3, C-type inactivation occurs even in 150 mM K (PMC9253088; PMC8812516). Not unlike what is seen here with apamin, binding of the sea anemone toxin (ShK) with a Fab attached (or the related dalazatide) inserts a Lys into the selectivity filter and stabilizes the conducting conformation of Kv1.3 even though the Lys depletes occupancy of S1 by potassium (PMC9253088; PMC8812516). Or might the conformation of the filter be controlled by regulatory processes in SK2 channels? I think connecting the dots here would enhance the impact of this study, even if it remains relatively speculative.

      Please see the response to reviewer 2’s comments for a comparison of the selectivity filter structure between SK2-4 and C-type inactivated K<sub>v</sub>1.3 and a discussion of toxin induced selectivity filter conformational change.

      What is known about how the functional properties of SK2 channels (where the filter changes conformation) diOer from SK4, where the filter remains conducting (reference 13)? Is there any evidence that SK2 channels inactivate?

      Compared with SK4, SK2 has some unique properties such as lower conductance and the ability to switch between low- and high-open probability states. Mutation of Phe243 suggests that the S3-S4 linker conformation contributes to the low conductance. This is included in the discussion.

      “Such a mechanism may explain some properties of SK2 that are not observed in SK4, which lacks an S3-S4 linker, such as its low conductance (~10 pS) and the ability to switch between low- and high-open probability states[3,4]. Indeed, mutation of Phe243 in rat SK2 produced a 2-fold increase in channel conductance[5].”

      Or might the conformation of the filter be controlled by regulatory processes in SK2 channels? I think connecting the dots here would enhance the impact of this study, even if it remains relatively speculative.

      Please see the response to reviewer 1’s comments for a discussion of the potential physiological role of the S3-S4 linker/extracellular constriction and its mechanism for opening.

      Reviewer #1 (Recommendations for the authors):

      I enjoyed reading your paper and am intrigued by your findings on the selectivity filter of SK2. I've got a few recommendations for data analysis and a couple of questions that might contribute to the discussion.

      In your Ca2+-bound dataset, have you tried to parse out any alternative conformations (e.g., by using 3D classification, or 3D variability)? Do you think there might be a small(er) population of particles that adopt a fully open conformation? If you haven't done this already, I would recommend doing so. You have a rather large number of particles in your final 3D reconstruction (~660k), so there might be some hidden conformations that could contribute to our understanding of the system.

      I would recommend doing the same for your compound 4-bound data set.

      Please see above for response to this recommendation.

      Do you think apamine works solely as a pore blocker, or does its binding perhaps also aOect the S6 gate via allosteric networks (perhaps the same ones that induce the formation of the K+ conductive SF through binding of compound 1 above the S6 gate?)?

      Apamin binding does not change the conformation of the pore helices (S5 or S6) and thus we believe it acts primarily as a pore blocker. The following was added to the results section:

      “Overall, the apamin-bound SK2-4/CaM structure resembles Ca<sup>2+</sup>-bound SK2-4. The Nterminal lobe of CaM engages with the S<sub>45</sub> A helix, the S5 and S6 helices adopt a similar conformation, and the intracellular gate Val390 is open with a radius of 3.5 Å (Fig 2D). The most significant conformational change is in the position of the S3-S4 linker, which shifts ~2 Å away from the pore axis to accommodate apamin binding.”

      Is there a mechanistic explanation for why it might be diOicult/energetically costly for the SF to be conductive and the S6 gate to be open at the same time?

      Not to our knowledge.

      I also have these minor recommendations:

      -In all figures showing density, include the threshold/sigma value at which density is shown.

      -For all ligands and ions, include half-map data.

      Sigma values were added for all figures legends displaying cryoEM density. The displayed maps are the sharpened full maps.

      Reviewer #2 (Recommendations for the authors):

      Is it possible to provide a structure-sequence guided explanation for the diOerent aOinity of compound 1 for SK2 vs SK4?

      Yes. The following is now included in the results section and a panel was added to Figure S6D.

      “However, for SK4 Thr212 replaces SK2 Ser318 and Trp216 (homologous to SK2 Trp322) is conserved but adopts a diVerent rotamer conformation (Fig S6D). Both changes occlude the compound 1 binding site in SK4 and would likely reduce compound 1 potency on SK4 as observed in the functional data.”

      Is it possible to propose a model of modulation by compound 1/4 where the authors can comment on the conformational dependence of compound binding? That is, do they bind exclusively to the identified conformational states of the channel, or are they able to bind to both closed and open channels, but bias one state over the other?

      The clash between compound 1 and Thr386 in the open conformation of the S6 helices suggests that compound 1 would preferentially bind to closed state of SK2. Similarly, the clash between compound 4 and Ile380 in the closed conformation of the S6 helices suggests that compound 4 would preferentially bind to the open state of SK2. This was included in the discussion:

      “This proposed mechanism of modulation suggests that compound 1 may bind preferentially to the closed conformation of the S6 helices and compound 4 may bind preferentially to the open conformation of the S6 helices.” 

      Please provide the calcium concentration used to generate the data in Figure 1B. The calcium concentration is now stated in the legend for Fig 1B:

      “Intracellular solution contains 2 µM Ca<sup>2+</sup> based on calculation using Maxchelator (see methods)”

      Essential and critically important descriptions of experiments in Figure 7A are lacking. It would be essential to describe properly, with care, what the currents and the conditions of measurements are. If these currents are obtained by subtracting leak currents by adding other drugs, it would be good to comment on whether the latter compete with compounds 1/4.

      As recommended, additional details for electrophysiological data were added to the results, methods, and figure legends for clarification. SK currents were obtained by subtracting leak currents by adding UCL1684 only at the end of experiments. UCL1684 is not expected to interfere with eVect of compound 1 or 4 given diVerent binding sites and mechanisms.  

      If Compound 1 changes the structure of the SF (Figure 6F), would it also promote apamin binding? Given that both these agents produce a similar change in the SF, could each favor the binding of the other?

      Since apamin binds to the S3-S4 linker it is unlikely that the selectivity filter conformational change observed in the compound 1 bound structure would aVect apamin binding.

    1. eLife Assessment

      This manuscript presents useful insights into the molecular basis underlying the positive cooperativity between the co-transported substrates (galactoside sugar and sodium ion) in the melibiose transporter MelB. Building on years of previous studies, this work improves on the resolution of previously published structures and reports the presence of a water molecule in the sugar binding site that would appear to be key for its recognition, introduces further structures bound to different substrates, and utilizes HDX-MS to further understand the positive cooperativity between sugar and the co-transported sodium cation. Although the experimental work is solid, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion, as well as a clearer description of the new insight that is obtained in relation to previous studies. The work will be of interest to biologists and biochemists working on cation-coupled symporters, which mediate the transport of a wide range of solutes across cell membranes.

    2. Reviewer #1 (Public review):

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remain unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen-deuterium exchange (HDX) mass spectrometry. They first report 4 different crystal structures of galactose derivatives to explore molecular recognition, showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      The results from the crystallography appear sensible, though the resolution of the data is low, with only the structure with NPG better than 3Å. However, it is a bit difficult to understand what novel information is being brought out here and what is known about the ligands. For instance, are these molecules transported by the protein or do they just bind? They measure the affinity by ITC, but draw very few conclusions about how the affinity correlates with the binding modes. Can the protein transport the trisaccharide raffinose?

      The HDX also appears to be well done; however, in the manuscript as written, it is difficult to understand how this relates to the overall mechanism of the protein and the conformational changes that the protein undergoes.

    3. Reviewer #2 (Public review):

      This manuscript from Hariharan, Shi, Viner, and Guan presents x-ray crystallographic structures of membrane protein MelB and HDX-MS analysis of ligand-induced dynamics. This work improves on the resolution of previously published structures, introduces further sugar-bound structures, and utilises HDX to explore in further depth the previously observed positive cooperatively to cotransported cation Na+. The work presented here builds on years of previous study and adds substantial new details into how Na+ binding facilitates melibiose binding and deepens the fundamental understanding of the molecular basis underlying the symport mechanism of cation-coupled transporters. However, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion.

      Comments on Crystallography and biochemical work:

      (1) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      (2) It is slightly unclear what the ITC measurements add to this current manuscript. The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but it is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend. Additionally, the D59C mutant utilised here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na+. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium.

      Comments on HDX-MS work:

      While the use of HDX-MS to deepen the understanding of ligand allostery is an elegant use of the technique, this reviewer advises the authors to refer to the Masson et al. (2019) recommendations for the HDX-MS article (https://doi.org/10.1038/s41592-019-0459-y) on how to best present this data. For example:

      (1) The Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilised protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      (2) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      (3) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      (4) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, this reviewer understands that working with dynamic transporters can lead to increased data variation; a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      (5) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity? The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there is no difference between the dynamics of each site.

      (6) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      (7) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      (8) Have the authors considered utilising Li+ to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      (9) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences instead between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      (10) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na+-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate, this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

    4. Reviewer #3 (Public review):

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelBSt) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na⁺, H⁺, or Li⁺, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na+ by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na+ and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26 forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na+, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      Weaknesses:

      (1) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373.

      (2) Site-directed mutagenesis could help strengthen the conclusions of the authors. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr 121, Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the claims of the authors regarding the allosteric communication between the two substrate-binding sites.

      (3) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. It would make the effect of Na+ on the structural dynamics of the melibiose-bound transporter more visible. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na+-bound transporter.

      (4) For non-specialists, it would be beneficial to better introduce and explain the choice of using D59C for the structural analyses.

      (5) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226), I would recommend indicating more of them in areas where deuterium changes are substantial.

      (6) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

    5. Author response:

      Reviewer #1:

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remain unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen-deuterium exchange (HDX) mass spectrometry.

      They first report 4 different crystal structures of galactose derivatives to explore molecular recognition, showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      We appreciate the understanding of our work presented in this manuscript by this reviewer.

      The results from the crystallography appear sensible, though the resolution of the data is low, with only the structure with NPG better than 3Å. However, it is a bit difficult to understand what novel information is being brought out here and what is known about the ligands. For instance, are these molecules transported by the protein or do they just bind? They measure the affinity by ITC, but draw very few conclusions about how the affinity correlates with the binding modes. Can the protein transport the trisaccharide raffinose?

      The four structures with a bound sugar of different sizes aimed to identify the binding motif on both the primary substrate (sugar) and the transporter (MelB<sub>St</sub>). Although the resolutions of the structures complexed with melibiose, raffinose, or a-MG are relatively low, the size and shape of the densities at each structure are consistent with the corresponding sugar molecules, which provide valuable data for determining the pose of the bound sugar. Additionally, there is another a-NPG-bound structure at a higher resolution of 2.7 Å. Therefore, our new data support the published binding site with the galactosyl moiety as the main interacting group. The identified water-1 in this study further confirms the orientation of C4-OH. Notably, this transporter does not recognize or transport glucosides where the orientation of C4-OH at the glucopyranosyl ring is opposite. We will provide stronger data to support the water-1.

      Regarding the raffinose question, we should have clearly introduced the historical background. Bacterial disaccharide transporters have broad specificity, allowing them to work on a group of sugars with shared structural elements; for example, one sugar molecule can be transported by several transporters. As reported in the literature, the galactosides melibiose, lactose, and raffinose can be transported by both LacY and MelB of E. coli. We did not test whether MelB<sub>St</sub> can transport the a-NPG and raffinose. To address this issue and strengthen our conclusions, we plan to conduct additional experiments to gather evidence of the translocation of these sugars by MelB<sub>St</sub>.

      The HDX also appears to be well done; however, in the manuscript as written, it is difficult to understand how this relates to the overall mechanism of the protein and the conformational changes that the protein undergoes.

      Previously, we used HDX-MS to examine the conformational transition between inward- and outward-facing conformations using a conformation-specific nanobody to trap MelB<sub>St</sub> in an inward-facing state, as structurally resolved by cryoEM single-particle analysis and published in eLife 2024. That study identified dynamic regions that may be involved in the conformational transitions; however, there was no sugar present. We also solved and published the crystal structure of the apo D59C MelB<sub>St</sub>. The sugar-bound and apo states are virtually identical. To address the positive cooperativity of binding between the sugar and co-transport cations observed in biophysical analysis, in this study, we utilize HDX-MS to analyze the structural dynamics induced by melibiose, Na<sup>+</sup>, or both, focusing on the binding residues at the sugar-binding and cation-binding pockets. The results suggest that the coupling cation stabilizes sugar-binding residues at helices I and V, contributing to affinity but not specificity.

      Since MelB<sub>St</sub> favors the outward-facing conformation, and simulations on the free-energy landscape suggest that the highest affinity of the sugar-bound state is also at an outward-facing state, MelB<sub>St</sub> in both the apo and bound states tend to remain in the outward-facing conformation. We will include a section comparing these differences. Thank you to this reviewer for the critical insight.

      Reviewer #2:

      This manuscript from Hariharan, Shi, Viner, and Guan present x-ray crystallographic structures of membrane protein MelB and HDX-MS analysis of ligand-induced dynamics. This work improves on the resolution of previously published structures, introduces further sugar-bound structures, and utilises HDX to explore in further depth the previously observed positive cooperatively to cotransported cation Na<sup>+</sup>. The work presented here builds on years of previous study and adds substantial new details into how Na<sup>+</sup> binding facilitates melibiose binding and deepens the fundamental understanding of the molecular basis underlying the symport mechanism of cation-coupled transporters. However, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion.

      We thank this reviewer for taking the time to read our previous articles related to this manuscript.

      Comments on Crystallography and biochemical work:

      (1) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      This figure shows a stereo view of a density map created in cross-eye style to demonstrate its quality. We will update this figure with a higher-resolution map, and the density for Wat-1 is clearly visible. This also addresses Reviewer-3’s comment regarding the map resolution.

      (2) It is slightly unclear what the ITC measurements add to this current manuscript. The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but it is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend. Additionally, the D59C mutant utilised here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium.

      While raffinose and a-MG have been reported as substrates of MelB in E. coli, binding data are unavailable; additionally, for MelB<sub>St</sub>, we lack data on the binding of two of the four sugars or sugar analogs. We performed a label-free binding assay using ITC to address this concern with the WT MelB<sub>St</sub>. We will also perform the binding assay with the D59C MelB<sub>St</sub>, since sugar binding has been structurally analyzed with this mutant, as pointed out by this reviewer. Along with other new functional results, we will prepare a new Figure 1 on functional analysis, which will also address the comment regarding extra bulk at the non-galactosyl moiety with poor affinity.

      This D59C uniport mutant exhibits increased thermostability, making it a valuable tool for crystal structure determination, especially since the wild type (WT) is difficult to crystallize at high quality. Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this mutant selectively abolishes cation binding and cotransport. However, it still maintains intact sugar binding with slightly higher affinity and preserves the conformational transition, as demonstrated by an electroneutral transport reaction, the melibiose exchange, and fermentation assays with intact cells. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport. We will provide additional details during the revision.

      Comments on HDX-MS work:

      While the use of HDX-MS to deepen the understanding of ligand allostery is an elegant use of the technique, this reviewer advises the authors to refer to the Masson et al. (2019) recommendations for the HDX-MS article (https://doi.org/10.1038/s41592-019-0459-y) on how to best present this data. For example:

      All authors appreciate this reviewer’s comments and suggestions, which will be incorporated into the revision.

      (1) The Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilised protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      Yes, a lipid/detergent removal step was applied in this study and in previous studies and this information was clearly described in Methods.

      (2) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We will update the Table S2. Thank you.

      (3) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We will prepare the plots in supplementary information.

      (4) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, this reviewer understands that working with dynamic transporters can lead to increased data variation; a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We appreciate this comment and will cite this article on the hybrid significance method. We will include volcano plots for each dataset. We fully acknowledge that using a cutoff of P < 0.05 can increase the likelihood of false-positive identifications. However, given the complexity of the samples analyzed in this study, we believe that some important changes may have been excluded due to higher variability within the dataset. By applying multiple levels of statistical testing, we determined that P < 0.05 represents a suitable threshold for this study. The threshold values were marked in the residual plots and explained in the text. For Figure 6, we have revised it by showing the P value directly.

      (5) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity? The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there is no difference between the dynamics of each site.

      Table S4 was created to provide an overall view of the dynamic regions. If we understand correctly, this reviewer asked us to comment on the effect of solvent accessibility or hydrophobic regions on the overall dynamics outside the binding residues of the peptides that carry binding residues. Since the HDX rate is influenced by two linked factors: solvent accessibility and hydrogen-bonding interactions that reflect structural dynamics, poor solvent accessibility in buried regions results in low deuterium uptakes. The peptides in our dataset that include the Na<sup>+</sup>-binding site showed low HDX, likely due to poor solvent accessibility and structural stability. It is unclear what this reviewer meant by "increased dynamics at peptides covering the Na binding site on overall more dynamic helices." We do not observe increased dynamics in peptides covering Na<sup>+</sup>-binding sites.

      (6) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Thanks for this suggestion. The previous datasets were collected in the presence of Na<sup>+</sup>. In the current study, we also have a Na-containing dataset. Both showed similar results: the multiple overlapping peptides covering the sugar-binding residues on helices I and V have higher HDX rates than those covering the Na<sup>+</sup>-binding residues, even when Na<sup>+</sup> is present in both datasets.

      (7) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Thanks for this suggestion. Conducting the HDX-MS comparison between the WT and the D59C mutant is certainly interesting, especially given the growing amount of structural and biochemical/biophysical data available for this mutant. However, due to limited resources, we might consider doing it later.

      (8) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      Thanks for this suggestion. We have demonstrated that Li<sup>+</sup> also shows positive cooperativity with melibiose through ITC binding measurements. Li<sup>+</sup> binds to MelB<sub>St</sub> with higher affinity than Na<sup>+</sup> but causes many different effects on MelB. It is worth investigating this thoroughly and individually. To address the second question, H<sup>+</sup> is a poor coupling cation with minimal impact on melibiose binding. Since its pKa is around 6.5, only a small subpopulation of MelB<sub>St</sub> is protonated at pH 7.5. The order of sugar-binding cooperativity is the highest with Na<sup>+</sup>, followed by Li<sup>+</sup> and H<sup>+</sup>.

      (9) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences instead between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      The sugar translocation free-energy landscape simulations showed that both helix bundles move relative to the membrane plane. That analysis aimed to clarify a hypothesis in the field—that the MFS transporter can use an asymmetric mode to transition between inward- and outward-facing states. In the case of MelB, we clearly demonstrated that both domains move and each helix bundle moves as a unit, so the labeling changes were identified only in some extramembrane loops and a few highly flexible helices. Thanks for the suggestion about comparing with XylE. We will include a discussion on it.

      (10) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate, this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Thank this reviewer for reading the single-molecule force spectroscopy (SMFS) study on MelB<sub>St</sub>. The C3 loop mentioned in this SMFS article is partially covered in the dataset Mel or Mel plus Na<sup>+</sup> vs. Apo, and more coverage is in the Na<sup>+</sup> vs. Apo. In either condition, no deprotection was detected. Two possible reasons the HDX data did not reflect the deprotection are: 1) The changes were too subtle and did not pass the statistical tests and 2) the longest labeling time point was still insufficient to detect the changes; much longer labeling times should be considered in future studies.

      Reviewer #3:

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelB<sub>St</sub>) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na⁺, H⁺, or Li⁺, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na<sup>+</sup> by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na<sup>+</sup> and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26 forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na<sup>+</sup>, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      Thank this reviewer for your positive comments.

      Weaknesses:

      (1) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373.

      Thanks for your comments on the resolution. We will improve the density for the Water 1.

      (2) Site-directed mutagenesis could help strengthen the conclusions of the authors. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr 121, Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the claims of the authors regarding the allosteric communication between the two substrate-binding sites.

      The authors thank this reviewer for the thoughtful suggestions. MelB<sub>St</sub> has been subjected to Cys-scanning mutagenesis (https://doi.org/10.1016/j.jbc.2021.101090). Placing a Cys residue on the hydrogen bond-donor Q372 significantly decreased the transport initial rate, accumulation, and melibiose fermentation, with little effect on protein expression, as shown in Figure 2 of this JBC paper. Although no binding data are available, the poor initial rate of transport with a similar amount of protein expressed suggested that the binding affinity is apparently decreased, supporting the role of water-1 in the binding pocket for better binding. The T373C mutant retained most activities of the WT. We will discuss the functional characterizations of these two mutants. Thanks.

      (3) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations?

      Thanks for this important question. We will discuss the deprotected data in the conformational transition between inward-facing and outward-facing states. The two regions, loop8-9 and loop1-2, are located in the gate area on both sides of the membrane and showed increased deuterium uptakes upon binding of melibiose plus Na<sup>+</sup>. They are likely involved in this process.

      The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. It would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter more visible. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      We will analyze the data as suggested by this reviewer.

      (4) For non-specialists, it would be beneficial to better introduce and explain the choice of using D59C for the structural analyses.

      As response to the reviewer #1 at page 3, “Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this mutant selectively abolishes cation binding and cotransport. However, it still maintains intact sugar binding with slightly higher affinity and preserves the conformational transition, as demonstrated by an electroneutral transport reaction, the melibiose exchange, and fermentation assays with intact cells. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport. We will provide additional details during the revision.”.

      (5) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226), I would recommend indicating more of them in areas where deuterium changes are substantial.

      We appreciate this comment, which will make the plots more meaningful. In the previous article published in eLife (2024), we drew boxed to mark the transmembrane regions; however, it generated much confusion, such as why some helices are very short. The revised figure will label the full length of covered positions.

      (6) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      This is an intriguing mechanistic question. Based on current data, we believe that the bound melibiose physically prevents the release of Na<sup>+</sup> or Li<sup>+</sup> from the cation-binding pocket. The cation-binding pocket and surrounding regions, including the sugar-binding residue Asp124, show low HDX, supporting this idea. Since we lack a structure with both substrates bound, figuring out the details structurally is challenging. However, we have a hypothesis about the intracellular Na<sup>+</sup> release as proposed in the 2024 JBC paper (https://doi.org/10.1016/j.jbc.2024.107427). After sugar release, the rotamer change of Asp55 will help Na<sup>+</sup> exit the cation pocket to the sugar pocket, and the negative membrane potential will facilitate the further movement from MelB to the cytosol. We will discuss this during the revision.

    1. eLife Assessment

      This important study significantly advances our understanding of the skeleton of cartilaginous fishes by using a range of state of the art and complementary approaches to compare the skeleton amongst three cartilagenous fishes (catshark, little skate and ratfish). The evidence presented is compelling and likely to impact several fields of study.

    2. Reviewer #2 (Public review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Comments on previous revisions:

      The manuscript has been revised and improved and can be published. A very nice manuscript, indeed. My only recommendation (point of discussion, not a requirement) would still be to think about the claim of paedomorphosis in a holocephalan.

      Within the chondrichthyes, how distant holocephali are in relation to elasmobranchii remains uncertain, holocephali are quite a specialised group. Holocephali are also older than Batoidea and Selachii. As paedomorphosis is a derived character, I imagine it is difficult to establish that development in an extant holocephalan is derived compared to development in elasmobranchii. If this type of development would have been typical for the "older" holocephali it would not be paedomorphic. Also, the uncertainty how distant holocephali are from elasmobranchii makes it difficult to identify paedomorphosis with reference to chondrichthyes.

      [Editors note: the authors have made further revisions in response to the previous reviews.]

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1).  Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra.  While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward.  Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session.  Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate.  Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In  , however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades.  In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision.  We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled.  However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding.  We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group.  In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis.  In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally.  The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)!  In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439).  We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications.  No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above.  Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question.  But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for than you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly

      Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press,Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7.  Thanks for the de Beer REFs.  While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections with in the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness.  For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results.  We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place!  We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species.  For the Fig7 branching and catshark inclusion, please see above. 

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends.  We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends.  That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively.  For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”.  We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”. 

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue.  We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603). 

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      no change; see above

      L53: down tune languish, remove "severely" and "major"

      done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      no change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      no change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      all regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      references to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred.  Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      no change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      no action; see above

      L436: remove paragraph

      no action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study.  (lines 440-453)

    1. eLife Assessment

      This study presents valuable findings on the ability of a state-of-the-art method temporally delayed linear modelling (TDLM) to detect the replay of sequences in human memory. The investigation provides convincing evidence that TDLM has limitations in its sensitivity to detect replay when being applied to extended (minutes-long) rest periods, though a more thorough treatment of the relationship to prior positive findings would make the demonstration even stronger. The work will be of particular interest to researchers investigating memory reactivation in humans, especially using iEEG, MEG, and EEG.

    2. Reviewer #1 (Public review):

      Summary:

      Participants learned a graph-based representation, but, contrary to the hypotheses, failed to show neural replay shortly after. This prompted a critical inquiry into temporally delayed linear modeling (TDLM)--the algorithm used to find replay. First, it was found that TDLM detects replay only at implausible numbers of replay events per second. Second, it detects replay-to-cognition correlations only at implausible densities. Third, there are concerning baseline shifts in sequenceness across participants. Fourth, spurious sequences arise in control conditions without a ground truth signal. Fifth, when reframing simulations previously published, similar evidence is apparent.

      Strengths:

      (1) This work is meticulous and meets a high standard of transparency and open science, with preregistration, code and data sharing, external resources such as a GUI with the task and material for the public.

      (2) The writing is clear, balanced, and matter-of-fact.

      (3) By injecting visually evoked empirical data into the simulation, many surface-level problems are avoided, such as biological plausibility and questions of signal-to-noise ratio.

      (4) The investigation of sequenceness-to-cognition correlations is an especially useful add-on because much of the previous work uses this to make key claims about replay as a mechanism.

      Weaknesses:

      Many of the weaknesses are not so much flaws in the analyses, but shortcomings when it comes to interpretation and a lack of making these findings as useful as they could be.

      (1) I found the bigger picture analysis to be lacking. Let us take stock: in other work, during active cognition, including at least one study from the Authors, TDLM shows significance sequenceness. But the evidence provided here suggests that even very strong localizer patterns injected into the data cannot be detected as replay except at implausible speeds. How can both of these things be true? Assuming these analyses are cogent, do these findings not imply something more destructive about all studies that found positive results with TDLM?

      (2) All things considered, TDLM seems like a fairly 'vanilla' and low-assumption algorithm for finding event sequences. It is hard to see intuitively what the breaking factor might be; why do the authors think ground truth patterns cannot be detected by this GLM-based framework at reasonable densities?

      (3) Can the authors sketch any directions for alternative methods? It seems we need an algorithm that outperforms TDLM, but not many clues or speculations are given as to what that might look like. Relatedly, no technical or "internal" critique is provided. What is it about TDLM that causes it to be so weak?

      Addressing these points would make this manuscript more useful, workable, and constructive, even if they would not necessarily increase its scientific breadth or strength of evidence.

    3. Reviewer #2 (Public review):

      Summary:

      Kern et al. investigated whether temporally delayed linear modeling (TDLM) can uncover sequential memory replay from a graph-learning task in human MEG during an 8-minute post-learning rest period. After failing to detect replay events, they conduct a simulation study in which they insert synthetic replay events, derived from each participant's localizer data, into a control rest period prior to learning. The simulations suggest that TDLM only reveals sequences when replay occurs at very high densities (> 80 per minute) and that individual differences in baseline sequenceness may lead to spurious and/or lackluster correlations between replay strength and behavior.

      Strengths:

      The approach is extremely well documented and rigorous. The authors have done an excellent job re-creating the TDLM methodology that is most commonly used, reporting the different approaches and parameters that they used, and reporting their preregistrations. The hybrid simulation study is creative and provides a new way to assess the efficacy of replay decoding methods. The authors remain measured in the scope/applicability of their conclusions, constructive in their discussion, and end with a useful set of recommendations for how to best apply TDLM in future studies. I also want to commend this work for not only presenting a null result but thoroughly exploring the conditions under which such a null result is expected. I think this paper is interesting and will be generally quite useful for the field, but I believe it also has a number of weaknesses that, if addressed, could improve it further.

      Weaknesses:

      The sample size is small (n=21, after exclusions), even for TDLM studies (which typically have somewhere between 25-40 participants). The authors address this somewhat through a power analysis of the relationship between replay and behavioral performance in their simulations, but this is very dependent on the assumptions of the simulation. Further, according to their own power analysis, the replay-behavior correlations are seriously underpowered (~10% power according to Figure 7C), and so if this is to be taken at face value, their own null findings on this point (Figure 3C) could therefore just reflect undersampling as opposed to methodological failure. I think this point needs to be made more clearly earlier in the manuscript. Relatedly, it would be very useful if one of the recommendations that come out of the simulations in this paper was a power analysis for detecting sequenceness in general, as I suspect that the small sample size impacts this as well, given that sequenceness effects reported in other work are often small with larger sample sizes. Further, I believe that the authors' simulations of basic sequenceness effects would themselves still suffer from having a small number of subjects, thereby impacting statistical power. Perhaps the authors can perform a similar sort of bootstrapping analysis as they perform for the correlation between replay and performance, but over sequenceness itself?

      The task paradigm may introduce issues in detecting replay that are separate from TDLM. First, the localizer task involves a match/mismatch judgment and a button press during the stimulus presentation, which could add noise to classifier training separate from the semantic/visual processing of the stimulus. This localizer is similar to others that have been used in TDLM studies, but notably in other studies (e.g., Liu, Mattar et al., 2021), the stimulus is presented prior to the match/mismatch judgment. A discussion of variations in different localizers and what seems to work best for decoding would be useful to include in the recommendations section of the discussion. Second, and more seriously, I believe that the task design for training participants about the expected sequences may complicate sequence decoding. Specifically, this is because two images (a "tuple") are shown together and used for prediction, which may encourage participants to develop a single bound representation of the tuple that then predicts a third image (AB -> C rather than A -> B, B -> C). This would obviously make it difficult to i) use a classifier trained on individual images to detect sequences and ii) find evidence for the intended transition matrix using TDLM. Can the authors rule out this possibility?

      Participants only modestly improved (from 76-82% accuracy) following the rest period (which the authors refer to as a consolidation period). If the authors assume that replay leads to improved performance, then this suggests there is little reason to see much task-related replay during rest in the first place. This limitation is touched on (lines 228-229), but I think it makes the lack of replay finding here less surprising. However, note that in the supplement, it is shown that the amount of forward sequenceness is marginally related to the performance difference between the last block of training and retrieval, and this is the effect I would probably predict would be most likely to appear. Obviously, my sample size concerns still hold, and this is not a significant effect based on the null hypothesis testing framework the authors employ, but I think this set of results should at least be reported in the main text. I was also wondering whether the authors could clarify how the criterion over six blocks was 80% but then the performance baseline they use from the last block is 76%? Is it just that participants must reach 80% within the six blocks *at some point* during training, but that they could dip below that again later?

      Because most of the conclusions come from the simulation study, there are a few decisions about the simulations that I would like the authors to expand upon before I can fully support their interpretations. First, the authors use a state-to-state lag of 80ms and do not appear to vary this throughout the simulations - can the authors provide context for this choice? Does varying this lag matter at all for the results (i.e., does the noise structure of the data interact with this lag in any way?) Second, it seems that the approach to scaling simulated replays with performance is rather coarse. I think a more sensitive measure would be to scale sequence replays based on the participants' responses to *that* specific sequence rather than altering the frequency of all replays by overall memory performance. I think this would help to deliver on the authors' goal of simulating an "increase of replay for less stable memories" (line 246). On the other hand, I was also wondering whether it is actually necessary to use the real memory performance for each participant in these simulations - couldn't similar goals (with a better/more full sampling of the space of performance) be achieved with simulated memory performance as well, taking only the MEG data from the participant? Finally, Figure 7D shows that 70ms was used on the y-axis. Why was this the case, or is this a typo?

      Because this is a re-analysis of a previous dataset combined with a new simulation study on that data aimed at making recommendations about how to best employ TDLM, I think the usefulness of the paper to the field could be improved in a few places. Specifically, in the discussion/recommendation section, the authors state that "yet unknown confounders" (line 295) lead to non-random fluctuations in the simulated correlations between replay detection and performance at different time lags. Because it is a particularly strong claim that there is the potential to detect sequenceness in the baseline condition where there are no ground-truth sequences, the manuscript could benefit from a more thorough exploration of the cause(s) of this bias in addition to the speculation provided in the current version. In addition, to really provide that a realistic simulation is necessary (one of the primary conclusions of the paper), it would be useful to provide a comparison to a fully synthetic simulation performed on this exact task and transition structure (in addition to the recreation of the original simulation code from the TDLM methods paper). Finally, I think the authors could do further work to determine whether some of their recommendations for improving the sensitivity of TDLM pan out in the current data - for example, they could report focusing not just on the peak decoding timepoint but incorporating other moments into classifier training.

      Lastly, I would like the authors to address a point that was raised in a separate public forum by an author of the TDLM method, which is that when replays "happen during rest, they are not uniform or close". Because the simulations in this work assume regularly occurring replay events, I agree that this is an important limitation that should be incorporated into alternative simulations to ensure the lack of findings is not because of this assumption.

    4. Reviewer #3 (Public review):

      Summary:

      Kern et al. critically assess the sensitivity of temporally delayed linear modelling (TDLM), a relatively new method used to detect memory replay in humans via MEG. While TDLM has recently gained traction and been used to report many exciting links between replay and behavior in humans, Kern et al. were unable to detect replay during a post-learning rest period. To determine whether this null result reflected an actual absence of replay or sensitivity of the method, the authors ran a simulation: synthetic replay events were inserted into a control dataset, and TDLM was used to decode them, varying both replay density and its correlation with behavior. The results revealed that TDLM could only reliably detect replay at unrealistically (not-physiological) high replay densities, and the authors were unable to induce strong behavior correlations. These findings highlight important limitations of TDLM, particularly for detecting replay over extended, minutes-long time periods.

      Strengths:

      Overall, I think this is an extremely important paper, given the growing use of TDLM to report exciting relationships between replay and behavior in humans. I found the text clear, the results compelling, and the critique of TDLM quite fair: it is not that this method can never be applied, but just that it has limits in its sensitivity to detect replay during minutes-long periods. Further, I greatly appreciated the authors' efforts to describe ways to improve TDLM: developing better decoders and applying them to smaller time windows.

      The power of this paper comes from the simulation, whereby the authors inserted replay events and attempted to detect them using TDLM. Regarding their first study, there are many alternative explanations or possible analysis strategies that the authors do not discuss; however, none of these are relevant if, under conditions where it is synthetically inserted, replay cannot be detected.

      Additionally, the authors are relatively clear about which parameters they chose, why they chose them, and how well they match previous literature (they seem well matched).

      Finally, I found the application of TDLM to a baseline period particularly important, as it demonstrated that there are fluctuations in sequenceness in control conditions (where no replay would be expected); it is important to contrast/calculate the difference between control (pre-resting state) and target (post-resting state) sequenceness values.

      Weaknesses:

      While I found this paper compelling, I was left with a series of questions.

      (1) I am still left wondering why other studies were able to detect replay using this method. My takeaway from this paper is that large time windows lead to high significance thresholds/required replay density, making it extremely challenging to detect replay at physiological levels during resting periods. While it is true that some previous studies applying TDLM used smaller time windows (e.g., Kern's previous paper detected replay in 1500ms windows), others, including Liu et al. (2019), successfully detected replay during a 5-minute resting period. Why do the authors believe others have nevertheless been able to detect replay during multi-minute time windows?

      For example, some studies using TDLM report evidence of sequenceness as a contrast between evidence of forwards (f) versus backwards (b) sequenceness; sequenceness was defined as ZfΔt - ZbΔt (where Z refers to the sequence alignment coefficient for a transition matrix at a specific time lag). This use case is not discussed in the present paper, despite its prevalence in the literature. If the same logic were applied to the data in this study, would significant sequenceness have been uncovered? Whether it would or not, I believe this point is important for understanding methodological differences between this paper and others.

      (2) Relatedly, while the authors note that smaller time windows are necessary for TDLM to succeed, a more precise description of the appropriate window size would greatly improve the utility of this paper. As it stands, the discussion feels incomplete without this information, as providing explicit guidance on optimal window sizes would help future researchers apply TDLM effectively. Under what window size range can physiological levels of replay actually be detected using TDLM? Or, is there some scaling factor that should be considered, in terms of window size and significance threshold/replay density? If the authors are unable to provide a concrete recommendation, they could add information about time windows used in previous studies (perhaps, is 1500ms as used in their previous paper a good recommendation?).

      (3) In their simulation, the authors define a replay event as a single transition from one item to another (example: A to B). However, in rodents, replay often traverses more than a single transition (example: A to B to C, even to D and E). Observing multistep sequences increases confidence that true replay is present. How does sequence length impact the authors' conclusions? Similarly, can the authors comment on how the length of the inserted events impacts TDLM sensitivity, if at all?

      For example, regarding sequence length, is it possible that TDLM would detect multiple parts of a longer sequence independently, meaning that the high density needed to detect replay is actually not quite so dense? (example: if 20 four-step sequences (A to B to C to D to E) were sampled by TDLM such that it recorded each transition separately, that would lead to a density of 80 events/min).

    1. eLife Assessment

      This important study fills a gap in our knowledge of the evolution of GPCRs in holozoans, as well as the phylogeny of associated signaling pathway components such as G proteins, GRKs, and RIC8 proteins. The evidence supporting the conclusions is compelling, with the analysis of extensive new genomic data from choanoflagellates and other non-animal holozoans. Overall, the study is thorough and well-executed. It will be a resource for researchers interested in both the comparative genomics of multicellularity and GPCR biology more broadly, especially given the importance of GPCRs as highly druggable targets

    2. Reviewer #1 (Public review):

      Summary:

      The authors strived for an inventory of GPCRs and GPCR pathway component genes within the genomes of 23 choanoflagellates and other close relatives of metazoans.

      Strengths:

      The authors generated a solid phylogenetic overview of the GPCR superfamily in these species. Intriguingly, they discover novel GPCR families, novel assortments of domain combinations, novel insights into the evolution of those groups within the Opisthokonta clade. A particular focus is laid on adhesion GPCRs, for which the authors discover many hitherto unknown subfamilies based on Hidden Markov Models of the 7TM domain sequences, which were also reflected by combinations of extracellular domains of the homologs. In addition, the authors provide bioinformatic evidence that aGPCRs of choanoflagellates also contained a GAIN domain, which are self-cleavable thereby reflecting the most remarkable biochemical feat of aGPCRs.

      Weaknesses:

      The chosen classification scheme for aGPCRs may require reassessment and amendment by the authors in order to prevent confusion with previously issued classification attempts of this family.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to characterise the GPCR family in choanoflagellates (and other unicellular holozoans). GPCRs are the most abundant gene family in many animal genomes, playing crucial roles in a wide range of physiological processes. Although they are known to evolve rapidly, GPCRs are an ancient feature of eukaryotic biology. Identifying conserved elements across the animal-protist boundary is therefore a valuable goal, and the increasing availability of genomes from non-animal holozoans provides new opportunities to explore evolutionary patterns that were previously obscured by limited taxon sampling. This study presents a comprehensive re-examination of GPCRs in choanoflagellates, uncovering examples of differential gene retention and revealing the dynamic nature of the GPCR repertoire in this group. As GPCRs are typically involved in environmental sensing, understanding how these systems evolved may shed light on how our unicellular ancestors adapted their signalling networks in the transition to complex multicellularity.

      Strengths:

      The paper combines a broad taxonomic scope with the use of both established and recently developed tools (e.g. Foldseek, AlphaFold), enabling a deep and systematic exploration of GPCR diversity. Each family is carefully described, and the manuscript also functions as an up-to-date review of GPCR classification and evolution. Although similar attempts of understanding GPCR evolution were done over the last decade, the authors build on this foundation by identifying new families and applying improved computational methods to better predict structure and function. Notably, the presence of Rhodopsin-like GPCRs in some choanoflagellates and ichthyosporeans is intriguing, even though they do not fall within known animal subfamilies. The computational framework presented here is broadly applicable, offering a blueprint for surveying GPCR diversity in other non-model eukaryotes (and even in animal lineages), potentially revealing novel families relevant to drug discovery or helping revise our understanding of GPCR evolution beyond model systems.

      Weaknesses:

      While the study contributes several interesting observations, it does not radically revise the evolutionary history of the GPCR family. However, in an era increasingly concerned with the reproducibility of scientific findings, this is arguably a strength rather than a weakness. It is encouraging to see that previously established patterns largely hold, and that with expanded sampling and improved methods, new insights can be gained-especially at the level of specific GPCR subfamilies. Then, no functional follow ups are provided in the model system Salpingoeca rosetta, but I am sure functional work on GPCRs in choanoflagellates is set to reveal very interesting molecular adaptations in the future.

      Comments on the latest version:

      The authors have done a good job answering my questions and suggestions.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      “I am sorry to dwell on the point of naming the newly identified families of adhesion GPCRs in choanoflagellates. I commented: "Can the authors suggest another scheme (mind to avoid the subfamily I-IX or the alternative ADGRA-G,L,V subfamily schemes of metazoan aGPCRs) and adapt their numbering throughout the text and all figures/supplementary figures/supplementary files." Now the authors have changed the Roman numeral numbering (previously used by the adhesion GPCR field to denominate metazoan receptor families) to the other option that I explicitly said should be obsolete, the numbering by capital letters (which is in use since its introduction in 2015 in Hamann et al., Pharmacol Rev, 2015). The authors write: "Phylogenetic analysis of the 7TM domains of choanoflagellates uncovered at least 19 subfamilies of aGPCRs (subfamilies A-S ...". I am thus afraid this has not addressed my point at all. For example, in the revised numbering scheme for Choanoflagellates aGPCR subfamilies of the authors the now used "A" descriptor, which are predicted to contain a HYR domain, can be mistaken for ADGRA homologs (abbreviated as "A" receptors, previously termed subfamily III aGPCRs) of metazoan aGPCRs, which contain HRM and LRR domains. Likewise, choanoflagellate "E" receptors are predicted to harbour LRR repeats, but metazoan ADGRE (abbreviated as "E" too) are characterised by their EGF domains. This clearly underlines the need to devise a numbering scheme for the newly described choanoflagellate aGPCR homologs so they cannot be confused with the receptors from other kingdoms, for which identical naming conventions exist. Please change this, e.g. by numbering/denominating the choanoflagellate subfamilies by greek letters (or your pick of any other ordering system that does not lend itself to be mistaken with the previous and existing aGPCR classifications) and change the manuscript and figures accordingly.”

      We have now re-labeled the choanoflagellate aGPCR subfamilies, previously numbered from A to S, using Greek alphabetical enumeration (from α to τ). Changes have been made throughout the main text, in Figure 5, and in Supplementary Figures  S6 and S7.

    1. eLife Assessment

      This important study convincingly shows that Vibrio bacteria act as predators of ecologically significant algae that contribute to harmful blooms in the lab, as well as in their natural habitat. While the data strongly suggest that starvation may induce predation, further work is needed to fully establish this link. Similarly, the evidence for a social component in the predation process remains incomplete. This study will be very impactful to those interested in the diversity of microbial predator-prey interactions and controlling toxic algal bloom, but the paper could be strengthened by more clearly showing the degree of replication, by better defining the terms used to describe the observed behaviour, and by providing better support for starvation and collective behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Rolland and colleagues investigated the interaction between Vibrio bacteria and Alexandrium algae. The authors found a correlation between the abundance of the two in the Thau Lagoon and observed in the laboratory that Vibrio grows to higher numbers in the presence of the algae than in monoculture. Time-lapse imaging of Alexandrium in coculture with Vibrio enabled the authors to observe Vibrio bacteria in proximity to the algae and subsequent algae death. The authors further determine the mechanism of the interaction between the two and point out similarities between the observed phenotypes and predator-prey behaviours across organisms.

      Strengths:

      The study combines field work with mechanistic studies in the laboratory and uses a wide array of techniques ranging from co-cultivation experiments to genetic engineering, microscopy and proteomics. Further, the authors test multiple Vibrio and Alexandria species and claim a wide spread of the observed phenotypes.

      Weaknesses:

      In my view, the presentation of the data is in some cases not ideal. The phrasing of some conclusions (e.g., group-attacks and wolf-pack-hunting by the bacteria) is in my opinion too strong based on the herein provided data.

    3. Reviewer #2 (Public review):

      Goal summary:

      The authors sought to (i) demonstrate correlations between the dynamics of the dinoflagellate Alexandrium pacificum and the bacterim Vibrio atlanticus in natural populations, ii) demonstrate the occurrence of predation in laboratory experiments, iii) claim coordinated action by the predators in the predation process, iv) demonstrate that predation is induced by predator starvation, and v) test for effects of quorum sensing and iron-uptake genes on the predation process.

      Strengths include:

      (1) Data indicating correlated dynamics in a natural environment that increase the motivation for the study of in vitro interactions.

      (2) Experimental design allowing clear inference of predation based on population counts of both prey and predators in addition to microscopy-based evidence.

      (3) Supplementation of population-level data with molecular approaches to test hypotheses regarding possible involvement of quorum sensing and iron uptake in predation.

      Weaknesses include:

      (1) A lack of early, clear definitions for several important terms used in the paper, including 'predation', 'coordination' and 'coordinated action', 'group attack', and 'wolf-pack hunting', along with a corresponding lack of criteria for what evidence would warrant use of some of these labels. (For example, does mere simultaneity of attacks of an A. pacificum cell by many V. atlanticus cells constitute "coordination"? Or, as it seems to us, does coordination require some form of signalling between predator cells?)

      (2) Absence of controls for cell density in the test for starvation effects on predatory behavior; unclear how the length of incubation affects the density of V. atlanticus cells.

      (3) Lack of clarity in some of the methodological descriptions

      Appraisal:

      The authors convincingly achieve their aim of demonstrating that V. atlanticus can prey on A. pacificum, provide strongly suggestive evidence that such predation is induced by starvation, and clearly demonstrate that both iron availability and, correspondingly, the presence of genes involved in iron uptake, strongly influence the efficacy of predation. However, the evidence for starvation-induction of predation can be strengthened with cell-density controls; evidence for a social component to predation - positive interactions between attacking predators - is lacking.

      Discussion of impact:

      This paper will interest those interested in how microbial behaviour responds to environmental fluctuations, in particular predatory behaviour, but will do so more strongly if the evidence of starvation-induction of predation is strengthened. It will also interest those investigating bacteria-algae interactions and potential ecological controls of algal blooms. It has the potential to interest researchers of microbial cooperation, should the authors be able to provide any evidence of coordination between predator cells.

    1. eLife Assessment

      How secretion is regulated during cell division and how membrane trafficking factors cooperate with the cytoskeleton during cell division remain poorly understood. In this work the authors find protein-protein interactions and localization dependencies between the polymeric septin cytoskeleton and the exocyst complex, using fission yeast as a model organism and using alphafold 3 based structural predictions. The work provides a valuable body of new information that will be of great interest to the cell biology community. The evidence is solid and provides the authors and the community a framework to test if the identified interfaces reflect bona fide interaction sites in vivo and in vitro in future.

    2. Reviewer #1 (Public review):

      Summary

      In this manuscript, Singh, Wu and colleagues explore functional links between septins and the exocyst complex. The exocyst in a conserved octameric complex that mediates the tethering of secretory vesicles for exocytosis in eukaryotes. In fission yeast cells, the exocyst is necessary for cell division, where it localizes mostly at the rim of the division plane, but septins, which localize in a similar manner, are non-essential. The main findings of the work are that septins are required for the specific localization of the exocyst to the rim of the division plane, and the likely consequent localization of the glucanase Eng1 at this same location, where it is known to promote cell separation. In absence of septins, the exocyst still localizes to the division plane, but is not restricted to the rim. They also show some defect in the localization of secretory vesicles and glucan synthase cargo. They further show interactions between septins and exocyst subunits through coIP experiments.

      Strengths

      The septin, exocyst and Eng1 localization data are well supported, showing that the septin rim recruits the exocyst and (likely consequently) the Eng1 glucanase at this location. One important finding of the manuscript is that of a physical interaction between septins and exocyst subunits in co-immunoprecipitation experiments.

      Weaknesses

      While interactions are supported by coIP experiments, the AlphaFold-predicted septin-exocyst interactions are not very convincing and the predicted binding interfaces are not supported by mutation analysis. A further open question is whether septins interact with the intact exocyst complex or whether the interactions occur only with individual subunits. The two-hybrid and coIP data only show weak interactions with individual subunits, and some coIPs (for instance Sec3 and Exo70 with Spn1 and Spn4) are negative, suggesting that the exocyst complex may not remain intact in these experiments.

    3. Reviewer #2 (Public review):

      Summary:

      This interesting study implicates the direct interaction between two multi-subunit complexes, known as the exocyst and septin complexes, in the function of both complexes during cytokinesis in fission yeast. While previous work from several labs had implicated roles for the exocyst and septin complexes in cytokinesis and cell separation, this study describes the importance of protein:protein interaction between these complexes in mediating the functions of these complexes in cytokinesis. Previous studies in neurons had suggested interactions between septins and exocyst complexes occur but the functional importance of such interactions was not known. Moreover, in baker's yeast where both of these complexes have been extensively studied - no evidence of such an interaction has been uncovered despite numerous studies which should have detected it. Therefore while exocyst:septin interactions appear to be conserved in several systems, it appears likely that budding yeast are the exception--having lost this conserved interaction.

      Strengths:

      The strengths of this work include the rigorous analysis of the interaction using multiple methods including Co-IP of tagged but endogenously expressed proteins, 2 hybrid interaction, and Alphafold Multimer. Careful quantitative analysis of the effects of loss of function in each complex and the effects on localization and dynamics of each complex was also a strength. Taken together this work convincingly describes that these two complexes do interact and that this interaction plays an important role in post Golgi vesicle targeting during cytokinesis.

      Comments on revisions:

      The authors have added substantial work to the revised manuscript, and it is much improved. In particular, the figures portraying the AlphaFold Multimer model of the exocyst:septin interactions are much clearer. I also appreciate the effort that went into modeling the fission yeast exocyst complex based on the yeast CryoEM structure in order to determine if the predicted interfaces with septins were likely to be surface accessible in the intact exocyst complex.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Singh, Wu and colleagues explore functional links between septins and the exocyst complex. The exocyst in a conserved octameric complex that mediates the tethering of secretory vesicles for exocytosis in eukaryotes. In fission yeast cells, the exocyst is necessary for cell division, where it localizes mostly at the rim of the division plane, but septins, which localize in a similar manner, are non-essential. The main findings of the work are that septins are required for the specific localization of the exocyst to the rim of the division plane, and the likely consequent localization of the glucanase Eng1 at this same location, where it is known to promote cell separation. In the absence of septins, the exocyst still localizes to the division plane but is not restricted to the rim. They also show some defects in the localization of secretory vesicles and glucan synthase cargo. They further propose that interactions between septins and exocysts are direct, as shown through Alphafold2 predictions (of unclear strength) and clean coIP experiments. 

      Strengths: 

      The septin, exocyst and Eng1 localization data are well supported, showing that the septin rim recruits the exocyst and (likely consequently) the Eng1 glucanase at this location. One major finding of the manuscript is that of a physical interaction between septins and exocyst subunits. Indeed, many of the coIPs supporting this discovery are very clear. 

      Weaknesses: 

      I am less convinced by the strength of the physical interaction of septins with the exocyst complex. Notably, one important open question is whether septins interact with the intact exocyst complex, as claimed in the text, or whether the interactions occur only with individual subunits. The two-hybrid and coIP data only show weak interactions with individual subunits, and some coIPs (for instance Sec3 and Exo70 with Spn1 and Spn4) are negative, suggesting that the exocyst complex does not remain intact in these experiments.

      Given the known structure of the full exocyst complex and septin filaments (at least in S. cerevisiae), the Alphafold2 predicted structure could be used to probe whether the proposed interaction sites are compatible with full complex formation.  

      We thank the reviewer for these important and insightful comments. We agree that our current data, particularly the data from yeast two-hybrid and co-immunoprecipitation (coIP) assays, primarily reveal interactions between individual septin and exocyst subunits, and do not conclusively demonstrate binding of septins to the fully assembled exocyst complex. We realize this as a key limitation and have revised the manuscript text accordingly to clarify this point.

      We also appreciate the reviewer’s suggestion to use structural prediction to further assess their interaction plausibility. We have now employed the full Saccharomyces cerevisiae exocyst complex (with 4.4 Å resolution) published by the Guo group (Mei et al., 2018) to examine the interfaces of septin and the exocyst interactions, assuming that the S. pombe exocyst has the similar structure. We focused on checking all the interacting residues on the exocyst complex and septins from our AlphaFold modeling to determine whether these predicted interactions are structurally compatible. Our analysis reveals that majority subunit interactions are sterically feasible, while a few would likely require partial disassembly or flexible conformations. These new insights have been added to the revised Results and Discussion sections (Figure Supplement S4, S5 and Videos 4-7).

      While we cannot fully resolve whether septins engage with the whole exocyst complex versus selected subunits, our combined data support a model that septins scaffold or spatially regulate the exocyst localization at the division site, potentially through dynamic and multivalent interactions. We now explicitly state this more cautious interpretation in the revised manuscript.

      Mei, K., Li, Y., Wang, S., Shao, G., Wang, J., Ding, Y., Luo, G., Yue, P., Liu, J.-J., Wang, X. and Dong, M.-Q., Wang, H-W, Guo W. 2018. Cryo-EM structure of the exocyst complex. Nature Struct & Mol. Biol, 25(2), pp.139-146.

      The effect of spn1∆ on Eng1 localization is very clear, but the effect on secretory vesicles (Ypt3, Syb1) and glucan synthase Bgs1 is less convincing. The effect is small, and it is not clear how the cells are matched for the stage of cytokinesis. 

      For localizations and quantifications of Eng1, Ypt3, Syb1, and Bgs1 shown in Figures 6 and 7, cells with a closed septum (at or after the end of contractile-ring constriction) were quantified or highlighted. To quantify their fluorescence intensity at the division site using line scan, the line width used was 3 pixels. For Syb1 (Figure 6D), we quantified cells at the end of ring constriction (when Rlc1-tdTomato constricted to a dot) in the middle focal plane. The exact same lines were drawn in both Rlc1 and Syb1 channels. The center of line scan was defined as the pixel with the brightest Rlc1 value. All data were aligned by the center and plotted. For Bgs1 (Figure 7A), we quantified the cells that Rlc1 signal had disappeared from the division site. The line was drawn in the Bgs1 channel in the middle focal plane. The center of line scan was defined as the pixel with the brightest Bgs1 value.

      All data were aligned by the center and plotted. These details were added to the Materials and Methods.

      Reviewer #2 (Public Review): 

      Summary: 

      This interesting study implicates the direct interaction between two multi-subunit complexes, known as the exocyst and septin complexes, in the function of both complexes during cytokinesis in fission yeast. While previous work from several labs had implicated roles for the exocyst and septin complexes in cytokinesis and cell separation, this study describes the importance of protein:protein interaction between these complexes in mediating the functions of these complexes in cytokinesis. Previous studies in neurons had suggested interactions between septins and exocyst complexes occur but the functional importance of such interactions was not known. Moreover, in baker's yeast where both of these complexes have been extensively studied - no evidence of such an interaction has been uncovered despite numerous studies which should have detected it. Therefore while exocyst:septin interactions appear to be conserved in several systems, it appears likely that budding yeast are the exception--having lost this conserved interaction. 

      Strengths: 

      The strengths of this work include the rigorous analysis of the interaction using multiple methods including Co-IP of tagged but endogenously expressed proteins, 2 hybrid interaction, and Alphafold Multimer. Careful quantitative analysis of the effects of loss of function in each complex and the effects on localization and dynamics of each complex was also a strength. Taken together this work convincingly describes that these two complexes do interact and that this interaction plays an important role in post Golgi vesicle targeting during cytokinesis. 

      Weaknesses: 

      The authors used Alphafold Multimer to predict (largely successfully) which subunits were most likely to be involved in direct interactions between the complexes. It would be very interesting to compare this to a parallel analysis on the budding yeast septin and exocyst complexes where it is quite clear that detectable interactions between the exocyst and septins (using the same methods) do not exist. Presumably the resulting pLDDT scores will be significantly lower. These are in silico experiments and should not be difficult to carry out. 

      We thank the reviewer for this insightful suggestion. To assess the specificity of the predicted interactions between septins and the exocyst complex in S. pombe, we performed a comparative AlphaFold2 analysis using some of the homologous subunits from Saccharomyces cerevisiae. We modeled two interactions between Cdc10-Sec5 and Cdc10-Sec15 (Cdc10 is the Spn2 homolog) using the same pipeline and parameters at the time when we did the modeling for S. pombe. We did not find interactions between them using the criteria we used for the fission yeast proteins in this study. These results support the notion that the predicted septin–exocyst interactions in S. pombe are not generalizable to budding yeast. Unfortunately, we did not test all other combinations at that time and the AlphaFold2 platform is not available to us now (showing system error messages when we tried recently). We thank the reviewer again for this helpful suggestion, which should strengthen the evolutionary interpretation of the septin-exocyst interactions once it is able to be systematically carried out.

      Reviewer #3 (Public Review): 

      Septins in several systems are thought to guide the location of exocytosis, and they have been found to interact with the exocyst vesicle-tethering complex in some cells. However, it is not known whether such interactions are direct or indirect. Moreover, septin-exocyst physical associations were not detected in several other systems, including yeasts, making it unclear whether such interactions reflect a conserved septin-exocytosis link or whether they may missed if they depend on septin polymerization or association into higher-order structures. Singh et. al., set out to define whether and how septins influence the exocyst during S. pombe cytokinesis. Based on three lines of evidence, the authors conclude that septins directly bind to exocyst subunits to regulate localization of the exocyst and vesicle secretion during cytokinesis. The conclusions are consistent with the data presented, but some interpretations need to be clarified and extended: 

      (1) The first line of evidence examines septin and exocyst localization during cytokinesis in wild-type and septin-mutant or exocyst-mutant yeast. Quantitative imaging convincingly shows that the detailed localization of the exocyst at the division site is perturbed in septin mutants, and that this is accompanied by modest accumulation of vesicles and vesicle cargos. Whether that is sufficient to explain the increased thickness of the division septum in septin mutants remains unclear.

      The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane (less likely at the rim) without septins. Due to the lack of the glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum. We have added these points to the Discussion.

      (2) The second line of evidence involves a comprehensive Alphafold2 analysis of potential pair-wise interactions between septin and exocyst subunits. This identifies several putative interactions in silico, but it is unclear whether the identified interaction surfaces would be available in the full septin or exocyst complexes.  

      We thank the reviewer for raising this important point. We fully agree that a key limitation of pairwise AlphaFold predictions is that they do not account for the higher-order structural context of multimeric protein complexes, such as septin hetero-oligomers or the assembled exocyst complex. As a result, some of the predicted interfaces could indeed be conformationally restricted in the native state.

      To address this concern, we predicted the S. pombe exocyst and septin structures using AlphaFold3. We mapped predicted contact residues onto the predicted structure. Most predicted interfaces (86% for the exocyst and 86-96% for septins) appear to be located on accessible surfaces in the assembled complexes (Figure supplement S4, S5, videos 4 - video 7), suggesting that these interactions are sterically plausible. We have added this important caveat to the text of the revised manuscript highlighting the interface accessibility within the assembled complexes. We appreciate the reviewer’s insight, which helped us strengthen the interpretation and limitations of the AlphaFold-based analysis.

      (3) The third line of evidence uses co-immunoprecipitation and yeast two hybrid assays to show that several physical interactions predicted by Alphafold2 can be detected, leading the authors to conclude that they have identified direct interactions. However, both methods leave open the possibility that the interactions are indirect and mediated by other proteins in the fission yeast extract (co-IP) or budding yeast cell (two-hybrid). 

      We thank the reviewer for this important clarification. We agree that coimmunoprecipitation (co-IP) and yeast two-hybrid (Y2H) assays cannot conclusively distinguish between direct and indirect interactions. As the reviewer points out, co-IPs may reflect associations mediated by bridging proteins within the fission yeast extract, and Y2H readouts can be influenced by fusion context or endogenous host proteins. In our manuscript, we have now revised the relevant statements in the Results and Discussion sections to clarify that the observed associations are consistent with direct interactions predicted by AlphaFold2, but cannot alone establish direct binding. We have also tempered our terminology—substituting phrases such as “direct interaction” with “physical association consistent with direct binding,” where appropriate.

      (4) Based on prior studies it would be expected that the large majority of both septins and exocyst subunits are present in cells and extracts as stoichiometric complexes. Thus, one would expect any septin-exocyst interaction to yield associations detectable with multiple subunits, yet co-IPs were not detected in some combinations. It is therefore unclear whether the interactions reflect associations between fully-formed functional complexes or perhaps between transient folding intermediates. 

      We thank the reviewer for this thoughtful observation. We agree that both septins and exocyst subunits are generally understood to exist in cells as stable, stoichiometric complexes, and that interactions between fully assembled complexes might be expected to yield co-immunoprecipitation signals involving multiple subunits from each complex. However, it was also found that >50% of septins Spn1 and Spn4 are in the cytoplasm even during cytokinesis when the septin double rings are formed (Table 1 of Wu and Pollard, Science 2005, PMID: 16224022). Thus, it is possible that there are pools of free septin and exocyst subunits in the cytoplasm, which were detected in our Co-IP assays. 

      In our experiments, we observed selective co-IP signals between certain septin and exocyst subunits, while other combinations did not yield detectable interactions. We believe these findings could reflect several other possibilities besides the possible interactions among the free subunits in the cytoplasm:

      (1) Some interactions may only be strong enough between specific subunits at exposed interfaces under the Co-IP conditions, rather than through wholesome complex–complex interactions;

      (2) The detergent and/or salt conditions used in our co-IPs may disrupt labile complex interfaces or partially dissociate multimeric assemblies.

      To address this concern, we now include in the Discussion a paragraph highlighting the possibility that some of the observed interactions may not reflect binding between fully assembled, functional complexes. Notably, most detected interactions pairs are consistent with the AlphaFold predictions, which suggest specific subunit interfaces may be responsible for mediating contact. While we cannot fully resolve whether septins engage with the whole exocyst complex versus selected subunits, our combined data supports a model that septins scaffold or spatially regulate the exocyst localization at the division site, potentially through dynamic and multivalent interactions. We now explicitly state this more cautious interpretation in the revised manuscript. Future biochemical studies using native complex purifications, cross-linking mass spectrometry, or in vitro reconstitution with fully assembled septin and exocyst complexes, or in vivo FRET assays will be essential to clarify whether the interactions we observe occur between intact assemblies or intermediate forms.

      Reviewer #1 (Recommendations for the Authors): 

      A major finding from the manuscript is the description of physical interaction of septin subunits with exocyst subunits. The analysis starts from Alphafold2 predictions, shown in Figures 3 and S3. However, some of the most useful metrics of Alphafold, the PAE plot and the pTM and ipTM values, are not provided. It is thus very difficult to estimate the value of the predicted structures (which are also obscured by all side chains). The power of a predicted structure is that it suggests binding interfaces, which is not explored here. At the very least, it would not be difficult to examine whether the proposed binding interfaces are free in the septin filaments and octameric exocyst complex. 

      Please also see response to reviewer #1 (Public Review).

      We thank the reviewer for these very helpful suggestions. We agree that inclusion of AlphaFold2 model confidence metrics—specifically the Predicted Aligned Error (PAE) plots, as well as pTM and ipTM values—is essential for evaluating the reliability of the predicted septin–exocyst interfaces.

      In the revised manuscript, we have now included the PAE plots (Figure 3 and Supplementary S3) and summarizes the pTM scores for each predicted septin–exocyst subunit pair. We also provide a short description of these metrics in the figure legend to help guide interpretation. The old Alphafold2 version (alphafold2advanced) that we used doesn’t give iPTM score, so are not included. However, according to our methodology, we only counted the interacting residues which have pLDDT scores >50%, predicting the resulting iPTM score should not be very weak.

      In addition, we have updated Figures 3 and S3 to show simplified ribbon diagrams of the interface regions, with side chains hidden by default and selectively displayed only at predicted interaction hotspots. This improves structural clarity and makes the interface regions easier to interpret. We mentioned in the Discussion that the preliminary studies show that the predicted interacting interfaces of Sec15 and Sec5 with septin subunits are accessible for interaction in the whole exocyst complex. The new Figure Supplement S4 and S5 and Videos 4-7 now show the interface residues of both the exocyst and septins that are involved in the interactions.

      Two further points on the interaction: 

      The 2H interaction data is not very convincing. The insets showing beta-gal assays do not look very different from the negative control (compare for instance in panel 4E the Sec15BD alone, last column, with the Sec15-BD in combination with Spn4-AD, third column: roughly same color), which suggests it is mostly driven by autoactivation of Sec15-BD. Providing growth information in addition to beta-gal may be helpful. 

      We appreciate the reviewer’s close evaluation of the yeast two-hybrid (Y2H) assay data, and we agree that the signals observed in the Spn4–Sec15 combination is indeed weak. Unfortunately, we did not perform growth assays. However, we would like to clarify that this is consistent with the nature of the interactions that we are investigating. The interaction between individual septin and exocyst subunits is not strong and/or transient as supported by the weak interactions by Co-IP experiments. Given the exocyst only tethers/docks vesicles on the plasma membrane for tens of seconds before vesicle fusion, the multivalent interactions between septins and the exocyst should be very dynamic and not be too strong. 

      As evidenced by our Co-IP experiments and multivalent interactions predicted by Alphafold2, the interaction between Spn4 and Sec15 is detectable but weak, suggesting that this may be a low-affinity or transient interaction. Given that Y2H assays have known limitations in detecting such low-affinity interactions—especially those that depend on conformational context or are not optimal in the yeast nucleus—it is perhaps not surprising that the X-gal color development is subtle. These limitations of the Y2H system have been well-documented (e.g., Braun et al., 2009; Vidal & Fields, 2014), particularly for interactions with affinities in the micromolar range or those requiring conformational specificity. Therefore, the weak signal observed is in line with expectations for a lowaffinity, transient interaction such as between Spn4 and Sec15.

      Vidal, M. and Fields, S., 2014. The yeast two-hybrid assay: still finding connections after 25 years. Nature methods, 11(12), pp.1203-1206.

      Braun, P., Tasan, M., Dreze, M., Barrios-Rodiles, M., Lemmens, I., Yu, H., Sahalie, J.M., Murray, R.R., Roncari, L., De Smet, A.S. and Venkatesan, K., 2009. An experimentally derived confidence score for binary protein-protein interactions. Nature methods, 6(1), pp.91-97.

      In the coIP experiments, I am confused by the presence of tubulin signal in some of the IPs. For instance, in Fig 4B, but not 4D, where the same Sec15-GFP is immunoprecipitated. There is also a signal in 4C but not 4A. This needs to be clarified. 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with well-documented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not as a marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37.

      Regarding the localization of Ypt3 and Syb1 in WT and spn1∆ in Figure 6C-D and Bgs1 in Figure 7A, it would help to add a contractile ring marker to be able to match the timing of cytokinesis between WT and mutants and ensure that cells of same stage are compared (and add some quantification for Ypt3). In fact, in Figure 7A, next to the cells being pointed at, there are very similar localizations of Bgs1 in WT and spn1∆ at the rim of the ingressing septum, which makes me wonder how the quantified cells were chosen. 

      For localizations and quantifications of Eng1, Ypt3, Syb1, and Bgs1 shown in Figures 6 and 7, cells with a closed septum (at or after the end of contractile-ring constriction) were quantified or highlighted. To quantify their fluorescence intensity at the division site using line scan, the line width used was 3 pixels. For Syb1 (Figure 6D), we quantified cells at the end of ring constriction (when Rlc1-tdTomato constricted to a dot) in the middle focal plane. The exact same lines were drawn in both Rlc1 and Syb1 channels. The center of line scan was defined as the pixel with the brightest Rlc1 value. All data were aligned by the center and plotted. For Bgs1 (Figure 7A), we quantified the cells that Rlc1 signal had disappeared from the division site. The line was drawn in the Bgs1 channel in the middle focal plane. The center of line scan was defined as the pixel with the brightest Bgs1 value. All data were aligned by the center and plotted. These details were added to the Materials and Methods.

      Finally, the manuscript would benefit from some figure reorganization/compaction. Unless work on the binding interfaces is added, Figure 3 and S3 could be removed and summarized by providing the pTM and ipTM values of the predicted interactions. Figure 5 could be combined with Figure 2, as it is essentially a repeat with additional exocyst subunits. 

      Because the binding interfaces are added, we keep the original Figures 3 and S3. The experiments in Figure 5 could not be performed before the interaction tests between septins and the exocyst. Thus, to aid the flow of the story, we keep Figures 2 and 5 separated.

      Minor comments: 

      The last sentence of the first paragraph of the results does not make much sense at this point of the paper. After the first paragraph, there is no evidence that colocalization would be required for proper function.  

      We agree that the sentence in question may have overstated the functional implications of colocalization too early in the Results section, before presenting supporting evidence. Our intention was to introduce the hypothesis that spatial proximity between septins and exocyst subunits may be relevant for their coordination during cytokinesis, which we examine in later figures. We have revised the sentence to more accurately reflect the observational nature of the data at this stage in the manuscript as below:

      "These observations suggest the spatial proximity between septins and the exocyst during certain stage of cytokinesis, raising the possibility of their functional coordination, which we would further investigate below."

      What is the indicated n in Figure 6B? Number of cells? 

      Yes, the n in Figure 6B refers to the thin sections of electron microscopy quantified in the analysis. We have now updated the figure legend to explicitly state this for clarity.

      The causal inference made between the alteration of Exocyst localization in septin mutants and the thicker septum is possible, but by no means certain. It should be phrased more cautiously. 

      We agree that our original phrasing may have overstated the causal relationship between altered exocyst localization in septin mutants and septum thickening. Our data supports a correlation between these phenotypes, but additional experiments would be required to establish direct causality.

      To reflect this, we have revised the relevant sentence in the Discussion to read:

      “The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane without septins. Due to the lack of the glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum.”

      Reviewer #2 (Recommendations for the Authors): 

      (1) In the display of the AlphaFold Model for the interactions (Figure 3 and Supplemental Figure 3) it is difficult to identify which subunits are where. Residue numbers and subunits should be labeled and only side chains important for the interactions should be present in the model. 

      We appreciate this valuable suggestion. We agree that clearer visual labeling is essential for interpreting the predicted interactions and have revised Figures 3 and S3 accordingly to improve readability and emphasize key structural features.

      Specifically, we have:

      • Labeled each subunit with its name and color-coded consistently across panels.

      •  Annotated key interface residues with residue numbers directly in the figure.

      • Removed non-interacting side chains to declutter the model and highlight only those involved in predicted interactions as well as expanded the figure legend for explanation.

      (2) In Table 1 the column label "Genetic Interaction at 25C" is confusing when synthetic growth defects are shown with a "plus". Rather this column could be labeled "Growth of double mutants at 25C" and then designate the relative growth rate observed at 25C as in Table 2. Designating a negative effect on growth with a plus is confusing. 

      Thanks for the thoughtful suggestions. We have made the suggested changes by deleting the last column so that Tables 1 and 2 are consistent.

      (3) In Figure 4, why is tubulin being co-immunoprecipitated in two of the four anti-GFP IPs? Are the IPs dirty and if so why does it vary between the four experiments? If they are dirty can the non-specific tubulin be removed by additional washes with IP buffer or conversely is it necessary to do minimal washes in order to detect the exocyst-septin interaction by coIP? A comment on this would be helpful. 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with welldocumented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37. 

      In response to the second part of reviewer’s comment, we washed the pulldown product for 5 times each time with 1 ml IP buffer at 4ºC. We used this standard protocol for all the Co-IP experiments to detect the interaction between different septin-exocyst subunits. So, we are not sure if and how more washes or more stringent buffer conditions can interfere with detection of the interactions.

      Reviewer #3 (Recommendations for the Authors): 

      In addition to the issues noted in the public review, there were some confusing findings and references to previous literature that merit further consideration or discussion: 

      • The current gold standard for validating Alphafold predictions involves making targeted mutants suggested by the structural predictions. The absence of any such validation weakens the conclusions significantly. 

      We agree that the targeted mutagenesis based on AlphaFold2-predicted interaction interfaces represents a powerful approach to experimentally validate the in silico models. While we did not pursue structure-guided mutagenesis in this study, our goal was to identify putative interactions between septin and exocyst subunits as a foundation for future functional work. Our current conclusions are intentionally limited to proposing putative interfaces, supported by co-immunoprecipitation and genetic interaction data.

      We recognize that direct validation of specific contact residues would significantly strengthen the model. Accordingly, we have revised the Discussion to explicitly state this limitation and to note that structure-based mutagenesis will be an important next step to test the functional relevance of predicted interactions. We have added the following statement:

      “Future studies are needed to refine the residues involved in the interactions because the predicted interacting residues from AlphaFold are too numerous. However, it is encouraging that most of the predicted interacting residues are clustered in several surface patches. Experimental validation through targeted mutagenesis is an important next step.”

      • Much of the writing appears to imply that differences in mutant phenotypes indicate differences in septin (or exocyst) subunit behaviors/functions. However, my reading of the work in budding yeast is that such differences reflect the partial functionality that can be conferred by aberrant partial septin complexes that assemble and may polymerize in mutants lacking different subunits. In this view, which is supported by data showing that essentially all septins are in stoichiometric octameric complexes in cells, the wild-type functions are all mediated by the full complex. Similarly, the separate exocyst subunit localizations based on tagged Sec3 (Finger et al) were not supported by later work from the Brennwald lab with untagged Sec3, and the idea that different exocyst subunits may function separately from the full complex has very limited support in yeast. I would suggest that the text be edited to better reflect the literature, or that different views be better justified. 

      Thanks for the suggestions. We have revised the text accordingly.

      • The comprehensive set of Alphafold2 predictions is a major strength of the paper, but it is unclear to this reader whether the multiple predicted interactions truly reflect multivalent multimode interactions or whether many (most?) predictions would not be consistent with interactions between full complexes and may not indicate physiological interactions. Better discussion of these issues is needed to interpret the findings. 

      We appreciate the reviewer’s suggestion to use structural prediction to further assess interaction plausibility. We have now employed the full Saccharomyces cerevisiae exocyst complex (with 4.4 Å resolution) published by the Guo group to examine the interfaces of septins and the exocyst interactions, assuming that the S. pombe exocyst has the similar structure. We mapped predicted contact residues onto the predicted structure. Most predicted interfaces (86% for the exocyst and 86-96% for septins) appear to be located on accessible surfaces in the assembled complexes (Figure supplement S4, S5, videos 4 - video 7), suggesting that these interactions are sterically plausible. We have added this important caveat to the text of the revised manuscript highlighting the interface accessibility within the assembled complexes. We appreciate the reviewer’s insight, which helped us strengthen the interpretation and limitations of the AlphaFold-based analysis.

      • Some but not all co-IP blots appear to show tubulin (negative control) coming down with the GFP pull-downs. Why is that, and what does it imply for the reliability of the co-IP protocol? 

      The presence of tubulin in some immunoprecipitates is not unexpected, particularly in experiments involving cytoskeleton-associated proteins such as septins and exocyst subunits. The occasional presence of tubulin in our co-IP samples is consistent with welldocumented reports showing tubulin as a frequent non-specific co-purifying protein, particularly under native lysis conditions used to preserve large complexes (Vega and Hsu, 2003; Gavin et al., 2006; Mellacheruvu et al., 2013; Hein et al., 2015). The CRAPome database and quantitative interactomics studies highlight tubulin as one of the most common background proteins in affinity-based workflows. Importantly, tubulin was used as a loading control but not a marker for interaction in our study, and its variable presence does not reflect a specific interaction with Sec15-GFP or other bait proteins, and we have clarified this point in the revised figure legend.

      Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dümpelfeld, B. and Edelmann, A., 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084), pp.631-636.

      Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.P., St-Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y. and Halim, V.A., 2013. The CRAPome: a contaminant repository for affinity purification–mass spectrometry data. Nature methods, 10(8), pp.730736.

      Hein, M.Y., Hubner, N.C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I.A., Weisswange, I., Mansfeld, J., Buchholz, F. and Hyman, A.A., 2015. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell, 163(3), pp.712-723.

      Vega, I.E., Hsu, S.C. 2003. The septin protein Nedd5 associates with both the exocyst complex and microtubules and disruption of its GTPase activity promotes aberrant neurite sprouting in PC12 cells. Neuroreport, 14, pp.31-37.

      • Why were two different protocols used for different yeast-two-hybrid analyses? 

      The purpose of using two protocols was to test which protocol is more reliable and sensitive.

      • The different genetic interactions between septin and exocyst mutants when combined with TRAPP-II mutants merits further discussion: might the difference reflect relocation of exocyst from rim to center in septin mutants versus inactivation of exocyst in exocyst mutants? 

      We appreciate this insightful comment and agree that this distinction is likely meaningful. The reviewer correctly notes that septin mutants may not abolish exocyst function but rather cause its spatial mislocalization: from the rim to the center of the division site, whereas the exocyst mutants likely result in partial or complete loss of vesicle tethering activity at the plasma membrane.

      To address this important nuance, we have expanded the Discussion as follows:

      “The genetic interactions between mutations in the exocyst and septins when combined with TRAPP-II mutants may reflect fundamentally different consequences for compromising the exocyst function (Tables 1 and 2). In septin mutants, the exocyst complex still localizes to the division site but is mispositioned from the rim to the center of the division plane. This mislocalization allows partial retention of exocyst function, leading to very mild synthetic or additive defects when combined with compromised TRAPP-II trafficking and tethering. In contrast, in exocyst subunit mutants, the exocyst becomes partial or non-functional, resulting in a more severe loss of exocyst activity. These differing consequences could explain the qualitative differences in genetic interactions observed with TRAPP-II mutants (Tables 1 and 2). Thus, septins and the exocyst also work in different genetic pathways for certain functions in fission yeast cytokinesis.”

      • The vesicle accumulation in septin mutants was quite modest. Does that imply that most vesicles are still fusing in the septum? Further discussion would be beneficial to understand what the authors think this means. 

      We thank the reviewer for this important point. We agree that the modest vesicle accumulation observed in septin mutants suggests that a significant proportion of vesicles continue to successfully fuse at the division site, even in the absence of fully functional septin structures.

      We now discuss this in greater detail in the revised manuscript:

      “The relatively modest vesicle accumulation in septin mutants suggests that septins are not absolutely required for vesicle tethering or fusion per se at the division site. Instead, septins primarily function to spatially organize the targeting sites of exocyst-directed vesicles by stabilizing the localization of the exocyst at the rim of the cleavage furrow. In septin mutants, mislocalization of the exocyst reduces the spatial precision of membrane insertion but still permits vesicle tethering and fusion, albeit in a less controlled manner. Thus, septins likely play a modulatory rather than essential role in exocytic vesicle delivery during cytokinesis. This interpretation aligns with our localization and genetic interaction data, which indicates that septins act as scaffolds to optimize secretion geometry, rather than as core components of the fusion machinery.”

      • It was unclear to this reader why relocation of some exocyst complexes from the rim to the center of the septal region would lead to dramatic thickening of the septum. Further discussion would be beneficial to understand what the authors think this means. 

      The modest accumulation of vesicles and vesicle cargos at the division site is one of the reasons for the increased thickness of the division septum in septin mutants. It is more likely that the misplaced exocyst can still tether vesicles along the division plane without septins. Because of the lack of glucanase Eng1 at the rim of the division plane in septin mutants, daughter-cell separation is delayed and then cells continue to thicken the septum. We have added these points to the Discussion.

    1. eLife Assessment

      This work presents important findings suggesting that a combination of transcranial stimulation approaches applied for a short period could improve memory performance. Solid methods and evidence, in line with current standards for non-invasive stimulation and recording, are included to broadly support the main findings. The results potentially have implications for non-invasive enhancement of cognitive functions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors make a bold claim that a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) causes slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

    4. Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they find that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increases gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments. It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors make a bold claim that a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) causes slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) The title refers to the "precuneus-hippocampus" network. A clear definition of what is meant by this terminology is lacking. More importantly, mechanistic evidence that the precuneus and the hippocampus are involved in the potential effects of stimulation remains unconvincing.

      Thank you for the observation. We believe that the evidence collected supports our state relative to the stimulation of the precuneus and the involvement of the hippocampus. In particular, given the existing evidence on TMS methodology and precuneus non-invasive stimulation (see Koch et al., Brain, 2022, Koch et al., Alzheimer's research & therapy, 2025), the computation of the biophysical model with the E-field we produced (see Biophysical modeling and E-field calculation section in the supplementary information), together with the individual identification of the precuneus through the RM (see iTBS+γtACS neuromodulation protocol and MRI data acquisition in the main text), we can reasonably assume that the individually identified PC was stimulated.

      As we acknowledged in the Limitations section, we cannot entirely rule out the possibility that our results might also reflect stimulation of more superficial parietal regions adjacent to the precuneus. Nor do we provide direct evidence of microscopic changes in the precuneus following stimulation. However, the results we provide in terms of changes in precuneus oscillatory activity and precuneus-hippocampi connectivity sustain both our thesis of the precuneus stimulation and of hippocampi involvement in the stimulation effects.

      Despite this consideration, we agree on the fact that a clear definition of what is meant by the terminology “precuneus-hippocampus network” is lacking. Moreover, since our data and previous evidence sustain the notion of PC stimulation, while this study does not produce direct evidence of the hippocampi stimulation - but only of the effect of the neuromodulation protocol on its connection with the precuneus, we soften the claim in the title. We remove the mention of the precuneus-hippocampus network so that the modified title will be as follows: “Dual transcranial electromagnetic stimulation of the precuneus boosts human long-term memory.”

      (2) The question of the extent to which the stimulation approach and the stimulation parameters used in these experiments causes specific and functionally relevant neural effects remains open. Invasive recordings that could address this question remain out of the scope of this non-invasive study. The authors conducted scalp EEG experiments in an attempt to address this question using non-invasive methods. However, the results shown in Fig. 3 are unclear. The results are inconsistently reported in units of microvolts squared in some panels (3A, 3B) and in units of microvolts in other panels (3C). Also, there is insufficient consideration of potential contamination by signal components reflecting eye movements, other muscle artifacts, or another volume-conducted signal reflecting aggregate activity inside the brain.

      As you correctly noted, Figure 3 presents results obtained from the TMS–EEG recordings. However, there is no inconsistency regarding the measurement units, as we are referring to two distinct indices: one in the frequency domain—oscillatory power shown in Figures 3A and 3B, expressed in microvolts squared (μV<sup>²</sup>)—and one in the time domain—the TMS-evoked potential shown in Figure 3C, expressed in microvolts (μV).

      Regarding the concern about artifacts, this is an important issue on which our group has a strong expertise, having published well-established, highly cited procedures on how to record and clean TMS-EEG signals (e.g., Casula et al., Clinical Neurophysiology, 2017; Rocchi et al., Brain Stimulation, 2021). In the current study, we adopted a well-established and rigorous approach for both data acquisition and preprocessing. This ensured that the recorded TMS–EEG signals were not contaminated by physiological or electrical artifacts.

      As regards the recording procedure, all participants were instructed to fixate on a black cross to minimize eye movements. To avoid auditory-related components caused by the TMS click, we adopted an ad-hoc procedure optimized for TMS-EEG recordings (Rocchi et al., Brain Stimulation, 2021). First, participants were given earphones that continuously played an ad-hoc masking noise composed of white noise mixed with specific time-varying frequencies of the TMS click (Rocchi et al., Brain Stimulation, 2021). The masking noise volume was adjusted to ensure that participants could not detect the TMS click, or as much as tolerated (always below 90 dB). To further reduce the impact of the TMS click on the EEG signal, we placed ear defenders (SNR=30) on top of the earphones. Please see TMS–EEG data acquisition section in the main text.

      As regards the offline cleaning process, we applied Independent Component Analysis (INFOMAX-ICA) to the EEG data to identify and remove components associated with muscle activity, eye movements, blinking, and residual TMS-related artifacts, in line with the most recent guidelines on TMS–EEG preprocessing (Hernandez-Pavon et al., Brain Stimulation, 2023). Specifically, for TMS-related muscle artefacts, we strictly followed the criteria based on their scalp topography, spectral content, timing, and amplitude, which we published in a paper focused on this topic (Casula et al., Clinical Neurophysiology, 2017). We add this detail in the TMS–EEG preprocessing and analysis section in the supplementary information (lines 119-120).

      (3) Figure 3 indicates "Precuneus oscillatory activity ...", but evidence that the activity presented reflects precuneus activity is lacking. The maps shown at the bottom of Figure 3C suggest that the EEG signals recorded with scalp EEG reflect activity generated across a wide spatial range, with a peak encompassing at least tens of centimeters. Thus, evidence that effects specifically reflect precuneus activity, as the paper's title and text throughout the manuscript suggest, is lacking.

      We believe there may have been a misunderstanding. As indicated in the figure caption, panels A and B represent oscillatory activity, whereas panel C displays the TMS-evoked potentials (TEPs). Therefore, the topographical maps mentioned (i.e., those in panel C) did not refer to oscillatory activity, but to differences in TEP amplitude. Specifically, the topographies shown in Figure 3C illustrate statistically significant differences in TEP amplitudes between post-stimulation time points (T1—immediately after stimulation, and T2—20 minutes after stimulation) and the pre-stimulation baseline (T0).

      In this figure, we focused our analysis on a cluster of electrodes overlying the individually identified precuneus, capturing EEG responses to single TMS pulses delivered to that target. This approach, widely used in previous literature (e.g., Koch et al., NeuroImage, 2018; Casula et al., Annals of Neurology, 2022; Koch et al., Brain, 2022; Maiella et al., Clinical Neurophysiology, 2024; Koch et al., Alzheimer’s Research & Therapy, 2025), supports the interpretation that the observed responses reflect precuneus-related activity. Furthermore, the wide spatial range change you mention proved to be statistically different only when conducting the TMS-EEG over the precuneus (i.e., administering the TMS single pulse over the precuneus) and not when performing it over the left parietal cortex. We modified the discussion section in the main text to make it more clear (lines 196-199).

      “Moreover, we observed specific cortical changes in the posteromedial parietal areas, as evidenced by the whole-brain analysis conducted on TMS-EEG data when performed over the precuneus and the absence of effect when TMS-EEG was performed on the lateral posterior parietal cortex used as a control condition.”

      That said, we do not state that the effects observed specifically reflect the precuneus activity; indeed, we think the effect of the stimulation is broader, as discussed in the Discussion section. We rather sustain, in line with the literature (Koch et al., Neuroimage 2018; Koch et al., Brain, 2022; Koch et al., Alzheimer's research & therapy, 2025), the idea that the effects observed are a consequence of the precuneus stimulation by the dual stimulation.

      (4) The paper as currently presented (e.g., Figure 3) also lacks rigorous evidence of relevant oscillatory activity. Prior to filtering EEG signals in a particular frequency band, clear evidence of oscillations in the frequency band of interest should be shown (e.g., demonstration of a clear peak that emerges naturally in the frequency range of interest when spectral analysis is applied to "raw" signals). The authors claim that gamma oscillations change because of the stimulation, but a clear peak in the gamma range prior to stimulation is not apparent in the data as currently presented. Thus, the extent to which spectral measurements during stimulation reflect physiological gamma oscillations remains unclear.

      If we understand correctly, your concern relates to the lack of a clear gamma peak before neuromodulation, which may suggest uncertainty about the observed changes in gamma oscillatory activity. Is that correct?

      First, it is important to underline that the natural frequency typically observed in the precuneus falls within the beta range, not the gamma range (see Rosanova et al., Journal of Neuroscience, 2009; Casula et al., Annals of Neurology, 2022). This explains why a prominent gamma peak is not expected at baseline (T0).

      Differently, our neuromodulatory protocol was specifically aimed at boosting gamma oscillatory activity given its well-established role in learning and memory processes (Griffiths & Jensen, Trends in Neurosciences, 2023). Thus, to assess the effect of the neuromodulatory protocol, we compared the oscillatory activity before (T0) and after stimulation (T1 and T2), which showed a clear increase in the gamma band. This effect is visible in the raw oscillatory power plot and is most clearly represented in Figure 3B, where the gamma band emerged as the only frequency range showing significant changes across time points.

      (5) Concerns remain regarding the rigor of statistical analyses in the revised manuscript (see also point 8 below). Figure 3B shows an undefined statistical test with p<0.05. The statistical test that was used is not explained. Also, a description of how corrections for multiple comparisons were made is missing. Figures 3A and 3C are not accompanied by statistics, making the results difficult to interpret. For Figure 4C, a claim was made based on a significant p-value for one statistical test and a non-significant p-value in another test. This is a common statistical mistake (see Figure 1 and accompanying discussion in Makin and Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175).

      All statistical tests are described in the Statistical Analysis section of the main text. Specifically, to assess cortical oscillation changes in Experiment 3, we conducted repeated-measures ANOVAs with stimulation condition (iTBS+γtACS vs. iTBS+sham-tACS) and time (ΔT1 = T1–T0; ΔT2 = T2–T0) as within-subject factors, for each frequency band. To further explore the effects of stimulation at each time point, we performed paired t-tests with Bonferroni correction for multiple comparisons. A one-tailed hypothesis was adopted, based on our a priori prediction of gamma-band increase derived from previous work (Maiella et al., 2022).

      Please note that Figures 3A and 3C are purely descriptive and are therefore not accompanied by statistical tests. Figure 3A shows the full spectral profile across frequencies and conditions, while statistical significance for these data is reported in Figure 3B. Similarly, the upper part of Figure 3C displays the TMS-evoked potential (TEP) in the precuneus, while the statistical comparison of TEP amplitudes across time points is shown in the lower part of Figure 3C.

      Regarding Figure 4C and the article you cited, are you referring to the error described as “Interpreting comparisons between two effects without directly comparing them”? If we understand correctly, this refers to the mistake of inferring an effect by observing that a significant result occurs in one condition or group, while the corresponding result in another condition or group is not significant, without directly testing the difference between them.

      In the case of Experiment 4, which investigates fMRI effects and is illustrated in Figure 4, we employed a general linear model that explicitly modeled both conditions and time points, allowing for a direct statistical comparison. Therefore, the connectivity effect reported does not fall into the category of the error you mentioned.

      Importantly, Figure 4C does not depict the effect of the neuromodulatory protocol itself. Rather, its purpose is to show that, within the real stimulation condition, there is a correlation between the observed effect and the integrity of the bilateral Middle Longitudinal Fasciculus. No conclusions or assumptions were made based on the absence of a significant correlation in the sham condition. However, since it was an exploratory analysis, we decided to soften our claims relative to the neural mechanism in the discussion section of the main text (lines 241-246).

      (6) In the second question posed in the original review, I highlighted that it was unclear how such stimulation would produce memory enhancement. The authors replied that, in the absence of mechanisms, there are many other studies that suffer from the same problem. This raises the question of placebo effects. The paper does not sufficiently address or discuss the possibility that any potential stimulation effects may reflect placebo effects.

      We agree with the reviewer on the potential role of a placebo effect in our study. For this reason, our experimental study had several stimulation conditions, including a placebo condition, which corresponded to the sham iTBS-sham tACS condition, which did not produce any effect.

      (7) The third major concern in the original review was the lack of evidence for a mechanism that is specific to the precuneus. Evidence for specific involvement of the precuneus remains lacking in the revised manuscript. The authors state: "the non-invasive stimulation protocol was applied to an individually identified precuneus for each participant". However, the meaning of this statement is unclear. Specifically, it is unclear how the authors know that they are specifically targeting the precuneus. Without directly recording from the precuneus and directly demonstrating effects, which is outside of the scope of the study, specific involvement of the precuneus seems speculative. Also, it does not seem as though a figure was included in the paper to show how the stimulation protocol specifically targets the precuneus. In their response to the original reviews, the authors state that posterior medial parietal areas are the only regions that show significant differences following the stimulation, but they did not cite a specific figure, or statistics reported in the text, that show this. In any event, posterior medial parietal areas encompass a wide area of the brain, so this would still not provide evidence for an effect specifically involving the precuneus.

      We respectfully disagree with the claim that targeting the precuneus in our study is speculative. The statement that “without directly recording from the precuneus and directly demonstrating effects, which is outside the scope of the study, specific involvement of the precuneus seems speculative” would, by that logic, implicitly call into question a large body of cognitive neuroscience research employing non-invasive techniques such as EEG and fMRI.

      Our methodological approach—combining MRI-guided stimulation, biophysical modeling, and TMS–EEG—is well established and widely used for targeting and studying the role of specific cortical regions, including the precuneus (e.g., Wang et al., Science, 2014; Koch et al., NeuroImage, 2018; Casula et al., Annals of Neurology, 2022, 2023; Koch et al., Brain, 2022; Maiella et al., Clinical Neurophysiology, 2024; Koch et al., Alzheimer’s Research & Therapy, 2025).

      In line with previously published protocols (Santarnecchi et al., Human Brain Mapping, 2018; Özdemir et al., PNAS, 2020; Mantovani et al., Journal of Psychiatric Research, 2021), we identified individual targets (i.e., the precuneus) for each participant based on structural and resting-state functional MRI data (see MRI Data Acquisition and Preprocessing section in the main text). This target was then accurately localized using MRI-guided stereotaxic neuronavigation, ensuring reproducible and anatomically precise stimulation across subjects.

      Finally, concerning the last comment about the lack of figures/statistics showing how the stimulation protocol targets the precuneus and the specificity of the effect observed, we would like to let the focus go over:

      Figure 3 in the main text, where we show the results of the TME-EEG over the posterior medial parietal areas;

      Figure S1 in the supplementary information, which shows with the e-fied simulation how the stimulation protocol targets the brain;

      the Precuneus iTBS+γtACS increases gamma oscillatory activity section in the main text results, where we report the results of the statistical analysis of the TMS-EEG conducted over the precuneus and the left posterior parietal cortex, used as a control condition to test for the specificity of the neuromodulation protocol.

      (8) Regarding chance levels, it is unfortunate that the authors cannot quantify what chance levels are in the immediate and delayed recall conditions. This makes interpretation of the results challenging. In the immediate and delayed conditions, the authors state that the chance level is 33%. It would be useful to mark this in the figures. If I understand correctly, chance is 33% in Fig. 2A. If this is the case and if I am interpreting the figure correctly:

      Gray bars for the sham condition appear to be below chance (~20-25%). Why is this condition associated with an accuracy level that is lower than chance?

      Cyan bars and red bars do not appear to be significantly different from chance (i.e., 33%), with red slightly higher than cyan. What statistic was performed to obtain the level of significance indicated in the figure? The highest average value for the red condition appears to be around 35%. More details are needed to fully explain this figure and to support the claims associated with this figure.

      The immediate and recall conditions you mention correspond to a free recall task. In this case, the notion of a fixed "chance level" is not straightforward as it would be in recognition or forced-choice paradigms, which is why we did not quantify it at first. I will now try to explain this extensively.

      Unlike multiple-choice tasks, where participants select the answer from a limited set of alternatives and the probability of a correct response by chance can be precisely quantified (e.g., 33% in a 3-alternative forced choice), free recall involves the spontaneous retrieval of items from memory without external cues or predefined options. As such, the response range in free recall is essentially unconstrained, encompassing the entire vocabulary of the participant.

      Because of this open-ended nature, the probability of correctly recalling a studied item purely by chance is exceedingly low and could be approximated to zero. Also, in our task, participants had to correctly recollect both name and occupation, doubling the possibility of the answers.

      This assumption is further supported by the fact that random guesses in free recall are unlikely to match any of the studied items, given the vast number of possible alternatives. As a result, performance above zero can be reasonably interpreted as reflecting genuine memory retrieval, rather than random guessing.

      As regards statistics, repeated-measures ANOVAs with stimulation condition as a within-subject factor (i.e., iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS) for each dependent variable (see statistical analysis section in main text).

      (9) In the revised version of the paper, the authors did not address concerns associated with the block design (please see question 4d in the original review).

      We are sorry for the misunderstanding. We did not address your concerns related to block design since it does not apply to our study. As reported in the paper you mentioned in the original review, block design involves data collection performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design since both TMS-EEG and fMRI were conducted in the resting state (i.e., without the presentation of stimuli) on different days according to the different randomized stimulation conditions.  

      In sum, this study presents an admirable aspirational goal, the notion that a non-invasive stimulation protocol could modulate activity in specific brain regions to enhance memory. However, the evidence presented at the behavioral level and at the mechanistic level (e.g. the putative involvement of specific brain regions) remains unconvincing.

      We hope our response will be carefully considered, fostering a constructive exchange and leading to a reassessment of your evaluation.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Comments on Revision:

      I thank the authors for their thoughtful responses to my first review and their inclusion of more detailed methodological discussion of their rationale for the stimulation protocol conditions and timing. Regarding the apparent difference in connectivity at baseline between conditions, the explanation that this is due to intrinsic dynamics, state, or noise implies the baseline is reflecting transient changes in dynamics rather than a true or stable baseline. Based on this, it looks like iTBS solely is significantly greater than the baseline before the iTBS and γtACS condition but maybe not that much lower than post-stimulation period for iTBS and γtACS. A longer baseline period should be used to ensure transient states are not driving baseline levels such that these endogenous fluctuations would average out. This also raises questions about whether the effect of iTBS and γtACS or iTBS alone are dependent on the intrinsic state at the time when stimulation begins. Their additional clarification of memory scoring is helpful but also reveals that the effect of dual iTBS+γtACS specifically on the association between faces and names is just significant. This modest increase in associative memory should be taken into consideration when interpreting these findings.

      We thank the reviewer for the feedback. We fully agree that considering baseline dynamics is critical when assessing the neurophysiological and connectivity effects of stimulation protocols.

      In Experiments 3 and 4, baseline measurements were specifically included in our design to account for the possibility that intrinsic dynamics, state, or noise could influence the observed effects of neuromodulation. Indeed, if we had compared only post-stimulation connectivity between the real and sham conditions, the effects might have appeared larger. The inclusion of baseline measurements allows us to contextualize and better isolate the neuromodulatory impact by controlling such endogenous fluctuations. Importantly, the fMRI connectivity measurements, which comprise the baseline, are derived from 10-minute BOLD signal acquisitions, which help mitigate the influence of transient fluctuations and provide a quite stable estimate of intrinsic connectivity.

      Moreover, regarding the possibility that stimulation effects may depend on the intrinsic state at stimulation onset, we hypothesize that gamma-frequency entrainment induced by tACS could reduce the variability of intrinsic dynamics, promoting a more stable neural state that is favorable for the induction of long-term plasticity.

      As regards the memory scoring, we would like to clarify that the significant improvement observed in the dual iTBS+γtACS condition does not pertain solely to the face–name association. Rather, it concerns the more demanding task of recalling the association between face, name, and occupation. While we agree that the observed effect could be considered modest, it is worth noting that it follows from only 3 minutes of stimulation.

      Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they find that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increases gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments. It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. In the revised manuscript, the authors provide post-hoc sensitivity analyses that help contextualize the strength of the findings.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in separate experiments, but readers should keep in mind that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      We thank the reviewer for the feedback.

      Reviewer #1 (Recommendations for the authors):

      I suggest:

      (1) Removing all mechanistic claims about the precuneus and hippocampus.

      We soften our claims about the precuneus-hippocampus network.

      (2) Repeating and focusing on the behavioral experiments with a much larger number of images and stronger statistical power to try to demonstrate a compelling behavioral correlate of the proposed stimulation protocol.

      We clarified the misunderstanding relative to the chance level of the behavioral experiments raised by the reviewer.

      Reviewer #2 (Recommendations for the authors):

      Use longer baseline to establish stable gamma level for comparisons in Figure 3

      If we understand correctly, you propose to increase the baseline to establish the gamma oscillatory activity as expressed in Figure 3 (showing the results of experiment 3). Is that right? In the figure, you see a baseline of -100; 0ms, which we use for a merely graphical reason, since no activity is usually observable before the TMS pulse. However, to establish the level of gamma, we used a larger baseline correction ranging from -700 ms to -300 ms (i.e., 400ms). We added this important information in the cortical oscillation section of the supplementary information (lines 134-135).

      Reviewer #3 (Recommendations for the authors):

      I think that the authors did a great job responding to the concerns raised by the reviewers. All of my own comments have been satisfactorily addressed. I will update my public review to be more concise, so that it only includes the overall assessment of the manuscript, including the strengths and weaknesses, but without the requests for clarification. Strengths and weaknesses remain largely the same, as the authors did not conduct additional experiments.

      Thank you.

    1. eLife Assessment

      This study presents a valuable finding that KDM5 inhibitors may enable a wide therapeutic window as compared to STING agonists or Type I Interferons. The evidence supporting the claims of the authors is convincing. The work will be of broad interest to scientists working in the field of breast cancer research.

    2. Reviewer #1 (Public review):

      In this manuscript, Lau et al reported that KDM5 inhibition in luminal breast cancer cells results in R-loop-mediated DNA damage, reduced cell fitness and an increase in ISG and AP signatures as well as cell surface Major Histocompatibility Complex (MHC) class I, mediated by RNA:DNA hybrid activation of the CGAS/STING pathway.

      Their studies have shown that KDM5 inhibition/loss mediates a viral mimicry and DNA damage response through the generation of R-loops in genomic repeats. This is a different mechanism from the more well studied double-stranded RNA-induced "viral mimicry" response.

      More importantly, they have shown that KDM5 inhibition does not result in DNA damage or activation of the CGAS/STING pathway in normal breast epithelial cells, suggesting that KDM5 inhibitors may enable a wide therapeutic window in this setting, as compared to STING agonists or Type I Interferons.

      Their findings provide new insights into the interplay between epigenetic regulation of genomic repeats, R-loop formation, innate immunity, and cell fitness in the context of cancer evolution and therapeutic vulnerability.

      Comments on revised version:

      The authors have satisfactorily addressed my comments and revised the manuscript accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated how the type-I interferon response (ISG) and antigen presentation (AP) pathways are repressed in luminal breast cancer cells and how this repression can be overcome. They found that a STING agonist can reactivate these pathways in breast cancer cells, but it also does so in normal cells, suggesting that this is not a good way to create a therapeutic window. Depletion of ADAR and inhibition of KDM5 also activate ISG and AP genes. The activation of ISG and AP genes is dependent on cGAS/STING and the JAK kinase. Interestingly, although both ADAR depletion and KDM5 inhibition activate ISG and AP genes, their effects on cell fitness are different. Furthermore, KDM5 inhibitor selectively activates ISG and AP genes in tumor cells but not normal cells, arguing that it may create a larger therapeutic window than the STING agonist. These results also suggest that KDM5 inhibition may activate ISG and AP genes in a way different from ADAR loss, and this process may affect tumor cell fitness independently of the activation of ISG and AP genes.

      The authors further showed that KDM5 inhibition increases R-loops and DNA damage in tumor cells, and XPF, a nuclease that cuts R-loops, is required for the activation of ISG and AP genes. Using H3K4me3 CUT&RUN, they found that KMD5 inhibition results in increased H3K4me3 not only at genes, but also at repetitive elements including SINE, LINE, LTR, telomeres, and centromeres. Using S9.6 CUT&TAG, they confirmed that R-loops are increased at SINE, LINE, and LTR repeated with increased H3K4me3. Together, the results of this study suggest that KMD5 inhibition leads to H3K4me3 and R-loop accumulation in repetitive elements, which induces DNA damage and cGAS/STING activation and subsequently activates AP genes. This provides an exciting approach to stimulate the anti-tumor immunity against breast tumors.

      KDM5 inhibition activates interferon and antigen presentation genes through R-loops.

      Strengths:

      A new approach to make breast tumors "hot" for anti-tumor immunity.

      Weaknesses:

      Future in vivo studies are needed to show the effects of KDM5 inhibitors on the immunotherapy responses of breast tumors.

      Comments on revised version:

      The authors have adequately addressed my comments.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful and positive assessment of our manuscript. Maybe our findings are best summarized in the model below, showing that KDM5 inhibition/loss mediates a viral mimicry and DNA damage response through the generation of R-loops in genomic repeats. This is a different mechanism from the more well studied double-stranded RNA-induced “viral mimicry” response. Our studies also suggest that KDM5 inhibition may have a larger therapeutic window than STING agonists, since KDM5 inhibition seemingly does not induce “viral mimicry” in normal breast epithelial cells. 

      Author response image 1.

      Model of viral mimicry activation. De-repression of repetitive elements may trigger dsRNA formation, which activates the RIG-1/MDA5 pathway, as well as PKR. Alternatively, derepression of these elements may induce transcription replication conflicts (TRCs), resulting in R-loop formation. R-loops can lead to DNA damage, and/or activate the cGAS/STING pathway. Both the MAVS pathway and the cGAS/STING pathway converge to activate type I interferon (IFN) responses, resulting in decreased cell fitness and/or increased immunogenicity.

      We do agree with the assessment that the study would be strengthened by in vivo studies. However, there are 4 different isoforms of KDM5 (3 in females), and existing KDM5specific inhibitors do not have adequate PK/PD properties for in vivo studies. We would also like to note that most mouse studies have not been proven to accurately predict immunotherapy responses in patients. Future studies in ex vivo tumor models would strengthen the clinical relevance of these studies. In the interim, we have added some normal macrophage studies in Figure S5 and an example of studies in normal T-cells below. Such studies will also be important to ensure that future KDM5 inhibitors do not have adverse effects on the immune system. Here, we observe that KDM5 inhibition appears to have neutral or slightly reduced T cell viability with KDM5 inhibition (Author response image 2a). However, KDM5 inhibition also results in increased CD107a expression in T-cells, indicative of a more cytotoxic phenotype (Author response image 2b). These studies suggest that KDM5 inhibitors do not have significant adverse effects on T cells or macrophages (figure S5) in the normal immune environment.

      Author response image 2.

      KDM5 inhibition does not have significant adverse effects on T-cells. a) Fold change proliferation of T-cells from 2 different human donors (left and right panels on graph) activated with 0.25ug/ml CD3 and treated with the indicated concentrations of C48 or a positive control (CBLB) compared to vehicle controls. b. FACS plots and histograms of CD107a surface expression (x-axis) versus forward scatter (FSC, y-axis) of T-cells from 2 different humans donors activated with 0.25ug/ml or 0.5mug/ml CD3 and treated with the indicated concentrations of C48.

      Specific comments and answers to Reviewer #1:

      We have added some additional analysis of data from other breast cancer cell lines to strengthen our points (Figure S2f, Figure S3e, Figure S4g-h, k.) We have also uploaded all the data to Geo with the following accession numbers :

      GSE296387: H3K4me3 CUT-and-Tag data

      GSE296584: S9.6 CUT-and-Tag data

      GSE296974: RNA-sequencing data

      Responses to Reviewer #1 (Recommendations for the authors):

      (1) We have not conducted genomic studies comparing KDM5 expression to retroelement activation status in the tumor data sets but recognize that this is important for future studies. Again, there are several KDM5 isoforms and looking at repeat expression in these larger data sets is complex. We have added some data correlating KDM5 expression with ISG signatures in Figure S3j-l as well as in the graph below (Author response image 3). The correlation with ISG and AP signatures is modest, but strongest for KDM5B and C in breast cancer data sets, consistent with our disruption data for these 2 isoforms. As mentioned above, we do agree that future studies of KDM5s along with a broader analysis of other epigenetic modifying enzymes over repeats in various cancer types will shed light on the role of histone modifying enzymes in suppressing “viral mimicry” in tumors.

      Author response image 3.

      Correlation between gene expression and IFN gene set GSVA scores in breast cancer cell lines. a) Pearson correlation score between gene expression and IFN signature (ISG) gene set variation analysis (GSVA) scores in breast cancer cell lines as reported in DepMap. Higher ranks indicate an inverse correlation between expression of the individual gene and the expression of the ISG gene set. Correlation ranks for KDM5A, B and C are highlighted. b) as in a), but comparing gene expression to antigen presentation (AP) GSVA scores.

      (2) We apologize for the mislabeling in figure 2B – has been corrected in the revised version.

      (3) We agree that blocking the cGAS/STING pathway, only partially rescues the ISREGFP and HLA-A, B, C phenotype in HCC1428 cells. We have added data (Figure S2f) showing that this rescue is stronger in MCF7 cells. It is possible that the MDA5/MAVS pathway may also contribute to activation of the Type I interferon response. However, we have data that MAVS plays a minor (if any) role in this context, as MAVS KO minimally decreases C48-induced ISRE-GFP activity and HLA-A, B, C surface expression in HCC1428 cells (added Figure S2g).

      Furthermore, there is no significant increase in dsRNA observed (using J2 antibody as a readout in immunofluorescence experiments) with C48 treatment as compared to 5’-azacytidine treatment or ADAR K/O (data not included). However, we have not performed MAVS/PKR K/O experiments to completely rule out the involvement of the dsRNA sensing pathways.

      (4) These experiments were performed in the operetta imaging system, rather than confocal imaging, and therefore we do not have such images. Quantification of RNaseH1-GFP in the whole cell is reported in the figure, as RNaseH1-GFP signal is increased in both the nucleus and the cytoplasm with C48 treatment. This is not unexpected, as our data suggest that R-loop formation occurs in repetitive regions of the genome that are de-repressed by KDM5 inhibition in the nucleus, and the RNA/DNA hybrids, generated from R-loops, may activate cGAS/STING pathway in the cytoplasm.

      (5) Disruption of siXPF and siXPG is relatively toxic in itself. Complete knockouts in breast cancer cells were not viable and we partially knocked down XPF using siRNA instead. We do agree that these kinds of rescue studies need to be expanded upon in future studies, but they served as further proof of the conclusions presented here.

      (6) We have provided all the data in Geo and alternative representations can be made.

      (7) Unfortunately, CUT-and-Tag experiments were not performed in cells expressing siXPF and therefore we cannot provide this data. However, XPF has been previously shown to be responsible for excising R-loops from the genome, rendering them detectable by cGAS/STING in the cytoplasm (Crossley et al, 2022, referenced in the current MS). Therefore, while we demonstrate that XPF knockdown attenuates type I IFN pathway activation upon KDM5 inhibition, it may not necessarily reduce R-loop formation in retroelements; it may just prevent their excision and downstream cGAS/STING activation. We do agree that CUT-and-Tag experiments in cells treated with siXPF versus siControl will have to be performed in the future to test this hypothesis.

      Responses to Reviewer #2 (Recommendations for the authors):

      (1) We have modified the text as well as the figure legend to state that this is a simplistic representation of the pathway in normal cells. As stated in the introduction, these pathways can be modified in tumors. The data presented suggest that the dsRNA pathway can be activated in all breast cancer cell lines tested, whereas more variation is observed in the activation of the STING pathway.  

      (2) The ADAR guides target ADAR 110 and p150 but not ADAR2. This has been clarified in the text.  

      (3) The guides have been renamed in the figure as the reviewer suggests.  

      (4) It has been shown by others that KDM5 can occupy the STING promoter (https://pubmed.ncbi.nlm.nih.gov/30080846/); which supports the reviewer’s suggestion that STING upregulation in HMECs may be due to increased H3K4me3 at the STING gene. However, we argue that STING upregulation is not sufficient to activate “viral mimicry” due to the absence of “tumor-specific R-loops” (due to an increase in TRC in tumor cells) in normal cells. It is interesting to note that the S9.6 signal in subtelomeric regions is increased in HMECS similar to what is observed in tumor cells. However, the S9.6 signal over other repeats is not (Author response image 4), suggesting that C48-induced increases over non-telomeric repeats are tumor specific. This suggests that the tumor-specific increases in R-loop formation, which lead to “viral mimicry” activation, are not driven by those formed in subtelomeric regions. Future studies will have to expand on these findings.

      Author response image 4.

      Percent of S9.6 reads that align to repetitive genome in HMEC cells. (a) % of total aligned S9.6 reads that map to subtelomeric region in HMEC cells treated with DMSO or 2.5 μM C48. (b) % of total aligned S9.6 reads that map to repetitive elements in general in HMEC cells treated as in a).

      (5) Clarity on R-loop quantification has been added to the figure legend as well as in the Materials and Methods section. Mean fluorescence intensity in the whole cell (this includes both nuclear and cytoplasmic signals) was quantified together and normalized to the number of DAPI-stained nuclei per well. As mentioned above all quantified in the Operetta imaging system.

      (6) We have added some data that shows that increases in H3K4me3 is observed in and around ISGs upon KDM5 inhibition (Figure S4f). However, without time course experiments it is difficult to assess whether these are direct effects of the KDM5 inhibitor or indirect effects from activation of Type I IFN (similarly to what has previously been reported with 5’-azacytidine induction of “viral mimicry”, https://pubmed.ncbi.nlm.nih.gov/26317465/).

      (7) We have previously included data showing that S9.6 reads in repeats that do not display C48-mediated increases in H3K4me3 also do not increase with C48 treatment (this is now Figure S4o). In addition, we have added some data showing that repeats with increased H3K4me3 and repeats with increased transcription upon C48 treatment also have increased S9.6 reads. Repeats that display both increases in H3K4me3 and mRNA expression have even greater increases in S9.6 signal compared to repeats that have increases in either one (Figure S4m-n). Taken together, this data suggest that KDM5 inhibition increases H3K4me3 in repeats, thereby allowing for their transcription, which can increase the probability of Transcription replication conflicts (TRC) and R-loop formation at such loci.

      (8) As mentioned earlier in this response, while we observe increased S9.6 reads in subtelomeric regions of HCC1428 cells upon KDM5 inhibition, we also observe this in normal HMEC cells. Since KDM5 inhibition does not induce viral mimicry in HMEC cells, this suggests that R-loops formed in subtelomeric regions do not dictate the response observed with C48 treatment in breast cancer cells.

      We hope that these answers to the reviewers comments as well as the additional data provided strengthens our findings.

    1. eLife Assessment

      The study showcases a significant and important enhancement of the MAGIC transgenesis method, by extending it genome-wide to all chromosomes. The authors convincingly demonstrate that the MAGIC mosaic clones can be generated for genes from all, including the 4th chromosome. With this toolkit extension, the method is now most likely set to strongly rival the classical FRT/Flp recombination system for gene manipulation in flies.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shen et al. have improved upon the mitotic clone analysis tool MAGIC that their lab previously developed. MAGIC uses CRISPR/Cas9-mediated double-stranded breaks to induce mitotic recombination. The authors have replaced the sgRNA scaffold with a more effective scaffold to increase clone frequency. They also introduced modifications to positive and negative clonal markers to improve signal-to-noise and mark the cytoplasm of the cells instead of the nuclei. The changes result in increase in clonal frequencies and marker brightness. The authors also generated the MAGIC transgenics to target all chromosome arms and tested the clone induction efficacy.

      Strengths:

      MAGIC is a mitotic clone generation tool that works without prior recombination to special chromosomes (e.g., FRT). It can also generate mutant clones for genes for which the existing FRT lines could not be used (e.g., the genes that are between the FRT transgene and the centromere).

      This manuscript does a thorough job in describing the method and provides compelling data that support improvement over the existing method.

      Weaknesses:

      It would be beneficial to have a greater variety of clonal markers for nMAGIC. Currently, the only marker is BFP, which may clash with other genetic tools (e.g., some FRET probes) depending on the application. It would be nice to have far-red clonal markers.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors present the latest improvement of their previously published methods, pMAGIC and nMAGIC, which can be used to engineer mosaic gene expression in wild-type animals and in a tissue-specific manner. They address the main limitation of MAGIC, the lack of gRNA-marker transgenes, which has hampered the broader adoption of MAGIC in the fly community. To do so, they create an entire toolkit of gRNA markers for every Drosophila chromosome and test them across a range of different tissues and in the context of making Drosophila species hybrid mosaic animals. The study provides a significant and broadly useful improvement compared to earlier versions, as it broadens the use-cases for transgenic manipulation with MAGIC to virtually any subfield of Drosophila cell biology.

      Strengths:

      Major improvements to MAGIC were made in terms of clone induction efficiency and usability across the Drosophila model system, including wild-type genotypes and the use in non-melanogaster species.

      Notably, mosaic mutants can now be created for genes residing on the 4th chromosome, which is exciting and possibly long-awaited by 4th chromosome gene enthusiasts.

      Selection of the standard set of gRNA markers was done thoughtfully, using non-repetitive conserved and unique sequences.

      The authors demonstrate that MAGIC can be used easily in the context of interspecific hybrids. I believe this is a great advancement for the Drosophila community, especially for evolutionary biologists, because this may allow for easy access to mechanistic, tissue-specific insight into the process of a range of hybrid incompatibilities, an important speciation process that is normally difficult to study at the level of molecular and cell biology.

      In the same way, because it is not limited to usage in any particular genetic background, genome-wide MAGIC can be potentially used in wild-type genotypes relatively easily. This is exciting, especially because natural genetic diversity is rarely investigated more mechanistically and at the scale/resolution of cells or specific tissues. Now, one can ask how a particular naturally occurring allele influences cell physiology compared to another (control) while keeping the global physiological context of the particular genetic background largely intact.

      Weaknesses:

      It is not entirely clear how functionally non-critical regions were evaluated, besides that they are selected based on conservation of sequence between species. It may be useful to directly test the difference in viability or other functionally relevant phenotype for flies carrying different markers. Similarly, the frequency of off-targets could be investigated or documented in a bit more detail, especially if one of the major use-cases is meant for naturally derived, diverse genetic backgrounds. It is, at the moment, unclear how consistently the clones are induced for each new gRNA marker across different WT genetic backgrounds, for example, a set of DGRP genotypes, which could be highly useful information for future users.

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript by Shen, Yeung, and colleagues, the authors generate an improved and expanded Mosaic analysis by gRNA-induced crossing-over (MAGIC) toolkit for use in making mosaic clones in Drosophila. This is a clever method by which mitotic clones can be induced in dividing cells by using CRISPR/Cas9 to generate double-strand breaks at specific locations that induce crossing over at those locations. This is conceptually similar to previous mosaic methods in flies that utilized FRT sites that had been inserted near centromeres along with heat-shock inducible FLPase. The advantage of the MAGIC system is that it can be used along with chromosomes lacking FRT sites already introduced, such as those found in many deficiency collections or in EMS mutant lines. It may also be simpler to implement than FRT-based mosaic systems. There are two flavors of the MAGIC system: nMAGIC and pMAGIC. In nMAGIC, the main constituents are a transgene insertion that contains gRNAs that target DNA near the centromere, along with a fluorescent marker. In pMAGIC, the main constituents are a transgenic insertion that contains gRNAs that target DNA near the centromere, along with ubiquitous expression of GAL80. As such, nMAGIC can be used to generate clones that are not labelled, whereas pMAGIC (along with a GAL4 line and UAS-marker) can be used much like MARCM to positively label a clone of cells. This manuscript introduces MAGIC transgenic reagents that allow all 4 chromosomes to be targeted. They demonstrate its use in a variety of tissues, including with mutants not compatible with current FLP/FRT methods, and also show it works well in tissues that prove challenging for FLP/FRT mosaic analyses (such as motor neurons). They further demonstrate that it can be used to generate mosaic clones in non-melanogaster hybrid tissues. Overall, this work represents a valuable improvement to the MAGIC method that should promote even more widespread adoption of this powerful genetic technique.

      Strengths:

      (1) Improves the design of the gRNA-marker by updating the gRNA backbone and also the markers used. GAL80 now includes a DE region that reduces the perdurance of the protein and thus better labeling of pMAGIC clones. The data presented to demonstrate these improvements is rigorous and of high quality.

      (2) Introduces a toolkit that now covers all chromosome arms in Drosophila. In addition, the efficiency of 3 target different sites is characterized for each chromosome arm (e.g., 3 different gRNA-Marker combinations), which demonstrate differences in efficiency. This could be useful to titrate how many clones an experimenter might want (e.g., lower efficiency combinations might prove advantageous).

      (3) The manuscript is well written and easy to follow. The authors achieved their aims of creating and demonstrating MAGIC reagents suitable for mosaic analysis of any Drosophila chromosome arm.

      (4) The MAGIC method is a valuable addition to the Drosophila genetics toolkit, and the new reagents described in this manuscript should allow it to become more widely adopted.

      Weaknesses:

      (1) The MAGIC method might not be well known to most readers, and the manuscript could have benefited from schematics introducing the technique.

      (2) Traditional mosaic analyses using the FLP/FRT system have strongly utilized heat-shock FLPase for inducible temporal control over mitotic clones, as well as a way to titrate how many clones are induced (e.g., shorter heat shocks will induce fewer clones). This has proven highly valuable, especially for developmental studies. A heat-shock Cas9 is available, and it would have been beneficial to determine the efficiency of inducing MAGIC clones using this Cas9 source.

    5. Author response:

      Reviewing Editor Comments:

      The following are some consolidated review remarks after discussions amongst all three reviewers:

      The reviewers feel the evidence level could be raised from 'convincing' to 'compelling' if the following key (and partially shared) suggestions by the reviewers are followed adequately:

      (1) Expand labeling options for nMAGIC, which is currently just a BFP marker. This would increase the utility of the method. A far-red marker would be very helpful. Could the authors just do this for one chromosome arm and make the reagent available for others to generate other chromosome arms?

      This is a great suggestion. We will make an nMAGIC vector containing a far-red fluorescent marker and generate a 40D2 version of this nMAGIC gRNA-maker to demonstrate its utility. This vector will be available for others to make additional nMAGIC gRNA-markers.

      (2) Verify that destabilized GAL80 is potent enough to suppress GAL4. Repeat Figure 1C-E with tub-GAL80-DE-SV40.

      We will use a tub-GAL80-DE-SV40 gRNA-marker to test suppression of pxn-Gal4.

      (3) Concern about the health of the induced mitotic clones. This is an important consideration, but the reviewers were not sure what the necessary experiments would be. To gauge twin-spot clone sizes? Please address.

      We will assess the health of induced mitotic clones in wing imaginal discs. We will do this by generating twin spots with a nMAGIC gRNA-marker in wing discs and compare the sizes of the two cell populations (BFP<sup>+/+</sup> and BFP<sup>-/-</sup>) in twin spots.

      (4) Include a schematic of the MAGIC method as Figure 1 or add it to Figure 1. Many may not be familiar with the method, so to promote its adoption, the authors should clearly introduce the MAGIC method in this paper (and not rely on readers to go to previous publications). For this paper to become a MAGIC reference paper, it should be self-contained.

      We will add a diagram of the MAGIC method in the revised manuscript.

      (5) Determine the utility of using a hs-Cas9 line for temporal induction of MAGIC clones. This is a traditional method for mitotic clone induction (with hsFLP/FRTs), and its use with the MAGIC system (especially pMAGIC) could also make it more attractive, especially to label small populations of neurons born at known times. To this point, the authors could generate pMAGIC clones using hs-Cas9 for commonly used adult target neurons, such as projection neurons, central complex neurons, or mushroom body neurons. The method to label small numbers of these adult neurons is well worked out with known GAL4 lines, and demonstrating that pMAGIC could have similar results would capture the attention of many not familiar with the pMAGIC method.

      We thank the reviewers for this suggestion. We will test hs-Cas9 in inducing pMAGIC clones in one of the neuronal populations in the adult brain, as suggested by the reviewers.

      In addition, we will address all other minor concerns of the reviewers.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Plasmodium vivax can persist in the liver of infected individuals in the form of dormant hypnozoites, which cause malaria relapses and are resistant to most current antimalarial drugs. This highlights the need to develop new drugs active against hypnozoites that could be used for radical cure. Here, the authors capitalize on an in vitro culture system based on primary human hepatocytes infected with P. vivax sporozoites to screen libraries of repurposed molecules and compounds acting on epigenetic pathways. They identified a number of hits, including hydrazinophthalazine analogs. They propose that some of these compounds may act on epigenetic pathways potentially involved in parasite quiescence. To provide some support to this hypothesis, they document DNA methylation of parasite DNA based on 5-methylcytosine immunostaining, mass spectrometry, and bisulfite sequencing.

      Strengths:

      -The drug screen itself represents a huge amount of work and, given the complexity of the experimental model, is a tour de force.

      -The screening was performed in two different laboratories, with a third laboratory being involved in the confirmation of some of the hits, providing strong support that the results were reproducible.

      -The screening of repurposing libraries is highly relevant to accelerate the development of new radical cure strategies.

      We thank the reviewer for pointing out the strengths of our report.

      Weaknesses:

      The manuscript is composed of two main parts, the drug screening itself and the description of DNA methylation in Plasmodium pre-erythrocytic stages. Unfortunately, these two parts are loosely connected. First, there is no evidence that the identified hits kill hypnozoites via epigenetic mechanisms. The hit compounds almost all act on schizonts in addition to hypnozoites, therefore it is unlikely that they target quiescence-specific pathways. At least one compound, colforsin, seems to selectively act on hypnozoites, but this observation still requires confirmation. Second, while the description of DNA methylation is per se interesting, its role in quiescence is not directly addressed here. Again, this is clearly not a specific feature of hypnozoites as it is also observed in P. vivax and P. cynomolgi hepatic schizonts and in P. falciparum blood stages. Therefore, the link between DNA methylation and hypnozoite formation is unclear. In addition, DNA methylation in sporozoites may not reflect epigenetic regulation occurring in the subsequent liver stages.

      We agree our report lacks direct evidence that hydrazinophthalazines are interacting with parasite epigenetic mechanisms. We spent significant resources attempting several novel approaches to establish a direct connection, but technological advances are needed to enable such studies, which we mention in the introduction and discussion. We disagree that schizonticidal activity automatically excludes the possibility a hypnozonticidal hit is acting on quiescence-specific pathways because both hypnozoites and schizonts are under epigenetic control and these pathways are likely performing different functions in different stages. Also important is the use of the word ‘specific’ as this term could be used to indicate parasite versus host (a drug that clears a parasite infection with a safety margin), parasite-directed effect versus host-directed effect (a drug acting via an agonistic or antagonistic effect on parasite or host pathway(s), but leading to parasite death in either case), hypnozoite versus schizont, or P. vivax versus other Plasmodium species. We were careful to indicate the usage of ‘specific’ throughout the text. Given the almost-nonexistent hit rate when screening diverse small molecule libraries screening against P. vivax hypnozoites, and remarkable increase in hits when screening epigenetic inhibitors as described in this report, our data suggests epigenetic pathways are important to the regulation of hypnozoite dormancy in addition to regulation of other parasite stages, but those effects are outside the scope of this report.

      -The mode of action of the hit compounds remains unknown. In particular, it is not clear whether the drugs act on the parasite or on the host cell. Merely counting host cell nuclei to evaluate the toxicity of the compounds is probably acceptable for the screen but may not be sufficient to rule out an effect on the host cell. A more thorough characterization of the toxicity of the selected hit compounds is required.

      We agree, and mention in the results and discussion, that the effect could be mediated through host pathways. This is not unlike the 8-aminoquinolones, which are activated by host cytochromes and kill via ROS, which is a nonspecific mechanism (that is, the compound is not directly interacting with a parasite target) leading to a parasite-specific effect (the parasite cannot tolerate the ROS produced, but the host can). During screening, it is generally the case that detecting hits with direct effects on the target organism are more desirable, so hits are counterscreened for general cytotoxicity. In this report, we show an effect on the parasite in direct comparison to the effect on host primary hepatocytes in the P. vivax assay itself, and follow up on hits with general counterscreens using two mammalian cell lines using CellTiter Glo, which does not rely on nuclei counts. Some compounds did show general cytotoxic effects, but with selectivity (more potency) against P. vivax liver stages, while other hits like the hydrazinophthalazines did not show an effect against primary hepatocytes and show only weak toxicity against mammalian cells at the highest dose tested. Further studies are needed to determine if the effect is indeed host- or parasite-directed and, if hydrazinophthalazines are to be developed into marketed antimalarials, extensive safety testing would be part of the development process.

      -There is no convincing explanation for the differences observed between P. vivax and P. cynomolgi. The authors question the relevance of the simian model but the discrepancy could also be due to the P. vivax in vitro platform they used.

      Fully characterizing the chemo-sensitivity of P. vivax and P. cynomolgi liver stages is outside the scope of this report. Rather, we report tool compounds which could be used in future studies to further characterize these sister species. We also make the point that P. cynomolgi is the gold standard for in vivo antirelapse activity, but it is still a model species, not a target species, and so few experimental hypnozonticidal compounds have been reported that the predictive value of P. cynomolgi is not fully understood. We found that several of our hits were species-specific using our in vitro platforms, thus future studies are needed to ensure this predictive value.

      -Many experiments were performed only once, not only during the screen (where most compounds were apparently tested in a single well) but also in other experiments. The quality of the data would be increased with more replication.

      Due to their size, compound library screens are typically performed once, with confirmation in dose-response assays, which were repeated several times. Rhesus PK studies was performed once on three animals, which is typical. All other studies were performed at least twice and most were performed three times or more. We provide a data table showing readers the source material for all replication as well as other source data tables showing the raw data for dose-response and other assays.

      -While the extended assay (12 days versus 8 days) represents an improvement of the screen, the relevance of adding inhibitors of core cytochrome activity is less clear, as under these conditions the culture system deviates from physiological conditions.

      We agree that cytochrome inhibitors render the platform less physiologically relevant, but the goal of screening is to detect hits which could be improved upon using medicinal chemistry, including metabolic stability. Metabolic stability is better assessed using standard assays such as liver microsomes, thus our goal was to characterize the effects of test compounds on the parasite without the confounding effect of hepatic metabolism.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, inhibitors of the P. vivax liver stages are identified from the Repurposing, Focused Rescue, and Accelerated Medchem (ReFRAME) library as well as a 773-member collection of epigenetic inhibitors. This study led to the discovery that epigenetics pathway inhibitors are selectively active against P. vivax and P. cynomolgi hypnozoites. Several inhibitors of histone post-translational modifications were found among the hits and genomic DNA methylation mapping revealed the modification on most genes. Experiments were completed to show that the level of methylation upstream of the gene (promoter or first exon) may impact gene expression. With the limited number of small molecules that act against hypnozoites, this work is critically important for future drug leads. Additionally, the authors gleaned biological insights from their molecules to advance the current understanding of essential molecular processes during this elusive parasite stage.

      Strengths:

      -This is a tremendously impactful study that assesses molecules for the ability to inhibit Plasmodium hypnozoites. The comparison of various species is especially relevant for probing biological processes and advancing drug leads.

      -The SI is wonderfully organized and includes relevant data/details. These results will inspire numerous studies beyond the current work.

      We thank the reviewer for pointing out the strengths of our report.

      Reviewer #3 (Public Review):

      Although this work represents a massive screening effort to find new drugs targeting P. vivax hypnozoites, the authors should balance their statement that they identified targetable epigenetic pathways in hypnozoites.

      -They should emphasize the potential role of the host cell in the presentation of the results and the discussion, as it is known that other pathogens modify the epigenome of the host cell (i.e. toxoplasma, HIV) to prevent cell division. Also, hydrazinophtalazines target multiple pathways (notably modulation of calcium flux) and have been shown to inhibit DNA-methyl transferase 1 which is lacking in Plasmodium.

      -In a drug repurposing approach, the parasite target might also be different than the human target.

      -The authors state that host-cell apoptotic pathways are downregulated in P. vivax infected cells (p. 5 line 162). Maybe the HDAC inhibitors and DNA-methyltransferase inhibitors are reactivating these pathways, leading to parasite death, rather than targeting parasites directly.

      We agree caution must be taken as we did not directly confirm the mechanism of our hits. Many follow up studies will be needed to do so. We do point out in the discussion that the mechanism of hits could be host-directed. We agree with the notion that some of these hits could be affecting parasitized host cell pathways, which lead to death of the parasitized cell, with the parasite being collateral damage, yet such a mechanism could lead to a safe and effective novel antimalarial.

      It would make the interpretation of the results easier if the authors used EC50 in µM rather than pEC50 in tables and main text. It is easy to calculate when it is a single-digit number but more complicated with multiple digits.

      We apologize for the atypical presentation of potency data. However, there is growing concern in drug discovery when Standard Deviation is applied to Potency data because Standard Deviation is a linear calculation and Potency is a log effect, making the math incompatible. We understand thousands of papers are reported every year using this mathematically incorrect method, making our presentation of these data less familiar. However, we define pEC50 in its use in the text and table legends and hope to increase its use in the broader scientific community.

      Authors mention hypnozoite-specific effects but in most cases, compounds are as potent on hypnozoite and schizonts. They should rather use "liver stage specific" to refer to increased activity against hypnozoites and schizonts compared to the host cell. The same comment applies to line 351 when referring to MMV019721. Following the same idea, it is a bit far-fetched to call MMV019721 "specific" when the highest concentration tested for cytotoxicity is less than twice the EC50 obtained against hypnozoites and schizonts.

      We have reviewed and revised statements in the manuscript to ensure the effect we are describing is accurate in terms of parasite versus parasite form.

      Page 5 lines 187-189, the authors state "...hydrazinophtalazines were inactive when tested against P. berghei liver schizonts and P. falciparum asexual blood stages, suggesting that hypnozoite quiescence may be biologically distinct from developing schizonts". The data provided in Figure 1B show that these hydrazinophtalazines are as potent in P. vivax schizonts than in P. vivax hypnozoites, so the distinct activity seems to be Plasmodium species specific and/or host-cell specific (primary human hepatocytes rather than cell lines for P. berghei) rather than hypnozoite vs schizont specific.

      We agree the effect of hydrazinophtalazine could be more species specific than stage specific, but the context of our comment has to do with current methods in antimalarial discovery and development. Given the biological uniqueness of the various Plasmodium species and stages, any hypnozonticidal hit may or may not have pan-species or pan-stage activity; our goal was to characterize this. Regardless of the mechanism, we found it interesting that the hydrazinophtalazines kill P. vivax hypnozoites, but not P. cynomolgi hypnozoites nor other species and stages used in antimalarial drug development. This result makes the point that hypnozoite-focused assays may be required to detect and develop hypnozonticidal hits, regardless of what other species or stages they may or may not act on.

      Why choose to focus on cadralazine if abandoned due to side effects? Also, why test the pharmacokinetics in monkeys? As it was a marketed drug, were no data available in humans?

      Cadralazine was found more potent than hydralazine and PK data was available from humans, thus dose prediction calculations showed an efficacious dose was more achievable with cadralazine than hydralazine. Side effects are often dependent on dose and regimen, which are very likely to be much different for treating malaria versus hypertension. Thus, the potential side effects of cadralazine if it was to be used as an antimalarial are simply unknown and are not disqualifying at this step. The PK study was done in Rhesus macaques so we could calculate the dose needed to achieve coverage of EC90 during a planned follow up in a Rhesus-P. cynomolgi relapse model. However, this planned in vivo efficacy study was not justified once we concurrently discovered cadralazine was inactive on P. cynomolgi in vitro.

      In the counterscreen mentioned on page 6, the authors should mention that the activity of poziotinib in P. berghei and P. cynomolgi is equivalent to cell toxicity, so likely not due to parasite specificity.

      Poziotinib shows activity against mammalian cell lines but not against the primary hepatocyte cultures supporting dose-response assays against P. vivax liver forms, which do not replicate. Thus, poziotinib appears selective in the liver stage assay but also may have a much more potent effect in continuously replicating cell lines.

      To improve the clarity and flow of the manuscript, could the authors make a recapitulative table/figure for all the data obtained for poziotinib and hydrazinophtalazines in the different assays (8-days vs 12-days) and laboratory settings rather than separate tables in main and supplementary figures. Maybe also reorder the results section notably moving the 12-day assay before the DNA methylation part.

      We apologize for the large amount of data presented but believe we are presenting it in the clearest way possible. All raw data is available if readers wish to re-analyze or re-organize our findings.

      The isobologram plot shows an additive effect rather than a synergistic effect between cadralazine and 5-azacytidine, please modify the paragraph title accordingly. Please put the same axis scale for both fractional EC50 in the isobologram graph (Figure 2A).

      The isobologram shows the effect approaching synergy at some combinations. The isobologram was rendered using standard methods. The raw data is available if readers wish to re-analyze it.

      Concerning the immunofluorescence detection of 5mC and 5hmC, the authors should be careful with their conclusions. The Hoechst signal of the parasites is indistinguishable because of the high signal given by the hepatocyte nuclei. The signal obtained with the anti-5hmC in hepatocyte nuclei is higher than with the anti-5mC, thus if a low signal is obtained in hypnozoites and schizonts, it might be difficult to dissociate from the background. In blood stages (Figure S18), the best to obtain a good signal is to lyse the red blood cell using saponin, before fixation and HCl treatment.

      We spent many hours using high resolution imaging of hundreds of parasites trying to detect clear 5hmC signal in both hypnozoites and schizonts but never saw a clearly positive signal. Indeed, the host signal can be confounding, thus we felt the most clear and unbiased way to quantify and present these data was using HCI. We appreciate the suggestion to lyse cells first for detecting in the blood stage.

      To conclude that 5mC marks are the predominate DNA methylation mark in both P. falciparum and P. vivax, authors should also mention that they compare different stages of the life cycle, that might have different methylation levels.

      We do mention at the start of this section our reasoning that quantifying marks in sporozoites was technically achievable, but not in a mixed culture of parasites and hepatocytes. We agree they could have different marks at these different stages.

      Also, the authors conclude that "[...] 5mC is present at low level in P. vivax and P. cynomolgi sporozoites and could control liver stage development and hypnozoite quiescence". Based on the data shown here, nothing, except presence the of 5mC marks, supports that DNA methylation could be implicated in liver stage development or hypnozoite quiescence.

      We clearly show sporozoite and liver stage DNA is methylated, which implicates this fundamental cell function exists in P. vivax liver stages, and that compounds with characterized activity against DNMT are active on liver stages. We acknowledge we were unable to show a direct effect and use the qualifier ‘could’ for this very reason.

      How many DNA-methyltransferase inhibitors were present in the epigenetic library? Out of those, none were identified as hits, maybe the hydrazinophtalazines effect is not linked to DNMT inhibition but another target pathway of these molecules like calcium transport?

      We supply the complete list of inhibitors in the epigenetic library as a supplemental file, the library contained 773 compounds. Hydrazinophtalazines were not included in the library, but several other DNA methyltransferase inhibitors were inactive. It is possible that hydrazinophtalazine activity is linked to other mechanisms but the inactivity of other DNMT inhibitors does not preclude the possibility hydrazinophtalazines are acting through DNMT.

      The authors state (line 344): "These results corroborate our hypothesis that epigenetic pathways regulate hypnozoites". This conclusion should be changed to "[...] that epigenetic pathways are involved in P. vivax liver stage survival" because:

      -The epigenetic inhibitors described here are as active on hypnozoite than liver schizonts.

      -Again, we cannot rule out that the host cell plays a role in this effect and that the compound may not act directly on the parasite.

      The same comment applies to the quote in lines 394 to 396. There is no proof in the results presented here that DNA methylation plays any role in the effect of hydrazinophtalazines in the anti-plasmodial activity obtained in the assay.

      We maintain that we use words throughout the text that express uncertainty about the mechanisms involved. It is important to point out that, prior to this paper, the number of hypnozonticidal hits was incredibly low and this field is just emerging. The fundamental role of epigenetic mechanisms is regulation of gene expression. Finding several hypnozonticial hits when screening epigenetic libraries implies epigenetic pathways are important for hypnozoite survival. We intentionally do not specify exact mechanisms or if they are host or parasite pathways. Host-parasite interactions in the liver stage are incredibly difficult to resolve and are outside the scope of this report. Furthermore, this statement is not exclusive to schizonts, but since screens of diversity sets against schizonts result in a much higher hit rate, the focus of this comment is unearthing rare hypnozonticidal hits.

    1. eLife Assessment

      This study provides valuable insights into human valve development by integrating snRNA-seq and spatial transcriptomics to characterize cell populations and regulatory programs in the embryonic and fetal outflow tract. The methods, data, and analyses are solid overall, but with some weaknesses that can be strengthened. The findings will be of interest to those who work in the field of heart development and congenital heart disease.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Bobola et al reports single-nucleus expression analysis with some supporting spatial expression data of human embryonic and fetal cardiac outflow tracts compared to adult aortic valves. The transcription factor GATA6 is identified as a top regulator of one of the mesenchymal subpopulations, and potential interacting factors and downstream target genes are identified bioinformatically. Additional bioinformatic tools are used to describe cell lineage relationships and trajectories for developmental and adult cardiac cell types.

      Strengths:

      The studies of human tissue and extensive gene expression data will be valuable to the field.

      Weaknesses:

      (1) The expression data are largely confirmatory of previous studies in humans and mice. Thus, it is not clear what novel biological insights are being reported. While there is some novelty and impact in using human tissue, there are extensive existing publications and data sets in this area.

      (2) Major conclusions regarding spatial localization, differential gene expression, or cell lineage relationships based on bioinformatic data are not validated in the context of intact tissues.

      (3) The conclusions regarding lineage relationships are based on common gene expression in the current study and may not reflect cellular origins or lineage relationships that have previously been reported in genetic mouse models.

      (4) An additional limitation is the exclusive examination of adult aortic valve leaflets that represent only a subset of outflow tract derivatives in the mature heart. The conclusion, as stated in the title regarding adult derivatives of the outflow tract, is not accurate based on the limited adult tissue evaluated, exclusive bioinformatic approach, and lack of experimental lineage analysis of cell origins.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Leshem et al. presents a transcriptomic analysis of the developing human outflow tract (OFT) at embryonic and fetal stages using snRNAseq and spatial transcriptomics. Additionally, the authors analyze transcriptomic data from the adult aortic valve to compare embryonic and adult cell populations, aiming to identify persistent embryonic transcriptional signatures in adult cells. A total of 15 clusters were identified from the embryonic and fetal OFT samples, including three mesenchymal and four endothelial clusters. Using SCENIC analysis on the embryonic snRNAseq data, the authors identified GATA6 as a key regulator of valve precursor cells. Spatial transcriptomic analysis of four fetal OFT sections further revealed the spatial distribution of mesenchymal nuclei, smooth muscle cells, and valvular interstitial cells. Trajectory analysis identified two distinct developmental origins of fetal mesenchymal cells: the neural crest and the second heart field. Finally, the authors used snRNAseq data from the adult aortic valve to propose that embryonic transcriptional signatures persist in a subset of adult cells.

      Strengths:

      (1) The study offers a rich and detailed dataset, combining snRNA-seq and spatial transcriptomics in human embryonic and fetal OFT, which are challenging to obtain.

      (2) The use of SCENIC and trajectory analysis adds mechanistic insight into cell lineage and regulatory programs during valve development.

      (3) This study confirms GATA6 as a key regulator of valve precursor cells.

      (4) Comparison between embryonic/fetal and adult datasets represents a novel attempt to trace persistence of developmental transcriptional programs.

      Weaknesses:

      (1) A major limitation is the lack of experimental validation to support key conclusions, particularly the claim of persistent embryonic transcriptional signatures in adult cells.

      (2) The manuscript would benefit from a clearer discussion of how these results advance beyond previous studies in human heart and valve development.

      (3) The comparison between embryonic and adult data is interesting, but would be more convincing with additional evidence supporting the proposed persistence of embryonic transcriptional signatures in adult cells.

    4. Reviewer #3 (Public review):

      Leshem et al have generated a transcriptional cell atlas of the human outflow tract at two developmental timepoints and its adult valvular derivatives. This carefully performed study provides a useful resource for the study of known genes implicated in outflow tract defects and potentially also for discovering new disease genes. The authors reveal neural crest and mesodermal contributions to different outflow tract components and show that GATA6, known to play a role in arterial valve development, controls a set of genes expressed in endocardium-derived cells during valve development. Interestingly, the results suggest lineage persistence of expression of certain genes through to the adult timepoint, a main new finding of this study.

      The following points should be addressed to reinforce the conclusions and emphasize the novel features of this study.

      (1) It would be helpful to clarify how these new findings confirm or diverge from what is known from analysis of neural crest and mesodermal lineage contributions to different cell populations in the mouse heart. Did the authors identify any human-specific populations of cells, such as the LGR5 population reported by Sahara et al?

      (2) The authors should clarify in the introduction and results that they consider the endocardium to be on the SHF trajectory as indicated in Figure S4C. Please add a reference for this point.

      (3) The GATA6 results are interesting and support this experimental approach. The paper would be reinforced if the authors could provide any functional validation (in addition to their GATA6 genomic occupancy data) that the designated target genes are regulated by GATA6. This might involve looking at mutant mouse embryos or cultured cells. Do the authors consider that GATA6 may regulate the endocardial to mesenchymal transition during the early stages of valve development? Or the valve interstitial cell versus fibroblast fate choice?

      (4) Do the new findings reveal whether human valves have a direct SHF to VIC trajectory (ie, without transiting through endocardium) as has been recently shown in the murine non-coronary valve leaflet? Relevant to this point, Figure 5E appears to show contributions to a single adult aortic valve leaflet - this should be explained, or corrected.

    5. Author response:

      We thank the editors and reviewers for the time and effort they have invested in evaluating our manuscript. We appreciate the constructive feedback, which highlights both the strengths of the work and areas for improvement. We will carefully consider all comments and, in the coming months, revise the manuscript to incorporate additional data, address the concerns regarding limited referencing, and provide further clarification on the points raised.

    1. eLife Assessment

      This manuscript reports important findings that have theoretical or practical implications beyond a single subfield. However, despite the combination of numerous analytical tools established and applied in the study, the work has substantial experimental limitations leading to incomplete evidence, indicating that the conclusions may be an over-interpretation of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      In the study by Wang et al. entitled "Dissecting organoid-bacteria interaction highlights decreased contractile force as a key factor for heart infection", a simple cardiac organoid (CO) model was established, by combining a heterologous mixture of patient-specific human induced pluripotent stem cells (hiPSC)-derived cardiomyocytes (CMs) in combination with primary HUVECs (Human Umbilical Vein Endothelial Cells) and human mesenchymal stem cells (MSCs, representing stromal cells). This model was applied for investigating the interplay of COs' bacterial infections in vitro, aiming at revealing pathological mechanisms of bacterial infections of the heart in vivo, which may induce myocarditis and consequently heart failure in affected patients.

      Strengths:

      The paper is systematic, well written, and easy to follow.

      Based on their results, the authors state that: "In this study, by developing quantitative tools for analyzing bacterial-cardiac organoid interactions in a 3D, dynamic, clinically relevant setting, we discovered the significant role of cardiac contractility in preventing bacterial infection."

      In principle, the idea of establishing a simple yet functionally and physiologically relevant in vitro model and relevant analytical tools for enabling the study of complex pathological mechanisms of cardiovascular diseases is intriguing.

      Weaknesses:

      However, despite the combination of numerous analytical tools established and applied in the study, the work has substantial experimental limitations, indicating that the bold conclusions may represent a misinterpretation or overinterpretation of the findings.

      Key limitations and questions:

      (1) It seems that iPSCs from only one patient ("dilated cardiomyopathy (DCM) cells were derived from a 47-year-old Asian male with an LMNA gene mutation") were used in the study. Moreover, it seems that only one iPSC-line/clone from that DCM patient was used and compared to a single control iPSC line from a "healthy donor". Therefore, despite the different assays and experimental controls used in the study, there is a high risk that the observed phenomena reflect iPSC-line-/ clone-dependent effects, rather than revealing general pathophysiologic mechanisms. Thus, key experiments must be shown by cardiomyocytes/ cardiac organoids derived from additional independent iPSC-lines representing different patients and other non-diseased control lines as well. Moreover, it is established good experimental practice in the iPS cell field to generate and include isogenic iPSC controls i.e. iPSC lines of the same genetic background but with corrections of the hypothesised gene mutation underlying the respective e.g., cardiovascular disease.

      (2) In Figure 1 (A) immunohistochemical staining for cardiomyocytes for the cardiac marker Troponin is shown, apparently indicating successful cardiomyogenic differentiation of the applied hiPSC lines. In supplemental Figure S1, a flow cytometry analysis specific to cTnT is shown to reveal the CMs content resulting from the monolayer differentiation of respective iPSC lines. Already, the exemplified plots indicate that the CMs' content/ purity for DCM-CMs was notably lower compared to healthy cardiomyocytes (CM; control). This is an important issue, since the non-CMs ("contaminating bystander cells") may have a substantial effect on the functional (including contractile) properties of the COs.

      Interestingly, based on the method description, it seems that COs were generated from cryopreserved iPSC-CMs and iPSC-DCMs, including intermediate seeding and culture on Matrigel before COs formation. However, it remains unclear whether the CMs FACS analysis, which is apparently: "Representative FACS plots for analysis of the cell types in DCM monolayer culture after 33 days of differentiation" shows a CMs purity relevant to CO formation, or something different.

      The lineage phenotype of non-CMs in respective differentiations should also be clarified. Moreover, it should be noted in the results that the CMs content in COs is lower than the 6:2:2 (CM:ECs:MSC) ratio indicated by the authors, since the CMs purity is not 100%, and is particularly reduced in the iPSC-DCMs.

      Finally, to investigate the important latter questions of the "real CMs content" in COs, systematic technologies should be applied to quantify the lineage composition in COs (e.g. by IF staining for the 3 lineages plus DAPI, followed by COs clearance, confocal microscopy "3D stags" and automated, ImageJ-based quantitative cell counts for total cell number definition (see e.g. doi: 10.1038/s41596-024-00976-2) per CO, and quantification of respective lineage content as well.

      These questions are of key importance since the presence of non-CMs and their phenotype has profound consequences on the cardiac organoid model, its contractile/ biophysical properties, and, in general, on models' sensitivity to bacterial infections as well.

      (3) Figure 2: (F) Why is this figure (Confocal Observations) showing only healthy cardiac organoids (HCOs) but not DCM-COs?

      The overall quality of these pictures is poor and not informative regarding the structural identity and tissue composition of the COs, which actually is an important topic in the frame of the paper, as the 3D structure and tissue composition - and differences between HCOs and DCM-COs - are of key importance to their contractile properties.

      Moreover, the expective overlay of the cardiac markers alpha-actinin and MHC is not obvious from Figure 2F (see also comments on Figure 7, below).

      In Figure 2E: COs at later stages/days should be shown, in particular at that stage, which was used for the functional assays i.e., bacteria infections and contraction pattern monitoring.

      (4) Figure 7 (A) (B) - In the IF sections, it seems that there is no overlay between the expression of the cardiac marker MHC (seems to be expressed in the centre of COs only) and the cardiac markers alpha-actinin (which seems to be unexpectedly expressed in all cells on the sections) and Troponin (which seems to be vocally expressed on the outside, excluding the area of MHC expression).

      (F) Quantification of the mean area of gene expression, e.g., for MHC indicates a larger area after MHC expression; this seems to entirely contradict the IF pictures (in Figures 7 A-D) of MHC expression before and after infection. This contraction is deemed very critical to this reviewer as it may indicate that the IF staining, data analysis, and/or data interpretation in this part of the manuscript is poor, misleading, or simply wrong.

      (5) Overall, from the perspective of this reviewer, the CO-derived results do not reflect in a meaningful way the contractile and hydrodynamic conditions in the mouse heart or the human heart. Thus, it seems that the conclusions may rather represent a hypothesised outcome bias.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tried deconvoluting, for the first time, the effect of various components of heart contraction on initial bacterial adhesion, which increases the risk of infective endocarditis. The proposed organoid platform might be used to develop and test novel therapeutic agents for infective endocarditis.

      Strengths:

      (1) Use of a broad range of methods: finite element methods, -omics, particle tracking, animal experiments to investigate the connections between contractility and infective endocarditis.

      (2) Detailed procedure and supportive information, which will allow other groups to replicate the results and extend the application of the proposed organoid platform.

      (3) Despite the complexity of the work reported, the manuscript is rather readable and understandable by non-specialists.

      Weaknesses:

      There is a minor issue with some of the vocabulary (e.g., magnificent amount of bacteria).

    1. eLife Assessment

      This fundamental study provides new insights into the plasticity mechanisms underlying the formation of spatial maps in the hippocampus. Supported by a large and comprehensive dataset, the evidence is convincing. This study will be of interest to neuroscientists focusing on spatial navigation, learning, and memory.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the cellular mechanisms underlying place field formation (PFF) in hippocampal CA1 pyramidal cells by performing in vivo two-photon calcium imaging in head-restrained mice navigating a virtual environment. Specifically, they sought to determine whether BTSP-like (behavioral time scale synaptic plasticity) events, characterized by large calcium transients, are the primary mechanism driving PFFs or if other mechanisms also play a significant role. Through their extensive imaging dataset, the authors found that while BTSP-like events are prevalent, a substantial fraction of new place fields are formed via non-BTSP-like mechanisms. They further observed that large calcium transients, often associated with BTSP-like events, are not sufficient to induce new place fields, indicating the presence of additional regulatory factors (possibly local dendritic spikes).

      Strengths

      The study makes use of a robust and extensive dataset collected from 163 imaging sessions across 45 mice, providing a comprehensive examination of CA1 place cell activity during navigation in both familiar and novel virtual environments. The use of two-photon calcium imaging allows the authors to observe the detailed dynamics of neuronal activity and calcium transients, offering insights into the differences between BTSP-like and non-BTSP-like PFF events. The study's ability to distinguish between these two mechanisms and analyze their prevalence under different conditions is a key strength, as it provides a nuanced understanding of how place fields are formed and maintained. The paper supports the idea that BTSP is not the only driving fore behind PFF, and other mechanisms are likely sufficient to drive PFF, and BTSP events may also be insufficient to drive PFF in some cases. The longer-than-usual virtual track used in the experiment allowed place cells to express multiple place fields, adding a valuable dimension to the dataset that is typically lacking in similar studies. Additionally, the authors took a conservative approach in classifying PFF events, ensuring that their findings were not confounded by noise or ambiguous activity.

      Weaknesses

      The stand out weakness of the paper is the lack of direct measures of BTSP events. Without direct confirmation that large calcium transients correspond to actual BTSP events (including associated complex spikes and calcium plateau potentials), concluding that BTSP is not necessary or sufficient for PFF formation is speculative (although I do believe it).

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this manuscript aim to investigate the formation of place fields (PFs) in hippocampal CA1 pyramidal cells. They focus on the role of behavioral time scale synaptic plasticity (BTSP), a mechanism proposed to be crucial for the formation of new PFs. Using in vivo two-photon calcium imaging in head-restrained mice navigating virtual environments, employing a classification method based on calcium activity to categorize the formation of place cells' place fields into BTSP, non-BTSP-like, and investigated their properties.

      Strengths:

      This work shows that place fields formation could induced by both BSTP and non-BSTP events, and it also provided a new and solid method to classify BTSP and non-BTSP place field formation using calcium image to the field. This work offers novel knowledge and new methods and factual evidence for other researchers in the field.

      The method enabled the authors to reveal that while many PFs are formed by BTSP-like events, a significant number of PFs emerge with calcium dynamics that do not match BTSP characteristics, suggesting a diversity of mechanisms underlying PF formation. The characteristics of place fields under the first two categories are comprehensively described, including aspects such as formation timing, quantity, and width.

      Weaknesses:

      The authors have addressed the weaknesses in the revised version.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Sumegi et al. use calcium imaging in head-fixed mice to test whether new place fields tend to emerge due to events that resemble behavioral time scale plasticity (BTSP) or other mechanisms. An impressive dataset was amassed (163 sessions from 45 mice with 500-1000 neurons per sample) to study spontaneous emergence of new place fields in area CA1 that had the signature of BTSP. The authors observed that place fields could emerge due to BTSP and non-BTSP-like mechanisms. Interestingly, when non-BTSP mechanisms seemed to generate a place field, this tended to occur on a trial with a spontaneous reset in neural coding (a remapping event). Novelty seemed to upregulate non-BTSP events relative to BTSP events. Finally, large calcium transients (presumed plateau potentials) were not sufficient to generate a place field.

      Strengths:

      I found this manuscript to be exceptionally well written, well powered, and timely given the outstanding debate and confusion surrounding whether all place fields must arise from BTSP event. Working at the same institute, Albert Lee (e.g. Epszstein et al., 2011 - which should be cited) and Jeff Magee (e.g. Bittner et al., 2017) showed contradictory results for how place fields arise. These accounts have not fully been put toe-to-toe and reconciled in the literature. This manuscript addresses this gap and shows that both accounts are correct - place fields can emerge due to a pre-existing map and due to BTSP.

      Weaknesses:

      I find only three significant areas for improvement in the present study:

      First, can it be concluded that non-BTSP events occur exclusively due to a global remapping event, as stated in the manuscript "these PFF surges included a high fraction of both non-BTSP- and BTSP-like PFF events, and were associated with global remapping of the CA1 representation"? Global remapping has a precise definition that involves quantifying the stability of all place fields recorded. Without a color scale bar in Figure 3D (which should be added), we cannot know whether the overall representations were independent before and after the spontaneous reset. It would be good to know if some neurons are able to maintain place coding (more often than expected by chance), suggestive of a partial-remapping phenomenon.

      Second, BTSP has a flip side that involves weakening of existing place fields when a novel field emerges. Was this observed in the present study? Presumably place fields can disappear due to this bidirectional-BTSP or due to global remapping. For a full comparison of the two phenomena, the disappearance of place fields must also be assessed.

      Finally, it would be good to know if place fields differ according to how they are born. For example, are there differences in reliability, width, peak rate, out of field firing, etc for those that arise due BTSP vs non-BTSP.

      Comments on revisions:

      The authors have mostly addressed my feedback. Compelling evidence for a fundamental observation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the cellular mechanisms underlying place field formation (PFF) in hippocampal CA1 pyramidal cells by performing in vivo two-photon calcium imaging in head-restrained mice navigating a virtual environment. Specifically, they sought to determine whether BTSP-like (behavioral time scale synaptic plasticity) events, characterized by large calcium transients, are the primary mechanism driving PFFs or if other mechanisms also play a significant role. Through their extensive imaging dataset, the authors found that while BTSP-like events are prevalent, a substantial fraction of new place fields are formed via non-BTSP-like mechanisms. They further observed that large calcium transients, often associated with BTSP-like events, are not sufficient to induce new place fields, indicating the presence of additional regulatory factors (possibly local dendritic spikes).

      Strengths

      The study makes use of a robust and extensive dataset collected from 163 imaging sessions across 45 mice, providing a comprehensive examination of CA1 place-cell activity during navigation in both familiar and novel virtual environments. The use of two-photon calcium imaging allows the authors to observe the detailed dynamics of neuronal activity and calcium transients, offering insights into the differences between BTSP-like and non-BTSP-like PFF events. The study's ability to distinguish between these two mechanisms and analyze their prevalence under different conditions is a key strength, as it provides a nuanced understanding of how place fields are formed and maintained. The paper supports the idea that BTSP is not the only driving force behind PFF, and other mechanisms are likely sufficient to drive PFF, and BTSP events may also be insufficient to drive PFF in some cases. The longer-than-usual virtual track used in the experiment allowed place cells to express multiple place fields, adding a valuable dimension to the dataset that is typically lacking in similar studies. Additionally, the authors took a conservative approach in classifying PFF events, ensuring that their findings were not confounded by noise or ambiguous activity.

      Weaknesses

      Despite the impressive dataset, there are several methodological and interpretational concerns that limit the impact of the findings. Firstly, the virtual environment appears to be poorly enriched, relying mainly on wall patterns for visual cues, which raises questions about the generalizability of the results to more enriched environments. Prior studies have shown that environmental enrichment can significantly influence spatial coding, and it would be important to determine how a more immersive VR environment might alter the observed PFF dynamics. Secondly, the study relies on deconvolution methods in some cases to infer spiking activity from calcium signals without in vivo ground truth validation. This introduces potential inaccuracies, as deconvolution is an estimate rather than a direct measure of spiking, and any conclusions drawn from these inferred signals should be interpreted with caution. Thirdly, the figures would benefit from clearer statistical annotations and visual enhancements. For example, several plots lack indicators of statistical significance, making it difficult for readers to assess the robustness of the findings. Furthermore, the use of bar plots without displaying underlying data distributions obscures variability, which could be better visualized with violin plots or individual data points. The manuscript would also benefit from a more explicit breakdown of the proportion of place fields categorized as BTSP-like versus non-BTSP-like, along with clearer references to figures throughout the results section. Lastly, the authors' interpretation of their data, particularly regarding the sufficiency of large calcium transients for PFF induction, needs to be more cautious. Without direct confirmation that these transients correspond to actual BTSP events (including associated complex spikes and calcium plateau potentials), concluding that BTSP is not necessary or sufficient for PFF formation is speculative.

      Reviewer #2 (Public review):

      Summary:

      The authors of this manuscript aim to investigate the formation of place fields (PFs) in hippocampal CA1 pyramidal cells. They focus on the role of behavioral time scale synaptic plasticity (BTSP), a mechanism proposed to be crucial for the formation of new PFs. Using in vivo two-photon calcium imaging in head-restrained mice navigating virtual environments, employing a classification method based on calcium activity to categorize the formation of place cells' place fields into BTSP, non-BTSP-like, and investigated their properties.

      Strengths:

      A new method to use calcium imaging to separate BTSP and non-BTSP place field formation. This work offers new methods and factual evidence for other researchers in the field.

      The method enabled the authors to reveal that while many PFs are formed by BTSP-like events, a significant number of PFs emerge with calcium dynamics that do not match BTSP characteristics, suggesting a diversity of mechanisms underlying PF formation. The characteristics of place fields under the first two categories are comprehensively described, including aspects such as formation timing, quantity, and width.

      Weaknesses:

      There are some issues about data and statistics that need to be addressed before these research findings can be considered as rigorous conclusions.

      While the authors mentioned 3 features of PF generated by BTSP during calcium imaging in the Introduction, the classification method used features 1 and 2. The confirmation by feature 3 in its current form is important but not strong enough.

      Some key data is missing such as the excluded PFs, the BTSP/non-BTSP of each animal, etc

      Impact:

      This work is likely to provide a new method to classify BTSP and non-BTSP place field formation using calsium image to the field.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Sumegi et al. use calcium imaging in head-fixed mice to test whether new place fields tend to emerge due to events that resemble behavioral time scale plasticity (BTSP) or other mechanisms. An impressive dataset was amassed (163 sessions from 45 mice with 500-1000 neurons per sample) to study the spontaneous emergence of new place fields in area CA1 that had the signature of BTSP. The authors observed that place fields could emerge due to BTSP and non-BTSP-like mechanisms. Interestingly, when non-BTSP mechanisms seemed to generate a place field, this tended to occur on a trial with a spontaneous reset in neural coding (a remapping event). Novelty seemed to upregulate non-BTSP events relative to BTSP events. Finally, large calcium transients (presumed plateau potentials) were not sufficient to generate a place field.

      Strengths:

      I found this manuscript to be exceptionally well-written, well-powered, and timely given the outstanding debate and confusion surrounding whether all place fields must arise from BTSP event. Working at the same institute, Albert Lee (e.g. Epszstein et al., 2011 - which should be cited) and Jeff Magee (e.g. Bittner et al., 2017) showed contradictory results for how place fields arise. These accounts have not fully been put toe-to-toe and reconciled in the literature. This manuscript addresses this gap and shows that both accounts are correct - place fields can emerge due to a pre-existing map and due to BTSP.

      We thank the Reviewer for his/her appreciation of the importance of our study. We have included the additional reference.

      Weaknesses:

      I find only three significant areas for improvement in the present study:

      First, can it be concluded that non-BTSP events occur exclusively due to a global remapping event, as stated in the manuscript "these PFF surges included a high fraction of both non-BTSP- and BTSP-like PFF events, and were associated with global remapping of the CA1 representation"? Global remapping has a precise definition that involves quantifying the stability of all place fields recorded. Without a color scale bar in Figure 3D (which should be added), we cannot know whether the overall representations were independent before and after the spontaneous reset. It would be good to know if some neurons are able to maintain place coding (more often than expected by chance), suggestive of a partial-remapping phenomenon.

      We have performed the analysis suggested by the Reviewer and determined what fraction of CA1PCs retained its original tuning property after the representation switch. We found that the remapping was essentially global, as only a small fraction (5.4%) of CA1PCs retained their pre-switch tuning curve after the switch. This is now described in the Results.

      We now state in the figure legend for the former Figure 3D (now Figure 3F) that the color scale applies to all subpanels.

      We would like to note that we do not conclude that non-BTSP events occur exclusively during global remapping – we have found a sizable fraction of PFF by non-BTSP mechanism also in the familiar environment with no signs of change in the population representation. We agree nonetheless that PFF is dominated by BTSP under these conditions, whereas the contribution of non-BTSP is larger during global remapping events.

      Second, BTSP has a flip side that involves the weakening of existing place fields when a novel field emerges. Was this observed in the present study? Presumably place fields can disappear due to this bidirectional BTSP or due to global remapping. For a full comparison of the two phenomena, the disappearance of place fields must also be assessed.

      In this study we focused on the birth of new PFs – yet, PFs not only form but also disappear constantly. The factors driving PF weakening are even less explored and understood than those driving PF birth. In fact, we observed (as illustrated by several examples in our MS) that many PFs weaken, or disappear completely during the course of an imaging session. These effects are sometimes accompanied by a new PFF event elsewhere (e.g. Figure 2 – figure supplement 2E bottom), whereas in other cases they are not (e.g. Figure 5A, middle). Similarly, some BTSP events seem to coincide with disappearance of another PF, but others are not (e.g. Figure 2A bottom, first PF along the track; Figure 3 – figure supplement 1A left, first PF). The picture is further complicated in the case of global remapping events (i.e. representation switches, Figure 3 – figure supplement 2B) that, by definition, include both new PFF and PF disappearance. We feel that exploration of the complex mechanisms at play in PF disappearance is outside the scope of the current study, but could be the subject of an interesting future investigation.

      Finally, it would be good to know if place fields differ according to how they are born. For example, are there differences in reliability, width, peak rate, out-of-field firing, etc for those that arise due to BTSP vs non-BTSP.

      We have analyzed several properties of the PFs and found no significant difference in either their width (BTSP: 46.4 ± 24.4 cm; non-BTSP: 50.4 ± 32.5 cm, p = 0.28) or peak rates (BTSP: 19.0 ± 14.7 a.u./s; non-BTSP: 21.4 ± 16.8 a.u./s, p = 0.27) or the out-of-field firing rates (BTSP: 0.64 ± 0.68 a.u./s; non-BTSP: 0.83 ± 1.25 a.u./s, p = 0.09, all unpaired t-test). We have included these data into the Results section.

      Reviewer #1 (Recommendations for the authors):

      Consider adding additional visual cues or environmental elements to the virtual reality (VR) setup to create a more enriched and immersive environment. Collect data from a couple of mice in the enriched environment and compare the PFF dynamics to the original environment. This would help determine whether the findings on PFF dynamics hold in a setting where spatial coding may be more robust. Including floor cues, distal visual markers, or varying textures might provide a more comprehensive understanding of the factors influencing BTSP-like and non-BTSP-like events.

      We thank the Reviewer for her/his suggestion of analyzing data obtained from a more enriched VR environment compared to the one we used in our study. We have now included data obtained in a profoundly different VR environment, which did not have sparse dominant visual landmarks, but the entire wall was covered with a rich pattern with different shapes of different colors. Our data from 11 imaging sessions from 4 mice revealed BTSP- and non-BTSP-like PFF events with approximately the same ratio to that found in our regular maze. These results are described in the Results section and are presented in a new supplementary figure (Figure 2 – figure supplement 2). 

      Wherever deconvolved spikes were used for analysis, provide a comparison of results obtained directly from the GCaMP ΔF/F signals versus those derived from the deconvolved spiking data. This could illustrate any differences and help readers understand the limitations and reliability of the inference method.

      We have adopted a currently widely accepted method in the field to infer spikes from fluorescent traces using the Suite2p software package. All of our analyses were then performed on the inferred spikes. To address the concerns of the Reviewer, we analyzed the relationship between the peak [Ca<sup>2+</sup>] transients and inferred spike activity (new Figure 3 – figure supplement 1C-E). Our results clearly demonstrate a robust, highly significant correlation between these measures at the level of individual cells (new Figure 3 – figure supplement 1D) and the Spearman correlation coefficients show a distribution that is very different from random distributions (new Figure 3 – figure supplement 1E). From these, we conclude that using directly the fluorescent data would have resulted in largely similar PF detection and identification.

      Improve the visual clarity of figures by enlarging key elements such as arrows that indicate BTSP-like events. Consider using colors that stand out more clearly to guide readers' attention. Include annotations of statistical significance directly on the figures (e.g., adding NS or * indicators) to make it clear which comparisons are statistically significant. This will help readers quickly interpret the data without needing to refer back to the text.

      Based on the suggestion of the Reviewer, we have enlarged the arrows. We have also indicated statistical results on the figures. Because some of the results of factorial ANOVA tests are difficult to be comprehensively indicated on our plots, we kept the description of the statistical results in the legends as well. We hope that these alterations will make data interpretation easier.

      Replace or supplement bar plots with violin plots or scatter plots that show the distribution of individual data points. This change would offer a clearer picture of data variability and underlying trends, aiding readers in assessing the robustness of the results.

      We have changed the plots and now present all data points.

      Add more detailed quantification in the results section, specifying the total number of newly formed place fields, the proportion that are categorized as BTSP-like versus non-BTSP-like, and how many events did not fit these categories. Explicitly state what fraction of the total recorded place field formations are represented by the 59 non-BTSP-like events mentioned, as this is currently difficult to discern.

      The number of BTSP- and non-BTSP-like PFF events are given in the MS. As described in the Methods, after identifying BTSP- and non-BTSP-like PFF events using the shift and gain criteria, we have manually checked each of these ROIs and the spatial footprint of every new PFF events for these cells and excluded ROIs with non-soma-like shapes and activities with spurious footprints suggesting contamination, creating a ‘cleaned’ dataset. We did not perform such visual inspection and manual curation of every ROI’s spatial footprints that belong to the two additional categories (no gain with shift, gain without shift, 872 events). Since these classes are also overestimated without curation, we cannot provide a precise fraction of the BTSP- and non-BTSP-like PFF events from the total recorded PFF population. However, - assuming that factors leading to exclusion affect all groups equally - we can provide their fractions by comparing the numbers of newly born PFs in all categories before the visual inspections. In the normal maze, we found 806 candidate BTSP-like (52%),164 non-BTSP-like (10%) PFFs and an additional 593 PFs (38%) could not be included in these two groups [40 PFs (3%) with formation lap gain and backward shift but significant backward drift; 238 PFs (15%) with formation lap gain but without backward shift; 315 PFs (20%) with no formation lap gain but with backward shift]. These data have been included in the Methods.

      Ensure that all statements describing specific findings are consistently linked to the appropriate figures and panels. There are instances in the text where results are discussed without clear references, which can make it challenging for readers to verify the data. For example, the section on population remapping in a novel environment should point directly to the relevant figure panels to guide readers.

      We regret that our text was not linked properly to the appropriate figures. We corrected this during the revision.

      Given that BTSP-like events are inferred rather than directly confirmed, it would be prudent to frame conclusions about their sufficiency in more tentative terms, acknowledging the limitations of the current data. Consider adding a discussion of potential future experiments that could confirm whether these large transients truly represent BTSP events, including evidence for complex spikes or calcium plateau potentials.

      The Reviewer is correct that we do not have direct evidence that all large somatic Ca<sup>2+</sup> events represent dendritic plateau potentials. Now we discuss this and other limitations in the MS (Discussion section).

      Reviewer #2 (Recommendations for the authors):

      Although the author has outlined three characteristics of place fields (PFs) generated by behavioral time scale synaptic plasticity (BTSP) during calcium imaging in the Introduction section, as follows: ' First, the prolonged CSB results in large [Ca<sup>2+</sup>] transient during the initial PFF event, typically followed by weaker Ca2+ signals on consecutive traversals through the PF. Second, due to the long and asymmetric temporal kernel of the plasticity (favoring potentiation of inputs active 1-2 seconds before the CSB) a substantial backward shift in the spatial position of the PF center can be observed on linear tracks after the formation lap. Third, the width of the new PF is generally proportional to the running speed of the animal during the PFF event.' Figure 3B, which displays the third feature of classified BTSP and non-BTSP data, serves as an important confirmation of the classification results using the first two features. Even though the Spearman correlation indicated a significant difference, the raw data distributions of BTSP and non-BTSP appear similar, suggesting that a distribution of bootstrap and more stringent confirmation should be conducted to be convincing.

      As described in the MS, because of the difference in the number of events in the two groups, we randomly subsampled the BTSP-like events to the sample size of the non-BTSP-like PFF events 10000 times and performed regression analysis. This bootstrapping revealed that both the r and p values of the fit to the non-BTSP data fell outside the 95% confidence interval of the bootstrapped BTSP values, indicating that the difference between the groups was robust.

      In further analysis during the revision, we found that the PF width variance explained by distance from landmarks is substantially larger than the variance explained by the running speed during the formation lap. We performed a cross-validated analysis by these two factors (Figure 3D), which highlights that speed explains some of the PF width variance of BTSP-like PFFs, but none of the non-BTSP PFFs.

      The proportions of the three types should be provided. page 6: ' Using a conservative approach, we categorized a new PF to be formed by a BTSP-like mechanism if it had both positive gain and negative shift values (Figure 2A; n = 310 new PFs), whereas new PFs exhibiting neither positive gain nor negative shift were considered as non-BTSP-like events (Figure 2B; n = 59). All other newly formed PFs (no-gain with backward shift and gain without backward shift) were excluded from further analysis.' The number of excluded newly formed PFs should be disclosed, as well as the distribution ratio of these three types in each animal.

      The number of BTSP- and non-BTSP-like PFF events are given in the MS. As described in the Methods, after identifying BTSP- and non-BTSP-like PFF events using the shift and gain criteria, we have manually checked each of these ROIs and the spatial footprint of every new PFF events for these cells and excluded ROIs with non-soma-like shapes or spurious activities, creating a ‘cleaned’ dataset. We did not perform such visual inspection and manual curation of every ROI’s spatial footprints that belonged to the two additional categories (no gain with shift, gain without shift, 872 events). Since these classes are also overestimated without curation, we cannot provide a precise fraction of the BTSP- and non-BTSP-like PFF events from the total recorded PFF population. However, - assuming that factors leading to exclusion affect all groups equally - we can provide their fractions by comparing the numbers of newly born PFs in all categories before the visual inspections. In the normal maze, we found 806 candidate BTSP-like (52%),164 non-BTSP-like (10%) PFFs and an additional 593 PFs (38%) could not be included in these two groups [40 PFs (3%) with formation lap gain and backward shift but significant backward drift; 238 PFs (15%) with formation lap gain but without backward shift; 315 PFs (20%) with no formation lap gain but with backward shift]. These data have been included in the Methods.

      Figure 2C, while showing an overall decrease in amplitude from the formation lap to the next lap, could benefit from a pairwise analysis of the corresponding formation lap and the following lap of each session to provide more convincing and detailed results.

      We now present all data with connected lines across consecutive laps to illustrate the changes in each ROI. Our statistical analysis included the pairwise comparison of amplitudes.

      The experiment's time range is broad (11-99 days); it is worth investigating whether different training intervals might influence the results.

      Based on the suggestion of the Reviewer, we have analyzed the elapsed time and the number of sessions from the first training to the recording, and we demonstrate that there is no correlation of these parameters with the number of new PFFs. These data are now presented in Figure 2 – figure supplement 1C.

      It is unclear whether the formation of place fields also generates characteristic features of dendritic properties.

      It is not clear to us which ‘characteristic dendritic features of dendritic properties’ generated by PFF the Reviewer refers to. Since we did not image dendrites of individual CA1PCs, we have no information about dendritic properties of the neurons.

      It may be necessary to add a clearer figure to illustrate the correlation between width and speed following the downsampling of non-BTSP-like events (refer to Figure 3B).

      We have performed extensive additional analysis on the relationship of PF width with various behavioral factors, including the speed of the animal in the formation lap. Inspection of the PF width distributions along the track revealed a close association of PF width with the distance of the animal from the nearest visual landmark in the corridor, so that PFs close to landmarks were narrower than PFs between landmarks. We found that the PF width variance explained by distance from landmarks is substantially larger than the variance explained by the running speed during the formation lap. Nevertheless, there is a clear difference between BTSP-like and non-BTSP-like PFFs: running speed explains some variance in the case of BTSP-like PFFs, but none for non-BTSP-like PFFs.

      We have included these findings into the Results section and created two new panels in Figure 3 (C, D) and Figure 3 – figure supplement 1 (A, B).

      It is recommended that statistical results be labeled in the figures with n.s. or stars for better readability.

      Based on the suggestion of the Reviewer, we have indicated statistical results on the figures. Because some of the results of factorial ANOVA tests are difficult to be comprehensively indicated on our plots, we kept the description of the statistical results in the legends as well. We hope that these alterations will make data interpretation easier. We hope that these alterations will make data interpretation easier.

    1. eLife Assessment

      This manuscript describes a useful study describing an interesting infection phenotype that differs between adult male and female zebrafish. The authors argue that male-biased expression of Cyp17a2 is implicated in mediating infection levels through STING and USP8 activity regulation. Thus, this study highlights an unexpected factor involved in antiviral immunity that could open new avenues of investigation for infection, metabolism, and other contexts. Although the manuscript presents some evidence supporting its main claims, the evidence for the main argument made in the study on sex dimorphism remains incomplete at this stage.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Lu & Cui et al. observe that adult male zebrafish are more resistant to infection and disease following exposure to Spring Viremia of Carp Virus (SVCV) than female fish. The authors then attempt to identify some of the molecular underpinnings of this apparent sexual dimorphism and focus their investigations on a gene called cytochrome P450, family 17, subfamily A, polypeptide 2 (cyp17a2) because it was among the genes that they found to be more highly expressed in kidney tissue from males than in females. Their investigations lead them to propose a direct connection between cyp17a2 and modulation of interferon signaling as the key underlying driver of the difference between male and female susceptibility to SVCV.

      Strengths:

      Strengths of this study include the interesting observation of a substantial difference between adult male and female zebrafish in their susceptibility to SVCV, and also the breadth of experiments that were performed linking cyp17a2 to infection phenotypes and molecularly to the stability of host and virus proteins in cell lines. The authors place the infection phenotype in an interesting and complex context of many other sexual dimorphisms in infection phenotypes in vertebrates. This study succeeds in highlighting an unexpected factor involved in antiviral immunity that will be an important subject for future investigations of infection, metabolism, and other contexts.

      Weaknesses:

      Weaknesses of this study include an indirect connection between the majority of experiments and the proposed mechanism underlying the sexual dimorphism phenotype, widespread reliance on over-expression when investigating protein-protein interaction and localization, and an insufficient amount of description of the data presented in the figures. Specific examples of areas for clarification or improvement include:

      (1) Figure 10 outlines a mechanistic link between cyp17a2 and the sexual dimorphism the authors report for SVCV infection outcomes. The data presented on increased susceptibility of cyp17a2-/- mutant male zebrafish support this diagram, but this conclusion is fairly weak without additional experimentation in both males and females. The authors justify their decision to focus on males by stating that they wanted to avoid potential androgen-mediated phenotypes in the cpy17a2 mutant background (lines 152-156), but this appears to be speculation. It also doesn't preclude the possibility of testing the effects of increased cyp17a2 expression on viral infection in both males and females. This is of critical importance if the authors intend to focus the study on sexual dimorphism, which is how the introduction and discussion are currently structured.

      (2) The authors present data indicating an unexpected link between cyp17a2 and ubiquitination pathways. It is unclear how a CYP450 family member would carry out such activities, and this warrants much more attention. One brief paragraph in the discussion (starting at line 448) mentions previous implications of CYP450 proteins in antiviral immunity, but given that most of the data presented in the paper attempt to characterize cyp17a2 as a direct interactor of ubiquitination factors, more discussion in the text should be devoted to this topic. For example, are there any known domains in this protein that make sense in this context? Discussion of this interface is more relevant to the study than the general overview of sexual dimorphism that is currently highlighted in the discussion and throughout the text.

      (3) Figures 2-9 contain information that could be streamlined to highlight the main points the authors hope to make through a combination of editing, removal, and movement to supplemental materials. There is a consistent lack of clarity in these figures that could be improved by supplementing them with more text to accompany the supplemental figures. Using Figure 2 and an example, panel (A) could be removed as unnecessary, panel (B) could be exchanged for a volcano plot with examples highlighting why cyp17a2 was selected for further study and also the full dataset could be shared in a supplemental table, panel (C) could be modified to indicate why that particular subset was chosen for plotting along with an explanation of the scaling, panel (D) could be moved to supplemental because the point is redundant with panels (A) and (C), panel (E) could be presented as a heatmap, in panels (G) and (H) data from EPC cells could be moved to supplemental because it is not central to the phenotype under investigation, panels (J) to (L) and (N) to (P) could be moved to supplemental because they are redundant with the main points made in panels (M) and (Q). Similar considerations could be made with Figures 3-9

      (4) The data in Figure 3 (A)-(C) do not seem to match the description in the text. That is, the authors state that cyp17a2 overexpression increases interferon signaling activity in cells, but the figure shows higher increases in vector controls. Additionally, the data in panel (H) are not described. What genes were selected and why, and where are the data on the rest of the genes from this analysis? This should be shared in a supplemental table.

      (5) Some of the reagents described in the methods do not have cited support for the applications used in the study. For example, the antibody for TRIM11 (line 624, data in Figures 6 & 7) was generated for targeting the human protein. Validation for use of this reagent in zebrafish should be presented or cited. Furthermore, the accepted zebrafish nomenclature for this gene would be preferred throughout the text, which is bloodthirsty-related gene family, member 32.

    3. Reviewer #2 (Public review):

      The manuscript identified Cyp17a2 as a master regulator of male-biased antiviral immunity in a sex chromosome-free model (zebrafish) challenging established immunological paradigms.

      Strengths:

      (1) The bifunctional role of Cyp17a2 (host-directed STING stabilization and virus-directed P degradation) represents a significant conceptual advance.

      (2) First demonstration of K33 chains as a critical regulatory switch for both host defense proteins and viral substrates.

      (3) Comprehensive validation across biological scales: organismal (survival, histopathology), cellular (transcriptomics, Co-IPs), and molecular (ubiquitination assays, site-directed mutagenesis).

      (4) Functional conservation in cyprinids (zebrafish and gibel carp) strengthens biological significance.

      Weaknesses:

      (1) Colocalization analyses (Figures 4G, 6I, 9D) require quantitative metrics (e.g., Pearson's coefficients) rather than representative images alone.

      (2) Figure 1 survival curves need annotated statistical tests (e.g., "Log-rank test, p=X.XX")

      (3) Figure 2P GSEA should report exact FDR-adjusted *p*-values (not just "*p*<0.05").

      (4) Section 2 overextends on teleost sex-determination diversity, condensing to emphasize relevance to immune dimorphism would strengthen narrative cohesion.

      (5) Limited discussion on whether this mechanism extends beyond Cyprinidae and its implications for teleost adaptation.

    1. eLife Assessment

      The study by Reed et al. provides fundamental findings and convincing evidence defining the topological changes that occur during tumorigenesis. The findings enhance the understanding of stable long-range connections among genes that reprogram cancer-related functions. Nevertheless, performing additional experiments is recommended.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Metz Reed and colleagues present an exceptionally thorough analysis of three-dimensional genome reorganization during breast cancer progression using the well-characterized MCF10 model system. The integration of high-resolution Micro-C contact maps with multi-omics profiling provides compelling insights into stage-specific dynamics of chromatin compartments, TAD boundaries, and looping events. The discovery that stable chromatin loops enable epigenetic reprogramming of cancer genes, while structural changes selectively drive metastasis-associated pathways, represents a significant conceptual advance. This work substantially deepens our understanding of genome topology in malignancy. To further enhance this impactful study, we offer the following constructive suggestions.

      Strengths:

      This work sets a benchmark for integrative 3D genomics in oncology. Its methodological sophistication and conceptual advances establish a new paradigm for studying nuclear architecture in disease.

      Weaknesses:

      Major Issues

      (1) Functional tests would strengthen the observed links between structure and gene changes. For example, the COL12A1 gene loop formation correlates with its increased expression. Disrupting this loop using CRISPR-dCas9 at chr6 position 75280 kb could prove whether the loop causes COL12A1 activation. Such experiments would turn strong correlations into clear mechanisms.

      (2) The H3K27ac looping idea needs deeper validation. Data suggests H3K27ac loss weakens loops without affecting CTCF. Testing how cohesin proteins interact with H3K27ac-modified sites would clarify this process. Degron systems could rapidly remove H3K27ac to observe real-time effects. Also, the AP-1 motifs found at dynamic loop sites deserve functional tests. Knocking down AP-1 factors might show if they control loop formation.

      (3) Connecting findings to patient data would boost clinical relevance. The MCF10 model is excellent for controlled studies. Checking if TAD boundary weakening occurs in actual patient metastases would show real-world importance. Comparing primary and metastatic tumor samples from the same patients could reveal new structural biomarkers. If tissue is scarce, testing cancer cells with added stroma cells might mimic tumor environment effects.

      Minor Issues

      Adding a clear definition for static loops would help readers. For example, state that static loops show less than 10 percent contact change across replicates. In the ABC model analysis, removing promoter regions from the enhancer list would focus results on true long-range interactions. Briefly noting why this study sees TAD weakening while other cancer types show different patterns would provide useful context.

    3. Reviewer #2 (Public review):

      Employing the MCF10 breast-cancer progression series, the authors integrate high-resolution Micro-C chromatin-conformation capture with RNA-seq and ChIP-seq to delineate the sequential reorganization of compartments, topologically associated domains (TADs), and long-range loops across benign, pre-neoplastic, and metastatic states, and couple these 3D alterations to gene expression and enhancer activity. Four principal findings emerge: (i) largely static chromatin frameworks still gate differential gene output, with up-regulated loci most affected; (ii) enhancer-promoter contact strength covaries with transcriptional amplitude; (iii) 127 genes gain expression concomitant with increased chromatin contacts; and (iv) progression-associated genes acquire altered histone marks at distal enhancers that remain tethered by stable loops. While the conclusions are broadly supported, methodological and analytical refinements are required.

      (1) Model representativeness.<br /> The long-term culture-adapted MCF10 genome harbours extensive aneuploidies and translocations. Validation of key COL12A1/WNT5A loop dynamics in an independent breast-cancer line (e.g., MDA-MB-231, T47D) or in patient-derived organoids/PDX models would strengthen generalizability.

      (2) The study remains purely correlative; no perturbation experiments are conducted to demonstrate causal roles of chromatin loops on gene expression. CRISPR interference (CRISPR-Cas9-KRAB/HDAC) or enhancer deletion/inversion should be applied to 3-5 pivotal loops (e.g., COL12A1, WNT5A) to test their impact on target-gene expression and cellular phenotypes (e.g., proliferation, migration).

      (3) The manuscript lacks integration with clinical datasets. Integrate TCGA-BRCA data to assess whether elevated COL12A1/WNT5A expression associates with overall survival (OS) or distant metastasis-free survival (DMFS).

    4. Reviewer #3 (Public review):

      Summary:

      The authors tackle an important problem: defining the topological changes that occur during tumorigenesis. To study this, they use an established stepwise cell model of breast cancer. A strength of their study is a careful, robust differential analysis of topological features across each cell state, which is presented clearly and rigorously. They define changes in compartmentalization, TAD structure, and chromatin looping. Intriguingly, when the authors integrate differential gene expression with chromatin looping, they see that most differentially regulated genes are not involved in loop changes, suggesting that changes in promoter or enhancer chromatin marks may play a bigger role in regulating transcription than differential loops. The differential topology analysis and its integration with transcription is very well done- one of the best versions of this I have read in the 3D genome field! However, the paper is framed largely as a cancer biology study, and it teaches us much less about this. I am worried that some of the trends for each topologic feature are not going to be consistent across the pre-malignant-malignant-metastatic spectrum and would like the authors to soften some of their claims a bit regarding how this clarifies our understanding of cancer evolution.

      Weaknesses:

      Major Concerns:

      (1) The integration of gene expression and chromatin loops is intriguing. The authors' differential analysis, however, omits consideration of genes that are on and simply further upregulated versus genes that transition on/off or off/on. It would be nice to see the authors break out looping patterns for these two different patterns of regulation, as it may be instructive regarding the rules for how EP loops govern transcription.

      (2) Given the paucity of differential loops at the majority of genes whose expression changes, the authors should examine chromatin subcompartments, as these may associate more with differential transcription.

      (3) The authors could push their TAD analysis further by integrating it with transcription. Can they look at genes and their enhancers that span these altered boundaries to see if these shifts impact transcription?

      (4) The progression of cancer critically goes from a benign -> pre-malignant -> malignant -> metastatic series of steps. The AT1 line is described as 'premalignant' and thus the authors' series omits a malignant line. While I think adding such a sample is an unreasonable request at this point (as it would have had to have been studied in 'batch' with these other samples), the authors should acknowledge that they omit this step and spend some time discussing the genetic, morphologic, and phenotypic features for their 3 conditions. The images in Figure 1S aren't particularly useful- they don't tell the reader that these cells are malignant/benign. The karyotypic data are intriguing but not fully analyzed, so it is hard to know what true phenotype these cells represent. For example, malignant means DCIS/invasive carcinoma - so then what does this pre-malignant cell model represent? The described alteration in the AT1 line is a Ras oncogene, so in some sense, the transition to this line really is just +/- Ras. The authors could spend some time thinking about the effects of Ras specifically on the 3D genome.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The question of how central nervous system (CNS) lamination defects affect functional integrity is an interesting topic, though it remains a subject of debate. The authors focused on the retina, which is a relatively simple yet well-laminated tissue, to investigate the impact of afadin - a key component of adherens junctions on retinal structure and function. Their findings show that the loss of afadin leads to significant disruptions in outer retinal lamination, affecting the morphology and localization of photoreceptors and their synapses, as illustrated by high-quality images. Despite these severe changes, the study found that some functions of the retinal circuits, such as the ability to process light stimuli, could still be partially preserved. This research offers new insights into the relationship between retinal lamination and neural circuit function, suggesting that altered retinal morphology does not completely eliminate the capacity for visual information processing.

      Strengths:

      The retina serves as an excellent model for investigating lamination defects and functional integrity due to its relatively simple yet well-organized structure, along with the ease of analyzing visual function. The images depicting outer retinal lamination, as well as the morphology and localization of photoreceptors and their synapses, are clear and well-described. The paper is logically organized, progressing from structural defects to functional analysis. Additionally, the manuscript includes a comprehensive discussion of the findings and their implications.

      Weaknesses:

      While this work presents a wealth of descriptive data, it lacks quantification, which would help readers fully understand the findings and compare results with those from other studies. Furthermore, the molecular mechanisms underlying the defects caused by afadin deletion were not explored, leaving the role of afadin and its intracellular signaling pathways in retinal cells unclear. Finally, the study relied solely on electrophysiological recordings to demonstrate RGC function, which may not be robust enough to support the conclusions. Incorporating additional experiments, such as visual behavior tests, would strengthen the overall conclusions. 

      We would like to thank the reviewer for the thoughtful and valuable comments that helped us to further improve the manuscript. We have revised the manuscript to address the following three points in response to the reviewer's comments.

      While this work presents a wealth of descriptive data, it lacks quantification, which would help readers fully understand the findings and compare results with those from other studies.

      In response, we quantified the position of each retinal cell type and measured retinal thickness in the cHet and cKO mice at 1M, as presented in Figures 2F–M. To reflect these additions, we have included explanatory text in the revised manuscript (see lines 507–533).

      Furthermore, the molecular mechanisms underlying the defects caused by afadin deletion were not explored, leaving the role of afadin and its intracellular signaling pathways in retinal cells unclear.

      As AJ components, such as catenin and cadherin, are known to be associated with several signaling pathways, including Notch and Wnt signals (PMID: 37255594), we speculated that these pathways might be disrupted in the afadin cKO retina. Since these pathways are involved in cell proliferation, we examined the number of progenitor cells in the afadin cKO retina at developmental stages P1, P3, and P5 (new Figure S6C, see lines 868-870). No significant differences were observed at any of these stages. We also quantified the number of each retinal cell type at P14 when differentiation is complete. In the cKO retina, the number of BCs significantly increased, whereas the number of photoreceptors significantly reduced (new Figure S4C, see lines 620-622). To our knowledge, activation or inactivation of any AJ-associated signaling pathway does not reproduce the cell fate alterations observed in the afadin cKO retina. These findings suggest that the above pathways related to AJ may be unchanged in the cKO retina. However, we cannot exclude the possibility that multiple signaling pathways may be affected simultaneously or other pathways affected in the cKO retina.

      Finally, the study relied solely on electrophysiological recordings to demonstrate RGC function, which may not be robust enough to support the conclusions. Incorporating additional experiments, such as visual behavior tests, would strengthen the overall conclusions.

      We appreciate the reviewer’s insightful suggestion. To more robustly evaluate visual function in the cKO mice, we performed optomotor response (OMR) and visual cliff tests using cHet, cKO, and optic nerve crush (ONC) mice with Aki Hashio, Yuki Emori, and Mao Hiratsuka. We added their name as co-authors to the new manuscript. In the OMR test, cKO mice exhibited fewer responses to visual stimuli than cHet mice but significantly more than ONC mice. Furthermore, although no significant difference was detected between cKO and ONC mice in the visual cliff test, some cKO mice displayed cautious behavior suggestive of depth perception. These results indicate that cKO mice retain partial visual function, which is consistent with the MEA analysis. We have included these data as the new Figure 8 and incorporated the findings into the revised manuscript in the Introduction (lines 130-131 and 133-134), Methods (lines 378-406), Results (lines 775-816), and Discussion sections (lines 1026-1035).

      Reviewer #2 (Public review):

      Summary:

      Ueno et al. described substantial changes in the afadin knockout retina. These changes include decreased numbers of rods and cones, an increased number of bipolar cells, and disrupted somatic and synaptic organization of the outer limiting membrane, outer nuclear layer, and outer plexiform layer. In contrast, the number and organization of amacrine cells and retinal ganglion cells remain relatively intact. They also observed changes in ERG responses and RGC receptive fields and functions using MEA recordings.<br /> Strengths:

      The morphological characterization of retinal cell types and laminations is detailed and relatively comprehensive.

      Weaknesses:

      (1) The major weakness of this study, perhaps, is that its findings are predominantly descriptive and lack any mechanistic explanation. As afadin is key component of adherent junctions, its role in mediating retinal lamination has been reported previously (see PMCID: PMC6284407). Thus, a more detailed dissection of afadin's role in processes, such as progenitor generation, cell migration, or the formation of retinal lamination would provide greater insight into the defects caused by knocking out afadin.

      Thank you for valuable comments. We agree with the reviewer's point that findings are predominantly descriptive and lack any mechanistic explanation. However, we would like to clarify that the study cited in the comment (PMCID: PMC6284407) analyzed the role of afadin in dendritic stratification of direction-selective RGCs within the IPL, where “lamination” refers to the layering of RGC dendrites in the IPL. Here, we analyzed the function of afadin in the laminar construction of the overall retina.

      In response to the reviewer’s comment, we have added new analyses addressing retinal lamination, as well as the number and spatial distribution of progenitor cells, during development in the cKO retina. These new results are shown in Figures 4E, 9C–F, S5A–C, and S6C of the revised manuscript, and corresponding explanations added in the revised text (lines 643–662 and 855–870).

      (2) The authors observed striking changes in the numbers of rods, cones, and BCs, but not in ACs or RGCs. The causes of these distinct changes in specific cell classes remain unclear. Detailed characterizations, such as the expression of afadin in early developing retina, tracing cell numbers across various early developmental time points, and staining of apoptotic markers in developing retinal cells, could help to distinguish between defects in cell generation and survival, providing a better understand of the underlying causes of these phenotypes.

      Thank you for the insightful comment. Following the reviewer’s suggestion, we quantified the number of retinal cell types at P14 when cell differentiation is complete (new Figure S4C). At P14, the numbers of photoreceptors and BCs were significantly reduced in the cKO retina, while Müller glia, which was significantly reduced at 1M, showed no difference. We further examined the number of rods and BCs at P1, P3, and P5 (new Figures S4E, F). No significant differences were detected at P1 or P3, however, at P5, rod marker expression was significantly decreased, while the number of BCs was significantly increased. These results suggest that the defects in cell fate determination of BCs and rods begin to emerge between P3 and P5, a period for which rods and BCs actively differentiate. We speculate that cells originally destined to become rods may instead differentiate into BCs in the cKO retina. In addition, we found a significant increase in apoptotic cells at P1, P3, P5, and P14 (new Figure S6B). Furthermore, Müller glia and rod photoreceptors showed significantly greater reduction at 1M compared to P14, suggesting that the reduction in Müller glia observed at 1M may be due to post-differentiation cell death. These are presented in Figures S4C, S4E–F, and S6B, and described in the revised manuscript (lines 620-635 and 827-838).

      (3) Although the total number of ACs or RGCs remains unchanged, their localizations are somewhat altered (Figures 2E and 4E). Again, the cause of the altered somatic localization in ACs and RGCs is unclear.

      Thank you for the valuable question. In response to the reviewer’s comment, we analyzed the position of RGCs and ACs in the developing cKO retina. In the cKO retina at P1, retinal cells were organized into distinct multicellular compartments with clear boundaries, and acellular regions extending to the outer retinal surface were observed at these boundaries. These acellular regions contained dendritic processes of RGCs and ACs, which are components of the IPL, indicating that elements of the IPL extended vertically across the retina. As development progressed, the compartment boundaries gradually shifted toward the inner retina. At P14, the IPL was mainly located on the inner retina, as in the normal retina. However, some IPL structures remained in the outer retina and may correspond to the acellular patches. We have included the above data in the revised manuscript as Figures S5A and S5B and revised the manuscript to include this point (lines 643-660).

      (4) One conclusion that the authors emphasise is that the function of RGCs remains detectable despite a major disrupted outer plexiform layer. However, the organization of the inner plexiform layer remains largely intact, and the axonal innervation of BCs remains unchanged. This could explain the function integrity of RGCs. In addition, the resolution of detecting RGCs by MEA is low, as they only detected 5 clusters in heterozygous animals. This represents an incomplete clustering of RGC functional types and does not provide a full picture of how functional RGC types are altered in the afadin knockout.

      We appreciate the reviewer’s insightful comments. Although our clustering of RGC subtypes in afadin cHet retinas resulted in only five clusters, the key finding of our study is the preservation of RGC receptive fields in afadin cKO retinas, despite severe photoreceptor loss (reduced to about one-third of normal) and disruption of photoreceptor-bipolar cell synapses in the OPL. This suggests that even with crucial damage to the OPL, the primary photoreceptor-bipolar-RGC pathway can still function as long as the IPL remains intact. Moreover, the presence of rod-driven responses in RGCs indicates that the AII amacrine cell-mediated rod pathway may also continue to function. We agree that our functional clustering in afadin cHet retinas was incomplete. However, we guess that the absence of RGCs with fast temporal responses in afadin cKO retinas may not simply be due to the loss of specific RGC subtypes but due to disrupted synaptic connections between photoreceptors and fast-responding BCs. Furthermore, the structural abnormalities in retinal lamination in afadin cKO retinas may alter RGC response properties, making strict functional classification less meaningful. We would like to emphasize the finding that disruption of the retinal lamination in afadin cKO retinas leads to the absence of RGCs with fast temporal response properties, rather than focusing solely on the classification of RGC subtypes.

      Minor Comments:

      (1) Line 56-67: "Overall, these findings provide the first evidence that retinal circuit function can be partially preserved even when there are significant disruptions in retinal lamination and photoreceptor synapses" There is existing evidence showing substantial adaption in retinal function when retinal lamination or photoreceptor synapses are disrupted, such as PMCID: PMC10133175.

      Thank you for your comment. We agree that the original sentence was ambiguous in its wording, and we have revised it to clarify our intended meaning (lines 48-50):

      "Overall, these findings provide the first evidence that retinal circuit function can be partially preserved even when there are significant disruptions in both retinal lamination and photoreceptor synapses."

      In response, we have cited this study and added the following sentence to the Discussion section of the revised manuscript. The paper you mentioned is crucial for discussing and considering the results of our study. We have cited this study and added the following sentence to the Discussion section of the revised manuscript (lines 910-915):

      “Furthermore, RFs of RGCs are also detected in several mouse models of retinitis pigmentosa, in which rod photoreceptors are degenerated and surviving cone photoreceptors lose their OS discs and pedicles, instead forming abnormal processes resembling synaptic dendrites (Barhoum et al., 2008; Ellis et al., 2023; Scalabrino et al., 2022).”

      (2) Line 114-115: "we focused on afadin, which is a scaffolding protein for nectin and has no ortholog in mice." The term "Ortholog" is misused here, as the mouse has an afadin gene. Should the intended meaning be that afadin has no other isoforms in mouse?

      Thank you for pointing it out. As we misused "Ortholog" as "Paralog", we revised the sentence (line 108).

      Recommendations for the authors:

      (1) The introduction to afadin is insufficient. Please provide more background information about this protein.

      Following the reviewer’s recommendations, we expanded the Introduction in the revised manuscript to provide a more detailed background on afadin, as follows (lines 108-119):

      “Afadin regulates the localization of nectin, which initiates cell–cell adhesion and promotes AJ formation by recruiting the cadherin–catenin complex. (Ohama et al., 2018; Takai and Nakanishi, 2003). In addition, afadin interacts with various cell adhesion and signaling molecules, as well as the actin cytoskeleton, and contributes to the accumulation of β-catenin, αE-catenin, and E-cadherin at AJs (Sakakibara et al., 2018; Sato et al., 2006). Afadin KO mice exhibit severe disruption of AJs in the ectoderm, along with other developmental defects, leading to embryonic lethality (Ikeda et al., 1999; Zhadanov et al., 1999). Conditional deletion of afadin in RGCs leads to disruption of dendrites in ON-OFF direction-selective RGCs (Duan et al., 2018). However, the effect of afadin loss on retinal lamination, circuit formation, and function is poorly understood.”

      (2) In Figure 1A (Bottom), regarding the peptide+ image, what does the green signal represent?

      The green signal observed in the peptide+ image represents the background and non-specific staining. We have added the sentence to the legend of Figure 1A in the revised manuscript (lines 1067-1068).

      (3) In the RESULTS section on page 17, the statement "Nectin-1, unlike nectin-2 and nectin-3, was partially co-localized with afadin at the OPL and IPL, in addition to the OLM" suggests that nectin-2 is also expressed at the IPL, as shown in Figure S1A. Providing high-power images, similar to those in Figure S1B, could help readers clearly recognize the staining signals.

      Following your suggestion, we added higher-magnification images of Nectin-2 signals in the IPL to Figure S1A and included the following clarification in the Figure legend (lines 1356-1358):

      “Nectin-2 and nectin-3 were localized in the OLM. The Nectin-2 signal in the IPL was insufficient for reliable assessment of its localization and colocalization.”

      (4) Figure S2A requires an uncropped scan of the membrane after Western blotting to demonstrate that there are no non-specific bands when using this afadin antibody, which was also utilized for IHC.

      We revised the new Figure S2C to include the uncropped membrane scan. Faint non-specific bands were observed in the Western blot, consistent with detecting non-specific signals in immunostaining using the anti-afadin antibody pre-absorbed with its antigen peptide.

      (5) IHC staining is necessary to demonstrate the knockout of afadin in retinal cells, as the paper does not show Cre expression in the retinal cells of the Dkk3-Cre mouse line. This would also help verify the specificity of the afadin antibody.

      In the cKO retina, the laminar structure was disrupted, and the background signal was generally high, making it difficult to reliably assess whether afadin expression was lost using immunostaining with the anti-afadin antibody. Therefore, in addition to the Western blot analysis already presented, we evaluated Cre activity in the Dkk3-Cre mouse line by crossing it with the R26-H2B-EGFP reporter line. Cre-mediated recombination was observed in all retinal cells at P0 and 1M. We have added these results to a revised Figure S2A and B and included explanatory text in the revised manuscript (lines 455–458).

      (6) Why is the outer nuclear layer (ONL) severely impaired in the cKO mice when afadin is not expressed in this layer? Additionally, given that afadin is highly expressed in the inner plexiform layer (IPL), why does the cKO not affect its structure?

      We speculate that the AJ defect in the outer retina during development may cause severe disruption of the ONL in afadin cKO mice. As shown in new Figure 9, ectopic AJs and aberrant position of mitotic cells were observed in the P0 cKO retina. These defects caused abnormal cell migration and position, resulting in the ONL disruption. On the other hand, in the IPL, afadin and other cell adhesion molecules may function redundantly, and thus, the IPL structure would be kept intact in the afadin cKO retina. We have added this interpretation to the Discussion section of the revised manuscript (lines 998–1005).

      (7) In the RESULTS section on page 20, the authors state, "We further investigated adherens junctions (AJs) in the cKO retina by immunostaining with OLM adherens junction markers β-catenin, N-cadherin, and nectin-1. We found that these signals were dispersed in the cKO retina (Figure S2C)." It appears that β-catenin, N-cadherin, and nectin-1 can still be detected in the cKO retina.

      We agree with the reviewer that β-catenin, N-cadherin, and nectin-1 can still be detected in the cKO retina. We used the term 'dispersed' to indicate that the signal was “scattered” rather than “disappeared”. To avoid confusion, we have revised the wording in the revised manuscript (line 499).\

      (8) In Figure 3, please indicate where the zoomed-in images were captured from the low-power images. Additionally, point out the locations of zoomed-in images in other figures as well.

      Following the reviewer’s suggestion, we updated Figures 2D, 3A-C, 4A, S2D, S3A, S3D, S3E, and S5D. The related Figure legends have also been revised.

      (9) The authors should include individual data points in all statistical graphics to provide a clearer presentation of the data.

      As suggested by the reviewers, we have revised all statistical graphs to display individual data points. Furthermore, the statistical analysis of synapse counts in Figures 3E, 3F, and S3C has been changed to linear mixed models (LMM) or generalized LMM to account for the variability in the number of synapses within individual mice.

      (10) In the RESULTS section on page 23, the statement "These data indicate that the rosette-like structure in the cKO may be an ectopic IPL, termed 'acellular patches'". What is the mechanism that may cause the rosette-like structure to translocate from the IPL to the outer region of the retina?

      Thank you for raising a valuable question. To clarify the mechanism of acellular patch formation in the cKO mice, we analyzed the position of RGCs and ACs in the developing cKO retina. In the cKO retina at P1, retinal cells were organized into distinct multicellular compartments with clear boundaries, and acellular regions extending to the outer retinal surface were observed at these boundaries. These acellular regions contained dendritic processes of RGCs and ACs, which are components of the IPL, indicating that elements of the IPL extended vertically across the retina. As development progressed, the compartment boundaries gradually shifted toward the inner retina. At P14, the IPL was mainly located on the inner retina, as in the normal retina. However, some IPL structures remained in the outer retina and may correspond to the acellular patches. We have included these findings in the revised manuscript as Figures S5A and S5B and added the corresponding description to the text (lines 643–665).

      (11) Is the blood vessel structure normal in the cKO retina? Could this impact the survival of retinal cells?

      Thank you for your valuable comment. We performed immunostaining with an anti-CD31 antibody, a marker for blood vessels, as shown in the new Figure S2G. No apparent differences were observed in the cKO retina. We have added the following description to the revised manuscript (lines 539–543):

      “It has been reported that defects in the distal processes of Müller glia are associated with abnormal retinal vasculature (Shen et al., 2012). Thus, we immunostained the cKO retina with anti-CD31, a blood vessel marker, but no apparent vascular abnormalities were detected (Figure S2G).”

      (12) In the RESULTS section on pages 26-29, there is a lot of statistical information included in parentheses. It would be more concise to place this information in the figure legends, if possible.

      Following the reviewer's suggestion, we have moved the statistical information from the main text (pages 26–29) to the corresponding Figure legends.

      (13) In the RESULTS section on page 28, the authors state, "On the other hand, the inner retina was apparently normal, and both the inner nuclear layer (INL) and IPL could be recognized." However, in Fig 7A, it appears that the INL is mixed with the ONL and cannot be clearly identified.

      We agree with the reviewer that the INL is mixed with the ONL and cannot be clearly identified. Accordingly, we have revised the description in the text (lines 740–742) as follows:

      “On the other hand, the inner retina was apparently normal, and both the IPL and the proximal part of the INL could be recognized.”.

      (14) It is mentioned in the manuscript that "The receptive field (RF) area in the cKO retinas was significantly smaller than that in the cHet retinas." Is there an impairment in the dendritic fields of RGCs in the cKO retina that could lead to a smaller RF?

      Thank you for asking an interesting question. The dendritic field reflects the region where presynaptic cells can form synaptic contacts, whereas the receptive field is dynamically shaped by spatiotemporal excitatory and inhibitory inputs, gap junctions, and membrane properties of the dendrites. Consequently, the size of the dendritic field does not necessarily correspond to that of the receptive field. Moreover, the disruption of the retinal lamination in the afadin cKO retina may alter the morphology of RGC dendritic fields—even when RNA expression levels are identical—which makes it difficult to exactly compare the morphology of the same RGC subtype between afadin cHet and afadin cKO retinas. Additionally, due to the presence of over 40 RGC subtypes and the rosette-like structures in the afadin cKO retina, it is challenging to trace the complete dendritic arborization of individual RGCs. For these reasons, we rather hesitate to compare the dendritic field size and the receptive field size.

      (15) Figure 7H was not cited in the corresponding section of the main text.

      Thank you for pointing it out. We have added a citation of Figure 7H in the revised manuscript (line 759).

      (16) In Figure 8C, is there a difference in the number of pHH3+ mitotic cells between the cHet and cKO mice?

      We quantified the number of pHH3-positive cells in the cKO retina at P0, as shown in the new Figure 9B. The number of mitotic cells was significantly increased in the cKO retina (see lines 853-855). In contrast, the number of BrdU-labeled progenitor cells at P1, P3, and P5 was not significantly different between cHet and cKO retinas, as presented in the new Figure S6C. These results suggest that although the total number of progenitor cells remain unchanged in cKO retinas, the M phase may be prolonged.

      (17) The results related to Figure 8 should be moved to a location before Figure 5, as Figure 8 is also related to the lamination defects.

      In the original manuscript, Figures 2–7 presented the phenotypes observed in the cKO retina, while Figure 8 addressed the possible cause of the lamination defects. Since the revised Figure 8 presents behavioral tests evaluating visual function, the phenotypic analyses are presented in the revised Figures 2–8. In response to the reviewers’ comments, we further analyzed the distribution of mitotic and progenitor cells during development and included these results as revised Figure 9.

      (18) In the DISCUSSION section on page 32, the authors state, "A few photoreceptor-bipolar cell-retinal ganglion cell (BC-RGC) pathways (vertical pathways of the retina) are inferred to be maintained in the cKO retina." The authors could verify this using retrograde transsynaptic tracing with a pseudorabies virus injected into the superior colliculus.

      Thank you for your interesting suggestion. This is an important point, and the recommended experiment idea sounds excellent. We attempted this analysis; however, the virus injected into the superior colliculus successfully labeled RGCs but failed to reach BCs and photoreceptors in normal mice. We guess that light stimulation evoked RGC firings evidently show that the photoreceptor-bipolar cell-retinal ganglion cell (BC-RGC) pathways function.

    2. Reviewer #2 (Public review):

      Summary:

      Ueno et al. described substantial changes in the Afadin knockout retina. These changes include decreased numbers of rods and cones, an increased number of bipolar cells, and disrupted somatic and synaptic organization of the outer limiting membrane, outer nuclear layer, outer plexiform layer. In contrast, the number and organization of amacrine cells and retinal ganglion cells remain relatively intact. They also observed changes in ERG responses, RGC receptive fields and functions, and visual behaviors. The morphological and function characterization of retinal cell types and laminations is detailed and relatively comprehensive.

    3. Reviewer #1 (Public review):

      Summary:

      The question of how central nervous system lamination defects affect functional integrity is an interesting yet debated topic. The authors investigated the role of afadin, a key adherens junction scaffolding protein, in retinal lamination and function using a retina-specific conditional knockout mouse model. Their findings show that the loss of Afadin caused severe outer retinal lamination defects, disrupting photoreceptor morphology, synapse numbers, and cell positioning, as demonstrated by histological analysis. Despite these structural impairments, retinal function was partially preserved: mERG detected small a- and b-waves, retinal ganglion cells responded to light, and behavioral tests confirmed residual visual function. This research offers new insights into the relationship between retinal lamination and neural circuit function, suggesting that altered retinal morphology does not completely eliminate the capacity for visual information processing.

      Strengths:

      The study effectively employs the well-organized laminar structure of the retina as an accessible model for investigating afadin's role in lamination within the central nervous system. High-quality histological, immunostaining, and electron microscopy images clearly reveal structural defects in the conditional knockout mice. The revised manuscript significantly enhances the findings by incorporating robust quantitative analyses of cell positioning, retinal thickness, and cell numbers, as well as new assessments of developmental defects. Additionally, new behavioral tests, including the optomotor response and visual cliff tests, have been introduced. Together with electrophysiological recordings, these additions compellingly demonstrate the partial preservation of visual function despite severe structural disruptions.

      Weaknesses:

      Overall, the study of the mechanisms remains weak. While the authors addressed concerns about molecular mechanisms by examining cell proliferation potentially related to Notch and Wnt signaling (Figure S6C, lines 868-870), the findings are largely negative (no significant changes in progenitor cell numbers), and the discussion of alternative pathways remains speculative.

    4. eLife Assessment

      This study demonstrates that conditional knockout of afadin disrupts retinal laminar organization and reduces the number of photoreceptors, while preserving certain aspects of retinal ganglion cell structure and light responsiveness. The work is valuable and well-supported by revised figures and comprehensive data on retinal cell types, lamination patterns, and visual functio. The findings are solid and intriguing, and the study provides insights into the relationship between retinal lamination and neural circuit function.

    1. eLife Assessment

      This valuable study employs a formalized computational model of learning to assess memory deficits in Alzheimer's Disease with the goal of developing an early diagnosis tool. Using an established mouse model of the disease, the authors studied multiple behavioral tasks and ages with the goal of showing similarities in behavioral deficits across tasks. Using the model, the authors indicate specific deficits in memory (overgeneralization and over differentiation) in mice with the transgene for the disease. The evidence presented is solid, yet certain concerns remain regarding the interpretation of the results of the modeling.

    2. Reviewer #1 (Public review):

      I applaud the authors' for providing a thorough response to my comments from the first round of review. The authors' have addressed the points I raised on the interpretation of the behavioral results as well as the validation of the model (fit to the data) by conducting new analyses, acknowledging the limitations where required and providing important counterpoints. As a result of this process, the manuscript has considerably improved. I have no further comments and recommend this manuscript for publication.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript proposes that the use of a latent cause model for assessment of memory-based tasks may provide improved early detection in Alzheimer's Disease as well as more differentiated mapping of behavior to underlying causes. To test the validity of this model, the authors use a previously described knock-in mouse model of AD and subject the mice to several behaviors to determine whether the latent cause model may provide informative predictions regarding changes in the observed behaviors. They include a well-established fear learning paradigm in which distinct memories are believed to compete for control of behavior. More specifically, it's been observed that animals undergoing fear learning and subsequent fear extinction develop two separate memories for the acquisition phase and the extinction phase, such that the extinction does not simply 'erase' the previously acquired memory. Many models of learning require the addition of a separate context or state to be added during the extinction phase and are typically modeled by assuming the existence of a new state at the time of extinction. The Niv research group, Gershman et al. 2017, have shown that the use of a latent cause model applied to this behavior can elegantly predict the formation of latent states based on a Bayesian approach, and that these latent states can facilitate the persistence of the acquisition and extinction memory independently. The authors of this manuscript leverage this approach to test whether deficits in production of the internal states, or the inference and learning of those states, may be disrupted in knock-in mice that show both a build-up of amyloid-beta plaques and a deterioration in memory as the mice age.

      Strengths:

      I think the authors' proposal to leverage the latent cause model and test whether it can lead to improved assessments in an animal model of AD is a promising approach for bridging the gap between clinical and basic research. The authors use a promising mouse model and apply this to a paradigm in which the behavior and neurobiology are relatively well understood - an ideal situation for assessing how a disease state may impact both the neurobiology and behavior. The latent cause model has the potential to better connect observed behavior to underlying causes and may pave a road for improved mapping of changes in behavior to neurobiological mechanisms in diseases such as AD.<br /> The authors also compare the latent cause model to the Rescorla-Wagner model and a latent state model allowing for better assessment of the latent cause model as a strong model for assessing reinstatement.

      Weaknesses:

      I have several substantial concerns which I've detailed below. These include important details on how the behavior was analyzed, how the model was used to assess the behavior, and the interpretations that have been made based on the model.<br /> (1) There is substantial data to suggest that during fear learning in mice separate memories develop for the acquisition and extinction phases, with the acquisition memory becoming more strongly retrieved during spontaneous recovery and reinstatement. The Gershman paper, cited by the authors, shows how the latent causal model can predict this shift in latent causes by allowing for the priors to decay over time, thereby increasing the posterior of the acquisition memory at the time of spontaneous recovery. In this manuscript, the authors suggest a similar mechanism of action for reinstatement, yet the model does not appear to return to the acquisition memory after reinstatement, at least based on the simulation and examples shown in figures 1 and 3. More specifically, in figure 1, the authors indicate that the posterior probability of the latent cause, z<sub>A</sub> (the putative acquisition memory), increases, partially leading to reinstatement. This does not appear to be the case as test 3 (day 36) appears to have similar posterior probabilities for z<sub>A</sub> as well as similar weights for the CS as compared to the last days of extinction. Rather, the model appears to mainly modify the weights in the most recent latent cause, z<sub>B</sub> - the putative the 'extinction state', during reinstatement. The authors suggest that previous experimental data have indicated that spontaneous recovery or reinstatement effects are due to an interaction of the acquisition and extinction memory. These studies have shown that conditioned responding at a later time point after extinction is likely due to a balance between the acquisition memory and the extinction memory, and that this balance can shift towards the acquisition memory naturally during spontaneous recovery, or through artificial activation of the acquisition memory or inhibition of the extinction memory (see Lacagnina et al. for example). Here the authors show that the same latent cause learned during extinction, z<sub>B</sub>, appears to dominate during the learning phase of reinstatement, with rapid learning to the context - the weight for the context goes up substantially on day 35 - in z<sub>B</sub>. This latent cause, z<sub>B</sub>, dominates at the reinstatement test, and due to the increased associative strength between the context and shock, there is a strong CR. For the simulation shown in figure 1, it's not clear why a latent cause model is necessary for this behavior. This leads to the next point.

      (2) The authors compared the latent cause model to the Rescorla-Wagner model. This is very commendable, particularly since the latent cause model builds upon the RW model, so it can serve as an ideal test for whether a more simplified model can adequately predict the behavior. The authors show that the RW model cannot successfully predict the increased CR during reinstatement (Appendix figure 1). Yet there are some issues with the way the authors have implemented this comparison:<br /> (2A) The RW model is a simplified version of the latent cause model and so should be treated as a nested model when testing, or at a minimum, the number of parameters should be taken into account when comparing the models using a method such as the Bayesian Information Criterion, BIC.<br /> (2B) The RW model provides the associative strength between stimuli and does not necessarily require a linear relationship between V and the CR. This is the case in the original RW model as well as in the LCM. To allow for better comparison between the models, the authors should be modeling the CR in the same manner (using the same probit function) in both models. In fact, there are many instances in which a sigmoid has been applied to RW associative strengths to predict CRs. I would recommend modeling CRs in the RW as if there is just one latent cause. Or perhaps run the analysis for the LCM with just one latent cause - this would effectively reduce the LCM to RW and keep any other assumptions identical across the models.<br /> (2C) In the paper, the model fits for the alphas in the RW model are the same across the groups. Were the alphas for the two models kept as free variables? This is an important question as it gets back to the first point raised. Because the modeling of the reinstatement behavior with the LCM appears to be mainly driven by latent cause z<sub>B</sub>, the extinction memory, it may be possible to replicate the pattern of results without requiring a latent cause model. For example, the 12-month-old App NL-G-F mice behavior may have a deficit in learning about the context. Within the RW model, if the alpha for context is set to zero for those mice, but kept higher for the other groups, say alpha_context = 0.8, the authors could potentially observe the same pattern of discrimination indices in figure 2G and 2H at test. Because the authors don't explicitly state which parameters might be driving the change in the DI, the authors should show in some way that their results cannot simply be due to poor contextual learning in the 12 month old App NL-G-F mice, as this can presumably be predicted by the RW model. The authors' model fits using RW don't show this, but this is because they don't consider this possibility that the alpha for context might be disrupted in the 12-month-old App NL-G-F mice. Of course, using the RW model with these alphas won't lead to as nice of fits of the behavior across acquisition, extinction, and reinstatement as the authors' LCM, the number of parameters are substantially reduced in the RW model. Yet the important pattern of the DI would be replicated with the RW model (if I'm not mistaken), which is the important test for assessment of reinstatement.

      (3) As stated by the authors in the introduction, the advantage of the fear learning approach is that the memory is modified across the acquisition-extinction-reinstatement phases. Although perhaps not explicitly stated by the authors, the post-reinstatement test (test 3) is the crucial test for whether there is reactivation of a previously stored memory, with the general argument being that the reinvigorated response to the CS can't simply be explained by relearning the CS-US pairing, because re-exposure the US alone leads to increase response to the CS at test. Of course there are several explanations for why this may occur, particularly when also considering the context as a stimulus. This is what I understood to be the justification for the use of a model, such as the latent cause model, that may better capture and compare these possibilities within a single framework. As such, it is critical to look at the level of responding to both the context alone and to the CS. It appears that the authors only look at the percent freezing during the CS, and it is not clear whether this is due to the contextual-US learning during the US re-exposure or to increased responding to the CS - presumably caused by reactivation of the acquisition memory. The authors do perform a comparison between the preCS and CS period, but it is not clear whether this is taken into account in the LCM. For example, the instance of the model shown in figure 1 indicates that the 'extinction cause', or cause z6, develops a strong weight for the context during the reinstatement phase of presenting the shock alone. This state then leads to increased freezing during the final CS probe test as shown in the figure. If they haven't already, I think the authors must somehow incorporate these different phases (CS vs ITI) into their model, particularly since this type of memory retrieval that depends on assessing latent states is specifically why the authors justified using the latent causal model. In more precise terms, it's not clear whether the authors incorporate a preCS/ITI period each day the cue is presented as a vector of just the context in addition to the CS period in which the vector contains both the context and the CS. Based on the description, it seemed to me that they only model the CRs during the CS period on days when the CS is presented, and thereby the context is only ever modeled on its own (as just the context by itself in the vector) on extinction days when the CS is not presented. If they are modeling both timepoints each day that the CS I presented, then I would recommend explicitly stating this in the methods section.

      (4) The authors fit the model using all data points across acquisition and learning. As one of the other reviewers has highlighted, it appears that there is a high chance for overfitting the data with the LCM. Of course, this would result in much better fits than models with substantially fewer free parameters, such as the RW model. As mentioned above, the authors should use a method that takes into account the number of parameters, such as the BIC.

      (5) The authors have stated that they do not think the Barnes maze task can be modeled with the LCM. Whether or not this is the case, if the authors do not model this data with the LCM, the Barnes maze data doesn't appear valuable to the main hypothesis. The authors suggest that more sophisticated models such as the LCM may be beneficial for early detection of diseases such as Alzheimer's, so the Barnes maze data is not valuable for providing evidence of this hypothesis. Rather, the authors make an argument that the memory deficits in the Barnes maze mimic the reinstatement effects providing support that memory is disrupted similarly in these mice. Although, the authors state that the deficits in memory retrieval are similar across the two tasks, the authors are not explicit as to the precise deficits in memory retrieval in the reinstatement task - it's a combination of overgeneralizing latent causes during acquisition, poor learning rate, over differentiation of the stimuli.

    4. Reviewer #3 (Public review):

      Summary:

      This paper seeks to identify underlying mechanisms contributing to memory deficits observed in Alzheimer's disease (AD) mouse models. By understanding these mechanisms, they hope to uncover insights into subtle cognitive changes early in AD to inform interventions for early-stage decline.

      Strengths:

      The paper provides a comprehensive exploration of memory deficits in an AD mouse model, covering early and late stages of the disease. The experimental design was robust, confirming age-dependent increases in Aβ plaque accumulation in the AD model mice and using multiple behavior tasks that collectively highlighted difficulties in maintaining multiple competing memory cues, with deficits most pronounced in older mice.

      In the fear acquisition, extinction, and reinstatement task, AD model mice exhibited a significantly higher fear response after acquisition compared to controls, as well as a greater drop in fear response during reinstatement. These findings suggest that AD mice struggle to retain the fear memory associated with the conditioned stimulus, with the group differences being more pronounced in the older mice.

      In the reversal Barnes maze task, the AD model mice displayed a tendency to explore the maze perimeter rather than the two potential target holes, indicating a failure to integrate multiple memory cues into their strategy. This contrasted with the control mice, which used the more confirmatory strategy of focusing on the two target holes. Despite this, the AD mice were quicker to reach the target hole, suggesting that their impairments were specific to memory retrieval rather than basic task performance.

      The authors strengthened their findings by analyzing their data with a leading computational model, which describes how animals balance competing memories. They found that AD mice showed somewhat of a contradiction: a tendency to both treat trials as more alike than they are (lower α) and similar stimuli as more distinct than they are (lower σx) compared to controls.

      Weaknesses:

      While conceptually solid, the model struggles to fit the data and to support the key hypothesis about AD mice's inability to retain competing memories. These issues are evident in Figure 3:

      (1) The model misses trends in the data, including the gradual learning of fear in all groups during acquisition, the absence of a fear response at the start of the experiment, and the faster return of fear during reinstatement compared to the gradual learning of fear during acquisition. It also underestimates the increase in fear at the start of day 2 of extinction, particularly in controls.

      (2) The model explains the higher fear response in controls during reinstatement largely through a stronger association to the context formed during the unsignaled shock phase, rather than to any memory of the conditioned stimulus from acquisition (as seen in Figure 3C). In the experiment, however, this memory does seem to be important for explaining the higher fear response in controls during reinstatement (as seen in Author Response Figure 3). The model does show a necessary condition for memory retrieval, which is that controls rely more on the latent causes from acquisition. But this alone is not sufficient, since the associations within that cause may have been overwritten during extinction. The Rescorla-Wagner model illustrates this point: it too uses the latent cause from acquisition (as it only ever uses a single cause across phases) but does not retain the original stimulus-shock memory, updating and overwriting it continuously. Similarly, the latent cause model may reuse a cause from acquisition without preserving its original stimulus-shock association.

      These issues lead to potential overinterpretation of the model parameters. The differences in α and σx are being used to make claims about cognitive processes (e.g., overgeneralization vs. over differentiation), but the model itself does not appear to capture these processes accurately.

      The authors could benefit from a model that better matches the data and captures the retention and retrieval of fear memories across phases. While they explored alternatives, including the Rescorla-Wagner model and a latent state model, these showed no meaningful improvement in fit. This highlights a broader issue: these models are well-motivated but may not fully capture observed behavior.

      Conclusion:

      Overall, the data support the authors' hypothesis that AD model mice struggle to retain competing memories, with the effect becoming more pronounced with age. While I believe the right computational model could highlight these differences, the current models fall short in doing so.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      I applaud the authors' for providing a thorough response to my comments from the first round of review. The authors' have addressed the points I raised on the interpretation of the behavioral results as well as the validation of the model (fit to the data) by conducting new analyses, acknowledging the limitations where required and providing important counterpoints. As a result of this process, the manuscript has considerably improved. I have no further comments and recommend this manuscript for publication.

      We are pleased that our revisions have addressed all the concerns raised by Reviewer #1.

      Reviewer #2 (Public review):

      Summary:

      This manuscript proposes that the use of a latent cause model for assessment of memory-based tasks may provide improved early detection in Alzheimer's Disease as well as more differentiated mapping of behavior to underlying causes. To test the validity of this model, the authors use a previously described knock-in mouse model of AD and subject the mice to several behaviors to determine whether the latent cause model may provide informative predictions regarding changes in the observed behaviors. They include a well-established fear learning paradigm in which distinct memories are believed to compete for control of behavior. More specifically, it's been observed that animals undergoing fear learning and subsequent fear extinction develop two separate memories for the acquisition phase and the extinction phase, such that the extinction does not simply 'erase' the previously acquired memory. Many models of learning require the addition of a separate context or state to be added during the extinction phase and are typically modeled by assuming the existence of a new state at the time of extinction. The Niv research group, Gershman et al. 2017, have shown that the use of a latent cause model applied to this behavior can elegantly predict the formation of latent states based on a Bayesian approach, and that these latent states can facilitate the persistence of the acquisition and extinction memory independently. The authors of this manuscript leverage this approach to test whether deficits in production of the internal states, or the inference and learning of those states, may be disrupted in knock-in mice that show both a build-up of amyloid-beta plaques and a deterioration in memory as the mice age.

      Strengths:

      I think the authors' proposal to leverage the latent cause model and test whether it can lead to improved assessments in an animal model of AD is a promising approach for bridging the gap between clinical and basic research. The authors use a promising mouse model and apply this to a paradigm in which the behavior and neurobiology are relatively well understood - an ideal situation for assessing how a disease state may impact both the neurobiology and behavior. The latent cause model has the potential to better connect observed behavior to underlying causes and may pave a road for improved mapping of changes in behavior to neurobiological mechanisms in diseases such as AD.

      The authors also compare the latent cause model to the Rescorla-Wagner model and a latent state model allowing for better assessment of the latent cause model as a strong model for assessing reinstatement.

      Weaknesses:

      I have several substantial concerns which I've detailed below. These include important details on how the behavior was analyzed, how the model was used to assess the behavior, and the interpretations that have been made based on the model.

      (1) There is substantial data to suggest that during fear learning in mice separate memories develop for the acquisition and extinction phases, with the acquisition memory becoming more strongly retrieved during spontaneous recovery and reinstatement. The Gershman paper, cited by the authors, shows how the latent causal model can predict this shift in latent causes by allowing for the priors to decay over time, thereby increasing the posterior of the acquisition memory at the time of spontaneous recovery. In this manuscript, the authors suggest a similar mechanism of action for reinstatement, yet the model does not appear to return to the acquisition memory after reinstatement, at least based on the simulation and examples shown in figures 1 and 3. More specifically, in figure 1, the authors indicate that the posterior probability of the latent cause,z<sub>A</sub> (the putative acquisition memory), increases, partially leading to reinstatement. This does not appear to be the case as test 3 (day 36) appears to have similar posterior probabilities for z<sub>A</sub> as well as similar weights for the CS as compared to the last days of extinction. Rather, the model appears to mainly modify the weights in the most recent latent cause, z<sub>B</sub> - the putative the 'extinction state', during reinstatement. The authors suggest that previous experimental data have indicated that spontaneous recovery or reinstatement effects are due to an interaction of the acquisition and extinction memory. These studies have shown that conditioned responding at a later time point after extinction is likely due to a balance between the acquisition memory and the extinction memory, and that this balance can shift towards the acquisition memory naturally during spontaneous recovery, or through artificial activation of the acquisition memory or inhibition of the extinction memory (see Lacagnina et al. for example). Here the authors show that the same latent cause learned during extinction, z<sub>B</sub>, appears to dominate during the learning phase of reinstatement, with rapid learning to the context - the weight for the context goes up substantially on day 35 - in z<sub>B</sub>. This latent cause, z<sub>B</sub>, dominates at the reinstatement test, and due to the increased associative strength between the context and shock, there is a strong CR. For the simulation shown in figure 1, it's not clear why a latent cause model is necessary for this behavior. This leads to the next point.

      We would like to first clarify that our behavioral paradigm did not last for 36 days, as noted by the reviewer. Our reinstatement paradigm contained 7 phases and 36 trials in total: acquisition (3 trials), test 1 (1 trial), extinction 1 (19 trials), extinction 2 (10 trials), test 2 (1 trial), unsignaled shock (1 trial), test 3 (1 trial). The day is labeled under each phase in Figure 2A. 

      We have provided explanations on how the reinstatement is explained by the latent cause model in the first round of the review. Briefly, both acquisition and extinction latent causes contribute to the reinstatement (test 3). The former retains the acquisition fear memory, and the latter has the updated w<sub>context</sub> from unsignaled shock. Although the reviewer is correct that the z<sub>B</sub> in Figure 1D makes a great contribution during the reinstatement, we would like to argue that the elevated CR from test 2 (trial 34) to test 3 (trial 36) is the result of the interaction between z<sub>A</sub> and z<sub>B</sub>.

      We provided Author response image 1 using the same data in Figure 1D and 1E to further clarify this point. The posterior probability of z<sub>A</sub> increased after an unsignaled shock (trial 35), which may be attributed to the return of acquisition fear memory. The posterior probability of z<sub>A</sub> then decreased again after test 3 (trial 36) because there was no shock in this trial. Along with the weight change, the expected shock change substantially in these three trials, resulting in reinstatement. Note that the mapping of expected shock to CR in the latent cause model is controlled by parameter θ and λ. Once the expected shock exceeds the threshold θ, the CR will increase rapidly if λ is smaller.

      Lastly, accepting the idea that separate memories are responsible for acquisition and extinction in the memory modification paradigm, the latent cause model (LCM) is a rational candidate modeling this idea. Please see the following reply on why a simple model like the Rescorla-Wagner (RW) model is not sufficient to fully explain the behaviors observed in this study.

      Author response image 1.

      The sum posterior probability (A), the sum of associative weight of CS (B), and the sum of associative weight of context (C) of acquisition and extinction latent causes in Figure 1D and 1E.

      (2) The authors compared the latent cause model to the Rescorla-Wagner model. This is very commendable, particularly since the latent cause model builds upon the RW model, so it can serve as an ideal test for whether a more simplified model can adequately predict the behavior. The authors show that the RW model cannot successfully predict the increased CR during reinstatement (Appendix figure 1). Yet there are some issues with the way the authors have implemented this comparison:

      (2A) The RW model is a simplified version of the latent cause model and so should be treated as a nested model when testing, or at a minimum, the number of parameters should be taken into account when comparing the models using a method such as the Bayesian Information Criterion, BIC.

      We acknowledge that the number of parameters was not taken into consideration when we compared the models. We thank the reviewer for the suggestion to use the Bayesian Information Criterion (BIC). However, we did not use BIC in this study for the following reasons. We wanted a model that can explain fear conditioning, extinction and reinstatement, so our first priority is to fit the test phases. Models that simulate CRs well in non-test phases can yield lower BIC values even if they fail to capture reinstatement. When we calculate the BIC by using the half normal distribution (μ = 0, σ \= 0.3) as the likelihood for prediction error in each trial, the BIC of the 12-month-old control is -37.21 for the RW model (Appendix 1–figure 1C) and -11.60 for the LCM (Figure 3C). Based on this result, the RW model would be preferred, yet the LCM was penalized by the number of parameters, even though it fit better in trial 36. Because we did not think this aligned with our purpose to model reinstatement, we chose to rely on the practical criteria to determine whether the estimated parameter set is accepted or not for our purpose (see Materials and Methods). The number of accepted samples can thus roughly be seen as the model's ability to explain the data in this study. These exclusion criteria then created imbalances in accepted samples across models (Appendix 1–figure 2). In the RW model, only one or two samples met the criteria, preventing meaningful statistical comparisons of BIC within each group. Overall, though we agreed that BIC is one of the reasonable metrics in model comparison, we did not think it aligns with our purpose in this study.

      (2B) The RW model provides the associative strength between stimuli and does not necessarily require a linear relationship between V and the CR. This is the case in the original RW model as well as in the LCM. To allow for better comparison between the models, the authors should be modeling the CR in the same manner (using the same probit function) in both models. In fact, there are many instances in which a sigmoid has been applied to RW associative strengths to predict CRs. I would recommend modeling CRs in the RW as if there is just one latent cause. Or perhaps run the analysis for the LCM with just one latent cause - this would effectively reduce the LCM to RW and keep any other assumptions identical across the models.

      Regarding the suggestion to run the analysis using the LCM with one latent cause, we agree that this method is almost identical to the RW model, which is also mentioned in the original paper (Gershman et al., 2017). Importantly, it would also eliminate the RW model’s advantage of assigning distinct learning rates to different stimuli, highlighted in the next comment (2C).

      We thank the reviewer for suggesting applying the transformation of associative strength (V) to CR as in the LCM. We examined this possibility by heuristically selecting parameter values to test how such a transformation would influence the RW model (Author response image 2A). Specifically, we set α<sub>CS</sub> = 0.5, α<sub>context</sub> \= 1, β = 1, and introduced the additional parameters θ and λ, as in the LCM. This parameter set is determined heuristically to address the reviewer’s concern about a higher learning rate of context. The dark blue line is the plain associative strength. The remaining lines are CR curves under different combinations of θ and λ.

      Consistent with the reviewer’s comment, under certain parameter settings (θ \= 0.01, λ = 0.01), the extended RW model can reproduce higher CRs at test 3, thereby approximating the discrimination index observed in the 12-month-old control group. However, this modification changes the characteristics of CRs in other phases from those in the plain RW model. In the acquisition phase, the CRs rise more sharply. In the extinction phase, the CRs remain high when θ is small. Though changing λ can modulate the steepness, the CR curve is flat on the second day of the extinction phase, which does not reproduce the pattern in observed data (Figure 2B). These trade-offs suggest that the RW model with the sigmoid transformation does not improve fit quality and, in fact, sacrifices features that were well captured by simpler RW simulations (Appendix 1–figure 1A to 1D). To further evaluate this extended RW model (RW*), we applied the same parameter estimation method used in the LCM for individual data (see Materials and Methods). For each animal, α<sub>CS</sub>, α<sub>context</sub>, β, θ, and λ were estimated with their lower and upper bounds set as previously described (see Appendix 1, Materials and Methods). The results showed that the number of accepted samples slightly increased compared to the RW model without sigmoidal transformation of CR (RW* vs. RW in Author response image 2B, 2C). However, this improvement did not surpass the LCM (RW* vs. LCM in Author response image 2B, Author response image 1C). Overall, these results suggest that while using the same method to map the expected shock to CR, the RW model does not outperform the LCM. Practically, further extension, such as adding novel terms, might improve the fitting level. We would like to note that such extensions should be carefully validated if they are reasonable and necessary for an internal model, which is beyond the scope of this study. We hope this addresses the reviewer's concerns about the implementation of the RW model. 

      Author response image 2.

      Simulation (A) and parameter estimation (B and C) in the extended Rescorla-Wagner model.

      (2C) In the paper, the model fits for the alphas in the RW model are the same across the groups. Were the alphas for the two models kept as free variables? This is an important question as it gets back to the first point raised. Because the modeling of the reinstatement behavior with the LCM appears to be mainly driven by latent cause z<sub>B</sub>, the extinction memory, it may be possible to replicate the pattern of results without requiring a latent cause model. For example, the 12-month-old App NL-G-F mice behavior may have a deficit in learning about the context. Within the RW model, if the alpha for context is set to zero for those mice, but kept higher for the other groups, say alpha_context = 0.8, the authors could potentially observe the same pattern of discrimination indices in figure 2G and 2H at test. Because the authors don't explicitly state which parameters might be driving the change in the DI, the authors should show in some way that their results cannot simply be due to poor contextual learning in the 12 month old App NL-G-F mice, as this can presumably be predicted by the RW model. The authors' model fits using RW don't show this, but this is because they don't consider this possibility that the alpha for context might be disrupted in the 12-month-old App NL-G-F mice. Of course, using the RW model with these alphas won't lead to as nice of fits of the behavior across acquisition, extinction, and reinstatement as the authors' LCM, the number of parameters are substantially reduced in the RW model. Yet the important pattern of the DI would be replicated with the RW model (if I'm not mistaken), which is the important test for assessment of reinstatement.

      We would like to clarify that we estimated three parameters in the RW model for individuals:  α<sub>CS</sub>,  α<sub>context</sub>, and β. Even if we did so, many samples did not satisfy our criteria (Appendix 1–figure 2). Please refer to the “Evaluation of model fit” in Appendix 1 and the legend of Appendix 1–figure 1A to 1D, where we have written the estimated parameter values.

      We did not agree that paralyzing the contextual learning by setting  α<sub>context</sub>  as 0 in the RW model can explain the CR curve of 12-month-old AD mice well. Specifically, the RW model cannot capture the between-day extinction dynamics (i.e., the increase in CR at the beginning of day 2 extinction)  and the higher CR at test 3 relative to test 2 (i.e., DI between test 3 and test 2 is greater than 0.5). In addition, because the context input (= 0.2) was relatively lower than the CS input (= 1), and there is only a single unsignaled shock trial, even setting  α<sub>context</sub> = 1 results in only a limited increase in CR (Appendix 1–figure 1A to 1D; see also Author response image 2 9). Thus, the RW model cannot replicate the reinstatement effect or the critical pattern of discrimination index, even under conditions of stronger contextual learning.  

      (3) As stated by the authors in the introduction, the advantage of the fear learning approach is that the memory is modified across the acquisition-extinction-reinstatement phases. Although perhaps not explicitly stated by the authors, the post-reinstatement test (test 3) is the crucial test for whether there is reactivation of a previously stored memory, with the general argument being that the reinvigorated response to the CS can't simply be explained by relearning the CS-US pairing, because re-exposure the US alone leads to increase response to the CS at test. Of course there are several explanations for why this may occur, particularly when also considering the context as a stimulus. This is what I understood to be the justification for the use of a model, such as the latent cause model, that may better capture and compare these possibilities within a single framework. As such, it is critical to look at the level of responding to both the context alone and to the CS. It appears that the authors only look at the percent freezing during the CS, and it is not clear whether this is due to the contextual-US learning during the US re-exposure or to increased responding to the CS - presumably caused by reactivation of the acquisition memory. The authors do perform a comparison between the preCS and CS period, but it is not clear whether this is taken into account in the LCM. For example, the instance of the model shown in figure 1 indicates that the 'extinction cause', or cause z6, develops a strong weight for the context during the reinstatement phase of presenting the shock alone. This state then leads to increased freezing during the final CS probe test as shown in the figure. If they haven't already, I think the authors must somehow incorporate these different phases (CS vs ITI) into their model, particularly since this type of memory retrieval that depends on assessing latent states is specifically why the authors justified using the latent causal model. In more precise terms, it's not clear whether the authors incorporate a preCS/ITI period each day the cue is presented as a vector of just the context in addition to the CS period in which the vector contains both the context and the CS. Based on the description, it seemed to me that they only model the CRs during the CS period on days when the CS is presented, and thereby the context is only ever modeled on its own (as just the context by itself in the vector) on extinction days when the CS is not presented. If they are modeling both timepoints each day that the CS I presented, then I would recommend explicitly stating this in the methods section.

      In this study, we did not model the preCS freezing rate, and we thank the reviewer for the suggestion to model preCS periods as separate context-only trials. In our view, however, this approach is not consistent with the assumptions of the LCM. Our rationale is that the available periods of context and the CS are different. We assume that observation of the context lasts from preCS to CS. If we simulate both preCS (context) and CS (context and tone), the weight of context would be updated twice. Instead, we follow the same method as described in the original code from Gershman et al. (2017) to consider the context effect. We agree that explicitly modeling preCS could provide additional insights, but we believe it would require modifying or extending the LCM. We consider this an important direction for future research, but it is outside the scope of this study.

      (4) The authors fit the model using all data points across acquisition and learning. As one of the other reviewers has highlighted, it appears that there is a high chance for overfitting the data with the LCM. Of course, this would result in much better fits than models with substantially fewer free parameters, such as the RW model. As mentioned above, the authors should use a method that takes into account the number of parameters, such as the BIC.

      Please refer to the reply to public review (2A) for the reason we did not take the suggestion to use BIC. In addition, we feel that we have adequately addressed the concern of overfitting in the first round of the review. 

      (5) The authors have stated that they do not think the Barnes maze task can be modeled with the LCM. Whether or not this is the case, if the authors do not model this data with the LCM, the Barnes maze data doesn't appear valuable to the main hypothesis. The authors suggest that more sophisticated models such as the LCM may be beneficial for early detection of diseases such as Alzheimer's, so the Barnes maze data is not valuable for providing evidence of this hypothesis. Rather, the authors make an argument that the memory deficits in the Barnes maze mimic the reinstatement effects providing support that memory is disrupted similarly in these mice. Although, the authors state that the deficits in memory retrieval are similar across the two tasks, the authors are not explicit as to the precise deficits in memory retrieval in the reinstatement task - it's a combination of overgeneralizing latent causes during acquisition, poor learning rate, over differentiation of the stimuli.

      We would like to clarify that we valued the latent cause model not solely because it is more sophisticated and fits more data points, but it is an internal model that implicates the cognitive process. Please also see the reply to the recommendations to authors (3) about the reason why we did not take the suggestion to remove this data.

      Reviewer #3 (Public review):

      Summary:

      This paper seeks to identify underlying mechanisms contributing to memory deficits observed in Alzheimer's disease (AD) mouse models. By understanding these mechanisms, they hope to uncover insights into subtle cognitive changes early in AD to inform interventions for early-stage decline.

      Strengths:

      The paper provides a comprehensive exploration of memory deficits in an AD mouse model, covering early and late stages of the disease. The experimental design was robust, confirming age-dependent increases in Aβ plaque accumulation in the AD model mice and using multiple behavior tasks that collectively highlighted difficulties in maintaining multiple competing memory cues, with deficits most pronounced in older mice.

      In the fear acquisition, extinction, and reinstatement task, AD model mice exhibited a significantly higher fear response after acquisition compared to controls, as well as a greater drop in fear response during reinstatement. These findings suggest that AD mice struggle to retain the fear memory associated with the conditioned stimulus, with the group differences being more pronounced in the older mice.

      In the reversal Barnes maze task, the AD model mice displayed a tendency to explore the maze perimeter rather than the two potential target holes, indicating a failure to integrate multiple memory cues into their strategy. This contrasted with the control mice, which used the more confirmatory strategy of focusing on the two target holes. Despite this, the AD mice were quicker to reach the target hole, suggesting that their impairments were specific to memory retrieval rather than basic task performance.

      The authors strengthened their findings by analyzing their data with a leading computational model, which describes how animals balance competing memories. They found that AD mice showed somewhat of a contradiction: a tendency to both treat trials as more alike than they are (lower α) and similar stimuli as more distinct than they are (lower σx) compared to controls.

      Weaknesses:

      While conceptually solid, the model struggles to fit the data and to support the key hypothesis about AD mice's inability to retain competing memories. These issues are evident in Figure 3:

      (1) The model misses trends in the data, including the gradual learning of fear in all groups during acquisition, the absence of a fear response at the start of the experiment, and the faster return of fear during reinstatement compared to the gradual learning of fear during acquisition. It also underestimates the increase in fear at the start of day 2 of extinction, particularly in controls.

      (2) The model explains the higher fear response in controls during reinstatement largely through a stronger association to the context formed during the unsignaled shock phase, rather than to any memory of the conditioned stimulus from acquisition (as seen in Figure 3C). In the experiment, however, this memory does seem to be important for explaining the higher fear response in controls during reinstatement (as seen in Author Response Figure 3). The model does show a necessary condition for memory retrieval, which is that controls rely more on the latent causes from acquisition. But this alone is not sufficient, since the associations within that cause may have been overwritten during extinction. The Rescorla-Wagner model illustrates this point: it too uses the latent cause from acquisition (as it only ever uses a single cause across phases) but does not retain the original stimulus-shock memory, updating and overwriting it continuously. Similarly, the latent cause model may reuse a cause from acquisition without preserving its original stimulus-shock association.

      These issues lead to potential overinterpretation of the model parameters. The differences in α and σx are being used to make claims about cognitive processes (e.g., overgeneralization vs. over differentiation), but the model itself does not appear to capture these processes accurately.

      The authors could benefit from a model that better matches the data and captures the retention and retrieval of fear memories across phases. While they explored alternatives, including the Rescorla-Wagner model and a latent state model, these showed no meaningful improvement in fit. This highlights a broader issue: these models are well-motivated but may not fully capture observed behavior.

      Conclusion:

      Overall, the data support the authors' hypothesis that AD model mice struggle to retain competing memories, with the effect becoming more pronounced with age. While I believe the right computational model could highlight these differences, the current models fall short in doing so.

      We thank the reviewer for the insightful comments. For the comments (1) and (2), please refer to our previous author response to comments #26 and #27. We recognize that the models tested in this study have limitations and, as noted, do not fully capture all aspects of the observed behavioral data. We see this as an important direction for future research and value the reviewer’s suggestions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I have maintained some of the main concerns included in the first round of reviews as I think they remain concerns with the new draft, even though the authors have included substantially more analysis of their data, which is appreciated. I particularly found the inclusion of the comparative modeling valuable, although I think the analysis comparing the models should be improved.

      (1) This relates to point 1 in the public assessment or #16 in the response to reviewers from the authors. The authors raise the point that even a low posterior can drive behavioral expression (lines 361-365 in the response to authors), and so the acquisition latent cause may partially drive reinstatement. Yet in the stimulation shown in figure 1D, this does not seem to be the case. As I mentioned in the public response, in figure 1, the posteriors for z<sub>A</sub> are similar on day 34 and day 36, yet only on day 36 is there a strong CR. At least in this example, it does not appear that z<sub>A</sub> contributes to the increased responding from day 34 (test 2) to day 36 (test 3). There may be a slight increase in z1 in figure 3C, but the dominant change from day 34 to day 36 appears to be the increase in the posterior of z3 and the substantial increase in w3. The authors then cite several papers which have shown the shift in balance between what it is the putative acquisition memory and extinction memory (i.e. Lacagnina et al.). Yet I do not see how this modeling fits with most of the previous findings. For example, in the Lacagnina et al. paper, activation of the acquisition ensemble or inhibition of the extinction ensemble drives freezing, whereas the opposite pattern reduces freezing. What appears to be the pattern in the modeling in this paper is primarily learning of context in the extinction latent cause to predict the shock. As I mention in point 2C of the public review, it's not clear why this pattern of results would require a latent cause model. Would a high alpha for context and not the CS not give a similar pattern of results in the RW model? At least for giving similar results of the DIs in figure 2?

      First, we would like to clarify that the x-axis in Figure 1D is labeled “Trial,” not “Day.” Please refer to the reply to public review (1), where we clarified the posterior probability of the latent cause from trials 34 to 36. Second, although we did not have direct neural circuit evidence in this study, we discussed the similarities between previous findings and the modeling in the first review. Briefly, our main point focuses on the interaction between acquisition and extinction memory. In other words, responses at different times arise from distinct internal states made up of competing memories. We assume that the reviewer expects a modeling result showing nearly full recovery of acquisition memory, which aligns with previous findings where optogenetic activation of the acquisition engram can partially mimic reinstatement (Zaki et al., 2022; see also the response to comment #12 in the first round of review). We acknowledge that such a modeling result cannot be achieved with the latent cause model and see it as a potential future direction for model improvement.

      Please also refer to the reply to public review (2) about how a high alpha for context in the RW model cannot explain the pattern we observed in the reinstatement paradigm.

      (2) This is related to point 3 in the public comments and #13 in the response to reviewers. I raised the question of comparing the preCS/ITI period with the CS period, but my main point was why not include these periods in the LCM itself as mentioned in more detail in point 3 in the current public review. The inclusion of the comparisons the authors performed helped, but my main point was that the authors could have a better measure of wcontext if they included the preCS period as a stimulus each day (when only the context is included in the stimulus). This would provide better estimates of wcontext. As stated in the public review, perhaps the authors did this, but my understanding of the methods this was not the case, rather, it seems the authors only included the CS period for CRs within the model (at least on days when the CS was present).

      Please refer to the reply to public review (3) about the reason why we did not model the preCS freezing rate.

      (3) This relates to point 4 in the public review and #15 and #24 in the response to authors. The authors have several points for why the two experiments are similar and how results may be extrapolated - lines 725-733. The first point is that associative learning is fundamental in spatial learning. I'm not sure that this broad connection between the two studies is particularly insightful for why one supports the other as associative learning is putatively involved in most behavioral tasks. In the second point about reversals, why not then use a reversal paradigm that would be easier to model with LCM? This data is certainly valuable and interesting, yet I don't think it's helpful for this paper to state qualitatively the similarities in the potential ways a latent cause framework might predict behavior on the Barnes maze. I would recommend that the authors either model the behavior with LCM, remove the experiment from the paper, or change the framing of the paper that LCM might be an ideal approach for early detection of dementia or Alzheimer's disease.

      We would like to clarify that our aim was not to present the LCM as an ideal tool for early detection of AD symptoms. Rather, our focus is on the broader idea of utilizing internal models and estimating individual internal states in early-stage AD. Regarding using a reversal paradigm that would be easier to model with LCM, the most straightforward approach is to use another type of paradigm for fear conditioning, then to examine the extent to which similar behavioral characteristics are observed between paradigms within subjects. However, re-exposing the same mice to such paradigms is constrained by strong carry-over effects, limiting the feasibility of this experiment. Other behavioral tasks relevant to AD that avoid shock generally involve action selection for subsequent observation (Webster et al., 2014), which falls outside the structure of LCM. Our rationale for including the Barnes maze task is that spatial memory deficit is implicated in the early stage of AD, making it relevant for translational research. While we acknowledge that exact modeling of Barnes maze behavior would require a more sophisticated model (as discussed in the first round of review), our intention to use the reversal Barnes maze paradigm is to suggest a presumable memory modification learning in a non-fear conditioning paradigm. We also discussed whether similar deficits in memory modification could be observed across two behavioral tasks.

      (4) Reviewer # mentioned that the change in pattern of behavior only shows up in the older mice questioning the clinical relevance of early detection. I do think this is a valid point and maybe should be addressed. There does seem to be a bit of a bump in the controls on day 23 that doesn't appear in the 6-month group. Perhaps this was initially a spontaneous recovery test indicated by the dotted vertical line? This vertical line does not appear to be defined in the figure 1 legend, nor in figures 2 and 3.

      We would like to emphasize that the App<sup>NL-G-F</sup> knock-in mouse is widely considered a model of early-stage AD, characterized by Aβ accumulation with little to no neurofibrillary tangle pathology or neuronal loss (see Introduction). By examining different ages, we can assess the contribution of both the amount and duration of Aβ accumulation as well as age-related factors. Modeling the deficit in the memory modification process in the older App<sup>NL-G-F</sup> knock-in mice, we suggested a diverged internal state in early-stage AD in older age, and this does not diminish the relevance of the model for studying early cognitive changes in AD.

      We would also like to clarify again that the x-axis in the figure is “Trial,” not “Day.” The vertical dashed lines in these figures indicate phase boundaries, and they were defined in the figure legend: in Figure 1C, “The vertical dashed lines separate the phases.”; in Figure 2B, “The dashed vertical line separates the extinction 1 and extinction 2 phases.”; in Figure 3, “The vertical dashed lines indicate the boundaries of phases.”

      (5) Are the examples in figure 3 good examples? The example for the 12-month-old control shows a substantial increase in weights for the context during test 3, but not for the CS. Yet in the bar plots in Figure 4 G and H, this pattern seems to be different. The weights for the context appear to substantially drop in the "after extinction" period as compared to the "extinction" period. It's hard to tell the change from "extinction" to "after extinction" for the CS weights (the authors change the y-axis for the CS weights but not for the context weights from panels G to H).

      We would like to clarify that in Figure 3C, the increase in weights for context is not presented during test 3 (trial 36), noted by the reviewer; rather, it is the unsignaled shock phase (trial 35).

      We assumed that the reviewer might misunderstand that the labels on the left in Figure 4, “Acquisition”, “Extinction”, and “After extinction”, indicate the time point. However, the data shown in Figure 4C to 4H are all from the same time point: test 3 (trial 36). The grouping reflects the classification of latent causes based on the trial in which they were inferred. In addition, for Figures 4G and 4H, the y‐axis limits were not set identically because the data range for “Sum of w<sub>CS</sub>” varied. This was done to ensure the visibility of all data points. In Figure 4, each dot represents one animal. Take Figure 3D as an example. The point in Figure 4G is the sum of w3 and w4 in trial 36, and the point in Figure 4H is w5 in trial 36, note that the subscript numerals indicate latent cause index. We hope this addresses the reviewer’s question about the difference between the two figures.


      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors show certain memory deficits in a mouse knock-in model of Alzheimer's Disease (AD). They show that the observed memory deficits can be explained by a computational model, the latent cause model of associative memory. The memory tasks used include the fear memory task (CFC) and the 'reverse' Barnes maze. Research on AD is important given its known huge societal burden. Likewise, better characterization of the behavioral phenotypes of genetic mouse models of AD is also imperative to advance our understanding of the disease using these models. In this light, I applaud the authors' efforts.

      Strengths:

      (1) Combining computational modelling with animal behavior in genetic knock-in mouse lines is a promising approach, which will be beneficial to the field and potentially explain any discrepancies in results across studies as well as provide new predictions for future work.

      (2) The authors' usage of multiple tasks and multiple ages is also important to ensure generalization across memory tasks and 'modelling' of the progression of the disease.

      Weaknesses:

      [#1] (1) I have some concerns regarding the interpretation of the behavioral results. Since the computational model then rests on the authors' interpretation of the behavioral results, it, in turn, makes judging the model's explanatory power difficult as well. For the CFC data, why do knock-in mice have stronger memory in test 1 (Figure 2C)? Does this mean the knock-in mice have better memory at this time point? Is this explained by the latent cause model? Are there some compensatory changes in these mice leading to better memory? The authors use a discrimination index across tests to infer a deficit in re-instatement, but this indicates a relative deficit in re-instatement from memory strength in test 1. The interpretation of these differential DIs is not straightforward. This is evident when test 1 is compared with test 2, i.e., the time point after extinction, which also shows a significant difference across groups, Figure 2F, in the same direction as the re-instatement. A clarification of all these points will help strengthen the authors' case.

      We appreciate the reviewer for the critical comments. According to the latent cause framework, the strength of the memory is influenced by at least 2 parameters: associative weight between CS and US given a latent cause and posterior probability of the latent cause. The modeling results showed that a higher posterior probability of acquisition latent cause, but not higher associative weight, drove the higher test 1 CR in App<sup>NL-G-F</sup> mice (Results and Discussion; Figure 4 – figure supplement 3B, 3C). In terms of posterior, we agree that App<sup>NL-G-F</sup> mice have strong fear memory. On the other hand, this suggests that App<sup>NL-G-F</sup> mice exhibited a tendency toward overgeneralization, favoring modification of old memories, which adversely affected the ability to retain competing memories. The strong memory in test 1 would be a compensatory effect of overgeneralization.    

      To estimate the magnitude of reinstatement, at least, one would have to compare CRs between test 2 (extinction) and test 3 (reinstatement), as well as those between test 1 (acquisition) and test 3. These comparisons represent the extent to which the memory at the reinstatement is far from that in the extinction, and close to that in the acquisition. Since discrimination index (DI) has been widely used as a normalized measure to evaluate the extent to which the system can distinguish between two conditions, we applied DI consistently to behavioral and simulated data in the reinstatement experiment, and the behavioral data in the reversal Barnes maze experiment, allowing us to evaluate the discriminability of an agent in these experiments. In addition, we used DI to examine its correlation with estimated parameters, enabling us to explore how individual discriminability may relate to the internal state. We have already discussed the differences in DI between test 3 and test 1, as well as CR in test 1 between control and App<sup>NL-G-F</sup> in the manuscript and further elaborated on this point in Line 232, 745-748.   

      [#2] (2) I have some concerns regarding the interpretation of the Barnes maze data as well, where there already seems to be a deficit in the memory at probe test 1 (Figure 6C). Given that there is already a deficit in memory, would not a more parsimonious explanation of the data be that general memory function in this task is impacted in these mice, rather than the authors' preferred interpretation? How does this memory weakening fit with the CFC data showing stronger memories at test 1? While I applaud the authors for using multiple memory tasks, I am left wondering if the authors tried fitting the latent cause model to the Barnes maze data as well.

      While we agree that the deficits shown in probe test 1 may imply impaired memory function in App<sup>NL-G-F</sup> mice in this task, it would be difficult to explain this solely in terms of impairments in general memory function. The learning curve and the daily strategy changes suggested that App<sup>NL-G-F</sup> mice would have virtually intact learning ability in the initial training phase (Figure 6B, 6F, Figure 6 – figure supplement 1 and 3). For the correspondence relationship between the reinstatement and the reversal Barnes maze learning from the aspect of memory modification process, please also see our reply to comment #24. We have explained why we did not fit the latent cause model to the Barnes maze data in the provisional response.

      [#3] (3) Since the authors use the behavioral data for each animal to fit the model, it is important to validate that the fits for the control vs. experimental groups are similar to the model (i.e., no significant differences in residuals). If that is the case, one can compare the differences in model results across groups (Figures 4 and 5). Some further estimates of the performance of the model across groups would help.

      We have added the residual (i.e., observed CR minus simulated CR) in Figure 3 – figure supplement 1D and 1E. The fit was similar between control and App<sup>NL-G-F</sup> mice groups in the test trials, except test 3 in the 12-month-old group. The residual was significantly higher in the 12-month-old control mice than App<sup>NL-G-F</sup> mice, suggesting the model underestimated the reinstatement in the control, yet the DI calculated from the simulated CR replicates the behavioral data (Figure 3 – figure supplement 1A to 1C). These results suggest that the latent cause model fits our data with little systematic bias such as an overestimation of CR for the control group in the reinstatement, supporting the validity of the comparisons in estimated parameters between groups. These results and discussion have been added in the manuscript Line 269-276.

      One may notice that the latent cause model overestimated the CR in acquisition trials in all groups in Figure 3 – figure supplement 1D and 1E. We have discussed this point in the reply to comment #26, 34 questioned by reviewer 3.

      [#4] (4) Is there an alternative model the authors considered, which was outweighed in terms of prediction by this model? 

      Yes, we have further evaluated two alternative models: the Rescorla-Wagner (RW; Rescorla & Wagner, 1972) model and the latent state model (LSM; Cochran & Cisler, 2019). The RW model serves as a baseline, given its known limitations in explaining fear return after extinction. The LSM is another contemporary model that shares several concepts with the latent cause model (LCM) such as building upon the RW model, assuming a latent variable inferred by Bayes’ rule, and involving a ruminative update for memory modification. We evaluated the three models in terms of the prediction accuracy and reproducibility of key behavioral features. Please refer to the Appendix 1 for detailed methods and results for these two models.

      As expected, the RW model fit well to the data till the end of extinction but failed to reproduce reinstatement (Appendix 1 – figure 1A to 1D). Due to a large prediction error in test 3, few samples met the acceptance criteria we set (Appendix 1 – figure 2 and 3A). Conversely, the LSM reproduced reinstatement, as well as gradual learning in acquisition and extinction phases, particularly in the 12month-old control (Appendix 1 – figure 1G). The number of accepted samples in the LSM was higher than in the RW model but generally lower than in the LCM (Appendix 1 – figure 2). The sum of prediction errors over all trials in the LSM was comparable to that in the LCM in the 6-month-old group (Appendix 1 – figure 4A), it was significantly lower in the 12-month-old group (Appendix 1 – figure 4B). Especially the LSM generated smaller prediction errors during the acquisition trials than in the LCM, suggesting that the LSM might be better at explaining the behaviors of acquisition (Appendix 1 – figure 4A and 4B; but see the reply for comment #34). While the LSM generated smaller prediction errors than the LCM in test 2 of the control group, it failed to replicate the observed DIs, a critical behavioral phenotype difference between control and App<sup>NL-G-F</sup> mice (Appendix 1 – figure 6A to 6C; cf. Figure 2F to 2H, Figure 3 – figure supplement 1A to 1C).

      Thus, although each model could capture different aspects of reinstatement, standing on the LCM to explain the reinstatement better aligns with our purpose. It should also be noted that we did not explore all parameter spaces of the LSM, hence we cannot rule out the possibility that alternative parameter sets could provide a better fit and explain the memory modification process well. A more comprehensive parameter search in the LSM may be a valuable direction for future research. 

      [#5] One concern here is also parameter overfitting. Did the authors try leaving out some data (trials/mice) and predicting their responses based on the fit derived from the training data?

      Following the reviewer’s suggestion, we confirmed if overfitting occurred using all trials to estimate parameters. Estimating parameters while actually leaving out trials would disorder the time lapse across trials, and thereby the prior of latent causes in each trial. Instead, we removed the constraint of prediction error by setting the error threshold to 1 for certain trials to virtually leave these trials out. We treated these trials as a virtual “training” dataset, while the rest of the trials were a “test” dataset. For the median CR data of each group (Figure 3), we estimated parameters under 6 conditions with unique training and test trials, then evaluated the prediction error for the training and test trials. Note that training and test trials were arbitrarily decided. Also, the error threshold for the acquisition trial was set to 1 as described in Materials and Methods, which we have further discussed the reason in the reply to comment #34 and treated acquisition trials separately from the test trials. We expect that the contribution of the data from the acquisition and test trials for parameter estimation could be discounted compared to those from the training trials with the constraint, and if overfitting occurred, the prediction error in the test data would be worse than that in the training trials.

      Author response image 1A to 1F showed the simulated and observed CR under each condition, where acquisition trials were in light-shaded areas, test trials were in dark-shaded areas, and the rest of the trials were training trials. Author response image 1G showed mean squared prediction error across the acquisition, training and test trials under each condition. The dashed gray line showed the mean squared prediction error of training trials in Figure 3 as a baseline.

      In conditions i and ii, where two or four trials in the extinction were used for training (Author response image 1A and 1B), the prediction error was generally higher in test trials than in training trials. In conditions iii and iv where ten trials in the extinction were used for training (Author response image 1C and 1D), the difference in prediction error between testing and training trials became smaller. These results suggest that providing more extinction trial data would reduce overfitting. In condition v (Author response image 1E), the results showed that using trials until extinction can predict reinstatement in control mice but not App<sup>NL-G-F</sup> mice. Similarly, in condition vi (Author response image 1F), where test phase trials were left out, the prediction error differences were greater in App<sup>NL-G-F</sup> mice. These results suggest that the test trials should be used for the parameter estimation to minimize prediction error for all groups. Overall, this analysis suggests that using all trials would reduce prediction error with few overfitting. 

      Author response image 1.

      Leaving trials out in parameter estimation in the latent cause model. (A – F) The observed CR (colored line) is the median freezing rate during the CS presentation over the mice within each group, which is the same as that in Figure 3. The colors indicate different groups: orange represents 6-month-old control, light blue represents 6-month-old App<sup>NL-G-F</sup> mice, pink represents 12-month-old control, and dark blue represents 12-month-old App<sup>NL-G-F</sup> mice. Under six different leave-out conditions (i – vi), parameters were estimated and used for generating simulated CR (gray line). In each condition, trials were categorized as acquisition (light-shaded area), training data (white area), and test data (dark-shaded area) based on the error threshold during parameter estimation. Only the error threshold of the test data trial was different from the original method (see Material and Method) and set to 1. In conditions i to vi, the number of test data trials is 27, 25, 19, and 19 in extinction phases. In condition v, the number of test data trials is 2 (trials 35 and 36). In condition vi, test data trials were the 3 test phases (trials 4, 34, and 36). (G) Each subplot shows the mean squared prediction error for the test data trial (gray circles), training data trial (white squares), and acquisition trial (gray triangles) in each group. The left y-axis corresponds to data from test and training trials, and the right y-axis corresponds to data from acquisition trials. The dashed line indicates the results calculated from Figure 3 as a baseline.  

      Reviewer #1 (Recommendations for the authors):

      Minor:

      [#6] (1) I would like the authors to further clarify why 'explaining' the reinstatement deficit in the AD mouse model is important in working towards the understanding of AD i.e., which aspect of AD this could explain etc.

      In this study, we utilized the reinstatement paradigm with the latent cause model as an internal model to illustrate how estimating internal states can improve understanding of cognitive alteration associated with extensive Aβ accumulation in the brain. Our findings suggest that misclassification in the memory modification process, manifesting as overgeneralization and overdifferentiation, underlies the memory deficit in the App<sup>NL-G-F</sup> knock-in model mice. 

      The parameters in the internal model associated with AD pathology (e.g., α and σ<sub>x</sub><sup>2</sup> in this study) can be viewed as computational phenotypes, filling the explanatory gap between neurobiological abnormalities and cognitive dysfunction in AD. This would advance the understanding of cognitive symptoms in the early stages of AD beyond conventional behavioral endpoints alone.

      We further propose that altered internal states in App<sup>NL-G-F</sup> knock-in mice may underlie a wide range of memory-related symptoms in AD as we observed that App<sup>NL-G-F</sup> knock-in mice failed to retain competing memories in the reversal Barnes maze task. We speculate on how overgeneralization and overdifferentiation may explain some AD symptoms in the manuscript:

      - Line 565-569: overgeneralization may explain deficits in discriminating highly similar visual stimuli reported in early-stage AD patients as they misclassify the lure as previously learned object

      - Line 576-579: overdifferentiation may explain impaired ability to transfer previously learned association rules in early-stage AD patients as they misclassify them as separated knowledge. 

      - Line 579-582: overdifferentiation may explain delusions in AD patients as an extended latent cause model could simulate the emergence of delusional thinking

      We provide one more example here that overgeneralization may explain that early-stage AD patients are more susceptible to proactive interference than cognitively normal elders in semantic memory tests (Curiel Cid et al., 2024; Loewenstein et al., 2015, 2016; Valles-Salgado et al., 2024), as they are more likely to infer previously learned material. Lastly, we expect that explaining memory-related symptoms within a unified framework may facilitate future hypothesis generation and contribute to the development of strategies for detecting the earliest cognitive alteration in AD.  

      [#7] (2) The authors state in the abstract/introduction that such computational modelling could be most beneficial for the early detection of memory disorders. The deficits observed here are pronounced in the older animals. It will help to further clarify if these older animals model the early stages of the disease. Do the authors expect severe deficits in this mouse model at even later time points?

      The early stage of the disease is marked by abnormal biomarkers associated with Aβ accumulation and neuroinflammation, while cognitive symptoms are mild or absent. This stage can persist for several years during which the level of Aβ may reach a plateau. As the disease progresses, tau pathology and neurodegeneration emerge and drive the transition into the late stage and the onset of dementia. The App<sup>NL-G-F</sup> knock-in mice recapitulate the features present in the early stage (Saito et al., 2014), where extensive Aꞵ accumulation and neuroinflammation worsen along with ages (Figure 2 – figure supplement 1). Since App<sup>NL-G-F</sup> knock-in mice are central to Aβ pathology without tauopathy and neurodegeneration, it should be noted that it does not represent the full spectrum of the disease even at advanced ages. Therefore, older animals still model the early stages of the diseases and are suitable to study the long-term effect of Aβ accumulation and neuroinflammation. 

      The age tested in previous reports using App<sup>NL-G-F</sup> mice spanned a wide range from 2 months old to 24 months old. Different behavioral tasks have varied sensitivity but overall suggest the dysfunction worsens with aging (Bellio et al., 2024; Mehla et al., 2019; Sakakibara et al., 2018). We have tested the reinstatement experiment with 17-month-old App<sup>NL-G-F</sup> mice before (Author response image 2). They showed more advanced deficits with the same trends observed in 12-month-old App<sup>NL-G-F</sup> mice, but their freezing rates were overall at a lower level. There is a concern that possible hearing loss may affect the results and interpretation, therefore we decided to focus on 12-month-old data.

      Author response image 2.

      Freezing rate across reinstatement paradigm in the 17-month-old App<sup>NL-G-F</sup> mice. Dashed and solid lines indicate the median freezing rate over 34 mice before (preCS) and during (CS) tone presentation, respectively. Red, blue, and yellow backgrounds represent acquisition, extinction, and unsignaled shock in Figure 2A. The dashed vertical line separates the extinction 1 and extinction 2 phases.

      [#8] (3) There are quite a few 'marginal' p-values in the paper at p>0.05 but near it. Should we accept them all as statistically significant? The authors need to clarify if all the experimental groups are sufficiently powered.

      For our study, we decided a priori that p < 0.05 would be considered statistically significant, as described in the Materials and Methods. Therefore, in our Results, we did not consider these marginal values as statistically significant but reported the trend, as they may indicate substantive significance.

      We described our power analysis method in the manuscript Line 897-898 and have provided the results in Tables S21 and S22.

      [#9] (4) The authors emphasize here that such computational modelling enables us to study the underlying 'reasoning' of the patient (in the abstract and introduction), I do not see how this is the case. The model states that there is a latent i.e. another underlying variable that was not previously considered.

      Our use of the term “reasoning” was to distinguish the internal model, which describes how an agent makes sense of the world, from other generative models implemented for biomarker and disease progression prediction. However, we agree that using “reasoning” may be misleading and imprecise, so to reduce ambiguity we have removed this word in our manuscript Line 27: Nonetheless, internal models of the patient remain underexplored in AD; Line 85: However, previous approaches did not suppose an internal model of the world to predict future from current observation given prior knowledge.   

      [#10] (5) The authors combine knock-in mice with controls to compute correlations of parameters of the model with behavior of animals (e.g. Figure 4B and Figure 5B). They run the risk of spurious correlations due to differences across groups, which they have indeed shown to exist (Figure 4A and 5A). It would help to show within-group correlations between DI and parameter fit, at least for the control group (which has a large spread of data).

      We agree that genotype (control, App<sup>NL-G-F</sup>) could be a confounder between the estimated parameters and DI, thereby generating spurious correlations. To address this concern, we have provided withingroup correlation in Figure 4 – figure supplement 2 for the 12-month-old group and Figure 5 – figure supplement 2 for the 6-month-old group.

      In the 12-month-old group, the significant positive correlation between σx2 and DI remained in both control and App<sup>NL-G-F</sup> mice even if we adjusted the genotype effect, suggesting that it is very unlikely that the correlations in Figure 4B are due to the genotype-related confounding. On the other hand, the positive correlation between α and DI was found to be significant in the control mice but not in the App<sup>NL-G-F</sup> mice. Most of α were distributed around the lower bound in App<sup>NL-G-F</sup> mice, which possibly reduced the variance and correlation coefficient. These results support our original conclusion that α and σ<sub>x</sub><sup>2</sup> are parameters associated with a lower magnitude of reinstatement in aged App<sup>NL-G-F</sup> mice.

      In the 6-month-old group, the correlations shown in Figure 5B were not preserved within subgroups, suggesting genotype would be a confounder for α, σ<sub>x</sub><sup>2</sup>, and DI. We recognized that significant correlations in Figure 5B may arise from group differences, increased sample size, or greater variance after combining control and App<sup>NL-G-F</sup> mice. 

      Therefore, we concluded that α and σ<sub>x</sub><sup>2</sup> are associated with the magnitude of reinstatement but modulated by the genotype effect depending on the age. 

      We have added interpretations of within-group correlation in the manuscript Line 307-308, 375-378.

      [#11] (6) It is unclear to me why overgeneralization of internal states will lead to the animals having trouble recalling a memory. Would this not lead to overgeneralization of memory recall instead?

      We assume that the reviewer is referring to “overgeneralization of internal states,” a case in which the animal’s internal state remained the same regardless of the observation, thereby leading to “overgeneralization of memory recall.” We agree that this could be one possible situation and appears less problematic than the case in which this memory is no longer retrievable. 

      However, in our manuscript, we did not deal with the case of “overgeneralization of internal states”. Rather, our findings illustrated how the memory modification process falls into overgeneralization or overdifferentiation and how it adversely affects the retention of competing memories, thereby causing App<sup>NL-G-F</sup> mice to have trouble recalling the same memory as the control mice. 

      According to the latent cause model, retrieval failure is explained by a mismatch of internal states, namely when an agent perceives that the current cue does not match a previously experienced one, the old latent cause is less likely to be inferred due to its low likelihood (Gershman et al., 2017). For example, if a mouse exhibited higher CR in test 2, it would be interpreted as a successful fear memory retrieval due to overgeneralization of the fear memory. However, it reflects a failure of extinction memory retrieval due to the mismatch between the internal states at extinction and test 2. This is an example that overgeneralization of memory induces the failure of memory retrieval. 

      On the other hand, App<sup>NL-G-F</sup> mice exhibited higher CR in test 1, which is conventionally interpreted as a successful fear memory retrieval. When estimating their internal states, they would infer that their observation in test 1 well matches those under the acquisition latent causes, that is the overgeneralization of fear memory as shown by a higher posterior probability in acquisition latent causes in test 1 (Figure 4 – figure supplement 3). This is an example that over-generalization of memory does not always induce retrieval failure as we explained in the reply to comment #1. 

      Reviewer #2 (Public review):

      Summary:

      This manuscript proposes that the use of a latent cause model for the assessment of memory-based tasks may provide improved early detection of Alzheimer's Disease as well as more differentiated mapping of behavior to underlying causes. To test the validity of this model, the authors use a previously described knock-in mouse model of AD and subject the mice to several behaviors to determine whether the latent cause model may provide informative predictions regarding changes in the observed behaviors. They include a well-established fear learning paradigm in which distinct memories are believed to compete for control of behavior. More specifically, it's been observed that animals undergoing fear learning and subsequent fear extinction develop two separate memories for the acquisition phase and the extinction phase, such that the extinction does not simply 'erase' the previously acquired memory. Many models of learning require the addition of a separate context or state to be added during the extinction phase and are typically modeled by assuming the existence of a new state at the time of extinction. The Niv research group, Gershman et al. 2017, have shown that the use of a latent cause model applied to this behavior can elegantly predict the formation of latent states based on a Bayesian approach, and that these latent states can facilitate the persistence of the acquisition and extinction memory independently. The authors of this manuscript leverage this approach to test whether deficits in the production of the internal states, or the inference and learning of those states, may be disrupted in knock-in mice that show both a build-up of amyloid-beta plaques and a deterioration in memory as the mice age.

      Strengths:

      I think the authors' proposal to leverage the latent cause model and test whether it can lead to improved assessments in an animal model of AD is a promising approach for bridging the gap between clinical and basic research. The authors use a promising mouse model and apply this to a paradigm in which the behavior and neurobiology are relatively well understood - an ideal situation for assessing how a disease state may impact both the neurobiology and behavior. The latent cause model has the potential to better connect observed behavior to underlying causes and may pave a road for improved mapping of changes in behavior to neurobiological mechanisms in diseases such as AD.

      Weaknesses:

      I have several substantial concerns which I've detailed below. These include important details on how the behavior was analyzed, how the model was used to assess the behavior, and the interpretations that have been made based on the model.

      [#12] (1) There is substantial data to suggest that during fear learning in mice separate memories develop for the acquisition and extinction phases, with the acquisition memory becoming more strongly retrieved during spontaneous recovery and reinstatement. The Gershman paper, cited by the authors, shows how the latent causal model can predict this shift in latent states by allowing for the priors to decay over time, thereby increasing the posterior of the acquisition memory at the time of spontaneous recovery. In this manuscript, the authors suggest a similar mechanism of action for reinstatement, yet the model does not appear to return to the acquisition memory state after reinstatement, at least based on the examples shown in Figures 1 and 3. Rather, the model appears to mainly modify the weights in the most recent state, putatively the 'extinction state', during reinstatement. Of course, the authors must rely on how the model fits the data, but this seems problematic based on prior research indicating that reinstatement is most likely due to the reactivation of the acquisition memory. This may call into question whether the model is successfully modeling the underlying processes or states that lead to behavior and whether this is a valid approach for AD.

      We thank the reviewer for insightful comments. 

      We agree that, as demonstrated in Gershman et al. (2017), the latent cause model accounts for spontaneous recovery via the inference of new latent causes during extinction and the temporal compression property provided by the prior. Moreover, it was also demonstrated that even a relatively low posterior can drive behavioral expression if the weight in the acquisition latent cause is preserved. For example, when the interval between retrieval and extinction was long enough that acquisition latent cause was not dominant during extinction, spontaneous recovery was observed despite the posterior probability of acquisition latent cause (C1) remaining below 0.1 in Figure 11D of Gershman et al. (2017). 

      In our study, a high response in test 3 (reinstatement) is explained by both acquisition and extinction latent cause. The former preserves the associative weight of the initial fear memory, while the latter has w<sub>context</sub> learned in the unsignaled shock phase. These positive w were weighted by their posterior probability and together contributed to increased expected shock in test 3. Though the posterior probability of acquisition latent cause was lower than extinction latent cause in test 3 due to time passage, this would be a parallel instance mentioned above. To clarify their contributions to reinstatement, we have conducted additional simulations and the discussion in reply to the reviewer’s next comment (see the reply to comment #13).

      We recognize that our results might appear to deviate from the notion that reinstatement results from the strong reactivation of acquisition memory, where one would expect a high posterior probability of the acquisition latent cause. However, we would like to emphasize that the return of fear emerges from the interplay of competing memories. Previous studies have shown that contextual or cued fear reinstatement involves a neural activity switch back to fear state in the medial prefrontal cortex (mPFC), including the prelimbic cortex and infralimbic cortex, and the amygdala, including ventral intercalated amygdala neurons (ITCv), medial subdivision of central nucleus of the amygdala (CeM), and the basolateral amygdala (BLA) (Giustino et al., 2019; Hitora-Imamura et al., 2015; Zaki et al., 2022). We speculate that such transition is parallel to the internal states change in the latent cause model in terms of posterior probability and associative weight change.

      Optogenetic manipulation experiments have further revealed how fear and extinction engrams contribute to extinction retrieval and reinstatement. For instance, Gu et al. (2022) used a cued fear conditioning paradigm and found that inhibition of extinction engrams in the BLA, ventral hippocampus (vHPC), and mPFC after extinction learning artificially increased freezing to the tone cue. Similar results were observed in contextual fear conditioning, where silencing extinction engrams in the hippocampus dentate gyrus (DG) impaired extinction retrieval (Lacagnina et al., 2019). These results suggest that the weakening extinction memory can induce a return of fear response even without a reminder shock. On the other hand, Zaki et al. (2022) showed that inhibition of fear engrams in the BLA, DG, or hippocampus CA1 attenuated contextual fear reinstatement. However, they also reported that stimulation of these fear engrams was not sufficient to induce reinstatement, suggesting these fear engram only partially account for reinstatement. 

      In summary, reinstatement likely results from bidirectional changes in the fear and extinction circuits, supporting our interpretation that both acquisition and extinction latent causes contribute to the reinstatement. Although it remains unclear whether these memory engrams represent latent causes, one possible interpretation is that w<sub>context</sub> update in extinction latent causes during unsignaled shock indicates weakening of the extinction memory, while preservation of w in acquisition latent causes and their posterior probability suggests reactivation of previous fear memory. 

      [#13] (2) As stated by the authors in the introduction, the advantage of the fear learning approach is that the memory is modified across the acquisition-extinction-reinstatement phases. Although perhaps not explicitly stated by the authors, the post-reinstatement test (test 3) is the crucial test for whether there is reactivation of a previously stored memory, with the general argument being that the reinvigorated response to the CS can't simply be explained by relearning the CS-US pairing, because re-exposure the US alone leads to increase response to the CS at test. Of course there are several explanations for why this may occur, particularly when also considering the context as a stimulus. This is what I understood to be the justification for the use of a model, such as the latent cause model, that may better capture and compare these possibilities within a single framework. As such, it is critical to look at the level of responding to both the context alone and to the CS. It appears that the authors only look at the percent freezing during the CS, and it is not clear whether this is due to the contextual US learning during the US re-exposure or to increased response to the CS - presumably caused by reactivation of the acquisition memory. For example, the instance of the model shown in Figure 1 indicates that the 'extinction state', or state z6, develops a strong weight for the context during the reinstatement phase of presenting the shock alone. This state then leads to increased freezing during the final CS probe test as shown in the figure. By not comparing the difference in the evoked freezing CR at the test (ITI vs CS period), the purpose of the reinstatement test is lost in the sense of whether a previous memory was reactivated - was the response to the CS restored above and beyond the freezing to the context? I think the authors must somehow incorporate these different phases (CS vs ITI) into their model, particularly since this type of memory retrieval that depends on assessing latent states is specifically why the authors justified using the latent causal model.

      To clarify the contribution of context, we have provided preCS freezing rate across trials in Figure 2 – figure supplement 2. As the reviewer pointed out, the preCS freezing rate did not remain at the same level across trials, especially within the 12-month-old control and App<sup>NL-G-F</sup> group (Figure 2 – figure supplement 2A and 2B), suggesting the effect context. A paired samples t-test comparing preCS freezing (Figure 2 – figure supplement 2E) and CS freezing (Figure 2E) in test 3 revealed significant differences in all groups: 6-month-old control, t(23) = -6.344, p < 0.001, d = -1.295; 6-month-old App<sup>NL-G-F</sup>, t(24) = -4.679, p < 0.001, d = -0.936; 12-month-old control, t(23) = -4.512, p < 0.001, d = 0.921; 12-month-old App<sup>NL-G-F</sup>, t(24) = -2.408, p = 0.024, d = -0.482. These results indicate that the response to CS was above and beyond the response to context only. We also compared the change in freezing rate (CS freezing rate minus preCS freezing rate) in test 2 and test 3 to examine the net response to the tone. The significant difference was found in the control group, but not in the App<sup>NL-GF</sup> group (Author response image 3). The increased net response to the tone in the control group suggested that the reinstatement was partially driven by reactivation of acquisition memory, not solely by the contextual US learning during the unsignaled shock phase. We have added these results and discussion in the manuscript Line 220-231.

      Author response image 3.

      Net freezing rate in test 2 and test 3. Net freezing rate is defined as the CS freezing rate (i.e., freezing rate during 1 min CS presentation) minus the preCS freezing rate (i.e., 1 min before CS presentation). The dashed horizontal line indicates no freezing rate change from the preCS period to the CS presentation. *p < 0.05 by paired-sample Student’s t-test, and the alternative hypothesis specifies that test 2 freezing rate change is less than test 3. Colors indicate different groups: orange represents 6-month-old control (n = 24), light blue represents 6-month-old App<sup>NL-G-F</sup> mice (n = 25), pink represents 12-month-old control (n = 24), and dark blue represents 12-month-old App<sup>NL-G-F</sup> mice (n = 25). Each black dot represents one animal. Statistical results were as follows: t(23) = -1.927, p = 0.033, Cohen’s d = -0.393 in 6-month-old control; t(24) = -1.534, p = 0.069, Cohen’s d = -0.307 in 6-month-old App<sup>NL-G-F</sup>; t(23) = -1.775, p = 0.045, Cohen’s d = -0.362 in 12-month-old control; t(24) = 0.86, p = 0.801, Cohen’s d = 0.172 in 12-monthold App<sup>NL-G-F</sup>

      According to the latent cause model, if the reinstatement is merely induced by an association between the context and the US in the unsignaled shock phase, the CR given context only and that given context and CS in test 3 should be equal. However, the simulation conducted for each mouse using their estimated parameters confirmed that this was not the case in this study. The results showed that simulated CR was significantly higher in the context+CS condition than in the context only condition (Author response image 4). This trend is consistent with the behavioral results we mentioned above.

      Author response image 4.

      Simulation of context effect in test 3. Estimated parameter sets of each sample were used to run the simulation that only context or context with CS was present in test 3 (trial 36). The data are shown as median with interquartile range, where white bars with colored lines represent CR for context only and colored bars represent CR for context with CS. Colors indicate different groups: orange represents 6-month-old control (n = 15), light blue represents 6-month-old App<sup>NL-G-F</sup> mice (n = 12), pink represents 12-month-old control (n = 20), and dark blue represents 12-month-old App<sup>NL-G-F</sup> mice (n = 18). Each black dot represents one animal. **p < 0.01, and ***p < 0.001 by Wilcoxon signed-rank test comparing context only and context + CS in each group, and the alternative hypothesis specifies that CR in context is not equal to CR in context with CS. Statistical results were as follows: W = 15, p = 0.008, effect size r = -0.66 in 6-month-old control; W = 0, p < 0.001, effect size r = -0.88 in 6-month-old App<sup>NL-G-F</sup>; W = 25, p = 0.002, effect size r = -0.67 in 12-month-old control; W = 9, p = 0.002 , effect size r = -0.75 in 12-month-old App<sup>NL-G-F</sup>

      [#14] (3) This is related to the second point above. If the question is about the memory processes underlying memory retrieval at the test following reinstatement, then I would argue that the model parameters that are not involved in testing this hypothesis be fixed prior to the test. Unlike the Gershman paper that the authors cited, the authors fit all parameters for each animal. Perhaps the authors should fit certain parameters on the acquisition and extinction phase, and then leave those parameters fixed for the reinstatement phase. To give a more concrete example, if the hypothesis is that AD mice have deficits in differentiating or retrieving latent states during reinstatement which results in the low response to the CS following reinstatement, then perhaps parameters such as the learning rate should be fixed at this point. The authors state that the 12-month-old AD mice have substantially lower learning rate measures (almost a 20-fold reduction!), which can be clearly seen in the very low weights attributed to the AD mouse in Figure 3D. Based on the example in Figure 3D, it seems that the reduced learning rate in these mice is most likely caused by the failure to respond at test. This is based on comparing the behavior in Figures 3C to 3D. The acquisition and extinction curves appear extremely similar across the two groups. It seems that this lower learning rate may indirectly be causing most of the other effects that the authors highlight, such as the low σx, and the changes to the parameters for the CR. It may even explain the extremely high K. Because the weights are so low, this would presumably lead to extremely low likelihoods in the posterior estimation, which I guess would lead to more latent states being considered as the posterior would be more influenced by the prior.

      We thank the reviewer for the suggestion about fitting and fixing certain parameters in different phases.

      However, this strategy may not be optimal for our study for the following scientific reasons.

      Our primary purpose is to explore internal states in the memory modification process that are associated with the deficit found in App<sup>NL-G-F</sup> mice in the reinstatement paradigm. We did not restrict the question to memory retrieval, nor did we have a particular hypothesis such that only a few parameters of interest account for the impaired associative learning or structure learning in App<sup>NL-G-F</sup> mice while all other parameters are comparable between groups. We are concerned that restricting questions to memory retrieval at the test is too parsimonious and might lead to misinterpretation of the results. As we explain in reply to comment #5, removing trials in extinction during parameter estimation reduces the model fit performance and runs the risk of overfitting within the individual. Therefore, we estimated all parameters for each animal, with the assumption that the estimated parameter set represents individual internal state (i.e., learning and memory characteristics) and should be fixed within the animal across all trials.  

      Figure 3 is the parameter estimation and simulation results using the median data of each group as an individual. The estimated parameter value is one of the possible cases in that group to demonstrate how a typical learning curve fits the latent cause model. The reviewer mentioned “20-fold reduction in learning rate” is the comparison of two data points, not the actual comparison between groups. The comparison between control and App<sup>NL-G-F</sup> mice in the 12-month-old group for all parameters was provided in Table S7. The Mann-Whitney U test did not reveal a significant difference in learning rate (η): 12-month-old control (Mdn = 0.09, IQR=0.23) vs. 12-month-old App<sup>NL-G-F</sup> (Mdn = 0.12, IQR=0.23), U = 199, p = 0.587.  

      We agree that lower learning rate could bias the learning toward inferring a new latent cause. However, this tendency may depend on the value of other parameters and varied in different phases in the reinstatement paradigm. Here, we used ⍺ as an example and demonstrate their interaction in Appendix 2 – table 2 with relatively extreme values: ⍺ \= {1, 3} and η \= {0.01, 0.5} while the rest of the parameters fixed at the initial guess value. 

      When ⍺ = 1, the number of latent causes across phases (K<sub>acq</sub>, K<sub>ext</sub>, K<sub>rem</sub>) remain unchanged and their posterior probability in test 3 were comparable even if η increased from 0.01 to 0.5. This is an example that lower η does not lead to inferring new latent causes because of low ⍺. The effect of low learning rate manifests in test 3 CR due to low w<sub>context, acq</sub> and w<sub>context, ext</sub>

      When ⍺ = 3, the number of acquisition latent causes (K<sub>acq</sub>) was higher in the case of η = 0.01 than that of η = 0.5, showing the effect mentioned by the reviewer. However, test 1 CR is much lower when η = 0.01, indicating unsuccessful learning even after inferring a new latent cause. This is none of the cases observed in this study. During extinction phases, the effect of η is surpassed by the effect of high ⍺, where the number of extinction latent causes (K<sub>ext</sub>) is high and not affected by η. After the extinction phases, the effect of K kicks in as the total number of latent causes reaches its value (K = 33 in this example), especially in the case of η = 0.01. A new latent cause is inferred after extinction in the condition of η = 0.5, but the CR 3 is still high as the w<sub>context, acq</sub> and w<sub>context, ext</sub> are high. This is an example that a new latent cause is inferred in spite of higher η

      Overall, the learning rate would not have a prominent effect alone throughout the reinstatement paradigm, and it has a joint effect with other parameters. Note that the example here did not cover our estimated results, as the estimated learning rate was not significantly different between control and App<sup>NL-G-F</sup> mice (see above). Please refer to the reply to comment #31 for more discussion about the interaction among parameters when the learning rate is fixed. We hope this clarifies the reviewer’s concern.

      [#15] (4) Why didn't the authors use the latent causal model on the Barnes maze task? The authors mention in the discussion that different cognitive processes may be at play across the two tasks, yet reversal tasks have been suggested to be solved using latent states to be able to flip between the two different task states. In this way, it seems very fitting to use the latent cause model. Indeed, it may even be a better way to assess changes in σx as there are presumably 12 observable stimuli/locations.

      Please refer to our provisional response about the application of the latent cause model to the reversal Barnes maze task. Briefly, it would be difficult to directly apply the latent cause model to the Barnes maze data because this task involves operant learning, and thereby almost all conditions in the latent cause model are not satisfied. Please also see our reply to comment #24 for the discussion of the link between the latent cause model and Barnes maze task. 

      Reviewer #2 (Recommendations for the authors):

      [#16] (1) I had a bit of difficulty finding all the details of the model. First, I had to mainly rely on the Gershman 2017 paper to understand the model. Even then, there were certain aspects of the model that were not clear. For instance, it's not quite clear to me when the new internal states are created and how the maximum number of states is determined. After reading the authors' methods and the Gershman paper, it seems that a new internal state is generated at each time point, aka zt, and that the prior for that state decays onwards from alpha. Yet because most 'new' internal states don't ever take on much of a portion of the posterior, most of these states can be ignored. Is that a correct understanding? To state this another way, I interpret the equation on line 129 to indicate that the prior is determined by the power law for all existing internal states and that each new state starts with a value of alpha, yet I don't see the rule for creating a new state, or for iterating k other than that k iterates at each timestep. Yet this seems to not be consistent with the fact that the max number of states K is also a parameter fit. Please clarify this, or point me to where this is better defined.

      I find this to be an important question for the current paper as it is unclear to me when the states were created. Most notably, in Figure 3, it's important to understand why there's an increase in the posterior of z<sub>5</sub> in the AD 12-month mice at test. Is state z<sub>5</sub> generated at trial 5? If so, the prior would be extremely small by trial 36, making it even more perplexing why z<sub>5</sub> has such a high posterior. If its weights are similar to z<sub>3</sub> and z<sub>4</sub>, and they have been much more active recently, why would z<sub>5</sub> come into play?

      We assume that the “new internal state" the reviewer is referring to is the “new latent cause." We would like to clarify that “internal state" in our study refers to all the latent causes at a given time point and observation. As this manuscript is submitted as a Research Advance article in eLife, we did not rephrase all the model details. Here, we explain when a new latent cause is created (i.e., the prior probability of a new latent cause is greater than 0) with the example of the 12-month-old group (Figure 3C and 3D). 

      Suppose that before the start of each trial, an agent inferred the most likely latent cause with maximum posterior, and it inferred k latent causes so far. A new latent cause can be inferred at the computation of the prior of latent causes at the beginning of each trial.  

      In the latent cause model, it follows a distance-dependent Chinese Restaurant Process (CRP; Blei and Frazier, 2011). The prior of each old latent cause is its posterior probability, which is the final count of the EM update before the current. In addition, the prior of old latent causes is sensitive to the time passage so that it exponentially decreases as a forgetting function modulated by g (see Figure 2 in Gershman et al., 2017). Simultaneously, the prior of a new cause is assigned ⍺. The new latent cause is inferred at this moment. Hence, the prior of latent causes is jointly determined by ⍺, g and its posterior probability. The maximum number of latent causes K is set a priori and does not affect the prior while k < K (see also reply to comment #30 for the discussion of boundary set for K and comment #31 for the discussion of the interaction between ⍺ and K). Note that only one new latent cause can be inferred in each trial, and (k+1)<sup>th</sup> latent cause, which has never been inferred so far, is chosen as the new latent cause.

      In our manuscript, the subscript number in zₖ denotes the order in which they were inferred, not the trial number. In Figures 3C and 3D, z<sub>3</sub> and z<sub>4</sub> were inferred in trials 5 and 6 during extinction; z<sub>5</sub> is a new latent cause inferred in trial 36. Therefore, the prior of z<sub>5</sub> is not extremely small compared to z<sub>4</sub> and z<sub>3</sub>.

      In both control and App<sup>NL-G-F</sup> mice in the 12-month-old (Figures 3C and 3D), z<sub>3</sub> is dominant until trial 35. The unsignaled shock at trial 35 generates a large prediction error as only context is presented and followed by the US. This prediction error reduces posterior of z<sub>3</sub>, while increasing the posterior of z<sub>4</sub> and w<sub>context</sub> in z<sub>3</sub> and z<sub>4</sub>. This decrease of posterior of z<sub>3</sub> is more obvious in the App<sup>NL-G-F</sup> than in the control group, prompting them to infer a new latent cause z<sub>5</sub> (Figure 3C and 3D). Although Figure 3C and 3D are illustrative examples as we explained in the reply to comment #14, this interpretation would be plausible as the App<sup>NL-G-F</sup> group inferred a significantly larger number of latent causes after the extinction with slightly higher posteriors of them than those in the control group (Figure 4E).

      [#17] (2) Related to the above, Are the states z<sub>A</sub> and z<sub>B</sub> defined by the authors to help the reader group the states into acquisition and extinction states, or are they somehow grouped by the model? If the latter is true, I don't understand how this would occur based on the model. If the former, could the authors state that these states were grouped together by the author?

      We used z<sub>A</sub> and z<sub>B</sub> annotations to assist with the explanation, so this is not grouped by the model. We have stated this in the manuscript Line 181-182.

      [#18] (3) This expands on the third point above. In Figure 3D, internal states z<sub>3</sub>, z<sub>4</sub>, and z<sub>5</sub> appear to be pretty much identical in weights in the App group. It's not clear to me why then the posterior of z<sub>5</sub> would all of a sudden jump up. If I understand correctly, the posterior is the likelihood of the observations given the internal state (presumably this should be similar across z<sub>3</sub>, z<sub>4</sub>, and z<sub>5</sub>), multiplied by the prior of the state. Z3 and Z4 are the dominant inferred states up to state 36. Why would z<sub>5</sub> become more likely if there doesn't appear to be any error? I'm inferring no error because there are little or no changes in weights on trial 36, most prominently no changes inz<sub>3</sub> which is the dominant internal state in step 36. If there's little change in weights, or no errors, shouldn't the prior dominate the calculation of the posterior which would lead to z<sub>3</sub> and z<sub>4</sub> being most prominent at trial 36?

      We have explained how z<sub>5</sub> of the 12-month-old App<sup>NL-G-F</sup> was inferred in the reply to comment #16. Here, we explain the process underlying the rapid changes of the posterior of z<sub>3</sub>, z<sub>4</sub>, and z<sub>5</sub> from trial 35 to 36.

      During the extinction, the mice inferred z<sub>3</sub> given the CS and the context in the absence of US. In trial 35, they observed the context and the unsignaled shock in the absence of the CS. This reduced the likelihood for the CS under z<sub>3</sub> and thereby the posterior of z<sub>3</sub>, while relatively increasing the posterior of z<sub>4</sub>. The associative weight between the context and the US , w<sub>context</sub>, indeed increased in both z<sub>3</sub> and z<sub>4</sub>, but w<sub>context</sub> of z<sub>4</sub> was updated more than that of z<sub>3</sub> due to its higher posterior probability. At the beginning of trial 36, a new latent cause z<sub>5</sub> was inferred with a certain prior (see also the reply for comment #16), and w<sub>5</sub> = w<sub>0</sub>, where w<sub>0</sub> is the initial value of weight. After normalizing the prior over latent causes, the emergence of z<sub>5</sub> reduced the prior probability of other latent causes compared to the case where the prior of z<sub>5</sub> is 0. Since the CS was presented while the US was absent in trial 36, the likelihood of the CS and that of the US under z<sub>3</sub>, and especially z<sub>4</sub>, given the cues and w became lower than the case in which z<sub>5</sub> has not been inferred yet. Consequently, the posterior of z<sub>5</sub> became salient (Figure 3D).

      To maintain consistency across panels, we used a uniform y-axis range. However, we acknowledge that this may make it harder to notice the changes of associative weights in Figure 3D. We have provided the subpanel in Figure 3D with a smaller y-axis limit to reveal the weight changes at trial 35 in Author response image 5.

      Author response image 5.

      Magnified view of w<sub>context</sub> and wCS in the last 3 trials in Figure 3D. The graph format is the same as in Figure 3D. The weight for CS (w<sub>CS</sub>) and that for context (w<sub>context</sub>) in each latent cause across trial 34 (test 2), 35 (unsignaled shock), and 36 (test 3) in 12-month-old App<sup>NL-G-F</sup> in Figure 3D was magnified in the upper and lower magenta box, respectively.

      [#19] (8) In Figure 4B - The figure legend didn't appear to indicate at which time points the DIs are plotted.

      We have amended the figure legend to indicate that DI between test 3 and test 1 is plotted.

      [#20] (9) Lines 301-303 state that the posterior probabilities of the acquisition internal states in the 12month AD mice were much higher at test 1 and that this resulted in different levels of CR across the control and 12-month App group. This is shown in the Figure 4A supplement, but this is not apparent in Figure 3 panels C and D. Is the example shown in panel D not representative of the group? The CRs across the two examples in Figure 3 C and D look extremely similar at test 1. Furthermore, the posteriors of the internal states look pretty similar across the two groups for the first 4 trials. Both the App and control have substantial posterior probabilities for the acquisition period, I don't see any additional states at test 1. The pattern of states during acquisition looks strikingly similar across the two groups, whereas the weights of the stimuli are considerably different. I think it would help the authors to use an example that better represents what the authors are referring to, or provide data to illustrate the difference. Figure 4C partly shows this, but it's not very clear how strong the posteriors are for the 3rd state in the controls.

      Figure 3 serves as an example to explain the internal states in each group (see also the third paragraph in the reply to comment #14). Figure 4C to H showed the results from each sample for between-group comparison in selected features. Therefore, the results of direct comparisons of the parameter values and internal states between genotypes in Figure 3 are not necessarily the same as those in Figure 4. Both examples in Figure 3C and 3D inferred 2 latent causes during the acquisition. In terms of posterior till test 1 (trial 4), the two could be the same. However, such examples were not rare, as the proportion of the mice that inferred 2 latent causes during the acquisition was slightly lower than 50% in the control, and around 90% in the App<sup>NL-G-F</sup> mice (Figure 4C). The posterior probability of acquisition latent cause in test 1 showed a similar pattern (Figure 4 – figure supplement 3), with values near 1 in around 50% of the control mice and around 90% of the App<sup>NL-G-F</sup> mice.  

      [#21] (10) Line 320: This is a confusing sentence. I think the authors are saying that because the App group inferred a new state during test 3, this would protect the weights of the 'extinction' state as compared to the controls since the strength of the weight updates depends on the probability of the posterior.

      In order to address this, we have revised this sentence to “Such internal states in App<sup>NL-G-F</sup> mice would diverge the associative weight update from those in the control mice after extinction.” in the manuscript Line 349-351.

      [#22] (11) In lines 517-519 the authors address the difference in generalizing the occurrence of stimuli across the App and control groups. It states that App mice with lower alpha generalized observations to an old cause rather than attributing it as a new state. Going back to statement 3 above, I think it's important to show that the model fit of a reduction in alpha does not go hand-in-hand with a reduction in the learning rates and hence the weights. Again, if the likelihoods are diminished due to the low weights, then the fit of alpha might be reduced as well. To reiterate my point above, if the observations in changes in generalization and differentiation occur because of a reduction in the learning rate, the modeling may not be providing a particularly insightful understanding of AD, other than that poor learning leads to ineffectual generalization and differentiation. Do these findings hold up if the learning rates are more comparable across the control and App group?

      These findings were explained on the basis of comparable learning rates between control and App<sup>NL-GF</sup> mice in the 12-month-old group (see the reply to comment #14). In addition, we have conducted simulation for different ⍺ and σ<sub>x</sub><sup>2</sup> values under the condition of the fixed learning rate, where overgeneralization and overdifferentaiton still occurred (see the reply to comment #26).  

      [#23] (12) Lines 391 - 393. This is a confusing sentence. "These results suggest that App NL-G-F mice could successfully form a spatial memory of the target hole, while the memory was less likely to be retrieved by a novel observation such as the absence of the escape box under the target hole at the probe test 1." The App mice show improved behavior across days of approaching the correct hole. Is this statement suggesting that once they've approached the target hole, the lack of the escape box leads to a reduction in the retention of that memory?

      We speculated that when the mice observed the absence of the escape box, a certain prediction error would be generated, which may have driven the memory modification. In App<sup>NL-G-F</sup> mice, such modification, either overgeneralization or overdifferentiation, could render the memory of the target hole vulnerable; if overgeneralization occurred, the memory would be quickly overwritten as the goal no longer exists in this position in this maze, while if overdifferentiation occurred, a novel memory such that the goal does not exist in the maze different from previous one would be formed. In either case of misclassification, the probability of retrieving the goal position would be reduced. To reduce ambiguity in this sentence, we have revised the description in the manuscript Line 432-434 as follows: “These results suggest that App<sup>NL-G-F</sup> mice could successfully form a spatial memory of the target hole, while they did not retrieve the spatial memory of the target hole as strongly as control mice when they observed the absence of the escape box during the probe test.”

      [#24] (13) The connection between the results of Barnes maze and the fear learning paradigm is weak. How can changes in overgeneralization due to a reduction in the creation of inferred states and differentiation due to a reduced σx lead to the observations in the Barnes maze experiment?

      We extrapolated our interpretation in the reinstatement modeling to behaviors in a different behavioral task, to explore the explanatory power of the latent cause framework formalizing mechanisms of associative learning and memory modification. Here, we explain the results of the reversal Barnes maze paradigm in terms of the latent cause model, while conferring the reinstatement paradigm.

      Whilst we acknowledge that fear conditioning and spatial learning are not fully comparable, the reversal Barnes maze paradigm used in our study shares several key learning components with the reinstatement paradigm. 

      First, associative learning is fundamental in spatial learning (Leising & Blaisdell, 2009; Pearce, 2009). Although we did not make any specific assumptions of what kind of associations were learned in the Barnes maze, performance improvements in learning phases likely reflect trial-and-error updates of these associations involving sensory preconditioning or secondary conditioning. Second, the reversal training phases could resemble the extinction phase in the reinstatement paradigm, challenge previously established memory. In terms of the latent cause model, both the reversal learning phase in the reversal Barnes maze paradigm and the extinction phase in the reinstatement paradigm induce a mismatch of the internal state. This process likely introduces large prediction errors, triggering memory modification to reconcile competing memories.  

      Under the latent cause framework, we posit that the mice would either infer new memories or modify existing memories for the unexpected observations in the Barnes maze (e.g., changed location or absence of escape box) as in the reinstatement paradigm, but learn a larger number of association rules between stimuli in the maze compared to those in the reinstatement. In the reversal Barnes maze paradigm, the animals would infer that a latent cause generates the stimuli in the maze at certain associative weights in each trial, and would adjust behavior by retaining competing memories.

      Both overgeneralization and overdifferentiation could explain the lower exploration time of the target hole in the App<sup>NL-G-F</sup> mice in probe test 1. In the case of overgeneralization, the mice would overwrite the existing spatial memory of the target hole with a memory that the escape box is absent. In the case of overdifferentiation, the mice would infer a new memory such that the goal does not exist in the novel field, in addition to the old memory where the goal exists in the previous field. In both cases, the App<sup>NL-G-F</sup> mice would not infer that the location of the goal is fixed at a particular point and failed to retain competing spatial memories of the goal, leading to relying on a less precise, non-spatial strategy to solve the task.  

      Since there is no established way to formalize the Barnes maze learning in the latent cause model, we did not directly apply the latent cause model to the Barnes maze data. Instead, we used the view above to explore common processes in memory modification between the reinstatement and the Barnes maze paradigm. 

      The above description was added to the manuscript on page 13 (Line 410-414) and page 19-20 (Line 600-602, 626-639).

      [#25] (14) In the fear conditioning task, it may be valuable to separate responding to the context and the cue at the time of the final test. The mice can learn about the context during the reinstatement, but there must be an inference to the cue as it's not present during the reinstatement phase. This would provide an opportunity for the model to perhaps access a prior state that was formed during acquisition. This would be more in line with the original proposal by Gershman et al. 2017 with spontaneous recovery.

      Please refer to the reply to comment #13 regarding separating the response to context in test 3.  

      Reviewer #3 (Public review):

      Summary:

      This paper seeks to identify underlying mechanisms contributing to memory deficits observed in Alzheimer's disease (AD) mouse models. By understanding these mechanisms, they hope to uncover insights into subtle cognitive changes early in AD to inform interventions for early-stage decline.

      Strengths:

      The paper provides a comprehensive exploration of memory deficits in an AD mouse model, covering the early and late stages of the disease. The experimental design was robust, confirming age-dependent increases in Aβ plaque accumulation in the AD model mice and using multiple behavior tasks that collectively highlighted difficulties in maintaining multiple competing memory cues, with deficits most pronounced in older mice.

      In the fear acquisition, extinction, and reinstatement task, AD model mice exhibited a significantly higher fear response after acquisition compared to controls, as well as a greater drop in fear response during reinstatement. These findings suggest that AD mice struggle to retain the fear memory associated with the conditioned stimulus, with the group differences being more pronounced in the older mice.

      In the reversal Barnes maze task, the AD model mice displayed a tendency to explore the maze perimeter rather than the two potential target holes, indicating a failure to integrate multiple memory cues into their strategy. This contrasted with the control mice, which used the more confirmatory strategy of focusing on the two target holes. Despite this, the AD mice were quicker to reach the target hole, suggesting that their impairments were specific to memory retrieval rather than basic task performance.

      The authors strengthened their findings by analyzing their data with a leading computational model, which describes how animals balance competing memories. They found that AD mice showed somewhat of a contradiction: a tendency to both treat trials as more alike than they are (lower α) and similar stimuli as more distinct than they are (lower σx) compared to controls.

      Weaknesses:

      While conceptually solid, the model struggles to fit the data and to support the key hypothesis about AD mice's ability to retain competing memories. These issues are evident in Figure 3:

      [#26] (1) The model misses key trends in the data, including the gradual learning of fear in all groups during acquisition, the absence of a fear response at the start of the experiment, the increase in fear at the start of day 2 of extinction (especially in controls), and the more rapid reinstatement of fear observed in older controls compared to acquisition.

      We acknowledge these limitations and explained why they arise in the latent cause model as follows.

      a. Absence of a fear response at the start of the experiment and the gradual learning of fear during acquisition 

      In the latent cause model, the CR is derived from a sigmoidal transformation from the predicted outcome with the assumption that its mapping to behavioral response may be nonlinear (see Equation 10 and section “Conditioned responding” in Gershman et al., 2017). 

      The magnitude of the unconditioned response (trial 1) is determined by w<sub>0</sub>, θ, and λ. An example was given in Appendix 2 – table 3. In general, a higher w<sub>0</sub> and a lower θ produce a higher trial 1 CR when other parameters are fixed. During the acquisition phase, once the expected shock exceeds θ, CR rapidly approaches 1, and further increases in expected shock produce few changes in CR. This rapid increase was also evident in the spontaneous recovery simulation (Figure 11) in Gershman et al. (2017). The steepness of this rapid increase is modulated by λ such that a higher value produces a shallower slope. This is a characteristic of the latent cause model, assuming CR follows a sigmoid function of expected shock, while the ordinal relationship over CRs is maintained with or without the sigmoid function, as Gershman et al. (2017) mentioned. If one assumes that the CR should be proportional to the expected shock, the model can reproduce the gradual response as a linear combination of w and posteriors of latent causes while omitting the sigmoid transformation (Figure 3). 

      b. Increase in fear at the start of day 2 extinction

      This point is partially reproduced by the latent cause model. As shown in Figure 3, trial 24 (the first trial of day 2 extinction) showed an increase in both posterior probability of latent cause retaining fear memory and the simulated CRs in all groups except the 6-month-old control group, though the increase in CR was small due to the sigmoid transformation (see above). This can be explained by the latent cause model as 24 h time lapse between extinction 1 and 2 decreases the prior of the previously inferred latent cause, leading to an increase of those of other latent causes. 

      Unlike other groups, the 6-month-old control did not exhibit increased observed CR at trial 24

      but at trial 25 (Figure 3A). The latent cause model failed to reproduce it, as there was no increase in posterior probability in trial 24 (Figure 3A). This could be partially explained by the low value of g, which counteracts the effect of the time interval between days: lower g keeps prior of the latent causes at the same level as those in the previous trial. Despite some failures in capturing this effect, our fitting policy was set to optimize prediction among the test trials given our primary purpose of explaining reinstatement.

      c. more rapid reinstatement of fear observed in older controls compared to acquisition

      We would like to point out that this was replicated by the latent cause model as shown in Figure 3 – figure supplement 1C. The DI between test 3 and test 1 calculated from the simulated CR was significantly higher in 12-month-old control than in App<sup>NL-G-F</sup> mice (cf. Figure 2C to E).  

      [#27] (2) The model attributes the higher fear response in controls during reinstatement to a stronger association with the context from the unsignaled shock phase, rather than to any memory of the conditioned stimulus from acquisition. These issues lead to potential overinterpretation of the model parameters. The differences in α and σx are being used to make claims about cognitive processes (e.g., overgeneralization vs. overdifferentiation), but the model itself does not appear to capture these processes accurately. The authors could benefit from a model that better matches the data and that can capture the retention and recollection of a fear memory across phases.

      First, we would like to clarify that the latent cause model explains the reinstatement not only by the extinction latent cause with increased w<sub>context</sub> but also the acquisition latent cause with preserved wCS and w<sub>context</sub> (see also reply to comment #13). Second, the latent cause model primarily attributes the higher fear reinstatement in control to a lower number of latent causes inferred after extinction (Figure 4E) and higher w<sub>context</sub> in extinction latent cause (Figure 4G). We noted that there was a trend toward significance in the posterior probability of latent causes inferred after extinction (Figure 4E), which in turn influences those of acquisition latent causes. Although the posterior probability of acquisition latent cause appeared trivial and no significance was detected between control and App<sup>NL-G-F</sup> mice (Figure 4C), it was suppressed by new latent causes in App<sup>NL-G-F</sup> mice (Author response image 6).

      This indicates that App<sup>NL-G-F</sup> mice retrieved acquisition memory less strongly than control mice. Therefore, we argue that the latent cause model attributed a higher fear response in control during reinstatement not solely to the stronger association with the context but also to CS fear memory from acquisition. Although we tested whether additional models fit the reinstatement data in individual mice, these models did not satisfy our fitting criteria for many mice compared to the latent cause model (see also reply to comment #4 and #28).

      Author response image 6.

      Posterior probability of acquisition, extinction, and after extinction latent causes in test 3. The values within each bar indicate the mean posterior probability of acquisition latent cause (darkest shade), extinction latent cause (medium shade), and latent causes inferred after extinction (lightest shade) in test 3 over mice within genotype. Source data are the same as those used in Figure 4C–E (posterior of z).

      Conclusion:

      Overall, the data support the authors' hypothesis that AD model mice struggle to retain competing memories, with the effect becoming more pronounced with age. While I believe the right computational model could highlight these differences, the current model falls short in doing so.

      Reviewer #3 (Recommendations for the authors):

      [#28] Other computational models may better capture the data. Ideally, I'd look for a model that can capture the gradual learning during acquisition, and, in some mice, the inferring of a new latent cause during extinction, allowing the fear memory to be retained and referenced at the start of day 2 extinction and during later tests.

      We have further evaluated another computational model, the latent state model, and compared it with the latent cause model. The simulation of reinstatement and parameter estimation method of the latent state model were described in the Appendix.

      The latent state model proposed by Cochran and Cisler (2019) shares several concepts with the latent cause model, and well replicates empirical data under certain conditions. We expect that it can also explain the reinstatement. 

      Following the same analysis flow for the latent cause model, we estimated the parameters and simulated reinstatement in the latent state model from individual CRs and median of them. In the median freezing rate data of the 12-month-old control mice, the simulated CR replicated the observed CR well and exhibited the ideal features that the reviewer looked for: gradual learning during acquisition and an increased fear at the start of the second-day extinction (Appendix 1 – figure 1G). However, a lot of samples did not fit well to the latent state model. The number of anomalies was generally higher than that in the latent cause model (Appendix 1 – figure 2). Within the accepted samples, the sum of squared prediction error in all trials was significantly lower in the latent state model, which resulted from lower prediction error in the acquisition trials (Appendix 1 – figure 4A and 4B). In the three test trials, the squared prediction error was comparable between the latent state model and the latent cause model except for the test 2 trials in the control group (Appendix 1 – figure 4A and 4B, rightmost panel). On the other hand, almost all accepted samples continued to infer the acquisition latent states during extinction without inferring new states (Appendix 1 – figure 5B and 5E, left panel), which differed from the ideal internal states the reviewer expected. While the latent state model fit performance seems to be better than the latent cause model, the accepted samples cannot reproduce the lower DI between test 3 and test 1 in aged App<sup>NL-G-F</sup> mice (Appendix 1 – figure 6C). These results make the latent state model less suitable for our purpose and therefore we decided to stay with the latent cause model. It should also be noted that we did not explore all parameter spaces of the latent state model hence we cannot rule out the possibility that alternative parameter sets could provide a better fit and explain the memory modification process well. A more comprehensive parameter search in the LSM may be a valuable direction for future research.

      If you decide not to go with a new model, my preference would be to drop the current modeling. However, if you wish to stay with the current model, I'd like to see justification or acknowledgment of the following:

      [#29] (1) Lower bound on alpha of 1: This forces the model to infer new latent causes, but it seems that some mice, especially younger AD mice, might rely more on classical associative learning (e.g., Rescorla-Wagner) rather than inferring new causes.

      We acknowledge that the default value set in Gershman et al. (2017) is 0.1, and the constraint we set is a much higher value. However, ⍺ = 1 does not always force the model to infer new latent causes.

      In the standard form Chinese restaurant process (CRP), the prior that n<sup>th</sup> observation is assigned to a new cluster is given by ⍺ / (n - 1 + ⍺) (Blei & Gershman, 2012). When ⍺ = 1, the prior of the new cluster for the 2nd observation will be 0.5; when ⍺ = 3, this prior increases to 0.75. Thus, when ⍺ > 1, the prior of the new cluster is above chance early in the sequence, which may relate to the reviewer’s concern. However, this effect diminishes as the number of observations increases. For instance, the prior of the new cluster drops to 0.1 and 0.25 for the 10th observation when ⍺ = 1 and 3, respectively. Furthermore, the prior in the latent cause model is governed by not only α but also g, a scaling parameter for the temporal difference between successive observations (see Results in the manuscript) following “distance-dependent” CRP, then normalized over all latent causes including a new latent cause. Thus, it does not necessarily imply that ⍺ greater than 1 forces agents to infer a new latent cause_. As shown in Appendix 2 – table 4, the number of latent causes does not inflate in each trial when _α = 1. On the other hand, the high number of latent causes due to α = 2 can be suppressed when g = 0.01. More importantly, the driving force is the prediction error generated in each trial (see also comment #31 about the interaction between ⍺ and σ<sub>x</sub><sup>2</sup>). Raising the value of ⍺ per se can be viewed as increasing the probability to infer a new latent cause, not forcing the model to do so by higher α alone. 

      During parameter exploration using the median behavioral data under a wider range of ⍺ with a lower boundary at 0.1, the estimated value eventually exceeded 1. Therefore, we set the lower bound of ⍺ to be 1 is to reduce inefficient sampling. 

      [#30] (2) Number of latent causes: Some mice infer nearly as many latent causes as trials, which seems unrealistic.

      We set the upper boundary for the maximum number of latent causes (K) to be 36 to align with the infinite features of CRP. This allowed some mice to infer more than 20 latent causes in total. When we checked the learning curves in these mice, we found that they largely fluctuated or did not show clear decreases during the extinction (Author response image 7, colored lines). The simulated learning curves were almost flat in these trials (Author response image 7, gray lines). It might be difficult to estimate the internal states of such atypical mice if the sampling process tried to fit them by increasing the number of latent causes. Nevertheless, most of the samples have a reasonable total number of latent causes: 12-month-old control mice, Mdn = 5, IQR = 4; 12-month-old App<sup>NL-G-F</sup> mice, Mdn = 5, IQR = 1.75; 6-month-old control mice, Mdn = 7, IQR = 12.5; 6-month-old App<sup>NL-G-F</sup> mice, Mdn = 5, IQR = 5.25. These data were provided in Tables S9 and S12.  

      Author response image 7.

      Samples with a high number of latent causes. Observed CR (colored line) and simulated CR (gray line) for individual samples with a total number of inferred latent causes exceeding 20. 

      [#31] (3) Parameter estimation: With 10 parameters fitting one-dimensional curves, many parameters (e.g., α and σx) are likely highly correlated and poorly identified. Consider presenting scatter plots of the parameters (e.g., α vs σx) in the Supplement.

      We have provided the scatter plots with a correlation matrix in Figure 4 – figure supplement 1 for the 12-month-old group and Figure 5 – figure supplement 1 for the 6-month-old group. As pointed out by the reviewer, there are significant rank correlations between parameters including ⍺ and σ<sub>x</sub><sup>2</sup> in both the 6 and 12-month-old groups. However, we also noted that there are no obvious linear relationships between the parameters.

      The correlation above raises a potential problem of non-identifiability among parameters. First, we computed the variance inflation index (VIF) for all parameters to examine the risk of multicollinearity, though we did not consider a linear regression between parameters and DI in this study. All VIF values were below the conventional threshold 10 (Appendix 2 – table 5), suggesting that severe multicollinearity is unlikely to bias our conclusions. Second, we have conducted the simulation with different combinations of ⍺, σ<sub>x</sub><sup>2</sup>, and K to clarify their contribution to overgeneralization and overdifferentiation observed in the 12-month-old group. 

      In Appendix 2 – table 6, the values of ⍺ and σ<sub>x</sub><sup>2</sup> were either their upper or lower boundary set in parameter estimation, while the value K was selected heuristically to demonstrate its effect. Given the observed positive correlation between alpha and σ<sub>x</sub><sup>2</sup>, and their negative correlation with K (Figure 4 - figure supplement 1), we consider the product of K \= {4, 35}, ⍺ \= {1, 3} and σ<sub>x</sub><sup>2</sup> \= {0.01, 3}. Among these combinations, the representative condition for the control group is α = 3, σ<sub>x</sub><sup>2</sup> = 3, and that for the App<sup>NL-G-F</sup> group is α = 1, σ<sub>x</sub><sup>2</sup> = 0.01. In the latter condition, overgeneralization and overdifferentiation, which showed higher test 1 CR, lower number of acquisition latent causes (K<sub>acq</sub>), lower test 3 CR, lower DI between test 3 and test 1, and higher number of latent causes after extinction (K<sub>rem</sub>), was extremely induced. 

      We found conditions that fall outside of empirical correlation, such as ⍺ = 3, σ<sub>x</sub><sup>2</sup> = 0.01, also reproduced overgeneralization and overdifferentiation. Similarly, the combination, ⍺ = 1, σ<sub>x</sub><sup>2</sup> = 3, exhibited control-like behavior when K = 4 but shifted toward App<sup>NL-G-F</sup>-like behavior when K = 36. The effect of K was also evident when ⍺ = 3 and σ<sub>x</sub><sup>2</sup> = 3, where K = 36 led to over-differentiation. We note that these conditions were artificially set and likely not representative of biologically plausible. These results underscore the non-identifiability concern raised by the reviewer. Therefore, we acknowledge that merely attributing overgeneralization to lower ⍺ or overdifferentiation to lower σ<sub>x</sub><sup>2</sup> may be overly reductive. Instead, these patterns likely arise from the joint effect of ⍺, σ<sub>x</sub><sup>2</sup>, and K. We have revised the manuscript accordingly in Results and Discussion (page 11-13, 18-19).

      [#32] (4) Data normalization: Normalizing the data between 0 and 1 removes the interpretability of % freezing, making mice with large changes in freezing indistinguishable seem similar to mice with small changes.

      As we describe in our reply to comment #26, the conditioned response in the latent cause model was scaled between 0 and 1, and we assume 0 and 1 mean the minimal and maximal CR within each mouse, respectively. Furthermore, although we initially tried to fit simulated CRs to raw CRs, we found that the fitting level was low due to the individual difference in the degree of behavioral expression: some mice exhibited a larger range of CR, while others showed a narrower one. Thus, we decided to normalize the data. We agree that this processing will make the mice with high changes in freezing% indistinguishable from those with low changes. However, the freezing% changes within the mouse were preserved and did not affect the discrimination index.

      [#33] (5) Overlooking parameter differences: Differences in parameters, like w<sub>0</sub>, that didn't fit the hypothesis may have been ignored.

      Our initial hypothesis is that internal states were altered in App<sup>NL-G-F</sup> mice, and we did not have a specific hypothesis on which parameter would contribute to such a state. We mainly focus on the parameters (1) that are significantly different between control and App</sup>NL-G</sup>- mice and (2) that are significantly correlated to the empirical behavioral data, DI between test 3 and test 1. 

      In the 12-month-old group, besides ⍺ and σ<sub>x</sub><sup>2</sup>, w<sub>0</sub> and K showed marginal p-value in Mann-Whitney U test (Table S7) and moderate correlation with the DI (Table S8). While differences in K were already discussed in the manuscript, we did miss the point that w<sub>0</sub> could contribute to the differences in w between control and App<sup>NL-G-F</sup> (Figure 4G) in the previous manuscript. We explain the contribution of w<sub>0</sub> on the reinstatement results here. When other parameters are fixed, higher w<sub>0</sub> would lead to higher CR in test 3, because higher w<sub>0</sub> would allow increasing w<sub>context</sub> by the unsignaled shock, leading to reinstatement (Appendix 2 – table 7). It is likely that higher w<sub>0</sub> would be sampled through the parameter estimation in the 12-month-old control but not App<sup>NL-G-F</sup>. On the other hand, the number of latent causes is not sensitive to w<sub>0</sub> when other parameters were fixed at the initial guess value (Appendix 2 – table 1), suggesting w<sub>0</sub> has a small contribution to memory modification process. 

      Thus, we speculate that although the difference in w<sub>0</sub> between control and App<sup>NL-G-F</sup> mice may arise from the sampling process, resulting in a positive correlation with DI between test 3 and test 1, its contribution to diverged internal states would be smaller relative to α or σ<sub>x</sub><sup>2</sup> as a wide range of w<sub>0</sub> has no effect on the number of latent causes (Appendix 2 – table 7). We have added the discussion of differences in w<sub>0</sub> in the 12-month-old group in manuscript Line 357-359.

      In the 6-month-old group, besides ⍺ and σ<sub>x</sub><sup>2</sup>, 𝜃 is significantly higher in the AD mice group (Table S10) but not correlated with the DI (Table S11). We have already discussed this point in the manuscript.  

      [#34] (6) Initial response: Higher initial responses in the model at the start of the experiment may reflect poor model fit.

      Please refer to our reply to comment #26 for our explanation of what contributes to high initial responses in the latent cause model.

      In addition, achieving a good fit for the acquisition CRs was not our primary purpose, as the response measured in the acquisition phase includes not only a conditioned response to the CS and context but also an unconditioned response to the novel stimuli (CS and US). This mixed response presumably increased the variance of the measured freezing rate over individuals, therefore we did not cover the results in the discussion.

      Rather, we favor models at least replicating the establishment of conditioning, extinction and reinstatement of fear memory in order to explain the memory modification process. As we mentioned in the reply for comment #4, alternative models, the latent state model and the Rescorla-Wagner model, failed to replicate the observation (cf. Figure 3 – figure supplement 1A-1C). Thus, we chose to stand on the latent cause model as it aligns better with the purpose of this study. 

      [#35] In addition, please be transparent if data is excluded, either during the fitting procedure or when performing one-way ANCOVA. Avoid discarding data when possible, but if necessary, provide clarity on the nature of excluded data (e.g., how many, why were they excluded, which group, etc?).

      We clarify the information of excluded data as follows. We had 25 mice for the 6-month-old control group, 26 mice for the 6-month-old App<sup>NL-G-F</sup> group, 29 mice for the 12-month-old control group, and 26 mice for the 12-month-old App<sup>NL-G-F</sup> group (Table S1). 

      Our first exclusion procedure was applied to the freezing rate data in the test phase. If the mouse had a freezing rate outside of the 1.5 IQR in any of the test phases, it is regarded as an outlier and removed from the analysis (see Statistical analysis in Materials and Methods). One mouse in the 6-month-old control group, one mouse in the 6-month-old App<sup>NL-G-F</sup> group, five mice in the 12-month-old control group, and two mice in the 12-month-old App<sup>NL-G-F</sup> group were excluded.

      Our second exclusion procedure was applied during the fitting and parameter estimation (see parameter estimation in Materials and Methods). We have provided the number of anomaly samples during parameter estimation in Appendix 1 – figure 2.   

      Lastly, we would like to state that all the sample sizes written in the figure legends do not include outliers detected through the exclusion procedure mentioned above.

      [#36] Finally, since several statistical tests were used and the differences are small, I suggest noting that multiple comparisons were not controlled for, so p-values should be interpreted cautiously.

      We have provided power analyses in Tables S21 and S22 with methods described in the manuscript (Line 897-898) and added a note that not all of the multiple comparisons were corrected for in the manuscript (Line 898-899).

      References cited in the response letter only 

      Bellio, T. A., Laguna-Torres, J. Y., Campion, M. S., Chou, J., Yee, S., Blusztajn, J. K., & Mellott, T. J. (2024). Perinatal choline supplementation prevents learning and memory deficits and reduces brain amyloid Aβ42 deposition in App<sup>NL-G-F</sup> Alzheimer’s disease model mice. PLOS ONE, 19(2), e0297289. https://doi.org/10.1371/journal.pone.0297289

      Blei, D. M., & Frazier, P. I. (2011). Distance Dependent Chinese Restaurant Processes. Journal of Machine Learning Research, 12(74), 2461–2488.

      Cochran, A. L., & Cisler, J. M. (2019). A flexible and generalizable model of online latent-state learning. PLOS Computational Biology, 15(9), e1007331. https://doi.org/10.1371/journal.pcbi.1007331

      Curiel Cid, R. E., Crocco, E. A., Duara, R., Vaillancourt, D., Asken, B., Armstrong, M. J., Adjouadi, M., Georgiou, M., Marsiske, M., Wang, W., Rosselli, M., Barker, W. W., Ortega, A., Hincapie, D., Gallardo, L., Alkharboush, F., DeKosky, S., Smith, G., & Loewenstein, D. A. (2024). Different aspects of failing to recover from proactive semantic interference predicts rate of progression from amnestic mild cognitive impairment to dementia. Frontiers in Aging Neuroscience, 16. https://doi.org/10.3389/fnagi.2024.1336008

      Giustino, T. F., Fitzgerald, P. J., Ressler, R. L., & Maren, S. (2019). Locus coeruleus toggles reciprocal prefrontal firing to reinstate fear. Proceedings of the National Academy of Sciences, 116(17), 8570–8575. https://doi.org/10.1073/pnas.1814278116

      Gu, X., Wu, Y.-J., Zhang, Z., Zhu, J.-J., Wu, X.-R., Wang, Q., Yi, X., Lin, Z.-J., Jiao, Z.-H., Xu, M., Jiang, Q., Li, Y., Xu, N.-J., Zhu, M. X., Wang, L.-Y., Jiang, F., Xu, T.-L., & Li, W.-G. (2022). Dynamic tripartite construct of interregional engram circuits underlies forgetting of extinction memory. Molecular Psychiatry, 27(10), 4077–4091. https://doi.org/10.1038/s41380-022-01684-7

      Lacagnina, A. F., Brockway, E. T., Crovetti, C. R., Shue, F., McCarty, M. J., Sattler, K. P., Lim, S. C., Santos, S. L., Denny, C. A., & Drew, M. R. (2019). Distinct hippocampal engrams control extinction and relapse of fear memory. Nature Neuroscience, 22(5), 753–761. https://doi.org/10.1038/s41593-019-0361-z

      Loewenstein, D. A., Curiel, R. E., Greig, M. T., Bauer, R. M., Rosado, M., Bowers, D., Wicklund, M., Crocco, E., Pontecorvo, M., Joshi, A. D., Rodriguez, R., Barker, W. W., Hidalgo, J., & Duara, R. (2016). A Novel Cognitive Stress Test for the Detection of Preclinical Alzheimer’s Disease: Discriminative Properties and Relation to Amyloid Load. The American Journal of Geriatric Psychiatry : Official Journal of the American Association for Geriatric Psychiatry, 24(10), 804–813. https://doi.org/10.1016/j.jagp.2016.02.056

      Loewenstein, D. A., Greig, M. T., Curiel, R., Rodriguez, R., Wicklund, M., Barker, W. W., Hidalgo, J., Rosado, M., & Duara, R. (2015). Proactive Semantic Interference Is Associated With Total and Regional Abnormal Amyloid Load in Non-Demented Community-Dwelling Elders: A Preliminary Study. The American Journal of Geriatric Psychiatry : Official Journal of the American Association for Geriatric Psychiatry, 23(12), 1276–1279. https://doi.org/10.1016/j.jagp.2015.07.009

      Valles-Salgado, M., Gil-Moreno, M. J., Curiel Cid, R. E., Delgado-Á lvarez, A., Ortega-Madueño, I., Delgado-Alonso, C., Palacios-Sarmiento, M., López-Carbonero, J. I., Cárdenas, M. C., MatíasGuiu, J., Díez-Cirarda, M., Loewenstein, D. A., & Matias-Guiu, J. A. (2024). Detection of cerebrospinal fluid biomarkers changes of Alzheimer’s disease using a cognitive stress test in persons with subjective cognitive decline and mild cognitive impairment. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1373541

      Zaki, Y., Mau, W., Cincotta, C., Monasterio, A., Odom, E., Doucette, E., Grella, S. L., Merfeld, E., Shpokayte, M., & Ramirez, S. (2022). Hippocampus and amygdala fear memory engrams reemerge after contextual fear relapse. Neuropsychopharmacology, 47(11), 1992–2001. https://doi.org/10.1038/s41386-022-01407-0

    1. eLife Assessment

      This manuscript describes an AI-automated microscopy-based approach to characterize both bacterial and host cell responses associated with Shigella infection of epithelial cells. The methodology is compelling and should be helpful for investigators studying a variety of intracellular pathogens. The authors have acquired important findings regarding host and bacterial responses in the context of infection, which should be followed up with further mechanistic-based studies.

    2. Reviewer #2 (Public review):

      Summary:

      Septin caging has emerged as one of the innate immune response of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      Weaknesses:

      As the main aim of the manuscript is to described the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

    3. Reviewer #4 (Public review):

      Summary

      In this study, López-Jiménez and colleagues demonstrate the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608).

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and this reference has been more fully discussed in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and excitement for our results, including our findings on T3SS activity and Shigella-septin interactions. In accordance with the Reviewer’s comments, we avoid overselling our data in the revised version of the manuscript.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescently labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To conclude that cell volume is indeed increased, the investigators should consider staining the cells with markers that demarcate cell boundaries and/or are confined to the cytosol, i.e., a cell tracker dye.

      Staining using our SEPT7 antibody enables us to define cell boundaries for cellular area measurements (Novel Figure 1 - figure supplement 1A). However, we agree with the Reviewer that staining cells with additional markers (such as a cell tracker dye) would be required to conclude that cell volume is increased. We therefore adjust our claims in the main text (lines 107-115 and 235-246).

      (2) Line 27: I understand what is meant by "recruited to actively pathogenic bacteria with increased T3SS activation." However, one could argue that there are many different roles of the intracytosolic bacteria in pathogenesis in terms of pathogenesis, not just actively secreting effectors.

      T3SS secretion by cytosolic bacteria is tightly regulated and both T3SS states (active, inactive) likely contribute to the pathogenic lifestyle of S. flexneri. In agreement with this, we removed this statement from the manuscript (lines 27, 225 and 274).

      (3) Line 88: Please clarify in the text that HeLa cells are being studied.

      We explicitly mention that the epithelial cell line we study is HeLa in the main text (line 93), in addition to the Materials and methods (line 328).

      (4) Line 97: is it possible to quantify the average distance of the nuclei from the cell perimeter? This would help provide some context as to what it means to be a certain distance from the nucleus, i.e., is there another way to point out that distance from nuclei correlates with movement inward post-invasion at the periphery?

      To provide more context to the inward movement of bacteria to the cell centre, we provide calculations based on measurements in Figure 1G, I. If we approximate geometric shape of both cells and nucleus to a circle, the median radius of a HeLa cell is 31.1 µm<sup>2</sup> (uninfected cell) and 36.3 µm<sup>2</sup> (infected cell). Similarly, the median radius of the nucleus is 22.2 µm<sup>2</sup> (uninfected cell) and 24.57 µm<sup>2</sup> (infected cell).

      However, we note that Figure 1F shows distance of bacteria to the centroid of the cell, which is the geometric centre of the cell, and which does not necessarily coincide with the geometric centre of the nucleus. We also note that nuclear area increases with infection (in a bacterial dose dependent manner). Finally, we note that these measurements are performed on max projections of 3D Z-stacks. In this case we cannot fully appreciate distance to the nucleus for bacteria located above it.

      (5) Lines 212-213 - there is no Figure 9A, B - I think this should be Figure 7A, B.

      Text has been updated (lines 216-217).

      Reviewer #2 (Recommendations For The Authors):

      Testing the analysis pipeline as a proof-of-concept question such as the comparison of caging around the laboratory strain as compared to one or a few clinical isolates or mutants of interest would help stress the relevance of this new, remarkable tool.

      We thank the Reviewer for their enthusiasm.

      Future research in the Mostowy lab will capitalise on the high-content tools generated here to explore the frequency and heterogeneity of septin cage entrapment for a wide variety of S. flexneri mutants and Shigella clinical isolates.

      The sentence in line 215 ends with "in agreement with" followed by a reference.

      Text has been updated (line 219).

      The sentence in line 217 on the correlation between caging and T3SS is not very clear.

      Text has been clarified (lines 221-223).

      There is a typo in line 219 : "protrusSions"

      Text has been updated (line 223).

      Reviewer #3 (Recommendations For The Authors):

      Major points

      The quantitative analysis approach in Figure 1 has multiple issues. Some examples:<br /> (1) How was the cell area estimated? Normally, a marker for the whole cell (CellMask or similar) or cells expressing GFP would be good indicators. Here it is not clear to me what was done.

      The cell area was estimated using SEPT7 antibody staining which is enriched under the cell cortex. CellProfiler was used to segment cells based on SEPT7 staining, using a propagation method from the identified nucleus based on Otsu thresholding. To provide more clarity on how this was performed, we now include a new figure (Figure 1- figure supplement 1A) showing a representative image of HeLa cells stained with SEPT7 and the corresponding cell segmentation performed with CellProfiler software, together with an updated figure legend explaining the procedure (lines 784–787).

      (2) The authors use Hoechst and integrated z-projections (Figure 1 S1) as a proxy to estimate nuclear volume. Hoechst staining depends on the organization of the DNA within the nucleus and I find that the authors need to do better controls to estimate nuclear size - this would be possible with cells expressing fluorescently labeled histones, or even better with a fluorescently tagged nuclear pore/envelope marker. The current quantification approach is misleading.

      We understand Reviewer #3’s concerns about using Hoechst staining as a proxy of nuclear volume, due to potential differences in DNA organisation within the nucleus.

      Following the recommendation of Reviewer #3 in the following point 3, text has been updated (lines 107–115 and 235-246).

      (3) Was cell density assessed for the measurements? If cells are confluent, bacteria could spread between cells within 3 hrs, if cells are less dense, this does not occur. When epithelial cells are infected for some hours, they have the tendency to round up a bit (and to appear thicker in z), but a bit smaller in xy. My suggestion to the authors (as they use these findings to follow up with experiments on the underlying processes) would be to tone down their statements - eg, Hoechst staining could be simply indicated as altered, but not put in a context of size (this would require substantial control experiments).

      Local cell density was not directly measured, but the experiment was set up to infect at roughly 80% confluency (cells were seeded at 10<sup>4</sup> cells/well 2 days prior to infection in a 96-well microplate, as described in the Materials and methods section) and to ensure bacterial spread between cells.

      In agreement with Reviewer #3 we tone down statements in the main text (see response to point 2 above).

      In addition, I found Figure 1 (and parts of Figure 2) disconnected from the rest of the manuscript, and it may even be an idea to take it out of the manuscript (that could also help to deal with my feedback relating to Figure 1). I would suggest starting the manuscript with the current Figure 3 and building the biological story with a stronger focus on SEPT7 (and its links with T3 secretion and actively pathogenic bacteria) from there on. As it stands, the two parts of the manuscript are not well connected.

      We carefully considered this comment but following revisions we have not reorganised the manuscript. We believe that high-content characterisation of S. flexneri infection in Figure 1 and 2 provides insightful information about changes in host cells in response to infection. Following this, we move onto characterising intracellular bacteria (and in particular those entrapped in septin cages) in the second part of the manuscript (Figure 3-7). Similar methods were used to analyse both host and bacterial cells and results obtained offer complementary views on host-pathogen interactions.

      My major reservation with the experimental work of the current version of the manuscript relates to Figure 5: The analysis of the septin phenotypes in Figure 5 seems to be problematic - to me, it appears that analysis and training were done on projected image stacks. As bacteria are rod-shaped their orientation in space has an enormous impact on how the septin signal appears in a projection - this can lead to wrong interpretation of the phenotypes. The authors need to do some quantitative controls analyzing their data in 3D. To be more clear: the example "tight" (second row) shows a bacterium that appears short. It may be that it's actually longer if one looks in 3D, and the septin signal could possibly fall in the category "rings" or even "two poles".

      The deep learning training and subsequent analysis of septin-cage entrapment is done on projected Z-stacks, which presents limitations. Future work in the Mostowy lab will exploit this first study and dive deeper into 3D aspects of the data.

      To address Reviewer #3’s concern, we include a sentence explaining that this analysis was performed using 2D max projections (lines 708 and 724), as well as acknowledging its limitations in the main text (lines 259-262).

      Minor points

      The scale bar in Fig 1 is very thin.

      We corrected the scale bar in Fig. 1 to make it more visible.

      Could it be that Figure 1F is swapped with Figure1E in the description?

      Descriptions for Figure 1E and F are correct.

      Line 27: what does "actively pathogenic bacteria" mean? I propose to change the term.

      We agree with Reviewer #3 that “actively pathogenic bacteria” should be removed from the text. This update is also in agreement with Reviewer #1 (see Reviewer #1 point 2).

      Line 28: "dynamics" can be confusing as it relates to dynamic events imaged by time-lapse.

      Although we are making a snapshot of the infection process at 3 hpi, we capture asynchronous processes in both host and bacterial cells (eg. host cells infected with different bacterial loads, bacterial cells undergoing actin polymerisation or septin cage entrapment). We agree that we are not following dynamics of full events over time. However, our high content approach enables us to capture different stages of dynamic processes. To avoid confusion, we replace “dynamics” by “diverse interactions” (line 28), and we discuss the importance of follow-up studies studying microscopy timelapses (line 274).

      Paragraph 59 following: the concept of heterogeneity was investigated in some detail for viral infection by the Pelkmans group (PMID: 19710653) using advanced image analysis tools. Advanced machine-learning-based analysis was then performed on Salmonella invasion by Voznica and colleagues (PMID: 29084895). It would be great to include these somewhat "old" works here as they really paved the way for high-content imaging, and the way analyses were performed then should be also discussed in light of how analyses can be performed now with the approaches developed by the authors.

      We agree. These landmark studies have now been included in the main text (lines 71-74).

      Line 181: I do not know what "morphological conformations" means, perhaps the authors can change the wording or clarify.

      We substituted the phrase “morphological conformations” by “morphological patterns” to improve clarity in the main text (lines 185).

      The authors claim (eg in the abstract) that they are measuring the dynamic infection process. To me, it appears that they look at one time-point, so no dynamic information can be extracted. I suggest that the authors tone down their claims.

      Please note our response above (Minor points, Line 28) which also refers to this question.

    1. eLife Assessment

      This study presents a computational-experimental workflow for optimizing RNA aptamers targeting SARS-CoV-2 RBD. While the integrated approach combining docking, molecular dynamics, and experimental validation shows some promise, the useful findings are limited by the extremely weak binding affinities (>100 µM KD) and restriction to a single target system. The evidence is incomplete, with experimental design issues in the antibody competition assays and a lack of specificity testing undermining confidence in the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors attempt to devise general rules for aptamer design based on structure and sequence features. The main system they are testing is an aptamer targeting a viral sequence.

      Strengths:

      The method combines a series of well-established protocols, including docking, MD, and a lot of system-specific knowledge, to design several new versions of the Ta aptamer with improved binding affinity.

      Weaknesses:

      The approach requires a lot of existing knowledge and, importantly, an already known aptamer, which presumably was found with Selex. In addition, although the aptamer may have a stronger binding affinity, it is not clear if any of it has any additional useful properties such as stability, etc.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript proposes a workflow for discovering and optimizing RNA aptamers, with application in the optimization of a SARS-CoV-2 RBD. The authors took a previously identified RNA aptamer, computationally docked it into one specific RBD structure, and searched for variants with higher predicted affinity. The variants were subsequently tested for RBD binding using gel retardation assays and competition with antibodies, and one was found to be a stronger binder by about three-fold than the founding aptamer.

      Overall, this would be an interesting study if it were performed with truly high-affinity aptamers, and specificity was shown for RBD or several RBD variants.

      Strengths:

      The computational workflow appears to mostly correctly find stronger binders, though not de novo binders.

      Weaknesses:

      (1) Antibody competition assays are reported with RBD at 40 µM, aptamer at 5 µM, and a titration of antibody between 0 and 1.2 µg. This approach does not make sense. The antibody concentration should be reported in µM. An estimation of the concentration is 0-8 pmol (from 0-1.2 µg), but that's not a concentration, so it is unknown whether enough antibody molecules were present to saturate all RBD molecules, let alone whether they could have displaced all aptamers.

      (2) These are not by any means high-affinity aptamers. The starting sequence has an estimated (not measured, since the titration is incomplete) KD of 110 µM. That's really the same as non-specific binding for an interaction between an RNA and a protein. This makes the title of the manuscript misleading. No high-affinity aptamer is presented in this study. If the docking truly presented a bound conformation of an aptamer to a protein, a sub-micromolar Kd would be expected, based on the number of interactions that they make.

      (3) The binding energies estimated from calculations and those obtained from the gel-shift experiments are vastly different, as calculated from the Kd measurements, making them useless for comparison, except for estimating relative affinities.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors attempt to devise general rules for aptamer design based on structure and sequence features. The main system they are testing is an aptamer targeting a viral sequence.

      Strengths:

      The method combines a series of well-established protocols, including docking, MD, and a lot of system-specific knowledge, to design several new versions of the Ta aptamer with improved binding affinity.

      We thank the reviewer for this accurate summary and for recognizing the strength of our integrated computational–experimental workflow in improving aptamer affinity. We will emphasize this contribution more clearly in the revised Introduction.

      Weaknesses:

      The approach requires a lot of existing knowledge and, impo rtantly, an already known aptamer, which presumably was found with SELEX. In addition, although the aptamer may have a stronger binding affinity, it is not clear if any of it has any additional useful properties such as stability, etc.

      Thanks for these critical comments.

      (1) On the reliance on a known aptamer: We agree that our CAAMO framework is designed as a post-SELEX optimization platform rather than a tool for de novo discovery. Its primary utility lies in rationally enhancing the affinity of existing aptamers that may not yet be sequence-optimal, thereby complementing experimental technologies such as SELEX. In the revised manuscript, we plan to clarify this point more explicitly in both the Introduction and Discussion sections, emphasizing that the propose CAAMO framework is intended to serve as a complementary strategy that accelerates the iterative optimization of lead aptamers.

      (2) On stability and developability: We also appreciate the reviewer’s important reminder that affinity alone is not sufficient for therapeutic development. We acknowledge that the present study has focused mainly on affinity optimization, and properties such as nuclease resistance, structural stability, and overall developability were not evaluated. In the revised manuscript, we will add a dedicated section highlighting the critical importance of these characteristics and outlining them as key priorities for our future research efforts.

      Reviewer #2 (Public review):

      Summary:

      This manuscript proposes a workflow for discovering and optimizing RNA aptamers, with application in the optimization of a SARS-CoV-2 RBD. The authors took a previously identified RNA aptamer, computationally docked it into one specific RBD structure, and searched for variants with higher predicted affinity. The variants were subsequently tested for RBD binding using gel retardation assays and competition with antibodies, and one was found to be a stronger binder by about three-fold than the founding aptamer. Overall, this would be an interesting study if it were performed with truly high-affinity aptamers, and specificity was shown for RBD or several RBD variants.

      Strengths:

      The computational workflow appears to mostly correctly find stronger binders, though not de novo binders.

      We thank the reviewer for the clear summary and for acknowledging that our workflow effectively prioritizes stronger binders.

      Weaknesses:

      (1) Antibody competition assays are reported with RBD at 40 µM, aptamer at 5 µM, and a titration of antibody between 0 and 1.2 µg. This approach does not make sense. The antibody concentration should be reported in µM. An estimation of the concentration is 0-8 pmol (from 0-1.2 µg), but that's not a concentration, so it is unknown whether enough antibody molecules were present to saturate all RBD molecules, let alone whether they could have displaced all aptamers.

      Thanks for your insightful comment. We have calculated that 0–1.2 µg antibody corresponds to a final concentration range of 0–1.6 µM (see Author response image 1). In practice, 1.2 µg was the maximum amount of commercial antibody that could be added under the conditions of our assay. In the revised manuscript, we plan to report all antibody quantities in molar concentrations in the Materials and Methods section for clarity and rigor.

      Author response image 1.<br /> Estimation of antibody concentration. Assuming a molecular weight of 150 kDa, dissolving 1.2 µg of antibody in a 5 µL reaction volume results in a final concentration of 1.6 µM.<br />

      As shown in Figure 5D of the main text, the purpose of the antibody–aptamer competition assay was not to achieve full saturation but rather to compare the relative competitive binding of the optimized aptamer (Ta<sup>G34C</sup>) versus the parental aptamer (Ta). Molecular interactions at this scale represent a dynamic equilibrium of binding and dissociation. While the antibody concentration may not have been sufficient to saturate all available RBD molecules, the experimental results clearly reveal the competitive binding behavior that distinguishes the two aptamers. Specifically, two consistent trends emerged:

      (1) Across all antibody concentrations, the free RNA band for Ta was stronger than that of Ta<sup>G34C</sup>, while the RBD–RNA complex band of the latter was significantly stronger, indicating that Ta<sup>G34C</sup>bound more strongly to RBD.

      (2) For Ta, increasing antibody concentration progressively reduced the RBD–RNA complex band, consistent with antibody displacing the aptamer. In contrast, for Ta<sup>G34C</sup>, the RBD–RNA complex band remained largely unchanged across all tested antibody concentrations, suggesting that the antibody was insufficient to displace Ta<sup>G34C</sup> from the complex.

      Together, these observations support the conclusion that Ta<sup>G34C</sup> exhibits markedly stronger binding to RBD than the parental Ta aptamer, in line with the predictions and objectives of our CAAMO optimization framework.

      (2) These are not by any means high-affinity aptamers. The starting sequence has an estimated (not measured, since the titration is incomplete) KD of 110 µM. That's really the same as non-specific binding for an interaction between an RNA and a protein. This makes the title of the manuscript misleading. No high-affinity aptamer is presented in this study. If the docking truly presented a bound conformation of an aptamer to a protein, a sub-micromolar Kd would be expected, based on the number of interactions that they make.

      In fact, our starting sequence (Ta) is a high-affinity aptamer, and then the optimized sequences (such as Ta<sup>G34C</sup>) with enhanced affinity are undoubtedly also high-affinity aptamers. See descriptions below:

      (1) Origin and prior characterization of Ta. The starting aptamer Ta (referred to as RBD-PB6-Ta in the original publication by Valero et al., PNAS 2021, doi:10.1073/pnas.2112942118) was selected through multiple positive rounds of SELEX against SARS-CoV-2 RBD, together with counter-selection steps to eliminate non-specific binders. In that study, Ta was reported to bind RBD with an IC₅₀ of ~200 nM as measured by biolayer interferometry (BLI), supporting its high affinity and specificity.

      (2) Methodological differences between EMSA and BLI measurements. We acknowledge that the discrepancy between our obtained binding affinity (K<sub>d</sub> = 110 µM) and the previously reported one (IC₅₀ ~ 200 nM) for the same Ta sequence arises primarily from methodological and experimental differences between EMSA and BLI. Namely, different experimental measurement methods can yield varied binding affinity values. While EMSA may have relatively low measurement precision, its relatively simple procedures were the primary reason for its selection in this study. Particularly, our framework (CAAMO) is designed not as a tool for absolute affinity determination, but as a post-SELEX optimization platform that prioritizes relative changes in binding affinity under a consistent experimental setup. Thus, the central aim of our work is to demonstrate that CAAMO can reliably identify variants, such as Ta<sup>G34C</sup>, that bind more strongly than the parental sequence under identical assay conditions.

      (3) Evidence of specific binding in our assays. We emphasize that the binding observed in our EMSA experiments reflects genuine aptamer–protein interactions. As shown in Figure 2G of the main text, a control RNA (Tc) exhibited no detectable binding to RBD, whereas Ta produced a clear binding curve, confirming that the interaction is specific rather than non-specific.

      (3) The binding energies estimated from calculations and those obtained from the gel-shift experiments are vastly different, as calculated from the Kd measurements, making them useless for comparison, except for estimating relative affinities.

      We thank the reviewer for raising this important point. CAAMO was developed as a post-SELEX optimization tool with the explicit goal of predicting relative affinity changes (ΔΔG) rather than absolute binding free energies (ΔG). Empirically, CAAMO correctly predicted the direction of affinity change for 5 out of 6 designed variants (e.g., ΔΔG < 0 indicates enhanced binding free energy relative to WT); such predictive power for relative ranking is highly valuable for prioritizing candidates for experimental testing. Our prior work on RNA–protein interactions likewise supports the reliability of relative affinity predictions (see: Nat Commun 2023, doi:10.1038/s41467-023-39410-8). In the revised manuscript we will explicitly state that the primary utility of CAAMO is to accurately predict affinity trends and to rank variants for follow-up, and we will moderate any statements that could be interpreted as claims about precise absolute ΔΔG values.

    1. eLife Assessment

      This study is a valuable contribution to the evidence base. However, the evidence provided is incomplete as the study results only partially support the study conclusions. Addressing the methodological and reporting issues raised by the peer reviewers and properly aligning the claim made for providing a tool for early warning with the study analysis/results would improve the study quality and usefulness of its findings.

    2. Reviewer #1 (Public review):

      This is my first review of this manuscript. The authors included previous reviews for a different journal with a length of 90 and 39 pages; I did not review this reply in my assessment of the paper itself. Influenza prediction is not my area of expertise.

      A major concern is that the model is trained in the midst of the COVID-19 pandemic and its associated restrictions and validated on 2023 data. The situation before, during, and after COVID is fluid, and one may not be representative of the other. The situation in 2023 may also not have been normal and reflective of 2024 onward, both in terms of the amount of testing (and positives) and measures taken to prevent the spread of these types of infections. A further worry is that the retrospective prospective split occurred in October 2020, right in the first year of COVID, so it will be impossible to compare both cohorts to assess whether grouping them is sensible.

      The outcome of interest is the number of confirmed influenza cases. This is not only a function of weather, but also of the amount of testing. The amount of testing is also a function of historical patterns. This poses the real risk that the model confirms historical opinions through increased testing in those higher-risk periods. Of course, the models could also be run to see how meteorological factors affect testing and the percentage of positive tests. The results only deal with the number of positive (only the overall number of tests is noted briefly), which means there is no way to assess how reasonable and/or variable these other measures are. This is especially concerning as there was massive testing for respiratory viruses during COVID in many places, possibly including China.

      (1) Although the authors note a correlation between influenza and the weather factors. The authors do not discuss some of the high correlations between weather factors (e.g., solar radiation and UV index). Because of the many weather factors, those plots are hard to parse.

      (2) The authors do not actually compare the results of both methods and what the LSTM adds.

      Minor comments:

      (3) The methods are long and meandering. They could be cleaned up and shortened. E.g., there is no need for 30 lines on PCR testing; the study area should come before the study design. The authors discuss similar elements in multiple places; this whole section can be shortened considerably without affecting the content.

      (4) How reliable is the "Our Word in Data" website for subnational coverage of restrictions? Some of the authors are from Putian and should be able to confirm the accuracy for both studied areas.

      (5) Figure 2A is hard to parse; it would make more sense to plot these as line plots (y=count, x=month).

    3. Reviewer #2 (Public review):

      Summary:

      The study aimed to assess the associations between meteorological drivers and influenza is important although not new. The authors used only 6 years of surveillance data and deep learning models, combining distributed lag non-linear models (DLNM) with Bayesian-optimized LSTM neural networks for predictive modeling. The key interest in this area is to explore the subtropical locations, where influenza is less common and circulates year-round. The authors further claimed that such an association could be able to provide an early warning in the community. In this direction, the current manuscript has several scopes of improvements and clarification of the claims, as I list here.

      Strengths:

      Study design based on a prospective cohort to analyse the data for retrospective outcomes.

      Weaknesses:

      (1) The rationale of the study is not clearly stated.

      (2) Several issues with methodological and data integration should be clarified.

      (3) Validation of the models is not presented clearly.

      (4) The claim for providing tools for 'early warning' was not validated by analysis and results.

    4. Author response:

      Reviewer # 1 (Public review):

      A major concern is that the model is trained in the midst of the COVID-19 pandemic and its associated restrictions and validated on 2023 data. The situation before, during, and after COVID is fluid, and one may not be representative of the other. The situation in 2023 may also not have been normal and reflective of 2024 onward, both in terms of the amount of testing (and positives) and measures taken to prevent the spread of these types of infections. A further worry is that the retrospective prospective split occurred in October 2020, right in the first year of COVID, so it will be impossible to compare both cohorts to assess whether grouping them is sensible.

      We fully concur with the reviewer that the COVID-19 pandemic represents a profound confounding factor that fundamentally impacts the interpretation and generalizability of our model. This is a critical point that deserves a more thorough treatment. In the revised manuscript, we will add a dedicated subsection in the Discussion to explicitly analyze the pandemic’s impact. We will reframe our model’s contribution not as a universally generalizable tool for a hypothetical “normal” future, but as a robust framework demonstrated to capture complex epidemiological dynamics under the extreme, non-stationary conditions of a real-world public health crisis. We will argue that its strong performance on the 2023 validation data, a unique post-NPI “rebound” year, specifically showcases its utility in modeling volatile periods.

      The outcome of interest is the number of confirmed influenza cases. This is not only a function of weather, but also of the amount of testing. The amount of testing is also a function of historical patterns. This poses the real risk that the model confirms historical opinions through increased testing in those higher-risk periods. Of course, the models could also be run to see how meteorological factors affect testing and the percentage of positive tests. The results only deal with the number of positive (only the overall number of tests is noted briefly), which means there is no way to assess how reasonable and/or variable these other measures are. This is especially concerning as there was massive testing for respiratory viruses during COVID in many places, possibly including China.

      The reviewer raises a crucial point regarding surveillance bias, which is inherent in studies using reported case data. We acknowledge this limitation and will address it more transparently.

      (1) Clarification of Available Data: Our manuscript states that over the six-year period, a total of 20,488 ILI samples were tested, yielding 3,155 positive cases (line 471; Figure 1). We will make this denominator more prominent in the Methods section. However, the reviewer is correct that our models for Putian and the external validation for Sanming utilize the daily positive case counts as the outcome. The reality of our surveillance data source is that while we have the aggregate total of tests over six years, obtaining a reliable daily denominator of all respiratory virus tests conducted (not just for ILI patients as per the surveillance protocol) is not feasible. This is a common constraint in real-world public health surveillance systems.

      (2) Justification and Discussion: We will add a detailed paragraph to the Limitations section to address this. We will justify our use of case counts as it is the most direct metric for assessing public health burden and planning resource allocation (e.g., hospital beds, antivirals). We will also explain that modeling the positivity rate presents its own challenges, as the ILI denominator is also subject to biases (e.g., shifts in healthcare-seeking behavior, co-circulation of other pathogens causing similar symptoms). We will thus frame our work as forecasting the direct surveillance signal that public health officials monitor daily.

      Although the authors note a correlation between influenza and the weather factors. The authors do not discuss some of the high correlations between weather factors (e.g., solar radiation and UV index). Because of the many weather factors, those plots are hard to parse.

      This is an excellent point. Our preliminary analysis (Supplementary Figure S2) indeed confirms a strong positive correlation between solar radiation and the UV index. Perhaps the reviewer overlooked the contents of the supplementary information document. We have included the figure for their review. Our original discussion did explicitly address this multicollinearity, summarized as follows: We acknowledge the high correlation between certain meteorological variables. We then explain that our two-stage modeling approach is designed to mitigate this issue. In the first stage, the DLNM models assess the impact of each variable individually, thus isolating their non-linear and lagged effects without being confounded by interactions. In the second stage, the LSTM network, by its nature, is a powerful non-linear function approximator that is robust to multicollinearity and can learn the complex, interactive relationships between all input features, including correlated ones.

      Figure S2. Scatterplot matrix illustrating correlations between Influenza cases and meteorological factors. This comprehensive scatterplot matrix visualizes the relationships between influenza-like illness (ILI) cases, influenza A and B cases, and multiple meteorological variables, including average temperature, humidity, precipitation, wind speed, wind direction, solar radiation, and ultraviolet (UV) index. The figure is composed of three distinct sections that collectively provide an in-depth analysis of these relationships:

      (1) Upper-right triangle: This section presents a Pearson correlation coefficient matrix, with color intensity reflecting the strength of correlations between the variables. Red cells represent positive correlations, while green cells represent negative correlations. The closer the coefficient is to 1 or -1, the darker the cell and the stronger the correlation, with statistically significant correlations marked by asterisks. This matrix allows for a rapid identification of notable relationships between influenza cases and meteorological factors.

      (2) Lower-left triangle: This section contains scatterplots of pairwise comparisons between variables. These scatterplots facilitate the visual identification of potential linear or non-linear relationships, as well as any outliers or anomalies. This visualization is essential for evaluating the nature of interactions between meteorological factors and influenza cases.

      (3) Diagonal: The diagonal displays the density distribution curves for each individual variable. These curves provide an overview of the distribution characteristics of each variable, revealing central tendencies, variance, and any skewness present in the data.

      The authors do not actually compare the results of both methods and what the LSTM adds.

      We thank the reviewer for this comment and realize we may not have signposted the comparison clearly enough. Our manuscript does present a direct comparison between the LSTM and ARIMA models in the Results section (lines 737-745) and Table 2, where performance metrics (MAE, RMSE, MAPE, SMAPE) for both models on the 2023 validation set are detailed, showing LSTM’s superior performance, particularly for Influenza A. Furthermore, Figure 6 (panels A and B) visualizes the LSTM’s predictions against observed values, and Supplementary Figure S3 does the same for the ARIMA model, allowing for a visual comparison of their fit.

      To address the reviewer’s concern, in the revised manuscript, we will:

      (1) Add a more explicit comparative statement in the Results section, directly contrasting the key metrics and highlighting the LSTM’s advantages in capturing peak activities.

      (2) Consider combining the visualizations from Figure 6 and Supplementary Figure S3 into a single, more powerful comparative figure that shows the observed data, the LSTM predictions, and the ARIMA predictions on the same plot.

      Meandering methods; reliability of “Our Word in Data”; Figure 2A is hard to parse.

      We will address these points comprehensively.

      (3) Methods: We will significantly streamline and restructure the Methods section. We also wish to provide context that the manuscript’s current structure reflects an effort to incorporate feedback from multiple rounds of peer review across different journals, which may have led to some repetition. We will perform a thorough edit to improve its conciseness and logical flow.

      (4) Data Reliability: The reviewer raises a crucial and highly insightful question regarding the validity of using a national-level index to represent local public health interventions. This is a critical aspect of our model’s construction, and we are grateful for the opportunity to provide a more thorough justification.

      We acknowledge that the ideal variable would be a daily, quantitative, city-level index of non-pharmaceutical interventions (NPIs). However, the practical reality of the data landscape in China is that such granular, publicly accessible databases for subnational regions do not exist. Given this constraint, our choice of the Our World in Data (OWID) national stringency index was the result of a careful consideration process, and we believe it serves as the best available proxy for our study context.

      In the revised manuscript, we will significantly expand the Methods section to articulate our rationale, which is threefold:

      National Policy Coherence: During the COVID-19 pandemic in mainland China, core NPIs, particularly mandatory face-covering policies in shared public spaces, were implemented with a high degree of national uniformity. While local governments had some autonomy, they operated within a centrally defined framework, ensuring a baseline level of policy consistency across the country.

      Local Context Alignment: A key factor supporting the use of this national proxy is the specific epidemiological context of Putian during the study period. For the vast majority of the pandemic, Putian was classified as a low-risk area with only sporadic COVID-19 cases. Consequently, the city’s public health measures consistently aligned with the standard national guidelines. It did not experience prolonged or exceptionally strict local lockdowns that would cause a significant deviation from the national-level policy trends captured by the OWID index.

      Validation by Local Public Health Experts: Most critically, and to directly address your suggestion, our co-authors from the Putian Center for Disease Control and Prevention have meticulously reviewed the OWID stringency index against their on-the-ground, institutional knowledge of the mandates that were in effect. They have confirmed that the categorical levels (0-4) and the temporal trends of the OWID index provide a faithful representation of the public health restrictions concerning face coverings as experienced by the population of Putian.

      Therefore, we will revise our manuscript to make it clear that the use of the OWID index was not a choice of convenience, but a necessary and well-vetted decision. Given the unavailability of official local data, the OWID index, cross-validated by our local experts, represents the most rigorous and appropriate variable available to account for the profound impact of NPIs on influenza transmission in our model.

      (5) Figure 2A: We agree completely and will replace the heatmap with a multi-line plot or a stacked area chart to better visualize the temporal dynamics of influenza subtypes.

      We have preliminarily completed the redrawing of Figure 3A. The new and old versions are presented for your review to determine which figure is more suitable for this manuscript in terms of scientific accuracy and visual impact.

      Reviewer #2 (Public review):

      Weakness (1):

      The rationale of the study is not clearly stated.

      We appreciate the reviewer’s critique and acknowledge that the unique contribution of our study needs to be articulated more forcefully. Our introduction (lines 105-140) attempted to outline the limitations of existing studies, but we will revise it to be much sharper. The revised introduction will state unequivocally that our study’s rationale is to address a confluence of specific, unresolved gaps in the literature: 1) The persistent challenge of forecasting influenza in subtropical regions with their erratic seasonality; 2) The lack of studies that build subtype-specific models for Influenza A and B, which we show have distinct meteorological drivers; 3) The methodological gap in integrating the explanatory power of DLNM with the predictive power of a rigorously, Bayesian-optimized LSTM network; and 4) The unique opportunity to develop and test a model on data that encompasses the unprecedented disruption of the COVID-19 pandemic, a critical test of model robustness.

      Weakness (2):

      Several issues with methodological and data integration should be clarified.

      We interpret this as a general statement, with the specific issues detailed in the reviewer’s subsequent points and the “Recommendations for the authors” section. We will meticulously address each of these specific points in our revision. For instance, as a demonstration of our commitment to clarification, we will provide a much more detailed justification for our choice of benchmark model (ARIMA), as detailed in our response to Recommendation #11.

      Reviewer #2 (Recommendation  for the authors):

      The authors should justify why the baseline model selection was made by comparing the LSTM model only with ARIMA? How the outcomes could be sensitive to other commonly used machine learning methods, such as Random Forest or XGBoost, etc, as a benchmark for their performance.

      The reviewer raises a highly pertinent question regarding the selection of our benchmark model. A robust comparison is indeed essential for contextualizing the performance of our proposed LSTM network. Our choice to benchmark against the ARIMA model was a deliberate and principled decision, grounded in the specific literature of influenza forecasting at the intersection of climatology and epidemiology.

      In the revised manuscript, we will expand our justification within the Methods section and reinforce it in the Discussion. Our rationale is as follows:

      (1) ARIMA as the Established Standard: As we briefly noted in our original introduction (lines 110-113), the ARIMA model is arguably the most widely established and frequently cited statistical method for time-series forecasting of influenza incidence, including studies investigating meteorological drivers. It serves as the conventional benchmark against which novel methods in this specific domain are often evaluated. Therefore, demonstrating superiority over ARIMA is the most direct and scientifically relevant way to validate the incremental value of our deep learning approach.

      (2) A Focused Scientific Hypothesis: Our primary hypothesis was that the LSTM network, with its inherent ability to capture complex non-linearities and long-term dependencies, could overcome the documented limitations of linear autoregressive models like ARIMA in the context of climate-influenza dynamics. Our study was designed specifically to test this hypothesis.

      (3) Avoiding a “Bake-off” without a Clear Rationale: While other machine learning models like Random Forest or XGBoost are powerful, they are not established as the standard baseline in this particular niche of literature. Including them would shift the focus from a targeted comparison against the conventional standard to a broader, less focused “bake-off” of various algorithms. Such an exercise, while potentially interesting, would risk diluting the core message of our paper and would be undertaken without a clear, literature-driven hypothesis for why one of these specific tree-based models should be the next logical benchmark.

      Therefore, we will argue in the revised manuscript that our focused comparison with ARIMA provides the clearest and most meaningful assessment of our model’s contribution to the existing body of work on climate-informed influenza forecasting. We will, however, explicitly acknowledge in the Discussion that future work could indeed benefit from a broader comparative analysis as the field continues to evolve and adopt a wider array of machine learning techniques.

      Similarly, for some of the reviewer’s recommendations that do not require significant time and effort to implement, such as recommendation 7, we have also redrawn Figure 3 based on your feedback. It is provided for your review.

      Figure 3 presents the time series of the cases. I wonder whether the data for these factors and outcomes are daily or aggregated by week/month? I suggest representing it in 9x1 format with a single x-axis to compare, instead of 3x3 format. Authors can refer similar plot in https://doi.org/ 10.1371/journal.pcbi.1012311 in Figure 1.

      We are deeply grateful for the reviewer’s valuable suggestion and thoughtful provision of reference illustrations. Based on their input, we have redrawn Figure 3 and have included it for their review.

      Weakness (3):

      Validation of the models is not presented clearly.

      We were concerned by this comment and conducted a thorough self-assessment of our manuscript. We believe we have performed a multi-faceted validation, but we have evidently failed to present it with sufficient clarity and structure. Our validation strategy, detailed across the Methods and Results sections, includes:

      • Internal Out-of-Time Validation: Using 2023 data as a hold-out set to test the model trained on 2018-2022 data (lines 695-696, 705-710; Figure 6A, B).

      • External Validation: Testing the trained model on an independent dataset from a different city, Sanming (lines 730-736; Figure 6I, J).

      • Benchmark Model Comparison: Quantitatively comparing the LSTM’s performance against the standard ARIMA model using multiple error metrics (lines 737-745; Table 2).

      • Interpretability Validation (Sanity Check): Using SHAP analysis to ensure the model’s predictions are driven by epidemiologically plausible factors (lines 746-755; Figure 6E-H).

      To address the reviewer’s valid critique of our presentation, we will significantly restructure the relevant parts of the Results section. We will create explicit subheadings such as “Internal Validation,” “External Validation,” and “Comparative Performance against ARIMA Benchmark” to make our comprehensive validation process unambiguous and easy to follow.

      Weakness (4):

      The claim for providing tools for 'early warning' was not validated by analysis and results.

      We agree with this assessment entirely. This aligns with the eLife Assessment and comments from Reviewer #1. Our primary revision will be to systematically recalibrate the manuscript's language. We will replace all instances of “early warning tool” with more accurate and modest phrasing, such as “high-performance forecasting framework” or “a foundational model for future warning systems.” We will ensure that our revised title, abstract, and conclusions precisely reflect what our study has delivered: a robust predictive model, not a field-ready public health intervention tool.

    1. eLife Assessment

      This landmark manuscript comprehensively examines the roles of nine structural proteins in herpes simplex virus 1 (HSV-1) assembly and nuclear egress. By integrating cryo-light microscopy and soft X-ray tomography, the study presents an innovative approach to investigating viral assembly within cells. The research is thoroughly executed, yielding exceptional data that explain previously unknown functions expected to bear widespread influence. This work is of broad interest to virologists, cellular biologists, and structural biologists, offering a robust, contextually rich methodology for studying large protein complex assembly within the cellular environment, serving as an excellent starting point for high-resolution techniques.

    2. Reviewer #1 (Public review):

      Summary:

      Nahas et al. investigated the roles of herpes simplex virus 1 (HSV-1) structural proteins using correlative cryo-light microscopy and soft X-ray tomography. The authors generated nine viral variants with deletions or mutations in genes encoding structural proteins. They employed a chemical fixation-free approach to study native-like events during viral assembly, enabling observation of a wider field of view compared to cryo-ET. The study effectively combined virology, cell biology, and structural biology to investigate the roles of viral proteins in virus assembly and budding.

      Strengths:

      (1) The study presented a novel approach to studying viral assembly in cellulo.

      (2) The authors generated nine mutant viruses to investigate the roles of essential proteins in nuclear egress and cytoplasmic envelopment.

      (3) The use of correlative imaging with cryoSIM and cryoSXT allowed for the study of viral assembly in a near-native state and in 3D.

      (4) The study identified the roles of VP16, pUL16, pUL21, pUL34, and pUS3 in nuclear egress.

      (5) The authors demonstrated that deletion of VP16, pUL11, gE, pUL51, or gK inhibits cytoplasmic envelopment.

      (6) The manuscript is well-written, clearly describing findings, methods, and experimental design.

      (7) The figures and data presentation are of good quality.

      (8) The study effectively correlated light microscopy and X-ray tomography to follow virus assembly, providing a valuable approach for studying other viruses and cellular events.

      (9) The research is a valuable starting point for investigating viral assembly using more sophisticated methods like cryo-ET with FIB-milling.

      (10) The study proposes a detailed assembly mechanism and tracks the contributions of studied proteins to the assembly process.

      (11) The study includes all necessary controls and tests for the influence of fluorescent proteins.

      Weaknesses:

      Overall, the manuscript does not have any major weaknesses, just a few minor comments, which were mostly solved in the revised version of the manuscript.

      Comments on the latest version:

      I reviewed the responses and the updated manuscript, and I am very pleased with how the authors have revised it. The manuscript was already strong, but with the addition of the summary table and the separated images, it is now excellent.

    3. Reviewer #2 (Public review):

      Summary:

      For centuries, humans have been developing methods to see ever smaller objects, such as cells and their contents. This has included studies of viruses and their interactions with host cells during processes extending from virion structure to the complex interactions between viruses and their host cells: virion entry, virus replication and virion assembly, and release of newly constructed virions. Recent developments have enabled simultaneous application of fluorescence-based detection and intracellular localization of molecules of interest in the context of sub-micron resolution imaging of cellular structures by electron microscopy.

      The submission by Nahas et al., extends the state-of-the-art for visualization of important aspects of herpesvirus (HSV-1 in this instance) virion morphogenesis, a complex process that involves virus genome replication, and capsid assembly and filling in the nucleus, transport of the nascent nucleocapsid and some associated tegument proteins through the inner and outer nuclear membranes to the cytoplasm, orderly association of several thousand mostly viral proteins with the capsid to form the virion's tegument, envelopment of the tegumented capsid at a virus-tweaked secretory vesicle or at the plasma membrane, and release of mature virions at the plasma membrane.

      In this groundbreaking study, cells infected with HSV-1 mutants that express fluorescently tagged versions of capsid (eYFP-VP26) and tegument (gM-mCherry) proteins were visualized with 3D correlative structured illumination microscopy and X-ray tomography. The maturation and egress pathways thus illuminated were studied further in infections with fluorescently tagged viruses lacking one of nine viral proteins.

      Strengths:

      This outstanding paper meets the journal's definitions of Landmark, Fundamental, Important, Valuable, and Useful. The work is also Exceptional, Compelling, Convincing, and Solid. The work is a tour de force of classical and state-of-the-art molecular and cellular virology. Beautiful images accompanied by appropriate statistical analyses and excellent figures. The numerous complex issues addressed are explained in a clear and coordinated manner; the sum of what was learned is greater than the sum of the parts. Impacts go well beyond cytomegalovirus and the rest of the herpesviruses, to other viruses and cell biology in general.

      Comments on the latest version:

      This is a very nice paper. The authors responded affirmatively to the suggestions and questions of the reviewers.

    4. Reviewer #3 (Public review):

      Summary:

      Kamal L. Nahas et al. demonstrated that pUL16, pUL21, pUL34, VP16, and pUS3 are involved in the egress of the capsids from the nucleous, since mutant viruses ΔpUL16, ΔpUL21, ΔUL34, ΔVP16, and ΔUS3 HSV-1 show nuclear egress attenuation determined by measuring the nuclear:cytoplasmic ratio of the capsids, the dfParental, or the mutants. Then, they showed that gM-mCherry+ endomembrane association and capsid clustering were different in pUL11, pUL51, gE, gK, and VP16 mutants. Furthermore, the 3D view of cytoplasmic budding events suggests an envelopment mechanism where capsid budding into spherical/ellipsoidal vesicles drives the envelopment.

      Strengths:

      The authors employed both structured illumination microscopy and cellular ultrastructure analysis to examine the same infected cells, using cryo-soft-X-ray tomography to capture images. This combination, set here for the first time, enabled the authors to obtain holistic data regarding a biological process, as a viral assembly. Using this approach, the researchers studied various stages of HSV-1 assembly. For this, they constructed a dual-fluorescently labelled recombinant virus, consisting of eYFP-tagged capsids and mCherry-tagged envelopes, allowing for the independent identification of both unenveloped and enveloped particles. They then constructed nine mutants, each targeting a single viral protein known to be involved in nuclear egress and envelopment in the cytoplasm, using this dual-fluorescent as the parental one. The experimental setting, both the microscopic and the virological, is robust and well-controlled. The manuscript is well-written, and the data generated is robust and consistent with previous observations made in the field.

      I congratulate the authors. The work is robust, and I personally highlight the way they managed to include others' results merged among their own, providing a complete view of the story.

      Comments on the latest version:

      I reviewed the responses and the updated manuscript, and I agree with the reviewer's #1 words: "The manuscript was already strong, but with the addition of the summary table and the separated images, it is now excellent."

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Nahas et al. investigated the roles of herpes simplex virus 1 (HSV-1) structural proteins using correlative cryo-light microscopy and soft X-ray tomography. The authors generated nine viral variants with deletions or mutations in genes encoding structural proteins. They employed a chemical fixation-free approach to study native-like events during viral assembly, enabling observation of a wider field of view compared to cryo-ET. The study effectively combined virology, cell biology, and structural biology to investigate the roles of viral proteins in virus assembly and budding.

      Strengths:

      (1) The study presented a novel approach to studying viral assembly in cellulo.

      (2) The authors generated nine mutant viruses to investigate the roles of essential proteins in nuclear egress and cytoplasmic envelopment.

      (3) The use of correlative imaging with cryoSIM and cryoSXT allowed for the study of viral assembly in a near-native state and in 3D.

      (4) The study identified the roles of VP16, pUL16, pUL21, pUL34, and pUS3 in nuclear egress.

      (5) The authors demonstrated that deletion of VP16, pUL11, gE, pUL51, or gK inhibits cytoplasmic envelopment.

      (6) The manuscript is well-written, clearly describing findings, methods, and experimental design.

      (7) The figures and data presentation are of good quality.

      (8) The study effectively correlated light microscopy and X-ray tomography to follow virus assembly, providing a valuable approach for studying other viruses and cellular events.

      (9) The research is a valuable starting point for investigating viral assembly using more sophisticated methods like cryo-ET with FIB-milling.

      (10) The study proposes a detailed assembly mechanism and tracks the contributions of studied proteins to the assembly process.

      (11) The study includes all necessary controls and tests for the influence of fluorescent proteins.

      Weaknesses:

      Overall, the manuscript does not have any major weaknesses, just a few minor comments:

      (1) The gel quality in Figure 1 is inconsistent for different samples, with some bands not well resolved (e.g., for pUL11, GAPDH, or pUL20).

      We thank the reviewer for their suggestion. We tried to resolve the bands several times, but unfortunately this was the best outcome we could achieve.

      (2) The manuscript would benefit from a summary figure or table to concisely present the findings for each protein. It is a large body of manuscript, and a summary figure showing the discovered function would be great.

      We thank the reviewer for their suggestion. We have created a summary table (Table 2).

      (3) Figure 2 lacks clarity on the type of error bars used (range, standard error, or standard deviation). It says, however, range, and just checking if this is what the authors meant.

      We thank the reviewer for double-checking, but it is meant to be range, as reported in the legend. We used range because there are only two data points for each time point, which are insufficient to calculate standard deviation or standard error.

      (4) The manuscript could be improved by including details on how the plasma membrane boundary was estimated from the saturated gM-mCherry signal. An additional supplementary figure with the data showing the saturation used for the boundary definition would be helpful.

      We appreciate the suggestion and have included an example of how saturated gM-mCherry signal was used to delineate the cytoplasm in Supp. Fig. 4A.

      (5) Additional information or supplementary figures on the mask used to filter the YFP signal for Figure 4 would be helpful.

      Thanks, we have adapted the text in the results section to clarify: “eYFP-VP26 signal was manually inspected to determine threshold values that filtered out background and included pixels containing individual or clustered puncta that represent capsids.”

      (6) The figure legends could include information about which samples are used for comparison for significance calculations. As the colour of the brackets is different from the compared values (dUL34), it would be great to have this information in the figure legend.

      Thanks, we have adapted Fig. 4B to make the colour of the brackets match the colour used for the ΔUL34 mutant, and we have included labels next to the brackets for clarity. We have applied similar adjustments to Fig. 5D & E and Supp. Fig. 4C.

      (7) In Figure 5B, the association between YFP and mCherry signals is difficult to assess due to the abundance of mCherry signal; single-channel and combined images might improve visualization.

      Thanks, we have provided split and combined channel views in Supp. Fig. 4B to improve visualization.

      (8) In Figure 6D, staining for tubulin could help identify the cytoskeleton structures involved in the observed virus arrays.

      We thank the reviewer for their suggestion, which we think would be interesting future work to build on the current study. Given the competitive nature of access to the cryoSIM and cryoSXT, CLXT, including staining for tubulin was outside the scope of additional experiments we were able to conduct at this time.

      (9) It is unclear in Figure 6D if the microtubule-associated capsids are with the gM envelope or not, as the signal from mCherry is quite weak. It could be made clearer with the split signals to assess the presence of both viral components.

      We have provided split channels to the figure to aid with visualization.

      (10) The representation of voxel intensity in Figure 8 is somewhat confusing. Reversion of the voxel intensity representation to align brighter values with higher absorption, which would simplify interpretation.

      We thank the reviewer for this suggestion. In contrast to fluorescence microscopy where high intensities reflect signal, low intensities represent signal (absorbance of X-rays) in cryoSXT. We respectfully decided not to reverse the values, as we believe that could cause more confusion. We have instead added a black-to-white gradient bar to illustrate that low voxel intensities correspond to dark signal in Fig 8.

      (11) The visualization in panel I of Figure 8 might benefit from a more divergent colormap to better show the variation in X-ray absorbance.

      We thank the reviewer for their suggestion. We experimented with a few different colour schemes but concluded that the current one produced the clearest results and was most accessible for color-blind viewers.

      (12) Figure 9 would be enhanced by images showing the different virus sizes measured for the comparative study, which would help assess the size differences between different assembly stages.

      We thank the reviewer for their suggestion and have included images to accompany the graph.

      Overall, this is an excellent manuscript and an enjoyable read. It would be interesting to see this approach applied to the study of other viruses, providing valuable insights before progressing to high-resolution methods.

      Reviewer #2 (Public review):

      Summary:

      For centuries, humans have been developing methods to see ever smaller objects, such as cells and their contents. This has included studies of viruses and their interactions with host cells during processes extending from virion structure to the complex interactions between viruses and their host cells: virion entry, virus replication and virion assembly, and release of newly constructed virions. Recent developments have enabled simultaneous application of fluorescence-based detection and intracellular localization of molecules of interest in the context of sub-micron resolution imaging of cellular structures by electron microscopy.

      The submission by Nahas et al., extends the state-of-the-art for visualization of important aspects of herpesvirus (HSV-1 in this instance) virion morphogenesis, a complex process that involves virus genome replication, and capsid assembly and filling in the nucleus, transport of the nascent nucleocapsid and some associated tegument proteins through the inner and outer nuclear membranes to the cytoplasm, orderly association of several thousand mostly viral proteins with the capsid to form the virion's tegument, envelopment of the tegumented capsid at a virus-tweaked secretory vesicle or at the plasma membrane, and release of mature virions at the plasma membrane.

      In this groundbreaking study, cells infected with HSV-1 mutants that express fluorescently tagged versions of capsid (eYFP-VP26) and tegument (gM-mCherry) proteins were visualized with 3D correlative structured illumination microscopy and X-ray tomography. The maturation and egress pathways thus illuminated were studied further in infections with fluorescently tagged viruses lacking one of nine viral proteins.

      Strengths:

      This outstanding paper meets the journal's definitions of Landmark, Fundamental, Important, Valuable, and Useful. The work is also Exceptional, Compelling, Convincing, and Solid. The work is a tour de force of classical and state-of-the-art molecular and cellular virology. Beautiful images accompanied by appropriate statistical analyses and excellent figures. The numerous complex issues addressed are explained in a clear and coordinated manner; the sum of what was learned is greater than the sum of the parts. Impacts go well beyond cytomegalovirus and the rest of the herpesviruses, to other viruses and cell biology in general.

      Reviewer #3 (Public review):

      Summary:

      Kamal L. Nahas et al. demonstrated that pUL16, pUL21, pUL34, VP16, and pUS3 are involved in the egress of the capsids from the nucleous, since mutant viruses ΔpUL16, ΔpUL21, ΔUL34, ΔVP16, and ΔUS3 HSV-1 show nuclear egress attenuation determined by measuring the nuclear:cytoplasmic ratio of the capsids, the dfParental, or the mutants. Then, they showed that gM-mCherry+ endomembrane association and capsid clustering were different in pUL11, pUL51, gE, gK, and VP16 mutants. Furthermore, the 3D view of cytoplasmic budding events suggests an envelopment mechanism where capsid budding into spherical/ellipsoidal vesicles drives the envelopment.

      Strengths:

      The authors employed both structured illumination microscopy and cellular ultrastructure analysis to examine the same infected cells, using cryo-soft-X-ray tomography to capture images. This combination, set here for the first time, enabled the authors to obtain holistic data regarding a biological process, as a viral assembly. Using this approach, the researchers studied various stages of HSV-1 assembly. For this, they constructed a dual-fluorescently labelled recombinant virus, consisting of eYFP-tagged capsids and mCherry-tagged envelopes, allowing for the independent identification of both unenveloped and enveloped particles. They then constructed nine mutants, each targeting a single viral protein known to be involved in nuclear egress and envelopment in the cytoplasm, using this dual-fluorescent as the parental one. The experimental setting, both the microscopic and the virological, is robust and well-controlled. The manuscript is well-written, and the data generated is robust and consistent with previous observations made in the field.

      Weaknesses:

      It would be helpful to find out what role the targeted proteins play in nuclear egress or envelopment acquisition in a different orthoherpesvirus, like HSV-2. This would confirm the suitability of the technical approach set and would also act as a way to validate their mechanism at least in one additional herpesvirus beyond HSV-1. So, using the current manuscript as a starting point and for future studies, it would be advisable to focus on the protein functions of other viruses and compare them.

      We appreciate the suggestion and agree that this would be a great starting point for future studies. At present, we do not have a panel of mutant viruses in HSV-2 or another orthoherpesvirus, and it would be significant work to generate them, so we consider this outside the scope of the current study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) There are enough uncommon abbreviations in the text to justify the inclusion of an abbreviation list.

      We thank the reviewer for the suggestion, but we define all uncommon abbreviations at first mention and an abbreviations list is not part of eLife’s house style.

      (2) The complex paragraph on p. 7 would be much easier to digest if broken into smaller chunks. Consider similar treatment for other lengthy landmark-free blocks of text, e.g., the one that begins on p. 14. Subheadings would help.

      We thank the reviewer for this suggestion. We have divided large paragraphs into more easily digestible chunks throughout the manuscript, for example in the discussion where the previous monolithic 3rd paragraph has been divided into five shorter, focussed paragraphs.

      (3) Table 1 needs units.

      We thank the reviewer for noticing our omission and apologise for the oversight - the table has been updated accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) Toward the end of the manuscript, I missed some lines attempting to speculate on the origin/nature of the spherical/ellipsoidal vesicles providing the envelopment. Would it be possible to incorporate this in the Discussion section?

      Thank you for noticing that omission. We have now included a few lines speculating that they may represent recycling endosomes, trans-Golgi network vesicles, or a hybrid compartment.

      (2) I congratulate the authors. The work is robust, and I personally highlight the way they managed to include others' results merged with their own, providing a complete view of the story.

      We thank the reviewer for their kind words.

      Note to editors

      In addition to these responses to the reviewer’s comments, we have also now included in the methods section details of the Tracking of Indels by Decomposition (TIDE) analysis we performed (data in Supplementary Figure 3) that was omitted by mistake from the original submission.

    1. eLife Assessment

      The ratio of nuclei to cell volume is a well-controlled parameter in eukaryotic cells. This study now reports important findings that expand our understanding of the regulatory relationship between cell size and number of nuclei. The evidence supporting the conclusions is convincing obtained by applying appropriate and validated methodology in line with current state-of-the-art. The paper will be of broad interest for cell biologists and fungal biotechnologists seeking to understand mechanisms determining cell size and number of nuclei and why this knowledge might also be of importance for the production of enzymes and thus production strains not only of Aspergillus oryzae but also other industrially used fungi.

    2. Reviewer #1 (Public review):

      Filamentous fungi are established work horses in biotechnology with Aspergillus oryzae as a prominent example with a thousand-year of history. Still the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlate it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase of ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.

      The methods used in the paper range from high quality cell biology, Raman spectroscopy to atomic force and electron microscopy and from laser microdissection to the use of microfluidic devices to study individual hyphae.

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology.

      Comments on revised version:

      The authors addressed all suggestions satisfactorily.

    3. Reviewer #2 (Public review):

      Summary:

      In the study presented by Itani and colleagues it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels and the tor regulatory cascade in regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei was also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains which is of significant interest for fungal biotechnology.

      Strengths:

      The study is very comprehensive and involves application of divers state-of-the-art cell biological, biochemical and genetic methods. Overall, the data are properly controlled and analyzed, and the figures and movies are of excellent quality.The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and the number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.

      In the revision the authors addressed all my comments and as a result produced an even stronger study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.

      Strengths:

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.

      Weaknesses:

      The authors addressed all suggestions satisfactorily.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all suggestions satisfactorily. 

      Reviewer #2 (Recommendations for the authors):

      The authors have adequately dealt with the comments. 

      Reviewer #3 (Recommendations for the authors):

      (1) Line 157. Although the authors have added a statement acknowledging that addition of YE increased hyphal width and secretion in A. nidulans without increasing nuclear number, they have not indicated how this result might impact their model. It might just boil down to variation between the different Aspergilli, but it merits attention. 

      (2) Line 341. To extend the argument, you might consider adding this citation (https://elifesciences.org/articles/76075), which provides evidence that nuclear size might scale with osmotic pressure based on the density of macromolecules in the nucleus vs. cytoplasm.

      Thanks for the suggestion.

      L341 This is likely related to the phenomenon in which a decrease in cell size is accompanied by a reduction in nuclear size (66).

      (3) Line 343. Neurospora crass hyphal cells can exceed 100 nuclei... 

      Changed.

    1. eLife Assessment

      This study presents a valuable finding regarding the role of Arp2/3 and the actin nucleators N-WASP and WAVE complexes in myoblast fusion. The data presented is convincing, and the work will be of interest to biologists studying skeletal muscle stem cell biology in the context of skeletal muscle regeneration.

    2. Reviewer #1 (Public review):

      Overall, the manuscript reveals the role for actin polymerization to drive fusion of myoblasts during adult muscle regeneration. This pathway regulates fusion in many contexts, but whether it was conserved in adult muscle regeneration remained unknown. Robust genetic tools and histological analyses were used to convincingly support the claims.

    3. Reviewer #2 (Public review):

      To fuse, differentiated muscle cells must rearrange their cytoskeleton and assemble actin-enriched cytoskeletal structures. These actin foci are proposed to generate mechanical forces necessary to drive close membrane apposition and the fusion pore formation. While the study of these actin-rich structures has been conducted mainly in drosophila and in vertebrate embryonic development, the present manuscript present clear evidence this mechanism is necessary for fusion of adult muscle stem cells in vivo, in mice. The data presented here clearly demonstrate that ARP2/3 and SCAR/WAVE complexes are required for differentiating satellite cells fusion into multinucleated myotubes, during skeletal muscle regeneration.

    4. Reviewer #3 (Public review):

      The authors have satisfactorily addressed my inquiries. However, I had to look quite hard to find where they responded to my final comment regarding the potential role of Arpc2 post-fusion during myofiber growth and/or maintenance, which I eventually located on page 7. I would appreciate it if the authors could state this point more explicitly, perhaps by adding a sentence such as "However, we cannot rule out the possibility that Arpc2 may also play a role in....." to improve clarity of communication.

      While I understood from the original version that this issue falls beyond the immediate scope of the study, I believe it is important to adopt a more cautious and rigorous interpretative framework, especially given the widespread use of this experimental approach. In particular, when a gene could potentially have additional roles in myofibers, it may be helpful to explicitly acknowledge that possibility. Even if Arpc2 may not necessarily be one of them, such roles cannot be fully excluded without direct testing.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Overall, the manuscript reveals the role of actin polymerization to drive the fusion of myoblasts during adult muscle regeneration. This pathway regulates fusion in many contexts, but whether it was conserved in adult muscle regeneration remained unknown. Robust genetic tools and histological analyses were used to support the claims convincingly. 

      We very much appreciate the positive comments from this Reviewer.

      There are a few interpretations that could be adjusted. 

      The beginning of the results about macrophages traversing ghost fibers after regeneration was a surprise given the context in the abstract and introduction. These results also lead to new questions about this biology that would need to be answered to substantiate the claims in this section. Also, it is unclear the precise new information learned here because it seems obvious that macrophages would need to extravasate the basement membrane to enter ghost fibers and macrophages are known to have this ability. Moreover, the model in Figure 4D has macrophages and BM but there is not even mention of this in the legend. The authors may wish to consider removing this topic from the manuscript. 

      We appreciate this comment and acknowledge that the precise behavior of macrophages when they infiltrate and/or exit the ghost fibers during muscle regeneration is not the major focus of this study. However, we think that visualizing macrophages squeezing through tiny openings on the basement membrane to infiltrate and/or exit from the ghost fibers is valuable. Thus, we have moved the data from the original main Figure 2 to the new Figure S1. 

      Regarding the model in Figure 4D, we have removed the macrophages because the depicted model represents a stage after the macrophages’ exit from the ghost fiber. 

      Which Pax7CreER line was used? In the methods, the Jax number provided is the Gaka line but in the results, Lepper et al 2009 are cited, which is not the citation for the Gaka line. 

      The Pax7<sup>CreER</sup> line used in this study is the one generated in Lepper et al. 2009. We corrected this information in “Material and Methods” of the revised manuscript. 

      Did the authors assess regeneration in the floxed mice that do not contain Cre as a control? Or is it known these alleles do not perturb the function of the targeted gene? 

      We examined muscle regeneration in the floxed mice without Cre. As shown in Figure 1 below, none of the homozygous ArpC2<sup>fl/fl</sup>, N-WASP<sup>fl/fl</sup>, CYFIP1<sup>fl/fl</sup> or N-WASP<sup>fl/fl</sup>;CYFIP1<sup>fl/fl</sup> alleles affected  muscle regeneration, indicating that these alleles do not perturb the function of the targeted gene.  

      Author response image 1.

      The muscle regeneration was normal in mice with only floxed target gene(s). Cross sections of TA muscles were stained with anti-Dystrophin and DAPI at dpi 14. n = 3 mice of each genotype, and > 80 ghost fibers in each mouse were examined. Mean ± s.d. values are shown in the dot-bar plot, and significance was determined by two-tailed student’s t-test. ns: not significant. Scale bar: 100 μm.

      The authors comment: 'Interestingly, expression of the fusogenic proteins, MymK and MymX, was up-regulated in the TA muscle of these mice (Figure S4F), suggesting that fusogen overexpression is not able to rescue the SCM fusion defect resulted from defective branched actin polymerization.' It is unclear if fusogens are truly overexpressed because the analysis is performed at dpi 4 when the expression of fusogens may be decreased in control mice because they have already fused. Also, only two animals were analyzed and it is unclear if MymX is definitively increased. The authors should consider adjusting the interpretation to SCM fusion defect resulting from defective branched actin polymerization is unlikely to be caused by a lack of fusogen expression. 

      We agree with the Reviewer that fusogen expression may simply persist till later time points in fusion mutants without being up-regulated. We have modified our interpretation according to the Reviewer’s suggestion. 

      Regarding the western blots in the original Figure S4F, we now show one experiment from each genotype, and include the quantification of MymK and MymX protein levels from 3 animals in the revised manuscript (new Figure S5F-S5H). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The ArpC2 cKO data could be presented in a clearer fashion. In the text, ArpC2 is discussed but in the figure, there are many other KOs presented and ArpC2 is the fourth one shown in the figure. The other KOs are discussed later. It may be worthwhile for the authors to rearrange the figures to make it easier for readers. 

      Thank you for this suggestion. We have rearranged the genotypes in the figures accordingly and placed ArpC2 cKO first. 

      The authors comment: 'Since SCM fusion is mostly completed at dpi 4.5 (Figure 1B) (Collins et al. 2024)'. This is not an accurate statement of the cited paper. While myofibers are formed by dpi 4.5 with centralized nuclei, there are additional fusion events through at least 21dpi. The authors should adjust their statement to better reflect the data in Collins et al 2024, which could include mentioning that primary fusions could be completed at dpi 4.5 and this is the process they are studying. 

      We have adjusted our statement accordingly in the revised manuscript.

      The authors comment: 'Consistent with this, the frequency distribution of SCM number per ghost fiber displayed a dramatic shift toward higher numbers in the ArpC2<sup>cKO</sup> mice (Figure S5C). These results indicate that the actin cytoskeleton plays an essential role in SCM fusion as the fusogenic proteins. Should it read 'These results indicate that the actin cytoskeleton plays AS an essential role in SCM fusion as the fusogenic proteins'? 

      Yes, and we adjusted this statement accordingly in the revised manuscript. 

      Minor comments 

      (1) In the results the authors state 'To induce genetic deletion of ArpC2 in satellites....'; 'satellites' is a term not typically used for satellite cells. 

      Thanks for catching this. We changed “satellites” to satellite cells.

      (2) In the next sentence, the satellite should be capitalized. 

      Done.

      (3) The cross-section area should be a 'cross-sectional area'. 

      Changed.

      Reviewer #2 (Public review):

      To fuse, differentiated muscle cells must rearrange their cytoskeleton and assemble actinenriched cytoskeletal structures. These actin foci are proposed to generate mechanical forces necessary to drive close membrane apposition and fusion pore formation. 

      While the study of these actin-rich structures has been conducted mainly in drosophila, the present manuscript presents clear evidence this mechanism is necessary for the fusion of adult muscle stem cells in vivo, in mice. 

      We thank this Reviewer for the positive comment.

      However, the authors need to tone down their interpretation of their findings and remember that genetic proof for cytoskeletal actin remodeling to allow muscle fusion in mice has already been provided by different labs (Vasyutina E, et al. 2009 PMID: 19443691; Gruenbaum-Cohen Y, et al., 2012 PMID: 22736793; Hamoud et al., 2014 PMID: 24567399). In the same line of thought, the authors write they "demonstrated a critical function of branched actin-propelled invasive protrusions in skeletal muscle regeneration". I believe this is not a premiere, since Randrianarison-Huetz V, et al., previously reported the existence of finger-like actin-based protrusions at fusion sites in mice myoblasts (PMID: 2926942) and Eigler T, et al., live-recorded said "fusogenic synapse" in mice myoblasts (PMID: 34932950). Hence, while the data presented here clearly demonstrate that ARP2/3 and SCAR/WAVE complexes are required for differentiating satellite cell fusion into multinucleated myotubes, this is an incremental story, and the authors should put their results in the context of previous literature. 

      In this study, we focused on elucidating the mechanisms of myoblast fusion during skeletal muscle regeneration, which remained largely unknown. Thus, we respectfully disagree with this Reviewer that “this is an incremental story” for the following reasons – 

      First, while we agree with this Reviewer that “genetic proof for cytoskeletal actin remodeling to allow muscle fusion in mice has already been provided by different labs”, most of the previous genetic studies, including ours (Lu et al. 2024), characterizing the roles of actin regulators (Elmo, Dock180, Rac, Cdc42, WASP, WIP, WAVE, Arp2/3) in mouse myoblast fusion were conducted during embryogenesis (Laurin et al. 2008; Vasyutina et al. 2009; Gruenbaum-Cohen et al. 2012; Tran et al. 2022; Lu et al. 2024), instead of during adult muscle regeneration, the latter of which is the focus of this study. 

      Second, prior to this study, several groups tested the roles of SRF, CaMKII theta and gemma, Myo10, and Elmo, which affect actin cytoskeletal dynamics, in muscle regeneration. These studies have shown that knocking out SRF, CaMKII, Myo10, or Elmo caused defects in mouse muscle regeneration, based on measuring the cross-sectional diameters of regenerated myofibers only (Randrianarison-Huetz et al. 2018; Eigler et al. 2021; Hammers et al. 2021; Tran et al. 2022). However, none of these studies visualized myoblast fusion at the cellular and subcellular levels during muscle regeneration in vivo. For this reason, it remained unclear whether the muscle regeneration defects in these mutants were indeed due to defects in myoblast fusion, in particular, defects in the formation of invasive protrusions at the fusogenic synapse. Thus, the previous studies did not demonstrate a direct role for the actin cytoskeleton, as well as the underlying mechanisms, in myoblast fusion during muscle regeneration in vivo.

      Third, regarding actin-propelled invasive protrusions at the fusogenic synapse, our previous study (Lu et al. 2024) revealed these structures by fluorescent live cell imaging and electron microscopy (EM) in cultured muscle cells, as well as EM studies in mouse embryonic limb muscle, firmly establishing a direct role for invasive protrusions in mouse myoblast fusion in cultured muscle cells and during embryonic development. Randrianarison-Huetz et al. (2018) reported the existence of finger-like actin-based protrusions at cell contact sites of cultured mouse myoblasts. It was unclear from their study, however, if these protrusions were at the actual fusion sites and if they were invasive (Randrianarison-Huetz et al. 2018). Eigler et al. (2021) reported protrusions at fusogenic synapse in cultured mouse myoblasts. It was unclear from their study, however, if the protrusions were actin-based and if they were invasive (Eigler et al. 2021). Neither Randrianarison-Huetz et al. (2018) nor Eigler et al. (2021) characterized protrusions in developing mouse embryos or regenerating adult muscle. 

      Taken together, to our knowledge, this is the first study to characterize myoblast fusion at the cellular and subcellular level during mouse muscle regeneration. We demonstrate that branched actin polymerization promotes invasive protrusion formation and myoblast fusion during the regeneration process. We believe that this work has laid the foundation for additional mechanistic studies of myoblast fusion during skeletal muscle regeneration.

      The citations in the original manuscript were primarily focused on previous in vivo studies of Arp2/3 and the actin nucleation-promoting factors (NPFs), N-WASP and WAVE (Richardson et al. 2007; Gruenbaum-Cohen et al. 2012), and of invasive protrusions mediating myoblast fusion in intact animals (Drosophila, zebrafish and mice) (Sens et al. 2010; Luo et al. 2022; Lu et al. 2024). We agree with this reviewer, however, that it would be beneficial to the readers if we provide a more comprehensive summary of previous literature, including studies of both intact animals and cultured cells, as well as studies of additional actin regulators upstream of the NPFs, such as small GTPases and their GEFs. Thus, we have significantly expanded our Introduction to include these studies and cited the corresponding literature in the revised manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I am concerned that the authors did not evaluate the efficiency of the target allele deletion efficiency following Pax7-CreER activation. The majority, if not all, of the published work focusing on this genetic strategy presents the knock-down efficiency using either genotyping PCR, immunolocalization, western-blot; etc... 

      (2) Can the authors provide evidence that the N-WASP, CYFIP1, and ARPC2 proteins are depleted in TAM-treated tissue? Alternatively, can the author perform RT-qPCR on freshly isolated MuSCs to validate the absence of N-WASP, CYFIP1, and ARPC2 mRNA expression?

      Thank you for these comments. We have assessed the target allele deletion efficiency with isolated satellite cells from TAM-injected mice in which Pax7-CreER is activated. Western blot analyses showed that the protein levels of N-WASP, CYFIP1, and ArpC2 significantly decreased in the satellite cells of knockout mice. Please see the new Figure S2.

      Reviewer #3 (Public review): 

      The manuscript by Lu et al. explores the role of the Arp2/3 complex and the actin nucleators NWASP and WAVE in myoblast fusion during muscle regeneration. The results are clear and compelling, effectively supporting the main claims of the study. However, the manuscript could benefit from a more detailed molecular and cellular analysis of the fusion synapse. Additionally, while the description of macrophage extravasation from ghost fibers is intriguing, it seems somewhat disconnected from the primary focus of the work. 

      Despite this, the data are robust, and the major conclusions are well supported. Understanding muscle fusion mechanism is still a widely unexplored topic in the field and the authors make important progress in this domain. 

      We appreciate the positive comments from this Reviewer.

      We agree with this Reviewer and Reviewer #1 that the macrophage study is not the primary focus of the work. However, we think that visualizing macrophages squeezing through tiny openings on the basement membrane to infiltrate and/or exit from the ghost fibers is valuable. Thus, we have moved the data from the original main Figure 2 to the new Figure S1. 

      I have a few suggestions that might strengthen the manuscript as outlined below.  

      (1) Could the authors provide more detail on how they defined cells with "invasive protrusions" in Figure 4C? Membrane blebs are commonly observed in contacting cells, so it would be important to clarify the criteria used for counting this specific event. 

      Thanks for this suggestion. We define invasive protrusions as finger-like protrusions projected by a cell into its fusion partner. Based on our previous studies (Sens et al. 2010; Luo et al. 2022; Lu et al. 2024), these invasive protrusions are narrow (with 100-250 nm diameters) and propelled by mechanically stiff actin bundles. In contrast, membrane blebs are spherical protrusions formed by the detachment of the plasma membrane from the underlying actin cytoskeleton. In general, the blebs are not as mechanically stiff as invasive protrusions and would not be able to project into neighboring cells. Thus, we do not think that the protrusions in Figure 4B are membrane blebs. We clarified the criteria in the text and figure legends of the revised manuscript.

      (2) Along the same line, please clarify what each individual dot represents in Figure 4C. The authors mention quantifying approximately 83 SCMs from 20 fibers. I assume each dot corresponds to data from individual fibers, but if that's the case, does this imply that only around four SCMs were quantified per fiber? A more detailed explanation would be helpful. 

      To quantitatively assess invasive protrusions in Ctrl and mutant mice, we analyzed 20 randomly selected ghost fibers per genotype. Within each ghost fiber, we examined randomly selected SCMs in a single cross section (a total of 83, 147 and 93 SCMs in Ctrl, ArpC2<sup>cKO</sup> and MymX<sup>cKO</sup> mice were examined, respectively). 

      In Figure 4C, each dot was intended to represent the percentage of SCMs with invasive protrusions in a single cross section of a ghost fiber. However, we mistakenly inserted a wrong graph in the original Figure 4C. We sincerely apologize for this error and have replaced it with the correct graph in the new Figure 4C.

      (3) Localizing ArpC2 at the invasive protrusions would be a strong addition to this study. Furthermore, have the authors examined the localization of Myomaker and Myomixer in ArpC2 mutant cells? This could provide insights into potential disruptions in the fusion machinery.

      We have examined the localization of the Arp2/3 complex on the invasive protrusions in cultured SCMs and included the data in Figure 4A of the original manuscript. Specifically, we showed enrichment of mNeongreen-tagged Arp2, a subunit of the Arp2/3 complex, on the invasive protrusions at the fusogenic synapse of cultured SCMs (see the enlarged panels on the right; also see supplemental video 4). The small size of the invasive protrusions on SCMs prevented a detailed analysis of the precise Arp2 localization along the protrusions.  Please see our recently published paper (Lu et al. 2024) for the detailed localization and function of the Arp2/3 complex during invasive protrusion formation in cultured C2C12 cells. 

      We have also attempted to localize the Arp2/3 complex in the regenerating muscle in vivo using an anti-ArpC2 antibody (Millipore, 07-227-I), which was used in many studies to visualize the Arp2/3 complex in cultured cells. Unfortunately, the antibody detected non-specific signals in the regenerating TA muscle of the ArpC2<sup>cKO</sup> animals. Thus, it cannot be used to detect specific ArpC2 signals in muscle tissues. Besides the specificity issue of the antibody, it is technically challenging to visualize invasive protrusions with an F-actin probe at the fusogenic synapses of regenerating muscle by light microscopy, due to the high background of F-actin signaling within the muscle cells. 

      Regarding the fusogens, we show that both are present in the TA muscle of the ArpC2<sup>cKO</sup> animals by western blot (Figure S5F-S5H). Thus, the fusion defect in these animals is not due to the lack of fusogen expression. Since the focus of this study is on the role of the actin cytoskeleton in muscle regeneration, the subcellular localization of the fusogens was not investigated in the current study. 

      (4) As a minor curiosity, can ArpC2 WT and mutant cells fuse with each other?

      Our previous work in Drosophila embryos showed that Arp2/3-mediated branched actin polymerization is required in both the invading and receiving fusion partners (Sens et al. 2010).  To address this question in mouse muscle cells, we co-cultured GFP<sup>+</sup> WT cells with mScarleti<sup>+</sup> WT (or mScarleti<sup>+</sup> ArpC2<sup>cKO</sup> cells) in vitro and assessed their ability to fuse with one another. We found that ArpC2<sup>cKO</sup> cells could barely fuse with WT cells (new Figure 3F and 3G), indicating that the Arp2/3-mediated branched actin polymerization is required in both fusion partners. This result is consistent with our findings in Drosophila embryos. 

      (5) The authors report a strong reduction in CSA at 14 dpi and 28 dpi, attributing this defect primarily to failed myoblast fusion. Although this claim is supported by observations at early time points, I wonder whether the Arp2/3 complex might also play roles in myofibers after fusion. For instance, Arp2/3 could be required for the growth or maintenance of healthy myofibers, which could also contribute to the reduced CSA observed, since regenerated myofibers inherit the ArpC2 knockout from the stem cells. Could the authors address or exclude this possibility? This is rather a broader criticism of how things are being interpreted in general beyond this paper. 

      This is an interesting question. It is possible that Arp2/3 may play a role in the growth or maintenance of healthy myofibers. However, the muscle injury and regeneration process may not be the best system to address this question because of the indispensable early step of myoblast fusion. Ideally, one may want to knockout Arp2/3 in myofibers of young healthy mice and observe fiber growth in the absence of muscle injury and compare that to the wild-type littermates. Since these experiments are out of the scope of this study, we revised our conclusion that the fusion defect in ArpC2<sup>cKO</sup> mice should account, at least in part, for the strong reduction in CSA at 14 dpi and 28 dpi, without excluding additional possibilities such as Arp2/3’s potential role in the growth or maintenance of healthy myofibers.  

      References:

      Eigler T, Zarfati G, Amzallag E, Sinha S, Segev N, Zabary Y, Zaritsky A, Shakked A, Umansky KB, Schejter ED et al. 2021. ERK1/2 inhibition promotes robust myotube growth via CaMKII activation resulting in myoblast-to-myotube fusion. Dev Cell 56: 3349-3363 e3346.

      Gruenbaum-Cohen Y, Harel I, Umansky KB, Tzahor E, Snapper SB, Shilo BZ, Schejter ED. 2012. The actin regulator N-WASp is required for muscle-cell fusion in mice. Proc Natl Acad Sci U S A 109: 11211-11216.

      Hammers DW, Hart CC, Matheny MK, Heimsath EG, Lee YI, Hammer JA, 3rd, Cheney RE, Sweeney HL. 2021. Filopodia powered by class x myosin promote fusion of mammalian myoblasts. Elife 10.

      Laurin M, Fradet N, Blangy A, Hall A, Vuori K, Cote JF. 2008. The atypical Rac activator Dock180 (Dock1) regulates myoblast fusion in vivo. Proc Natl Acad Sci U S A 105: 15446-15451.

      Lu Y, Walji T, Ravaux B, Pandey P, Yang C, Li B, Luvsanjav D, Lam KH, Zhang R, Luo Z et al. 2024. Spatiotemporal coordination of actin regulators generates invasive protrusions in cell-cell fusion. Nat Cell Biol 26: 1860-1877.

      Luo Z, Shi J, Pandey P, Ruan ZR, Sevdali M, Bu Y, Lu Y, Du S, Chen EH. 2022. The cellular architecture and molecular determinants of the zebrafish fusogenic synapse. Dev Cell 57: 1582-1597 e1586.

      Randrianarison-Huetz V, Papaefthymiou A, Herledan G, Noviello C, Faradova U, Collard L, Pincini A, Schol E, Decaux JF, Maire P et al. 2018. Srf controls satellite cell fusion through the maintenance of actin architecture. J Cell Biol 217: 685-700.

      Richardson BE, Beckett K, Nowak SJ, Baylies MK. 2007. SCAR/WAVE and Arp2/3 are crucial for cytoskeletal remodeling at the site of myoblast fusion. Development 134: 4357-4367.

      Sens KL, Zhang S, Jin P, Duan R, Zhang G, Luo F, Parachini L, Chen EH. 2010. An invasive podosome-like structure promotes fusion pore formation during myoblast fusion. J Cell Biol 191: 1013-1027.

      Tran V, Nahle S, Robert A, Desanlis I, Killoran R, Ehresmann S, Thibault MP, Barford D, Ravichandran KS, Sauvageau M et al. 2022. Biasing the conformation of ELMO2 reveals that myoblast fusion can be exploited to improve muscle regeneration. Nat Commun 13: 7077.

      Vasyutina E, Martarelli B, Brakebusch C, Wende H, Birchmeier C. 2009. The small G-proteins Rac1 and Cdc42 are essential for myoblast fusion in the mouse. Proc Natl Acad Sci U S A 106: 8935-8940.

    1. eLife Assessment

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The revised manuscript presents convincing evidence that the location of synapses on dendritic branches, as well as synaptic plasticity of excitatory and inhibitory synapses, influences the ability of a neuron to discriminate combinations of sensory stimuli. The ideas in this work are very interesting, presenting an important direction in the computational neuroscience field about how to harness the computational power of "active dendrites" for solving learning tasks.

    2. Reviewer #1 (Public review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." In the absence of inhibitory plasticity, the proposed mechanisms result in good performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Interestingly, adding inhibitory plasticity improves classification performance even when input features are randomly distributed.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation.

    3. Reviewer #2 (Public review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." In the absence of inhibitory plasticity, the proposed mechanisms result in good performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Interestingly, adding inhibitory plasticity improves classification performance even when input features are randomly distributed.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation.

      We greatly appreciate your recognition of the study’s integrative scope and the challenges of linking detailed biophysics to high-level computation. We acknowledge that the model’s complexity can obscure the contribution of individual components. However, as stated in the introduction the principles already have been shown in simplified theoretical models for instance  in Tran-Van-Minh et al. 2015. Our aim here was to extend those ideas into a more biologically detailed setting to test whether the same principles still hold under realistic constraints. While simplification can aid intuition, we believe that demonstrating these effects in a biophysically grounded model strengthens the overall conclusion. We agree that further comparisons with reduced models would be valuable for isolating the contribution of specific components and plan to explore that in future work.  

      Reviewer #2 (Public review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Thank you for highlighting the biological plausibility of our calcium- and dopamine-dependent learning rule and its ability to exploit dendritic nonlinearities. Your positive assessment reinforces our commitment to refining the rule and exploring its implications in larger, more diverse settings.

      Reviewer #1 (Recommendations for the authors):

      Major recommendations:

      P9: When introducing the excitatory learning rule, the reader is referred to the Methods. I suggest moving Figure 7A-D, "Excitatory plasticity" to be more prominently presented in the main body of the paper where the reader needs to understand it. There are errors in the current Figure 7, and wrong/confusing acronyms. The abbreviations "LTP-K" and "MP-K" are not intuitive. In A, I would spell out "LTP kernel" and "Theta_LTP adaptation".  In B, I would spell out "LTD kernel" and "Theta_LTD adaptation".

      We have clarified the terminology in Figure 7 by replacing “LTP-K” with “LTP kernel” and “MP-K” with “metaplasticity kernel”.  While we kept Figure 7 in the Methods section to maintain the flow of the main text, we agree that an earlier introduction of the learning rule improves clarity. To that end, we added a simplified schematic to Figure 3 in the Results section, which provides readers with an accessible overview of the excitatory plasticity mechanism at the point where it is first introduced.

      In C, for simplicity and clarity, I would only show the initial and updated LTP kernel and Calcium and remove the Theta_LTP adaptation curve, it's too busy and not necessary. Similarly in D, I would show only the initial and updated LTD kernel and Calcium and remove the Theta_LTD adaptation curve. In the current version of the Figure, panel B, right incorrectly labels "Theta_LTD" as "Theta_LTP". Panel D incorrectly labels "LTD kernel" as "LTP/MP-K" in the subheading and "MP/LTP-K" in the graph.

      To avoid confusion and better illustrate the interactions between calcium signals, kernels, and thresholds, we have added a movie showing how these components evolve during learning. The figure panels remain as originally designed, since the LTP kernel governs both potentiation and depression through metaplastic threshold adaptation, while the LTD kernel remains fixed.

      P17: Again, instead of pointing the reader to the Methods, I would move Figure 7E, "Inhibitory plasticity" to the main body of the paper where the reader needs to understand it. For clarity, I would label "C_TL" and "Theta_Inh,low" and "C_TH" as "Theta_Inh,high". The right panel could be better labeled "Inhibitory plasticity kernel". The left panel could be better labeled "Theta_Inh adaptation", with again replacing the acronyms "C_TL" and "C_TH". The same applies to Fig. 5D on P19.

      We have updated the labeling in Figures 5D and 7E for clarity, including replacing "C_TL" and "C_TH" with "Theta_Inh,low" and "Theta_Inh,high". In addition, we added a simplified schematic of the inhibitory plasticity rule to Figure 5 to assist the reader’s understanding when presenting the results. Figure 7E remains in the Methods section to preserve the flow of the main text.

      P12: I would suggest simplifying Fig. 3 panels and acronyms as well. Remove "MP-K" from C and D. Relabel "LTP-K" as "LTP kernel". The same applies to Fig. 5E on P19 and Fig. 3 - supplement 1 on P46 and Fig 6 - supplement 1 on P49.

      We have simplified the labeling across all relevant figures by replacing “MP-K” with “metaplasticity kernel” and “LTP-K” with “LTP kernel.” To maintain clarity, we retained these terms in only one panel as a reference.

      Minor recommendations:

      P4: "Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights." This sentence is jarring. BCM and metaplasticity has been discussed in hundreds of theory papers! Cite some. This sentence would more accurately read, "Our study corroborates prior theory work (citations) demonstrating that metaplasticity helps to achieve stable and physiologically realistic synaptic weights."

      We have followed the reviewers suggestion and updated the sentence to: Previous theoretical studies (Bienenstock et al., 1982; Fusi et al., 2005; Clopath et al., 2010; Benna & Fusi, 2016; Zenke & Gerstner, 2017) demonstrate the essential role of metaplasticity in maintaining stability in synaptic weight distributions. (page 2 line 49-51, page 3 line 1)

      P9: Grammar. "The neuron model was during training activated..." should read "During training, the neuron model was activated..."

      Corrected

      P17: Lovett-Barron et al., 2012 is appropriately cited here. Milstein et al., Neuron, 2015 also showed dendritic inhibition regulates plateau potentials in CA1 pyramidal cells in vitro, and Grienberger et al., Nat. Neurosci., 2017 showed it in vivo.

      P19 vs P16 vs P21. Fig. 4B, Fig. 5B, and Fig. 6B choose different strategies to show variance across seeds. Please choose one strategy and apply to all comparable plots.

      We thank the reviewer for these helpful points.

      We have added the suggested citations (Milstein et al., 2015; Grienberger et al., 2017) alongside Lovett-Barron et al., 2012. 

      Variance across seeds is now displayed uniformly (mean is solid line STD is shaded area) in Figures 4B, 5B, and 6B.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      (1)  Quality of Scientific Writing:

      i. Mathematical and Implementation Details:

      I appreciate the authors' efforts in clarifying the mathematical details and providing pseudocode for the learning rule, significantly improving readability and reproducibility. The reference to existing models via GitHub and ModelDB repositories is acceptable. However, I suggest enhancing the presentation quality of equations within the Methods section-currently, they are low-resolution images. Please consider rewriting these equations using LaTeX or replacing them with high-resolution images to further improve clarity.

      We appreciate the reviewer’s comment regarding clarity and reproducibility. In response, we have rewritten all equations in LaTeX to improve their readability and presentation quality in the Methods section.

      ii. Figure quality.

      I acknowledge the authors' effort to improve figure clarity and consistency throughout the manuscript. However, I notice that the x-axis label "[Ca]_v (μm)" in Fig. 7E still appears compressed and unclear. Additionally, given the complexity and abundance of hyperparameters or artificial settings involved in your experimental design and learning rule (such as kernel parameters, metaplasticity kernels, and unspecific features), the current arrangement of subfigures (particularly Fig. 3C, D and Fig. 5D, E) still poses readability challenges. I recommend reordering subfigures to present primary results (e.g., performance outcomes) prominently upfront, while relegating visualizations of detailed hyperparameter manipulations or feature weight variations to later sections or the discussion, thus enhancing clarity for readers.

      We thank the reviewer for pointing out the readability issue. We have corrected the x-axis label in Figure 7D. We hope this new layout with a simplified rule in Fig 3 and Fig 5   presents the key findings while retaining full mechanistic detail to make it easier to understand the model behavior.  

      iii. Writing clarity.

      The authors have streamlined the "Metaplasticity" section and reduced references to dopamine, which is a positive step. However, the broader issue remains: the manuscript still appears overly detailed and more like a technical report of a novel learning rule, rather than a clearly structured scientific paper. I strongly recommend that the authors further distill the manuscript by clearly focusing on one or two central scientific questions or hypotheses-for instance, emphasizing core insights such as "inhibitory inputs facilitate nonlinear dendritic computations" or "distal dendritic inputs significantly contribute to nonlinear integration." Clarifying and highlighting these primary scientific questions early and consistently throughout the manuscript would substantially enhance readability and impact.

      We appreciate the reviewer’s guidance on improving the manuscript’s clarity and focus.In response, we now highlight two central questions at the end of the Introduction and have retitled the main Results subsections to follow this thread, thereby sharpening the manuscript’s focus while retaining necessary technical detail (page3 line 20-28).We have also removed redundant passages and simplified technical details to improve overall readability .

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The authors acknowledge that their simulated [Ca²⁺] levels exceed typical biological measurements but claim that the learning rule remains robust across variations in calcium concentrations. However, robustness to calcium variations was not explicitly demonstrated in the main figures. To convincingly address this concern, I recommend the authors explicitly test and present whether adopting biologically realistic calcium concentrations (~1 μM) impacts the learning outcomes or synaptic weight dynamics. Clarifying this point with a supplemental analysis or an additional figure panel would significantly strengthen their argument regarding the model's biological plausibility and robustness.

      We thank the reviewer for the comment. The elevated [Ca<sup>²⁺</sup>]<sub>NMDA</sub> values reflect localized transients in spine heads with narrow necks and high NMDA conductance. These values are not problematic for our model, as the plasticity rule depends on relative calcium differences rather than absolute levels as the metaplasticity kernel will adjust. In future versions of our detailed neuron model, we will likely decrease the spine axial resistance of the spine neck.

    1. eLife Assessment

      This important computational study investigates homeostatic plasticity mechanisms that neurons may employ to achieve and maintain stable target activity patterns. The work extends previous analyses of calcium-dependent homeostatic mechanisms based on ion channel density by considering activity-dependent shifts in channel activation and inactivation properties that operate on faster and potentially variable timescales. The model simulations convincingly demonstrate the potential functional importance of these mechanisms.

    2. Reviewer #1 (Public review):

      This revision of the computational study by Mondal et al addresses several issues that I raised in the previous round of reviews and, as such, is greatly improved. The manuscript is more readable, its findings are more clearly described, and both the introduction and the discussion sections are tighter and more to the point. And thank you for addressing the three timescales of half activation/inactivation parameters. It makes the mechanism clearer.

      Some issues remain that I bring up below.

      Comment:

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, a few important questions are left unanswered.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      (2) Another related question is whether the speed of recovery is significantly modified by implementing voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

    4. Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

      Editor's note: The authors have adequately addressed the concerns raised in the public reviews above, as well as the previous recommendations, and revised the manuscript where necessary.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

      We thank the reviewer for this clarification. We did not intend to imply that voltage-dependence modulation is universally incapable of supporting bursting or that conductance changes alone are universally sufficient. To avoid any overstatement, we now write:

      “…activity-dependent changes in channel voltage-dependence alone did not assemble bursting from these low-conductance initial states (cf. Figure 1B)”.

      Reviewer #2 (Public review):

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      We appreciate the reviewer’s interest in a quantitative partitioning of the contributions from voltage-dependence regulation versus conductance regulation. We agree that this would be an important analysis in principle. In practice, obtaining this would be difficult.

      Our goal here was to establish the principle: that half-(in)activation shifts can meaningfully influence recovery. This is not an obvious result, given that these two processes can act on vastly different timescales.

      That said, our current dataset does provide partial quantitative insight. Eight of the twenty models required some form of voltage-dependence modulation to recover; among these, two only recovered under fast modulation and two only under slow modulation. This demonstrates that voltage-dependence regulation is essential for recovery in some neurons, and its timescale critically shapes the outcome.

      (2) Another related question is whether the speed of recovery is significantly modified by implemeting voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      Our current results suggest that voltage-dependence regulation can indeed accelerate recovery, as illustrated in Figure 3 and supported by additional simulations (not shown). However, a fully quantitative comparison (e.g., time-to-recovery distributions or survival analysis) would require a much larger ensemble of degenerate models to achieve sufficient statistical power across all four conditions. Generating and simulating this expanded model set is computationally intensive, requiring stochastic searches in a high-dimensional parameter space, full time-course simulations, and a subsequent selection process that may succeed or fail.

      The principal aim of the present study is conceptual: to demonstrate that this multi-timescale homeostatic model—built here for the first time—can capture interactions between conductance regulation and voltage-dependence modulation during assembly (“neurodevelopment”) and perturbation. Establishing the conceptual framework and exploring its qualitative behavior were the necessary first steps before pursuing a large-scale quantitative study.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

      We appreciate the reviewer’s interest in a more quantitative characterization of the interaction between voltage-dependence and conductance regulation (Fig. 6). As noted in our responses to Comments 1 and 2, some of the facets of this interaction—such as the ability to recover from perturbations and the speed of assembly—can be measured.

      However, fully quantifying the landscape sketched in Figure 6 would require systematically mapping the regions of high-dimensional parameter space where stable solutions exist. In our model, this space spans 18 dimensions (maximal conductances and half‑(in)activations). Even a coarse grid with three samples per dimension would entail over 100 million simulations, which is computationally prohibitive and would still collapse to a schematic representation for visualization.

      For this reason, we chose to present Figure 6 as a conceptual summary, illustrating the qualitative organization of solutions and the role of multi-timescale regulation, rather than attempting an exhaustive mapping. We view this figure as a necessary first step toward guiding future, more quantitative analyses.

      Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

      We thank the reviewer for this thoughtful comment and for highlighting an important point about the scope of our conclusions regarding timescale effects. The reviewer is correct that our simulations demonstrate the influence of voltage-dependence timescale primarily during periods of adaptation—when the neuron is moving from an initial, target-mismatched state toward a final target-satisfying state. Once the system has reached a stable solution, simply changing the timescale of voltage-dependent modulation does not by itself shift the neuron’s activity, unless a new perturbation occurs that re-engages the homeostatic mechanism. We have clarified this point in the revised Discussion.

      The confusion likely arose from imprecise phrasing in the original text describing Figure 6. Previously, we wrote:

      “When channel gating properties are altered quickly in response to deviations from the target activity, the resulting electrical patterns are shown in Figure 6 as the orange bubble labeled 𝝉<sub>𝒉𝒂𝒍𝒇</sub> = 6 s”. 

      We have revised this sentence to emphasize that the orange bubble represents the eventual stable state, rather than implying that timescale changes alone drive activity shifts:

      ”When channel gating properties are altered quickly in response to deviations from the target activity, the neuron ultimately settles into a stable activity pattern. The resulting electrical patterns are shown in Figure 6 as the orange bubble labeled 𝝉<sub>𝒉𝒂𝒍𝒇</sub> = 6 s”.

      Reviewer #1 (Recommendations for the authors):

      Unless I am missing something, Figure 2 should be a supplement to Figure 1. I would prefer to see panel B in Figure 1 to indicate that the findings of that figure are general. Panel A really is not showing anything useful to the reader.

      We appreciate the suggestion to combine Figure 2 with Figure 1, but we believe keeping Figure 2 separate better preserves the manuscript’s flow. Figure 1 illustrates the mechanism in a single model, while Figure 2 presents the population-level summary that generalizes the phenomenon across all models.

      Also, I find Figure 6 unnecessary and its description in the Discussion more detracting than useful. Even with the descriptions, I find nothing in the figure itself that clarifies the concept.

      We appreciate the reviewer’s feedback on Figure 6. The purpose of this figure is to conceptually illustrate that multiple degenerate solutions can satisfy the calcium target and that the timescale of voltage‑dependence modulation can influence which region of this solution space is accessed during the acquisition of the activity target. Reviewer 3 noted some confusion about this point. We made a small clarifying edit.

      At the risk of being really picky, I also don't see the purpose of Figure 7. And I find it strange to plot -Vm just because that's the argument of findpeaks.

      We appreciate the reviewer’s comment on Figure 7. The purpose of this figure is to illustrate exactly what the findpeaks function is detecting, as indicated by the red arrows on the traces. For readers unfamiliar with findpeaks, it may not be obvious how the algorithm interprets the waveform. Showing the peaks directly ensures that the measurements used in our analysis align with what one would intuitively expect.

      Reviewer #2 (Recommendations for the authors):

      The writing of the article has been much improved since the last version. It is much clearer, and the discussion has been improved and better addresses the biological foundations and relevance of the study. However, conclusions are rather qualitative, while one would expect some quantitative answers to be provided by the modeling approach.

      We appreciate the reviewer’s concern regarding quantification and share this perspective. As noted above, our study is primarily conceptual. Many aspects of the model, such as calcium handling and channel regulation, are parameterized based on incomplete biological data. These uncertainties make robust quantitative predictions difficult, so we focus on qualitative outcomes that are likely to hold independently of specific parameter choices.

    1. eLife Assessment

      This study presents a valuable investigation into cell-specific microstructural development in the neonatal rat brain using diffusion-weighted magnetic resonance spectroscopy. The evidence supporting the core claims is solid, with innovative in vivo data acquisition and modeling, noting residual caveats with regard to the limitations of diffusion-weighted magnetic resonance spectroscopy for strict validation of cell-type-specific metabolite compartmentation. In addition, the study provides community resources that will benefit researchers in this field. The work will be of interest to researchers studying brain development and biophysical imaging methods.

    2. Reviewer #1 (Public review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed non-invasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusion-weighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.

      (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.

      (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      Ligneul and coauthors have convincingly addressed and included my comments from the first and second round in their revised manuscript.

      I believe the following conceptual concerns, which are inherent to the nature of the study and do not require further adjustments of the manuscript, remain:

      (1) Metabolite compartmentation in one cell type or the other has often been challenged and is currently impossible to validate in vivo. Here, Ligneul and coauthors did not use this assumption a priori and supported their claims also with non-MR literature (eg. for Taurine), but the interpretation of results in that direction should be made with care.

      (2) Longitudinal MR studies of the developing brain make it difficult to extract parameters with an "absolute" meaning. Indirect assumptions used to derive such parameters may change with age and become confounding factors (brain structure, cell distribution, concentrations normalizing metabolites (here macromolecules), relaxation times...). While findings of the manuscript are convincing and supported with literature, the true underlying nature of such changes might be difficult to access.

      (3) Diffusion MRI in addition to diffusion MRS would have been complementary and beneficial to validate some of the signal contributions, but was unfeasible in the time constraints of experiments on young animals.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers once again for their careful evaluation of the revised manuscript and for their constructive suggestions. In response to the remaining recommendations, we have made minor amendments to the manuscript. The main changes are as follows:

      • Metabolite Concentrations: we now report them more conventionally, i.e. normalised by water content. The original normalisation by the absolute MM content has been retained in the supplementary information, as MMs are an endogenous tissue probe (i.e., not dependent on cerebrospinal fluid).  The fact that both water and MM normalisation provide similar trends supports the robustness of our conclusions. We have also updated Figure S2 to include the absolute MM concentrations, raw water content, and the MM-to-water ratios for each time point.

      • Taurine Interpretation: We have revised the wording related to the interpretation of taurine findings to clarify that we present a set of converging observations suggesting taurine may serve as a marker of early cerebellar neurodevelopment, rather than asserting it as a definitive conclusion.

      Comments to the editor & reviewers:

      We sincerely thank the reviewers and the editor for their valuable feedback, which has significantly improved the manuscript since its initial submission.

      Please note a correction in Figure S2 (added during the previous revision round): the reported evolution of metabolite/water concentrations has changed due to an earlier error in calculating the water peak integral, which has now been corrected.

      While we recognise that a study and manuscript can always be improved, we prefer not to make further changes at this stage. We cannot conduct new experiments, and redesigning the model falls outside the scope of this work. Additionally, we believe that further altering the manuscript’s structure could lead to unnecessary confusion rather than clarity.

    1. eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the work is of very high quality and that the methodological approach is praiseworthy. Although the experimental data support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, a concern remains that the central finding of paired-pulse depression at very short intervals could be due to a mechanism that does not depend on exocytosis, such as Ca²⁺ channel inactivation, rather than vesicle pool depletion. Overall, this is a solid study although the results still warrant consideration of alternative interpretations.

    2. Reviewer #3 (Public review):

      To summarize: The authors' overfilling hypothesis depends crucially on the premise that the very-quickly reverting paired-pulse depression seen after unusually short rest intervals of << 50 ms is caused by depletion of release sites whereas Dobrunz and Stevens (1997) concluded that the cause was some other mechanism that does not involve depletion. The authors now include experiments where switching extracellular Ca2+ from 1.2 to 2.5 mM increases synaptic strength on average, but not by as much as at other synapse types. They contend that the result supports the depletion hypothesis. I didn't agree because the model used to generate the hypothesis had no room for any increase at all, and because a more granular analysis revealed a mixed population with a subset where: (a) synaptic strength increased by as much as at standard synapses; and yet (b) the quickly reverting depression for the subset was the same as the overall population.

      The authors raise the possibility of additional experiments, and I do think this could clarify things if they pre-treat with EGTA as I recommended initially. They've already shown they can do this routinely, and it would allow them to elegantly distinguish between pv and pocc explanations for both the increases in synaptic strength and the decreases in the paired pulse ratio upon switching Ca2+ to 2.5 mM. Plus/minus EGTA pre-treatment trials could be interleaved and done blind with minimal additional effort.

      Showing reversibility would be a great addition too, because, in our experience, this does not always happen in whole-cell recordings in ex-vivo tissue even when electrical properties do not change. If the goal is to show that L2/3 synapses are less sensitive to changes in Ca2+ compared to other synapse types - which is interesting but a bit off point - then I would additionally include a positive control, done by the same person with the same equipment, at one of those other synapse types using the same kind of presynaptic stimulation (i.e. ChRs).

      Specific points (quotations are from the Authors' rebuttal)

      (1) Regarding the Author response image 1, I was instead suggesting a plot of PPR in 1.2 mM Ca2+ versus the relative increase in synaptic strength in 2.5 versus in 1.2 mM. This continues to seem relevant.

      (2) "Could you explain in detail why two-fold increase implies pv < 0.2?"

      a. start with power((2.5/(1 + (2.5/K1) + 1/2.97)),4) = 2*power((1.3/(1 + (1.3/K1) + 1/2.97)),4);

      b. solve for K1 (this turns out to be 0.48);

      c. then implement the premise that pv -> 1.0 when Ca2+ is high by calculating Max = power((C/(1 + (C/K1) + 1/2.97)),4) where C is [Ca] -> infinity.

      d. pv when [Ca] = 1.3. mM must then be power((1.3/(1 + (1.3/K1) + 1/2.97)),4)/Max, which is <0.2.

      Note that modern updates of Dodge and Rahamimoff typically include a parameter that prevents pv from approaching 1.0; this is the gamma parameter in the versions from Neher group.

      (3) "If so, we can not understand why depletion-dependent PPD should lead to PPF."

      When PPD is caused by depletion and pv < 0.2, the number of occupied release sites should not be decreased by more than one-fifth at the second stimulus so, without facilitation, PPR should be > 0.8. The EGTA results then indicate there should be strong facilitation, driving PPR to something like 1.2 with conservative assumptions. And yet, a value of < 0.4 is measured, which is a large miss.

      (4) Despite the authors' suggestion to the contrary, I continue to think there is a substantial chance that Ca2+-channel inactivation is the mechanism underlying the very quickly reverting paired-pulse depression. However, this is only one example of a non-depletion mechanism among many, with the main point being that any non-depletion mechanism would undercut the reasoning for overfilling. And, this is what Dobrunz and Stevens claimed to show; that the mechanism - whatever it is - does not involve depletion. The most effective way to address this would be affirmative experiments showing that the quickly reverting depression is caused by depletion after all. Attempting to prove that Ca2+-channel inactivation does not occur does not seem like a worthwhile strategy because it would not address the many other possibilities.

      (5) True that Kusick et al. observed morphological re-docking, but then vesicles would have to re-prime and Mahfooz et al. (2016) showed that re-priming would have to be slower than 110 ms (at least during heavy use at calyx of Held).

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3 (Public review):

      The central issue for evaluating the overfilling hypothesis is the identity of the mechanism that causes the very potent (>80% when inter pulse is 20 ms), but very quickly reverting (< 50 ms) paired pulse depression (Fig 1G, I). To summarize: the logic for overfilling at local cortical L2/3 synapses depends critically on the premise that probability of release (pv) for docked and fully primed vesicles is already close to 100%. If so, the reasoning goes, the only way to account for the potent short-term enhancement seen when stimulation is extended beyond 2 pulses would be by concluding that the readily releasable pool overfills. However, the conclusion that pv is close to 100% depends on the premise that the quickly reverting depression is caused by exocytosis dependent depletion of release sites, and the evidence for this is not strong in my opinion. Caution is especially reasonable given that similarly quickly reverting depression at Schaffer collateral synapses, which are morphologically similar, was previously shown to NOT depend on exocytosis (Dobrunz and Stevens 1997). Note that the authors of the 1997 study speculated that Ca2+-channel inactivation might be the cause, but did not rule out a wide variety of other types of mechanisms that have been discovered since, including the transient vesicle undocking/re-docking (and subsequent re-priming) reported by Kusick et al (2020), which seems to have the correct timing.

      Thank you for your comments on an alternative possibility besides Ca<sup>2+</sup> channel inactivation. Kusick et al. (2020) showed that transient destabilization of docked vesicle pool is recovered within 14 ms after stimulation. This rapid recovery implies that post-stimulation undocking events might be largely resolved before the 20 ms inter-stimulus interval (ISI) used in our paired-pulse ratio (PPR) experiments, arguing against the possibility that post-AP undocking/re-docking events significantly influence PPR measured at 20 ms ISI. Furthermore, Vevea et al. (2021) showed that post-stimulus undocking is facilitated in synaptotagmin-7 (Syt7) knockout synapses. In our study, Syt7 knockdown did not affect PPR at 20 ms ISI, suggesting that the undocking process described in Kusick et al. may not be a major contributor to the paired-pulse depression observed at 20 ms interval in our study. Therefore, it is unlikely that transient vesicle undocking primarily underlies the strong PPD at 20 ms ISI in our experiments. Taken together, the undocking/redocking dynamics reported by Kusick et al. are too rapid to affect PPR at 20 ms ISI, and our Syt7 knockdown data further argue against a significant role of this process in the PPD observed at 20 ms interval.

      In an earlier round of review, I suggested raising extracellular Ca<sup>2+</sup>, to see if this would increase synaptic strength. This is a strong test of the authors' model because there is essentially no room for an increase in synaptic strength. The authors have now done experiments along these lines, but the result is not clear cut. On one hand, the new results suggest an increase in synaptic strength that is not compatible with the authors' model; technically the increase does not reach statistical significance, but, likely, this is only because the data set is small and the variation between experiments is large. Moreover, a more granular analysis of the individual experiments seems to raise more serious problems, even supporting the depletion-independent counter hypothesis to some extent. On the other hand, the increase in synaptic strength that is seen in the newly added experiments does seem to be less at local L2/3 cortical synapses compared to other types of synapses, measured by other groups, which goes in the general direction of supporting the critical premise that pv is unusually high at L2/3 cortical synapses. Overall, I am left wishing that the new data set were larger, and that reversal experiments had been included as explained in the specific points below.

      Specific Points:

      (1) One of the standard methods for distinguishing between depletion-dependent and depletion-independent depression mechanisms is by analyzing failures during paired pulses of minimal stimulation. The current study includes experiments along these lines showing that pv would have to be extremely close to 1 when Ca<sup>2+</sup> is 1.25 mM to preserve the authors' model (Section "High double failure rate ..."). Lower values for pv are not compatible with their model because the k<sub>1</sub> parameter already had to be pushed a bit beyond boundaries established by other types of experiments.

      It should be noted that we did not arbitrarily pushed the k<sub>1</sub> parameter beyond boundaries, but estimated the range of k<sub>1</sub> based on the fast time constant for recovery from paired pulse depression as shown in Fig. 3-S2-Ab.

      The authors now report a mean increase in synaptic strength of 23% after raising Ca to 2.5 mM. The mean increase is not quite statistically significant, but this is likely because of the small sample size. I extracted a 95% confidence interval of [-4%, +60%] from their numbers, with a 92% probability that the mean value of the increase in the full population is > 5%. I used the 5% value as the greatest increase that the model could bear because 5% implies pv < 0.9 using the equation from Dodge and Rahamimoff referenced in the rebuttal. My conclusion from this is that the mean result, rather than supporting the model, actually undermines it to some extent. It would have likely taken 1 or 2 more experiments to get above the 95% confidence threshold for statistical significance, but this is ultimately an arbitrary cut off.

      Our key claim in Fig. 3-S3 is not the statistical non-significance of EPSC changes, but the small magnitude of the change (1.23-fold). This small increase is far less than the 3.24-fold increase predicted by the fourth-power relationship (D&R equation, Dodge & Rahamimoff, 1967), which would be valid under the conditions that the fusion probability of docked vesicles (p<sub>v</sub>) is not saturated. We do not believe that addition of new experiments would increase the magnitude of EPSC change as high as the Dodge & Rahamimoff equation predicts, even if more experiments (n) yielded a statistical significance. In other words, even a small but statistically significant EPSC changes would still contradict with what we expect from low p<sub>v</sub> synapses. It should be noted that our main point is the extent of EPSC increase induced by high external [Ca<sup>2+</sup>], not a p-value. In this regard, it is hard for us to accept the Reviewer’s request for larger sample size expecting lower p-value.

      Although we agree to Reviewer’s assertion that our data may indicate a 92% probability for the high Ca<sup>2+</sup> -induced EPSC increases by more than 5%, we do not agree to the Reviewer’s interpretation that the EPSC increase necessarily implies an increase in p<sub>v</sub>. We are sorry that we could not clearly understand the Reviewer’s inference that the 5% increase of EPSCs implies p<sub>v</sub> < 0.9. Please note that release probability (p<sub>r</sub>) is the product of p<sub>v</sub> and the occupancy of docked vesicles in an active zone (p<sub>occ</sub>). We imagine that this inference might be under the premise that p<sub>occ</sub> is constant irrespective of external [Ca<sup>2+</sup>]. Contrary to the Reviewer’s premise, Figure 2c in Kusick et al. (2020) showed that the number of docked SVs increased by c. a. 20% upon increasing external [Ca<sup>2+</sup>] to 2 mM. Moreover, Figure 7F in Lin et al. (2025) demonstrated that the number of TS vesicles, equivalent to p<sub>occ</sub> increased by 23% at high external [Ca<sup>2+</sup>]. These extents of p<sub>occ</sub> increases are similar to our magnitude of high external Ca<sup>2+</sup> -induced increase in EPSC (1.23-fold). Of course, it is possible that both increase of p<sub>occ</sub> and p<sub>v</sub> contributed to the high [Ca<sup>2+</sup>]<sub>o</sub>-induced increase in EPSC. The low PPR and failure rate analysis, however, suggest that p<sub>v</sub> is already saturated in baseline conditions of 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> and thus it is more likely that an increase in p<sub>occ</sub> is primarily responsible for the 1.23-fold increase. Moreover, the 1.23-fold increase, does not match to the prediction of the D&R equation, which would be valid at synapses with low p<sub>v</sub>. Therefore, interpreting our observation (1.23-fold increase) as a slight increase in p<sub>occ</sub> is rather consistent with recent papers (Kusick et al.,2020; Lin et al., 2025) as well as our other results supporting the baseline saturation of p<sub>v</sub> as shown in Figure 2 and associated supplement figures (Fig. 2-S1 and Fig. 2-S2).

      (2) The variation between experiments seems to be even more problematic, at least as currently reported. The plot in Figure 3-figure supplement 3 (left) suggests that the variation reflects true variation between synapses, not measurement error.

      Note that there was a substantial variance in the number of docked or TS vesicles at baseline and its fold changes at high external Ca<sup>2+</sup> condition in previous studies too (Lin et al., 2025; Kusick et al., 2020). Our study did not focus on the heterogeneity but on the mean dynamics of short-term plasticity at L2/3 recurrent synapses. Acknowledging this, the short-term plasticity of these synapses could be best explained by assuming that vesicular fusion probability (p<sub>v</sub>) is near to unity, and that release probability is regulated by p<sub>occ</sub>. In other words, even though p<sub>v</sub> is near to unity, synaptic strength can increase upon high external [Ca<sup>2+</sup>], if the baseline occupancy of release sites (p<sub>occ</sub>) is low and p<sub>occ</sub> is increased by high [Ca<sup>2+</sup>]. Lin et al. (2025) showed that high external [Ca<sup>2+</sup>] induces an increase in the number of TS vesicles (equivalent to p<sub>occ</sub>) by 23% at the calyx synapses. Different from our synapses, the baseline p<sub>v</sub> (denoted as p<sub>fusion</sub> in Lin et al., 2025) of the calyx synapse is not saturated (= 0.22) at 1.5 mM external [Ca<sup>2+</sup>], and thus the calyx synapses displayed 2.36-fold increase of EPSC at 2 mM external [Ca<sup>2+</sup>], to which increases in p<sub>occ</sub> as well as in p<sub>v</sub> (from 0.22 to 0.42) contributed. Therefore, the small increase in EPSC (= 23%) supports that p<sub>v</sub> is already saturated at L2/3 recurrent synapses.

      And yet, synaptic strength increased almost 2-fold in 2 of the 8 experiments, which back extrapolates to pv < 0.2.

      We are sorry that we could not understand the first comment in this paragraph. Could you explain in detail why two-fold increase implies pv < 0.2?

      If all of the depression is caused by depletion as assumed, these individuals would exhibit paired pulse facilitation, not depression. And yet, from what I can tell, the individuals depressed, possibly as much as the synapses with low sensitivity to Ca<sup>2+</sup>, arguing against the critical premise that depression equals depletion, and even arguing - to some extent - for the counter hypothesis that a component of the depression is caused by a mechanism that is independent of depletion.

      For the first statement in this paragraph, we imagine that ‘the depression’ means paired pulse depression (PPD). If so, we can not understand why depletion-dependent PPD should lead to PPF. If the paired pulse interval is too short for docked vesicles to be replenished, the first pulse-induced vesicle depletion would result in PPD. We are very sorry that we could not understand Reviewer’s subsequent inference, because we could not understand the first statement.

      I would strongly recommend adding an additional plot that documents the relationship between the amount of increase in synaptic strength after increasing extracellular Ca<sup>2+</sup> and the paired pulse ratio as this seems central.

      We found no clear correlation of EPSC<sub>1</sub> with PPR changes (ΔPPR) as shown in the figure below.

      Author response image 1.

      Plot of PPR changes as a function of EPSC1.<br />

      (3) Decrease in PPR. The authors recognize that the decrease in the paired-pulse ratio after increasing Ca<sup>2+</sup> seems problematic for the overfilling hypothesis by stating: "Although a reduction in PPR is often interpreted as an increase in pv, under conditions where pv is already high, it more likely reflects a slight increase in p<sub>occ</sub> or in the number of TS vesicles, consistent with the previous estimates (Lin et al., 2025)."

      We admit that there is a logical jump in our statement you mentioned here. We appreciate your comment. We re-wrote that part in the revised manuscript (line 285) as follows:

      “Recent morphological and functional studies revealed that elevation of [Ca<sup>2+</sup>]<sub>o</sub> induces an increase in the number of TS or docked vesicles to a similar extent as our observation (Kusick et al., 2020; Lin et al., 2025), raising a possibility that an increase in p<sub>occ</sub> is responsible for the 1.23-fold increase in EPSC at high [Ca<sup>2+</sup>]<sub>o</sub> . A slight but significant reduction in PPR was observed under high [Ca<sup>2+</sup>]<sub>o</sub> too. An increase in p<sub>occ</sub> is thought to be associated with that in the baseline vesicle refilling rate. While PPR is always reduced by an increase in p<sub>v,</sub> the effects of refilling rate to PPR is complicated. For example, PPR can be reduced by both a decrease (Figure 2—figure supplement 1) and an increase (Lin et al., 2025) in the refilling rate induced by EGTA-AM and PDBu, respectively. Thus, the slight reduction in PPR is not contradictory to the possible contribution of p<sub>occ</sub> to the high [Ca<sup>2+</sup>]<sub>o</sub> effects.”

      I looked quickly, but did not immediately find an explanation in Lin et al 2025 involving an increase in pocc or number of TS vesicles, much less a reason to prefer this over the standard explanation that reduced PPR indicates an increase in pv.

      Fig. 7F of Lin et al. (2025) shows an 1.23-fold increase in the number of TS vesicles by high external [Ca<sup>2+</sup>]. The same figure (Fig. 7E) in Lin et al. (2025) also shows a two-fold increase of p<sub>fusion</sub> (equivalent to p<sub>v</sub> in our study) by high external [Ca<sup>2+</sup>] (from 0.22 to 0.42,). Because p<sub>occ</sub> is the occupancy of TS vesicles in a limited number of slots in an active zone, the fold change in the number of TS vesicles should be similar to that of p<sub>occ</sub>.

      The authors should explain why the most straightforward interpretation is not the correct one in this particular case to avoid the appearance of cherry picking explanations to fit the hypothesis.

      The results of Lin et al. (2025) indicate that high external [Ca<sub>2+</sub>] induces a milder increase in p<sub>occ</sub> (23%) compared to p<sub>v</sub> (190%) at the calyx synapses. Because the extent of p<sub>occ</sub> increase is much smaller than that of p<sub>v</sub> and multiple lines of evidence in our study support that the baseline p<sub>v</sub> is already saturated, we raised a possibility that an increase in p<sub>occ</sub> would primarily contribute to the unexpectedly low increase of EPSC at 2.5 mM [Ca<sub>2+</sub>]<sub>o</sub>. As mentioned above, our interpretation is also consistent with the EM study of Kusick et al. (2020). Nevertheless, the reduction of PPR at 2.5 mM Ca<sub>2+</sub> seems to support an increase in p<sub>v,</sub> arguing against this possibility. On the other hand, because p<sub>occ</sub> = k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the simple vesicle refilling model (Fig. 3-S2Aa), a change in p<sub>occ</sub> should associate with changes in k<sub>1</sub> and/or b<sub>1</sub>. While PPR is always reduced by an increase in p<sub>v,</sub> the effects of refilling rate to PPR is complicated. For example, despite that EGTA-AM would not increase p<sub>v,</sub> it reduced PPR probably through reducing refilling rate (Fig. 2-S1). On the contrary, PDBu is thought to increase k<sub>1</sub> because it induces two-fold increase of p<sub>occ</sub> (Fig. 7L of Lin et al., 2025). Such a marked increase of p<sub>occ,</sub> rather than p<sub>v,</sub> seems to be responsible for the PDBu-induced marked reduction of PPR (Fig. 7I of Lin et al., 2025), because PDBu induced only a slight increase in p<sub>v</sub> (Fig. 7K of Lin et al., 2025). Therefore, the slight reduction of PPR is not contradictory to our interpretation that an increase in p<sub>occ</sub> might be responsible for the slight increase in EPSC induced by high [Ca<sup>2+</sup>]<sub>o</sub>.

      (4) The authors concede in the rebuttal that mean pv must be < 0.7, but I couldn't find any mention of this within the manuscript itself, nor any explanation for how the new estimate could be compatible with the value of > 0.99 in the section about failures.

      We have never stated in the rebuttal or elsewhere that the mean p<sub>v</sub> must be < 0.7. On the contrary, both of our manuscript and previous rebuttals consistently argued that the baseline p<sub>v</sub> is already saturated, based on our observations including low PPR, tight coupling, high double failure rate and the minimal effect of external Ca<sup>2+</sup> elevation.

      (5) Although not the main point, comparisons to synapses in other brain regions reported in other studies might not be accurate without directly matching experiments.

      Please understand that it not trivial to establish optimal experimental settings for studying other synapses using the same methods employed in the study. We think that it should be performed in a separate study. Furthermore, we have already shown in the manuscript that action potentials (APs) evoked by oChIEF activation occur in a physiologically natural manner, and the STP induced by these oChIEF-evoked APs is indistinguishable from the STP elicited by APs evoked by dual-patch electrical stimulation. Therefore, we believe that our use of optogenetic stimulation did not introduce any artificial bias in measuring STP.

      As it is, 2 of 8 synapses got weaker instead of stronger, hinting at possible rundown, but this cannot be assessed because reversibility was not evaluated. In addition, comparing axons with and without channel rhodopsins might be problematic because the channel rhodopsins might widen action potentials.

      We continuously monitored series resistance and baseline EPSC amplitude throughout the experiments. The figure below shows the mean time course of EPSCs at two different [Ca<sup>2+</sup>]<sub>o</sub>. As it shows, we observed no tendency for run-down of EPSCs during experiments. If any, such recordings were discarded from analysis. In addition, please understand that there is a substantial variance in the number of docked vesicles at both baseline and high external Ca<sup>2+</sup> (Lin et al., 2025; Kusick et al., 2020) as well as short-term dynamics of EPSCs at our synapses.

      Author response image 2.

      Time course of normalized amplitudes of the first EPSCs during paired-pulse stimulation at 20 ms ISI in control and in the elevated external Ca<sup>2+</sup> (n = 8).<br />

      (6) Perhaps authors could double check with Schotten et al about whether PDBu does/does not decrease the latency between osmotic shock and transmitter release. This might be an interesting discrepancy, but my understanding is that Schotten et al didn't acquire information about latency because of how the experiments were designed.

      Schotten et al. (2015) directly compared experimental and simulation data for hypertonicity-induced vesicle release. They showed a pronounced acceleration of the latency as the tonicity increases (Fig. 2-S2), but this tonicity-dependent acceleration was not reproduced by reducing the activation energy barrier for fusion (ΔEa) in their simulations (Fig. 2-S1). Thus, the authors mentioned that an unknown compensatory mechanism counteracting the osmotic perturbation might be responsible for the tonicity-dependent changes in the latency. Importantly, their modeling demonstrated that reducing ΔEa, which would correspond to increasing p<sub>v</sub> results in larger peak amplitudes and shorter time-to-peak, but did not accelerate the latency. Therefore, there is currently no direct explanation for the notion that PDBu or similar manipulations shorten latency via an increase in p<sub>v</sub>.

      (7) The authors state: "These data are difficult to reconcile with a model in which facilitation is mediated by Ca2+-dependent increases in pv." However, I believe that discarding the premise that depression is always caused by depletion would open up wide range of viable possibilities.

      We hope that Reviewer understands the reasons why we reached the conclusion that the baseline p<sub>v</sub> is saturated at our synapses. First of all, strong paired pulse depression (PPD) cannot be attributed to Ca<sup>2+</sup> channel inactivation because Ca<sup>2+</sup> influx at the axon terminal remained constant during 40 Hz train stimulation (Fig.2 -S2). Moreover, even if Ca<sup>2+</sup> channel inactivation is responsible for the strong PPD, this view cannot explain the delayed facilitation that emerges subsequent pulses (third EPSC and so on) in the 40 Hz train stimulation (Fig. 1-4), because Ca<sup>2+</sup> channel inactivation gradually accumulates during train stimulations as directly shown by Wykes et al. (2007) in chromaffin cells. Secondly, the strong PPD and very fast recovery from PPD indicates very fast refilling rate constant (k<sub>1</sub>). Under this high k<sub>1</sub>, the failure rates were best explained by p<sub>v</sub> close to unity. Thirdly, the extent of EPSC increase induced by high external Ca<sup>2+</sup> was much smaller than other synapses such as calyx synapses at which p<sub>v</sub> is not saturated (Lin et al., 2025), and rather similar to the increases in p<sub>occ</sub> estimated at calyx synapses or the EM study (Kusick et al., 2020; Lin et al., 2025).

      Reference

      Wykes et al. (2007). Differential regulation of endogenous N-and P/Q-type Ca<sup>2+</sup> channel inactivation by Ca<sup>2+</sup>/calmodulin impacts on their ability to support exocytosis in chromaffin cells. Journal of Neuroscience, 27(19), 5236-5248.

      Reviewer #3 (Recommendations for the authors):

      I continue to think that measuring changes in synaptic strength when raising extracellular Ca<sup>2+</sup> is a good experiment for evaluating the overfilling hypothesis. Future experiments would be better if the authors would include reversibility criteria to rule out rundown, etc. Also, comparisons to other types of synapses would be stronger if the same experimenter did the experiments at both types of synapses.

      We observed no systemic tendency for run-down of EPSCs during these experiments (Author response image 2). Furthermore, the observed variability is well within the expected variance range in the number of docked vesicles at both baseline and high external Ca²⁺ (Lin et al., 2025; Kusick et al., 2020) and reflects biological variability rather than experimental artifact. Therefore, we believe that additional reversibility experiments are not warranted. However, we are open to further discussion if the Reviewer has specific methodological concerns not resolved by our present data.

      For the second issue, as mentioned above, we think that studying at other synapse types should be done in a separate study.

    1. eLife Assessment

      This study tested the specific hypothesis that age-related changes to hearing involve a partial loss of synapse connections between sensory cells in the ear and the nerve fibers that carry information about sounds to the brain, and that this interferes with the ability to discriminate rapid temporal fluctuations in sounds. Physiological, behavioral, and histological analyses provide a powerful combination to test this hypothesis in gerbils. Contrary to previous suggestions, it was found that chemically-induced isolated synaptopathy (at similar levels as observed in aged gerbils) did not result in worse performance on a behavioral task measuring sensitivity to temporal fine-structure, nor did it produce degradations in auditory-nerve fiber encoding of fine structure. Aged gerbils showed degraded behavior and stronger than normal envelope responses, but temporal fine-structure coding was not affected; interpreted by the authors as suggesting central processing contributions to aging effects on discrimination. These findings are important for advancing our knowledge of the mechanistic bases for age-related changes to hearing, and the evidence provided is solid with the results largely supporting the claims made and minor limitations related to possible confounds discussed in reasonable depth.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the effects of aging on auditory system performance in understanding temporal fine structure (TFS), using both behavioral assessments and physiological recordings from the auditory periphery, specifically at the level of the auditory nerve. This dual approach aims to enhance understanding of the mechanisms underlying observed behavioral outcomes. The results indicate that aged animals exhibit deficits in behavioral tasks for distinguishing between harmonic and inharmonic sounds, which is a standard test for TFS coding. However, neural responses at the auditory nerve level do not show significant differences when compared to those in young, normal-hearing animals. The authors suggest that these behavioral deficits in aged animals are likely attributable to dysfunctions in the central auditory system, potentially as a consequence of aging.To further investigate this hypothesis, the study includes an animal group with selective synaptic loss between inner hair cells and auditory nerve fibers, a condition known as cochlear synaptopathy (CS). CS is a pathology associated with aging and is thought to be an early indicator of hearing impairment. Interestingly, animals with selective CS showed physiological and behavioral TFS coding similar to that of the young normal-hearing group, contrasting with the aged group's deficits. Despite histological evidence of significant synaptic loss in the CS group, the study concludes that CS does not appear to affect TFS coding, either behaviorally or physiologically.

      Strengths:

      This study addresses a critical health concern, enhancing our understanding of mechanisms underlying age-related difficulties in speech intelligibility, even when audiometric thresholds are within normal limits. A major strength of this work is the comprehensive approach, integrating behavioral assessments, auditory nerve (AN) physiology, and histology within the same animal subjects. This approach enhances understanding of the mechanisms underlying the behavioral outcomes and provides confidence in the actual occurrence of synapse loss and its effects.The study carefully manages controlled conditions by including five distinct groups: young normal-hearing animals, aged animals, animals with CS induced through low and high doses, and a sham surgery group. This careful setup strengthens the study's reliability and allows for meaningful comparisons across conditions. Overall, the manuscript is well-structured, with clear and accessible writing that facilitates comprehension of complex concepts.

      Weakness:

      The stimulus and task employed in this study are very helpful for behavioral research, and using the same stimulus setup for physiology is advantageous for mechanistic comparisons. However, I have some concerns about the limitations in auditory nerve (AN) physiology. Due to practical constraints, it is not feasible to record from a large enough population of fibers that covers a full range of best frequencies (BFs) and spontaneous rates (SRs) within each animal. This raises questions about how representative the physiological data are for understanding the mechanism in behavioral data. I am curious about the authors' interpretation of how this stimulus setup might influence results compared to methods used by Kale and Heinz (2010), who adjusted harmonic frequencies based on the characteristic frequency (CF) of recorded units. While, the harmonic frequencies in this study are fixed across all CFs, meaning that many AN fibers may not be tuned closely to the stimulus frequencies. If units are not responsive to the stimulus further clarification on detecting mistuning and phase locking to TFS effects within this setup would be valuable. Given the limited number of units per condition-sometimes as few as three for certain conditions-I wonder if CF-dependent variability might impact the results of the AN data in this study and discussing this factor can help with better understanding the results. While the use of the same stimuli for both behavioral and physiological recordings is understandable, a discussion on how this choice affects interpretation would be beneficial. In addition a 60 dB stimulus could saturate high spontaneous rate (HSR) AN fibers, influencing neural coding and phase-locking to TFS. Potentially separating SR groups, could help address these issues and improve interpretive clarity.

      A deeper discussion on the role of fiber spontaneous rate could also enhance the study. How might considering SR groups affect AN results related to TFS coding? While some statistical measures are included in the supplement, a more detailed discussion in the main text could help in interpretation.

      Although Figure S2 indicates no change in median SR, the high-dose treatment group lacks LSR fibers, suggesting a different distribution based on SR for different animal groups, as seen in similar studies on other species. A histogram of these results would be informative, as LSR fiber loss with CS-whether induced by ouabain in gerbils or noise in other animals-is well documented (e.g., Furman et al., 2013).

      Although ouabain effects on gerbils have been explored in previous studies, since these data is already seems to be recorded for the animal in this study, a brief description of changes in auditory brainstem response (ABR) thresholds, wave 1 amplitudes, and tuning curves for animals with cochlear synaptopathy (CS) in this study would be beneficial. This would confirm that ouabain selectively affects synapses without impacting outer hair cells (OHCs). For aged animals, since ABR measurements were taken, comparing hearing differences between normal and aged groups could provide insights into the pathologies besides CS in aged animals. Additionally, examining subject variability in treatment effects on hearing and how this correlates with behavior and physiology would yield valuable insights. If limited space maybe a brief clarification or inclusion in supplementary could be good enough.

      Another suggestion is to discuss the potential role of MOC efferent system and effect of anesthesia in reducing efferent effects in AN recordings. This is particularly relevant for aged animals, as CS might affect LSR fibers, potentially disrupting the medial olivocochlear (MOC) efferent pathway. Anesthesia could lessen MOC activity in both young and aged animals, potentially masking efferent effects that might be present in behavioral tasks. Young gerbils with functional efferent systems might perform better behaviorally, while aged gerbils with impaired MOC function due to CS might lack this advantage. A brief discussion on this aspect could potentially enhance mechanistic insights.

      Lastly, although synapse counts did not differ between the low-dose treatment and NH I sham groups, separating these groups rather than combining them with the sham might reveal differences in behavior or AN results, particularly regarding the significance of differences between aged/treatment groups and the young normal-hearing group.

    3. Reviewer #2 (Public review):

      Summary:

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that age-related changes aside from synaptopathy are responsible for the age-related decline in discrimination.

      Strengths:

      (1) The rationale and hypothesis are well-motivated and clearly presented.

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function.

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.

      Weaknesses:

      (1) I have concerns that the gerbils may not have been performing the behavioral task using temporal fine structure information.

      Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. However, gerbil auditory filters are thought to be broader than those in human. In the revised version of the manuscript, the authors provide modelling results suggesting that the excitation patterns were discriminable for the 4F0 conditions, but may not have been for the 8F0 conditions. These results provide some reassurance that the 8F0 discriminations were dependent on temporal cues, but the description of the model lacks detail. Also, the authors state that "thus, for these two conditions with harmonic number N of 8 the gerbils cannot rely on differences in the excitation patterns but must solve the task by comparing the temporal fine structure." This is too strong. Pulsed tone intensity difference limens (the reference used for establishing whether or not the excitation pattern cues were usable) may not be directly comparable to profile-analysis-like conditions, and it has been argued that frequency discrimination may be more sensitive to excitation pattern cues than predicted from a simple comparison to intensity difference limens (Micheyl et al. 2013, https://doi.org/10.1371/journal.pcbi.1003336).

      I'm also somewhat concerned that the masking noise used in the present study was too low in level to mask cochlear distortion products. Based on their excitation pattern modelling, the authors state (without citation) that "since the level of excitation produced by the pink noise is less than 30 dB below that produced by the complex tones, distortion products will be masked." The basis for this claim is not clear. In human, distortion products may be only ~20 dB below the levels of the primaries (referenced to an external sound masker / canceller, which is appropriate, assuming that the modelling reported in the present paper did not include middle-ear effects; see Norman-Haignere and McDermott, 2016, doi: 10.1016/j.neuroimage.2016.01.050). Oxenham et al. (2009, doi: 10.1121/1.3089220) provide further cautionary evidence on the potential use of distortion product cues when the background noise level is too low (in their case the relative level of the noise in the compromised condition was only a little below that used in the present study). The masking level used in the present study may have been sufficient, but it would be useful to have some further reassurance on this point.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human).

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group. Statistical analyses on very small samples can be unreliable due to problems of power, generalisability, and susceptibility to outliers.

    4. Reviewer #3 (Public review):

      This study is a part of the ongoing series of rigorous work from this group exploring neural coding deficits in the auditory nerve, and dissociating the effects of cochlear synaptopathy from other age-related deficits. They have previously shown no evidence of phase-locking deficits in the remaining auditory nerve fibers in quiet-aged gerbils. Here, they study the effects of aging on the perception and neural coding of temporal fine structure cues in the same Mongolian gerbil model.

      They measure TFS coding in the auditory nerve using the TFS1 task which uses a combination of harmonic and tone-shifted inharmonic tones which differ primarily in their TFS cues (and not the envelope). They then follow this up with a behavioral paradigm using the TFS1 task in these gerbils. They test young normal hearing gerbils, aged gerbils, and young gerbils with cochlear synaptopathy induced using the neurotoxin ouabain to mimic synapse losses seen with age.

      In the behavioral paradigm, they find that aging is associated with decreased performance compared to the young gerbils, whereas young gerbils with similar levels of synapse loss do not show these deficits. When looking at the auditory nerve responses, they find no differences in neural coding of TFS cues across any of the groups. However, aged gerbils show an increase in the representation of periodicity envelope cues (around f0) compared to young gerbils or those with induced synapse loss. The authors hence conclude that synapse loss by itself doesn't seem to be important for distinguishing TFS cues, and rather the behavioral deficits with age are likely having to do with the misrepresented envelope cues instead.

      The manuscript is well written, and the data presented are robust. Some of the points below will need to be considered while interpreting the results of the study, in its current form. These considerations are addressable if deemed necessary, with some additional analysis in future versions of the manuscript.

      Spontaneous rates - Figure S2 shows no differences in median spontaneous rates across groups. But taking the median glosses over some of the nuances there. Ouabain (in the Bourien study) famously affects low spont rates first, and at a higher degree than median or high spont rates. It seems to be the case (qualitatively) in figure S2 as well, with almost no units in the low spont region in the ouabain group, compared to the other groups. Looking at distributions within each spont rate category and comparing differences across the groups might reveal some of the underlying causes for these changes. Given that overall, the study reports that low-SR fibers had a higher ENV/TFS log-z-ratio, the distribution of these fibers across groups may reveal specific effects of TFS coding by group.

      [Update: The revised manuscript has addressed these issues]

      Threshold shifts - It is unclear from the current version if the older gerbils have changes in hearing thresholds, and whether those changes may be affecting behavioral thresholds. The behavioral stimuli appear to have been presented at a fixed sound level for both young and aged gerbils, similar to the single unit recordings. Hence, age-related differences in behavior may have been due to changes in relative sensation level. Approaches such as using hearing thresholds as covariates in the analysis will help explore if older gerbils still show behavioral deficits.

      [Update: The issue of threshold shifts with aging gerbils is still unresolved in my opinion. From the revised manuscript, it appears that aged gerbils have a 36dB shift in thresholds. While the revised manuscript provides convincing evidence that these threshold shifts do not affect the auditory nerve tuning properties, the behavioral paradigm was still presented at the same sound level for young and aged animals. But a potential 36 dB change in sensation level may affect behavioral results. The authors may consider adding thresholds as covariates in analyses or present any evidence that behavioral thresholds are plateaued along that 30dB range].

      Task learning in aged gerbils - It is unclear if the aged gerbils really learn the task well in two of the three TFS1 test conditions. The d' of 1 which is usually used as the criterion for learning was not reached in even the easiest condition for aged gerbils in all but one condition for the aged gerbils (Fig. 5H) and in that condition, there doesn't seem to be any age-related deficits in behavioral performance (Fig. 6B). Hence dissociating the inability to learn the task from the inability to perceive TFS 1 cues in those animals becomes challenging.

      [Update: The revised manuscript sufficiently addresses these issues, with the caveat of hearing threshold changes affecting behavioral thresholds mentioned above].

      Increased representation of periodicity envelope in the AN - the mechanisms for increased representation of periodicity envelope cues is unclear. The authors point to some potential central mechanisms but given that these are recordings from the auditory nerve what central mechanisms these may be is unclear. If the authors are suggesting some form of efferent modulation only at the f0 frequency, no evidence for this is presented. It appears more likely that the enhancement may be due to outer hair cell dysfunction (widened tuning, distorted tonotopy). Given this increased envelope coding, the potential change in sensation level for the behavior (from the comment above), and no change in neural coding of TFS cues across any of the groups, a simpler interpretation may be -TFS coding is not affected in remaining auditory nerve fibers after age-related or ouabain induced synapse loss, but behavioral performance is affected by altered outer hair cell dysfunction with age.

      [Update: The revised manuscript has addressed these issues]

      Emerging evidence seems to suggest that cochlear synaptopathy and/or TFS encoding abilities might be reflected in listening effort rather than behavioral performance. Measuring some proxy of listening effort in these gerbils (like reaction time) to see if that has changed with synapse loss, especially in the young animals with induced synaptopathy, would make an interesting addition to explore perceptual deficits of TFS coding with synapse loss.

      [Update: The revised manuscript has addressed these issues]

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review)  

      Summary:  

      The authors investigate the effects of aging on auditory system performance in understanding temporal fine structure (TFS), using both behavioral assessments and physiological recordings from the auditory periphery, specifically at the level of the auditory nerve. This dual approach aims to enhance understanding of the mechanisms underlying observed behavioral outcomes. The results indicate that aged animals exhibit deficits in behavioral tasks for distinguishing between harmonic and inharmonic sounds, which is a standard test for TFS coding. However, neural responses at the auditory nerve level do not show significant differences when compared to those in young, normalhearing animals. The authors suggest that these behavioral deficits in aged animals are likely attributable to dysfunctions in the central auditory system, potentially as a consequence of aging. To further investigate this hypothesis, the study includes an animal group with selective synaptic loss between inner hair cells and auditory nerve fibers, a condition known as cochlear synaptopathy (CS).CS is a pathology associated with aging and is thought to be an early indicator of hearing impairment. Interestingly, animals with selective CS showed physiological and behavioral TFS coding similar to that of the young normal-hearing group, contrasting with the aged group's deficits. Despite histological evidence of significant synaptic loss in the CS group, the study concludes that CS does not appear to affect TFS coding, either behaviorally or physiologically.  

      We agree with the reviewer’s summary.

      Strengths:  

      This study addresses a critical health concern, enhancing our understanding of mechanisms underlying age-related difficulties in speech intelligibility, even when audiometric thresholds are within normal limits. A major strength of this work is the comprehensive approach, integrating behavioral assessments, auditory nerve (AN) physiology, and histology within the same animal subjects. This approach enhances understanding of the mechanisms underlying the behavioral outcomes and provides confidence in the actual occurrence of synapse loss and its effects. The study carefully manages controlled conditions by including five distinct groups: young normal-hearing animals, aged animals, animals with CS induced through low and high doses, and a sham surgery group. This careful setup strengthens the study's reliability and allows for meaningful comparisons across conditions. Overall, the manuscript is well-structured, with clear and accessible writing that facilitates comprehension of complex concepts.

      Weaknesses:

      The stimulus and task employed in this study are very helpful for behavioral research, and using the same stimulus setup for physiology is advantageous for mechanistic comparisons. However, I have some concerns about the limitations in auditory nerve (AN) physiology. Due to practical constraints, it is not feasible to record from a large enough population of fibers that covers a full range of best frequencies (BFs) and spontaneous rates (SRs) within each animal. This raises questions about how representative the physiological data are for understanding the mechanism in behavioral data. I am curious about the authors' interpretation of how this stimulus setup might influence results compared to methods used by Kale and Heinz (2010), who adjusted harmonic frequencies based on the characteristic frequency (CF) of recorded units. While, the harmonic frequencies in this study are fixed across all CFs, meaning that many AN fibers may not be tuned closely to the stimulus frequencies. If units are not responsive to the stimulus further clarification on detecting mistuning and phase locking to TFS effects within this setup would be valuable. Since the harmonic frequencies in this study are fixed across all CFs, this means that many AN fibers may not be tuned closely to the stimulus frequencies, adding sampling variability to the results.

      We chose the stimuli for the AN recordings to be identical to the stimuli used in the behavioral evaluation of the perceptual sensitivity. Only with this approach can we directly compare the response of the population of AN fibers with perception measured in behavior.

      The stimuli are complex, i.e., comprise of many frequency components AND were presented at 68 dB SPL. Thus, the stimuli excite a given fiber within a large portion of the fiber’s receptive field. Furthermore, during recordings, we assured ourselves that fibers responded to the stimuli by audiovisual control. Otherwise it would have cost valuable recording time to record from a nonresponsive AN fiber.

      Given the limited number of units per condition-sometimes as few as three for certain conditions - I wonder if CF-dependent variability might impact the results of the AN data in this study and discussing this factor can help with better understanding the results. While the use of the same stimuli for both behavioral and physiological recordings is understandable, a discussion on how this choice affects interpretation would be beneficial. In addition a 60 dB stimulus could saturate high spontaneous rate (HSR) AN fibers, influencing neural coding and phase-locking to TFS. Potentially separating SR groups, could help address these issues and improve interpretive clarity.  

      A deeper discussion on the role of fiber spontaneous rate could also enhance the study. How might considering SR groups affect AN results related to TFS coding? While some statistical measures are included in the supplement, a more detailed discussion in the main text could help in interpretation.  We do not think that it will be necessary to conduct any statistical analysis in addition to that already reported in the supplement.  

      We considered moving some supplementary information back into the main manuscript but decided against it. Our single-unit sample was not sufficient, i.e. not all subpopulations of auditory-nerve fibers were sufficiently sampled for all animal treatment groups, to conclusively resolve every aspect that may be interesting to explore. The power of our approach lies in the direct linkage of several levels of investigation – cochlear synaptic morphology, single-unit representation and behavioral performance – and, in the main manuscript, we focus on the core question of synaptopathy and its relation to temporal fine structure perception. This is now spelled out clearly in lines 197 - 203 of the main manuscript.  

      Although Figure S2 indicates no change in median SR, the high-dose treatment group lacks LSR fibers, suggesting a different distribution based on SR for different animal groups, as seen in similar studies on other species. A histogram of these results would be informative, as LSR fiber loss with CS-whether induced by ouabain in gerbils or noise in other animals-is well documented (e.g., Furman et al., 2013).  

      Figure S2 was revised to avoid overlap of data points and show the distributions more clearly. Furthermore, the sample sizes for LSR and HSR fibers are now provided separately.

      Although ouabain effects on gerbils have been explored in previous studies, since these data already seems to be recorded for the animal in this study, a brief description of changes in auditory brainstem response (ABR) thresholds, wave 1 amplitudes, and tuning curves for animals with cochlear synaptopathy (CS) in this study would be beneficial. This would confirm that ouabain selectively affects synapses without impacting outer hair cells (OHCs). For aged animals, since ABR measurements were taken, comparing hearing differences between normal and aged groups could provide insights into the pathologies besides CS in aged animals. Additionally, examining subject variability in treatment effects on hearing and how this correlates with behavior and physiology would yield valuable insights. If limited space maybe a brief clarification or inclusion in supplementary could be good enough.  

      We thank the reviewer for this constructive suggestion. The requested data were added in a new section of the Results, entitled “Threshold sensitivity and frequency tuning were not affected by the synapse loss.” (lines 150 – 174). Our young-adult, ouabain-treated gerbils showed no significant elevations of CAP thresholds and their neural tuning was normal. Old gerbils showed the typical threshold losses for individuals of comparable age, and normal neural tuning, confirming previous reports. Thus, there was no evidence for relevant OHC impairments in any of our animal groups.   

      Another suggestion is to discuss the potential role of MOC efferent system and effect of anesthesia in reducing efferent effects in AN recordings. This is particularly relevant for aged animals, as CS might affect LSR fibers, potentially disrupting the medial olivocochlear (MOC) efferent pathway. Anesthesia could lessen MOC activity in both young and aged animals, potentially masking efferent effects that might be present in behavioral tasks. Young gerbils with functional efferent systems might perform better behaviorally, while aged gerbils with impaired MOC function due to CS might lack this advantage. A brief discussion on this aspect could potentially enhance mechanistic insights.  

      Thank you for this suggestion. The potential role of olivocochlear efferents is now discussed in lines 597 - 613.

      Lastly, although synapse counts did not differ between the low-dose treatment and NH I sham groups, separating these groups rather than combining them with the sham might reveal differences in behavior or AN results, particularly regarding the significance of differences between aged/treatment groups and the young normal-hearing group.  

      For maximizing statistical power, we combined those groups in the statistical analysis. These two groups did not differ in synapse number, threshold sensitivity or neural tuning bandwidths.

      Reviewer #2 (Public review):

      Summary:  

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that agerelated changes aside from synaptopathy are responsible for the age-related decline in discrimination. 

      We agree with the reviewer’s summary.

      Strengths: 

      (1) The rationale and hypothesis are well-motivated and clearly presented. 

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function. 

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.  

      Weaknesses: 

      (1) My main concern is that the stimuli may not have been appropriate for assessing neural temporal coding behaviorally. Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. By my calculations, the masking noise used in the present study was also considerably lower in level relative to the harmonic complex than that used in the human studies. These factors may have allowed the animals to perform the task using cues based on the pattern of activity across the neural array (excitation pattern cues), rather than cues related to temporal neural coding. The authors show that mean neural driven rate did not change with frequency shift, but I don't understand the relevance of this. It is the change in response of individual fibers with characteristic frequencies near the lowest audible harmonic that is important here.  

      The auditory filter bandwidth of the gerbil is about double that of human subjects. Because of this, the masking noise has a larger overall level than in the human studies in the filter, prohibiting the use of distortion products. The larger auditory filter bandwidth precludes that the gerbils can use excitation patterns, especially in the condition with a center frequency of 1600 Hz and a fundamental of 200 Hz and in the condition with a center frequency of 3200 Hz and a fundamental of 400 Hz. In the condition with a center frequency of 1600 Hz and a fundamental of 400 Hz, it is possible that excitation patterns are exploited. We have now added  modeling of the excitation patterns, and a new figure showing their change at the gerbils’ perception threshold, in the discussion of the revised version (lines 440 - 446 and Fig. 8).

      The case against excitation pattern cues needs to be better made in the Discussion. It could be that gerbil frequency selectivity is broad enough for this not to be an issue, but more detail needs to be provided to make this argument. The authors should consider what is the lowest audible harmonic in each case for their stimuli, given the level of each harmonic and the level of the pink noise. Even for the 8F0 center frequency, the lowest audible harmonic may be as low as the 4th (possibly even the 3rd). In human, harmonics are thought to be resolvable by the cochlea up to at least the 8th.  

      This issue is now covered in the discussion, see response to the previous point.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human). This should be discussed in the manuscript. 

      We agree that our results apply to moderate synaptopathy, which predominantly characterizes early stages of hearing loss or aged individuals without confounding noise-induced cochlear damage. This is now discussed in lines 486 – 498.

      It would be informative to provide synapse counts separately for the animals who were tested behaviorally, to confirm that the pattern of loss across the group was the same as for the larger sample.  

      Yes, the pattern was the same for the subgroup of behaviorally tested animals. We have added this information to the revised version of the manuscript (lines 137 – 141).

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group.  

      The results for the three old subjects differed significantly from those of young subjects and young ouabain-treated subjects. This indicates a sufficient statistical power, since otherwise no significant differences would be observed.

      Reviewer #3 (Public review):

      This study is a part of the ongoing series of rigorous work from this group exploring neural coding deficits in the auditory nerve, and dissociating the effects of cochlear synaptopathy from other agerelated deficits. They have previously shown no evidence of phase-locking deficits in the remaining auditory nerve fibers in quiet-aged gerbils. Here, they study the effects of aging on the perception and neural coding of temporal fine structure cues in the same Mongolian gerbil model. 

      They measure TFS coding in the auditory nerve using the TFS1 task which uses a combination of harmonic and tone-shifted inharmonic tones which differ primarily in their TFS cues (and not the envelope). They then follow this up with a behavioral paradigm using the TFS1 task in these gerbils. They test young normal hearing gerbils, aged gerbils, and young gerbils with cochlear synaptopathy induced using the neurotoxin ouabain to mimic synapse losses seen with age. 

      In the behavioral paradigm, they find that aging is associated with decreased performance compared to the young gerbils, whereas young gerbils with similar levels of synapse loss do not show these deficits. When looking at the auditory nerve responses, they find no differences in neural coding of TFS cues across any of the groups. However, aged gerbils show an increase in the representation of periodicity envelope cues (around f0) compared to young gerbils or those with induced synapse loss. The authors hence conclude that synapse loss by itself doesn't seem to be important for distinguishing TFS cues, and rather the behavioral deficits with age are likely having to do with the misrepresented envelope cues instead.  

      We agree with the reviewer’s summary.

      The manuscript is well written, and the data presented are robust. Some of the points below will need to be considered while interpreting the results of the study, in its current form. These considerations are addressable if deemed necessary, with some additional analysis in future versions of the manuscript. 

      Spontaneous rates - Figure S2 shows no differences in median spontaneous rates across groups. But taking the median glosses over some of the nuances there. Ouabain (in the Bourien study) famously affects low spont rates first, and at a higher degree than median or high spont rates. It seems to be the case (qualitatively) in Figure S2 as well, with almost no units in the low spont region in the ouabain group, compared to the other groups. Looking at distributions within each spont rate category and comparing differences across the groups might reveal some of the underlying causes for these changes. Given that overall, the study reports that low-SR fibers had a higher ENV/TFS log-zratio, the distribution of these fibers across groups may reveal specific effects of TFS coding by group.  

      As the reviewer points out, our sample from the group treated with a high concentration of ouabain showed very few low-spontaneous-rate auditory-nerve fibers, as expected from previous work. However, this was also true, e.g., for our sample from sham-operated animals, and may thus well reflect a sampling bias. We are therefore reluctant to attach much significance to these data distributions. We now point out more clearly the limitations of our auditory-nerve sample for the exploration of  interesting questions beyond our core research aim (see also response to Reviewer 1 above).  

      Threshold shifts - It is unclear from the current version if the older gerbils have changes in hearing thresholds, and whether those changes may be affecting behavioral thresholds. The behavioral stimuli appear to have been presented at a fixed sound level for both young and aged gerbils, similar to the single unit recordings. Hence, age-related differences in behavior may have been due to changes in relative sensation level. Approaches such as using hearing thresholds as covariates in the analysis will help explore if older gerbils still show behavioral deficits.  

      Unfortunately, we did not obtain behavioral thresholds that could be used here. We want to point out that the TFS 1 stimuli had an overall level of 68 dB SPL, and the pink noise masker would have increased the threshold more than expected from the moderate, age-related hearing loss in quiet. Thus, the masked thresholds for all gerbil groups are likely similar and should have no effect on the behavioral results.

      Task learning in aged gerbils - It is unclear if the aged gerbils really learn the task well in two of the three TFS1 test conditions. The d' of 1 which is usually used as the criterion for learning was not reached in even the easiest condition for aged gerbils in all but one condition for the aged gerbils (Fig. 5H) and in that condition, there doesn't seem to be any age-related deficits in behavioral performance (Fig. 6B). Hence dissociating the inability to learn the task from the inability to perceive TFS 1 cues in those animals becomes challenging.  

      Even in the group of gerbils with the lowest sensitivity, for the condition 400/1600 the animals achieved a d’ of on average above 1. Furthermore, stimuli were well above threshold and audible, even when no discrimination could be observed. Finally, as explained in the methods, different stimulus conditions were interleaved in each session, providing stimuli that were easy to discriminate together with those being difficult to discriminate. This approach ensures that the gerbils were under stimulus control, meaning properly trained to perform the task. Thus, an inability to discriminate does not indicate a lack of proper training.  

      Increased representation of periodicity envelope in the AN - the mechanisms for increased representation of periodicity envelope cues is unclear. The authors point to some potential central mechanisms but given that these are recordings from the auditory nerve what central mechanisms these may be is unclear. If the authors are suggesting some form of efferent modulation only at the f0 frequency, no evidence for this is presented. It appears more likely that the enhancement may be due to outer hair cell dysfunction (widened tuning, distorted tonotopy). Given this increased envelope coding, the potential change in sensation level for the behavior (from the comment above), and no change in neural coding of TFS cues across any of the groups, a simpler interpretation may be -TFS coding is not affected in remaining auditory nerve fibers after age-related or ouabain induced synapse loss, but behavioral performance is affected by altered outer hair cell dysfunction with age. 

      A similar point was made by Reviewer #1. As indicated above, new data on threshold sensitivity and neural tuning were added in a new section of the Results which indirectly suggest that significant OHC pathologies were not a concern, neither in our young-adult, synaptopathic gerbils nor in the old gerbils.  

      Emerging evidence seems to suggest that cochlear synaptopathy and/or TFS encoding abilities might be reflected in listening effort rather than behavioral performance. Measuring some proxy of listening effort in these gerbils (like reaction time) to see if that has changed with synapse loss, especially in the young animals with induced synaptopathy, would make an interesting addition to explore perceptual deficits of TFS coding with synapse loss.  

      This is an interesting suggestion that we now explore in the revision of the manuscript. Reaction times can be used as a proxy for listening effort and were recorded for all responses. The the new analysis now reported in lines 378 - 396 compared young-adult control gerbils with young-adult gerbils that had been treated with the high concentration of ouabain. No differences in response latencies was found, indicating that listening effort did not change with synapse loss.  

      Reviewer #1 (Recommendations for the authors): 

      Figure 2: The y-axis labeled as "Frequency" is potentially misleading since there are additional frequency values on the right side of the panels. It would be helpful to clarify more in the caption what these right-side frequency values represent. Additionally, the legend could be positioned more effectively for clarity.

      Thank you for your suggestion. The axis label was rephrased.

      Figure 7: This figure is a bit unclear, as it appears to show two sets of gerbil data at 1500 Hz, yet the difference between them is not explained.  

      We added the following text to the figure legend: „The higher and lower thresholds shown for the gerbil data reflect thresholds at  fc of 1600 Hz for fundamentals f0 of 200 Hz and 400 Hz, respectively.“

      Maybe a short description of fmax that is used in Figure 4 could help or at least point to supplementary for finding the definition.  

      We thank the reviewer for pointing out this typo/inaccuracy. The correct terminology in line with the remainder of the manuscript is “fmaxpeak”. We corrected the caption of figure 5 (previously figure 4) and added the reference pointing to figure 11 (previously figure 9), which explains the terms.

      I couldn't find information about the possible availability of data. 

      The auditory-nerve recordings reported in this paper are part of a larger study of single-unit auditorynerve responses in gerbils, formally described and published by Heeringa (2024) Single-unit data for sensory neuroscience: Responses from the auditory nerve of young-adult and aging gerbils. Scientific Data 11:411, https://doi.org/10.1038/s41597-024-03259-3. As soon as the Version of Record will be submitted, the raw single-unit data can be accessed directly through the following link:  https://doi.org/10.5061/dryad.qv9s4mwn4. The data that are presented in the figures of the present manuscript and were statistically analyzed are uploaded to the Zenodo repository (https://doi.org/10.5281/zenodo.15546625).  

      Reviewer #2 (Recommendations for the authors): 

      L22. The term "hidden hearing loss" is used in many different ways in the literature, from being synonymous with cochlear synaptopathy, to being a description of any listening difficulties that are not accounted for by the audiogram (for which there are many other / older terms). The original usage was much more narrow than your definition here. It is not correct that Schaette and McAlpine defined HHL in the broad sense, as you imply. I suggest you avoid the term to prevent further confusion.  

      We eliminated the term hidden hearing loss.

      L43. SNHL is undefined.

      Thank you for catching that. The term is now spelled out.

      L64. "whether" -> "that"  

      We corrected this issue.

      L102. It would be informative to see the synapse counts (across groups) for the animals tested in the behavioral part of the study. Did these vary between groups in the same way?  

      Yes, the pattern was the same for the subgroup of behaviorally tested animals. We have added this information to the revised version of the manuscript (lines 137 – 141).

      L108. How many tests were considered in the Bonferroni correction? Did this cover all reported tests in the paper?  

      The comparisons of synapse numbers between treatment groups were done with full Bonferroni correction, as in the other tests involving posthoc pair-wise comparisons after an ANOVA.

      Figure 1 and 6 captions. Explain meaning of * and ** (criteria values).  

      The information was added to the figure legends of now Figs. 1 and 7. 

      L139. I don't follow the argument - the mean driven rate is not important. It is the rate at individual CFs and how that changes with frequency shift that provides the cue.

      L142. I don't follow - individual driven rates might have been a cue (some going up, some down, as frequency was shifted).  

      Yes, theoretically it is possible that the spectral pattern of driven rates (i.e., excitation pattern) can be specifically used for profile analysis and subsequently as a strong cue for discriminating the TFS1 stimuli. In order to shed some light on this question with regard to the actual stimuli used in this study, we added a comprehensive figure showing simulated excitation patterns (figure 8). The excitation patterns were generated with a gammatone filter bank and auditory filter bandwidths appropriate for gerbils (Kittel et al. 2002). The simulated excitation patterns allow to draw some at least semi-quantitative conclusions about the possibility of profile analysis: 1. In the 200/1600 Hz and 400/3200 Hz conditions (i.e., harmonic number of fc is 8), the difference between all inharmonic excitation patterns and the harmonic reference excitation pattern is far below the threshold for intensity discrimination (Sinnott et al. 1992). 2. In the same conditions, the statistics of the pink noise make excitation patterns differences at or beyond the filter slopes (on both high and low frequency limits) useless for frequency shift discrimination. 3. In the 400/1600 Hz condition (i.e., harmonic number of fc is 4), there is a non-negligible possibility that excitation pattern differences were a main cue for discrimination. All of these conclusions are compatible with the results of our study.

      L193. Is this p-value Bonferroni corrected across the whole study? If not, the finding could well be spurious given the number of tests reported.  

      Yes, it is Bonferroni corrected

      L330. TFS is already defined.  

      L346. AN is already defined.  

      L408. "temporal fine structure" -> "TFS"  

      It was a deliberate decision to define these terms again in the Discussion, for readers who prefer to skip most of the detailed Results. 

      L364-366. This argument is somewhat misleading. Cochlear resolvability largely depends on the harmonic spacing (i.e., F0) relative to harmonic frequency (in other words, on harmonic rank). Marmel et al. (2015) and Moore and Sek (2009) used a center frequency (at least) 11 times F0. Here, the center frequency was only 4 or 8 times F0. In human, this would not be sufficient to eliminate excitation pattern cues.  

      We have now included results from modeling the excitation patterns in the discussion with a new figure demonstrating that at a center frequency of 8 times F0, excitation patterns provide no useful cue while this is a possibility at  a center frequency of 4 times F0 (Fig. 8, lines 440 - 446).

      L541. Was that a spectrum level of 20 dB SPL (level per 1-Hz wide band) at 1 kHz? Need to clarify.  

      The power spectral density of the pink noise at 1 kHz (i.e., the level in a 1 Hz wide band centered at 1 kHz) was 13.3 dB SPL. The total level of the pink noise (including edge filters at 100 Hz and 11 kHz) was 50 dB SPL.

      L919. So was the correction applied across only the tests within each ANOVA? Don't you need to control the study-wise error rate (across all primary tests) to avoid spurious findings?  

      We added information about the family-wise error rate (line 1077 - 1078). Since the ANOVAs tested different specific research questions, we do not think that we need to control the study-wise error rate.

      Reviewer #3 (Recommendations for the authors): 

      There was no difference in TFS sensitivity in the AN fiber activity across all the groups. Potential deficits with age were only sound in the behavioral paradigm. Given that, it might make it clearer to specify that the deficits or lack thereof are in behavior, in multiple instances in the manuscript where it says synaptopathy showed no decline in TFS sensitivity (For example Line 342-344).  

      We carefully went through the entire text and clarified a couple more instances.

      L353 - this statement is a bit too strong. It implies causality when there is only a co-occurrence of increased f0 representation and age-related behavioral deficits in TFS1 task.  

      The statement was rephrased as “Thus, cue representation may be associated with the perceptual deficits, but not reduced synapse numbers, as originally proposed.”

      L465-467 - while this may be true, I think it is hard to say this with the current dataset where only AN fibers are being recorded from. I don't think we can say anything about afferent central mechanisms with this data set.  

      We agree. However, we refer here to published data on central inhibition to provide a possible explanation. 

      Hearing thresholds with ABRs are mentioned in the methods, but that data is not presented anywhere. Would be nice to see hearing thresholds across the various groups to account or discount outer hair cell dysfunction. 

      This important point was made repeatedly and we thank the Reviewers for it. As indicated above, new data on threshold sensitivity and neural tuning were added in a new section of the Results which indirectly suggest that significant OHC pathologies were not a concern, neither in our young-adult, synaptopathic gerbils nor in the old gerbils.

    1. eLife Assessment

      This valuable study introduces a non-perturbative pulse-labeling strategy for yeast nuclear pore complexes (NPCs), employing a nanobody-based approach in order to selectively capture Nup84-containing complexes for imaging and biochemical analysis. The data convincingly demonstrate that a short induction period (20 minutes to 1 hour) yields a strong and sustained signal, enabling affinity purification that faithfully recapitulates the endogenous Nup84 interactome. This tool offers a powerful framework for investigating NPC dynamics and associated interactomes through both imaging and biochemical assays.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a nanobody-based pulse-labeling system to track yeast NPCs. Transient expression of a nanobody targeting Nup84 (fused to NeonGreen or an affinity tag) permits selective visualization and biochemical capture of NPCs. Short induction effectively labels NPCs, and the resulting purifications match those from conventional Nup84 tagging. Crucially, when induction is repressed, dilution of the labeled pool through successive cell cycles allows the visualization of "old" NPCs (and potentially individual NPCs), providing a powerful view of NPC lifespan and turnover without permanently modifying a core scaffold protein.

      Strengths:

      (1) A brief expression pulse labels NPCs, and subsequent repression allows dilution-based tracking of older (and possibly single) NPCs over multiple cell cycles.

      (2) The affinity-purified complexes closely match known Nup84-associated proteins, indicating specificity and supporting utility for proteomics.

      Weaknesses:

      (1) Reliance on GAL induction introduces metabolic shifts (raffinose → galactose → glucose) that could subtly alter cell physiology or the kinetics of NPC assembly. Alternative induction systems (e.g., β-estradiol-responsive GAL4-ER-VP16) could be discussed as a way to avoid carbon-source changes.

      (2) While proteomics is solid, a comprehensive supplementary table listing all identified proteins (with enrichment and statistics) would enhance transparency.

      (3) Importantly, the authors note that the method is particularly useful "in conditions where direct tagging of Nup84 interferes with its function, while sub-stoichiometric nanobody binding does not." After this sentence, it would be valuable to add concrete examples, such as experiments examining NPC integrity in aging or stress conditions where epitope tags can exacerbate phenotypes. These examples will help readers identify situations in which this approach offers clear advantages.

    3. Reviewer #2 (Public review):

      Summary:

      This preprint describes a practical and useful approach for labeling and tracking NPCs in situ. While useful applications including timelapse imaging, affinity purification, or proximity labeling are envisioned, addressing some outstanding technical questions would give a clearer picture of the sensitivity and temporal resolution of this approach.

      Strengths:

      Clever use of a fluorescently conjugated nanobody that binds directly to the core scaffold nucleoporin Nup84 with nanomolar affinity.

      Weaknesses:

      The decrease in nanobody labeling over 8 hours of chase period is interpreted to indicate that NPCs turn over during this time. However, it is also possible that the nanobody:Nup84 association is disrupted during mitosis by phosphorylation, other PTMs, or structural remodeling.

    4. Reviewer #3 (Public review):

      Summary:

      Submitted to the Tools and Resources series, this study reports on the use of a single-domain antibody targeting the nucleoporin Nup84 to probe and track NPCs in budding yeast. The authors demonstrate their ability to rapidly label or pull down NPCs by inducing the expression of a tagged version of the nanobody (Figure 1).

      Strengths:

      This tool's main strength is its versatility as an inexpensive, easy-to-set-up alternative to metabolic labelling or optical switching. This same rationale could, in principle, be applied to the study of other multiprotein complexes using similar strategies, provided that single-chain antibodies are available.

      Weaknesses:

      This approach has no inherent weaknesses, but it would be useful for the authors to verify that their pulse labelling strategy can also be used to detect assembly intermediates, structural variants, or damaged NPCs.

      Overall, the data clearly show that Nup84 nanobodies are a valuable tool for imaging NPC dynamics and investigating their interactomes through affinity purification.

    1. eLife Assessment

      The authors examined the frequency of alternative splicing across prokaryotes and eukaryotes and found that the rate of alternative splicing varies with taxonomic groups and genome coding content. This solid work, based on nearly 1,500 high-quality genome assemblies, relies on a novel genome-scale metric that enables cross-species comparisons and that quantifies the extent to which coding sequences generate multiple mRNA transcripts via alternative splicing. This timely study provides an important basis for improving our general understanding of genome architecture and the evolution of life forms.

    2. Reviewer #2 (Public review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree, and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF.

      The revised manuscript addresses the problems identified in the first round of reviews and now serves as a guide to understand how alternative splicing has evolved within different phyla, as opposed to making unsubstantiated claims about overall trends.

    3. Reviewer #3 (Public review):

      Summary:

      In "Alternative Splicing Across the Tree of Life: A Comparative Study," the authors use rich annotation features from nearly 1,500 high-quality NCBI genome assemblies to develop a novel genome-scale metric, the Alternative Splicing Ratio, that quantifies the extent to which coding sequences generate multiple mRNA transcripts via alternative splicing (AS). This standardized metric enables cross-species comparisons and reveals clear phylogenetic patterns: minimal AS in prokaryotes and unicellular eukaryotes, moderate AS in plants, and high AS in mammals and birds. The study finds a strong negative correlation between AS and coding content, with genomes containing approximately 50% intergenic DNA exhibiting the highest AS activity. By integrating diverse lines of prior evidence, the study offers a cohesive evolutionary framework for understanding how alternative splicing varies and evolves across the tree of life.

      Strengths:

      By studying alternative splicing patterns across the tree of life, the authors systematically address an important yet historically understudied driver of functional diversity, complexity, and evolutionary innovation. This manuscript makes a valuable contribution by leveraging standardized, publicly available genome annotations to perform a global survey of transcriptional diversity, revealing lineage-specific patterns and evolutionary correlates. The authors have done an admirable job in this revised version, thoroughly addressing prior reviewer comments. The updated manuscript includes more rigorous statistical analyses, careful consideration of potential methodological biases, expanded discussion of regulatory mechanisms, and acknowledgment of non-adaptive alternatives. Overall, the work presents an intriguing view of how alternative splicing may serve as a flexible evolutionary strategy, particularly in lineages with limited capacity for coding expansion (e.g., via gene duplication). Notably, the identification of genome size and genic coding fraction thresholds (~20 Mb and ~50%, respectively) as tipping points for increased splicing activity adds conceptual depth and potential generalizability.

      Weaknesses:

      While the manuscript offers a broad comparative view of alternative splicing, its central message becomes diffuse in the revised version. The focus of the study is unclear, and the manuscript comes across as largely descriptive without a well-articulated hypothesis or explanatory evolutionary model. Although the discussion gestures toward adaptive and non-adaptive mechanisms, these interpretations are not developed early or prominently enough to anchor the reader. The negative correlation between alternative splicing and coding content is compelling, but the biological significance of this pattern remains ambiguous: it is unclear whether it reflects functional constraint, genome organization, or annotation bias. This uncertainty weakens the manuscript's broader evolutionary inferences.

      Sections of the Introduction, particularly lines 72-90, lack cohesion and logical flow, shifting abruptly between topics without a clear structure. A more effective approach may involve separating discussions of coding and non-coding sequence evolution to clarify their distinct contributions to splicing complexity. Furthermore, some interpretive claims lack nuance. For example, the assertion that splicing in plants "evolved independently" seems overstated given the available evidence, and the citation regarding slower evolution of highly expressed genes overlooks counterexamples from the immunity and reproductive gene literature.

      Presentation of the results is occasionally vague. For instance, stating "we conducted comparisons of mean values" (line 146) without specifying the metric undercuts interpretability. The authors should clarify whether these comparisons refer to the Alternative Splicing Ratio or another measure. Additionally, the lack of correlation between splicing and coding region fraction in prokaryotes may reflect a statistical power issue, particularly given their limited number of annotated isoforms, rather than a biological absence of pattern.

      Finally, the assessment of annotation-related bias warrants greater methodological clarity. The authors note that annotations with stronger experimental support yield higher splicing estimates, yet the normalization strategy for variation in transcriptomic sampling (e.g., tissue breadth vs sequencing depth) is insufficiently described. As these factors can significantly influence splicing estimates, a more rigorous treatment is essential. While the authors rightly acknowledge that splicing represents only one layer of regulatory complexity, the manuscript would benefit from a more integrated consideration of additional dimensions, such as 3D genome architecture, e.g., the potential role of topologically associating domains in constraining splicing variation.

    4. Reviewer #4 (Public review):

      The manuscript reports on a large-scale study correlating genomic architecture with splicing complexity over almost 1,500 species. We still know relatively little about alternative splicing functional consequences and evolution, and thus, the study is relevant and timely. The methodology relies on annotations from NCBI for high-quality genomes and a main metric proposed by the authors and named Alternative Splicing Ratio (ASR). It quantifies the level of redundancy of each coding nucleotide in the annotated isoforms.

      According to the authors' response to the first reviewers' comments, the present version of the manuscript seems to be a profoundly revised version compared to the original submission. I did not have access to the reviewers' comments.

      Although the study addresses an important question and the authors have visibly made an important effort to make their claims more statistically robust, I have a number of major concerns regarding the methodology and its presentation.

      (1) A large part of the manuscript is speculative and vague. For instance, the Discussion is very long (almost longer than the Results section) and the items discussed are sometimes not in direct connection with the present work. I would suggest merging the last 2 paragraphs, for instance, since the before last paragraph is essentially a review of the literature without direct connection to the present work.

      (2) The Methods section lacks clarity and precision. A large part is devoted to explaining the biases in the data without any reference or quantification. The definition of ASR is very confusing. It is first defined in equation 2, with a different name, and then again in the next subsection from a different perspective on lines 512-518. Why build matrices of co-occurrences if these are, in practice, never used? It seems the authors exploit only the trace. A major revision, if I understood correctly, was the correction/normalisation of the ASR metric. This normalisation is not explained. The authors argue that they will write another paper about it, I do not think this is acceptable for the publication of the present manuscript. Furthermore, there is no information about the technical details of the implementation: which packages did the authors use?

      (3) Could the authors motivate why they do not directly focus on the MC permutation test? They motivate the use of permutations because the data contains extreme outliers and are non normal in most cases. Hence, it seems the Welch's ANOVA is not adapted. "To further validate our findings, we also conducted<br /> 148 a Monte Carlo permutation test, which supported the conclusions (see Methods)." Where is the comparison shown? I did not see any report of the results for the non-permuted version of the Welch's ANOVA.

      (4) What are the assumptions for the Phylogenetic Generalized Least Squares? Which evolution model was chosen and why? What is the impact of changing the model? Could the authors define more precisely (e.g. with equations) what is lambda? Is it estimated or fixed?

      (5) I think the authors could improve their account of recent literature on the topic. For instance, the paper https://doi.org/10.7554/eLife.93629.3, published in the same journal last year, should be discussed. It perfectly fits in the scope of the subsection "Evidence for the adaptive role of alternative splicing". Methods and findings reported in https://doi.org/10.1186/s13059-021-02441-9 and https://www.genome.org/cgi/doi/10.1101/gr.274696.120 directly concern the assessment of AS evolutionary conservation across long evolutionary times and/or across many species. These aspects are mentioned in the introduction on p.3. but without pointing to such works. Can we really qualify a work published in 2011 as "recent" (line 348-350)?

      The generated data and codes are available on Zenodo, which is a good point for reproducibility and knowledge sharing with the community.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Methodological biases in annotation and sequencing methods

      We acknowledge the reviewer’s concern regarding methodological heterogeneity in genome annotations, particularly regarding the use of CDS annotations derived from public databases. In response, we have properly addressed the potential sources of bias in estimating alternative splicing (AS) across such a broad taxonomic range.

      Given the methodological challenges encountered in this study, we have undertaken an in-depth analysis of the biases associated with genome annotations and their impact on large-scale estimates of alternative splicing. This effort has resulted in the development of a comprehensive framework for quantifying, modeling, and correcting such biases, which we believe will be of interest to the broader genomics community. We are currently preparing a separate manuscript dedicated to this methodological aspect, which we intend to submit for publication in the near future.

      To account for these biases, we performed a statistical evaluation of annotation quality by examining the relationship between ASR values and multiple features of the NCBI annotation pipeline, including both technical and biological variables. Specifically, we analyzed a set of metadata descriptors related to: (i) genome assembly quality (e.g., Contig N50, Scaffold N50, number of gaps, gap length, contig/scaffold count), (ii) the amount and diversity of experimental evidence used in annotation (e.g., number of RNA-Seq reads, number of tissues, number of experimental runs, number of proteins and transcripts, including those derived from Homo sapiens), and (iii) the nature of the annotated coding sequences (e.g., total number of CDSs, percentage of CDSs supported by experimental evidence, proportion of known CDSs, percentage of CDSs derived from ab initio predictions).

      This comprehensive analysis revealed that the strongest bias affecting ASR values is associated with the proportion of fully supported CDSs, which showed a strong positive correlation with observed splicing levels. In contrast, the percentage of CDSs relying on ab initio models showed a negative correlation, indicating that computational predictions tend to underestimate splicing complexity. Based on these findings, we implemented a polynomial normalization model using the percentage of fully supported CDSs as the main predictor of annotation bias. The resulting normalized metric, ASR<sup>∗</sup>, corrects for annotation-related variability while preserving biologically meaningful variation.

      We further verified the robustness of this correction by comparing the main results of our study using both the raw ASR and the normalized ASR<sup>*</sup> across all analyses. The qualitative and quantitative consistency of results obtained with both metrics demonstrates that our findings are not an artifact of methodological bias and validates the reliability of our approach.

      Conceptual and Statistical Framework

      Our aim was not to investigate specific regulatory mechanisms of alternative splicing, but rather to explore large-scale statistical patterns across the tree of life using a newly defined metric—the Alternative Splicing Ratio (ASR)—that enables genome-wide comparisons of splicing complexity across species. To clarify the conceptual framework, we have revised the manuscript to explicitly state our assumptions, objectives, and the scope of our conclusions. The ASR metric is now briefly introduced in the Results section, with a more detailed mathematical formulation included in the Methods section.

      From a methodological standpoint, we have expanded the manuscript to better support the comparative framework through additional statistical analyses. In particular, we now include:

      • Monte Carlo permutation tests to assess pairwise differences in splicing and genomic variables across taxonomic groups, which are robust to non-normality and heteroscedasticity in the data.

      • Welch’s ANOVA with Bonferroni correction, which accounts for unequal variances when comparing group means.

      • Phylogenetic Generalized Least Squares (PGLS) regression, which explicitly models phylogenetic non-independence between species and allows us to infer lineage-specific associations between genomic composition and alternative splicing.

      • Coefficient of variation analysis, used to evaluate the relative variability of splicing and genomic traits across groups in a scale-independent manner.

      • Variability ratio metrics, designed to compare the dispersion of splicing values relative to genomic features, thereby quantifying trends in regulatory plasticity versus structural constraints.

      All methods are thoroughly described in the revised Methods section, and their application is presented in the Results section.

      Functional vs. non-functional nature of AS events

      We have included a new discussion paragraph addressing the ongoing debate regarding the functionality of alternative splicing and a possible non-adaptive explanation for the patterns observed. While many previous studies suggest that a considerable fraction of AS events might represent splicing noise or non-functional isoforms, our intention is not to adopt this view uncritically. Instead, we cite recent literature to provide a more nuanced interpretation, recognizing both the potential adaptive value and the uncertainty surrounding the functional relevance of many AS events. Thus, rather than assuming that all observed alternative splicing events are adaptive or biologically meaningful, we now emphasize that many patterns may emerge from other processes, such as those associated to genomic constraints.

      Terminology and Result Interpretation

      The manuscript has been thoroughly revised to improve both the scientific language and the conceptual framing. We have removed inappropriate terminology such as “higher/lower organisms” and “highly evolved”. Also, we have reinterpreted the results. As part of this process, the manuscript has been substantially rewritten to focus on the most meaningful findings. Ultimately, we have retained only those results that specifically concern broad-scale patterns of alternative splicing across taxa, which are now presented with greater clarity and methodological rigor.

      Reviewer #2

      Gene Regulatory Complexity Beyond Splicing Mechanisms

      While alternative splicing represents a prominent mechanism of transcriptomic diversification, we agree with the reviewer that it constitutes only one component of the broader landscape of gene regulation. Structural and behavioral complexity in organisms arises from a combination of regulatory processes, and our study focuses specifically on alternative splicing as a measurable proxy within this multifactorial system. To clarify this point, we have added a paragraph in the Discussion section, where we explicitly contextualize alternative splicing within the wider regulatory architecture. In that paragraph, we discuss additional mechanisms that contribute to phenotypic complexity—such as transcriptional control, chromatin remodeling, epigenetic modifications, and RNA editing—citing key literature.

      Alternative Splicing Measure and Methodology

      While we agree that alternative splicing is not a definitive measure of organismal complexity, we argue that it remains a meaningful proxy for transcriptomic and regulatory diversification, especially when analyzed at large phylogenetic scale. In this version of the manuscript, our goal was not to equate alternative splicing with biological complexity, but rather to quantify its patterns across lineages and evaluate its relationship with genome structure. This point is now explicitly stated in both the Introduction and Discussion.

      We also recognize the limitations associated with the use of coding sequence (CDS) annotations from public databases such as NCBI RefSeq. To address this concern, we have conducted a detailed analysis of the potential biases introduced by heterogeneous annotation quality, sequencing depth, and computational prediction, as previously addressed in our response to Reviewer #1.

      In response to concerns about unsupported statements, we have completely rewritten the manuscript to ensure that all claims are now explicitly supported by data and grounded in up-to-date scientific literature. We have reformulated speculative statements, removed inappropriate generalizations, and improved the logical flow of the arguments throughout the text. In summary, we have strengthened both the conceptual framework and the methodological foundation of the study, while maintaining a cautious interpretation of the results.

      Trends of Alternative Splicing

      To address the reviewer’s concern, we have revised the interpretation of trends as used in our analysis. In this study, we define a trend not as a strict directional progression or a linear trajectory across all species, but rather as a broad statistical pattern observable in the relative distribution and variability of alternative splicing across major taxonomic groups. We do not claim that this pattern reflects a universal adaptive pathway. Instead, we interpret it as a signal of differences in regulatory strategies associated to the genome architecture. To avoid misinterpretation, we have rephrased several sentences in the manuscript and explicitly emphasized the variability within groups, and the lack of significant correlations in certain clades.

      Inconsistent statistics

      The discrepancies pointed out were due to differences between mean and median-based analyses. These have been clarified and consistently reported in the revised manuscript. Error bars, p-values, and a supplementary table summarizing all tests are now included. Furthremore, we have no removed any species from our dataset.

    1. eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures. The evidence presented is convincing, with rigorous data that characterizes the outcomes of the evolution experiments. However, the manuscript's primary weakness is in its presentation, as claims about the causal relationship between genotypes and phenotypes are based on correlational evidence. The manuscript needs to be revised to address these limitations, clarify the implications of the experimental design, and adjust the overall narrative to better reflect the nature of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt.

      Weaknesses:

      (1) The main limitation of the paper is that its findings on the function of specific genes are based on correlation, not cause-and-effect evidence. While the parallel evolution evidence is strong, the authors have not yet performed the definitive tests (i.e., reconstruction of ancestral genes) to ensure that the mutations identified in isolation are enough to account for the virulence or resistance changes observed. This makes the conclusions more like firm hypotheses, not confirmed facts.

      (2) In some instances, the claims in the text are not fully supported by the visual data from the figures or are reported with vagueness. For example, the display of phenotypic clusters in the PCA (Figure 6A) and the sweeping generalization about the effect of antibiotics on the mutation rates (Figure S5) can be more precise and nuanced. Such small deviations dilute the overall argument somewhat and must be corrected.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus, traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because<br /> (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and<br /> (2) because the site of infection is different in C. elegans and human hosts.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

    4. Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus was fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant strain, MRSA, and an isogenic susceptible strain, MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent, and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP, and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Weaknesses:

      There are a few clarifications that could be made to better understand and contextualize the results. Primarily, when comparing the number of mutations and selection across conditions in an evolution experiment, information about population sizes is important to be able to calculate the mutation supply and number of generations throughout the experiment. These calculations can be difficult in vivo, but since several steps in the methodology require plating and regrowth, those population sizes could be determined. There was also no mention of how the authors controlled the inoculation density of bacteria introduced to each host. This would need to be known to calculate the generation time within the host. These caveats should be addressed in the manuscript.

      Another concern is the number of generations the populations of S. aureus spent either with relaxed selection in rich media or under antibiotic pressure in between the host exposure periods. It is probable then that the majority of mutations were selected for in these intervening periods between host infection. Again, a more detailed understanding of population sizes would contribute to the understanding of which phase of the experiment contributed to the mutation profile observed.

    1. eLife Assessment

      This study reports on the development and characterization of chickens with genetic deficiencies in type I or type III interferon receptors, which is an important contribution to the field of avian immunology. The data reflecting the development of the new interferon-receptor-deficient chickens is compelling. However, the characterization of IFN biology and infection responses in these knockout chickens is somewhat incomplete and could be improved by addressing the noted weaknesses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an extensive body of work and an outstanding contribution to our understanding of the IFN type I and III system in chickens. The research started with the innovative approach of generating KO chickens that lack the receptor for IFNα/β (IFNAR1) or IFN-λ (IFNLR1). The successful deletion and functional loss of these receptors was clearly and comprehensively demonstrated in comparison to the WT. Moreover, the homozygous KO lines (IFNAR1-/- or IFNLR1-/- ) were found to have similar body weights, and normal egg production and fertility compared to their WT counterparts. These lines are a major contribution to the toolbox for the study of avian/chicken immunology.

      The significance of this contribution is further demonstrated by the use of these lines by the authors to gain insight into the roles of IFN type I and IFN-type III in chickens, by conducting in ovo and in vivo studies examining basic aspects of immune system development and function, as well as the responses to viral challenges conducted in ovo and in vivo.

      Based on solid, state-of the-art methods and convincing evidence from studies comparing various immune system related functions in the IFNAR1-/- or IFNLR1-/- lines to the WT, revealed that the deletion of IFNAR1 and/or IFNLR1 resulted in:<br /> (1) impaired IFN signaling and induction of anti-viral state;<br /> (2) modulation of immune cell profiles in the peripheral blood circulation and spleen;<br /> (3) modulation of the cecum microbiome;<br /> (4) reduced concentrations of IgM and IgY in the blood plasma before and following immunization with model antigen KLH, whereby also line differences in the time-course of the antibody production were observed;<br /> (5) decrease in MHCII+ macrophages and B cells in the spleen of IFNAR1 KO chickens, although the MHCII-expression per cell was not affected in this line; and<br /> (6) reduction in the response of αβ1 TCR+ T cells of IFNAR1 KO chickens as suggested by clonal repertoire analyses.

      These studies were then followed by examination of the role of type I and type III IFN in virus infection, using different avian influenza A virus strains as well as an avian gamma corona virus (IBV) in in ovo challenge experiments. These studies revealed: viral titers that reflect virus-species and strain-specific IFN responses; no differences in the secretion of IFN-α/β in both KO compared to the WT lines; a predominant role of type I IFN in inducing the interferon-stimulated gene (ISG) Mx; and that an excessive and unbalanced type I IFN response can harm host fitness (survival rate, length of survival) and contribute to immunopathology.

      Based on guidance from the in ovo studies, comprehensive in vivo studies were conducted on host-pathogen interactions in hens from the three lines (WT, IFNAR1 KO, or IFNLR1 KO). These studies revealed the early appearance of symptoms and poor survival of hens from the IFNR1 KO line challenged with H3N1 avian influenza A virus; efficient H#N1 virus replication in IFNAR1 KO hens, increased plasma concentrations of IFNα/β and mRNA expression of IFN-λ in spleens of the IFNAR1 KO hens; a pro-inflammatory role of IFN-λ in the oviduct of hens infected with H3N1 virus; increased proinflammatory cytokine expression in spleens of IFNAR1 KO hens, and Impairment of negative feedback mechanisms regulating IFN-α/β secretion in IFNAR1-KO hens and a significant decrease in this group's antiviral state; additionally it was demonstrated that IFN-α/β can compensate IFN-λ to induce an adequate antiviral state in the spleen during H3N1 infection, but IFN-λ cannot compensate for IFN-α/β signaling in the spleen.

      Strengths:

      (1) Both the methods and results from the comprehensive, well-designed, and well-executed experiments are considered excellent. The results are well and correctly described in the result narrative and well presented in both the manuscript and supplement Tables and Figures. Excellent discussion/interpretation of results.

      (2) The successful generation of the type I and type III IFN KO lines offers unprecedented insight and opens multiple new venues for exploring the IFN system in chickens. The new knowledge reported here is direct evidence of the high impact of this model system on effectively addressing a critical knowledge gap in avian immunology.

      (3) The thoughtful selection of highly relevant viruses to poultry and human health for the in ovo and in vivo challenge studies to examine and assess host-pathogen interactions in the IFNR KO and WT lines.

      (4) Making use of the unique opportunities in the chicken model to examine and evaluate the host's IFN system responses to various viral challenges in ovo, before conducting challenge studies in hens.

      (5) The new knowledge gained from the IFNAR1 and IFNLR1 KO lines will find much-needed application in developing more effective strategies to prevent health challenges like avian influenza and its devastating effects on poultry, humans, and other mammals.

      (6) The excellent cooperation and contributions of the co-authors and institutions.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      This study attempts to dissect the contributions of type I and type III IFNs to the antiviral response in chickens. The first part of the study characterises the generation of IFNAR and IFNLR KO chicken strains and describes basic differences. Four different viruses are then tested in chicken embryos, while the subsequent analysis of the antiviral response in vivo is performed with one influenza H3N1 strain.

      Strengths:

      Having these two KO chicken strains as a tool is a great achievement. The initial analysis is solid. Clear effect of IFNAR deficiency in in vivo infection, less so for IFNLR deficiency.

      Weaknesses:

      (1) The antibody induction by KLH immunisation: No data indicated whether or not this vaccination induces IFN responses in wt mice, so the effects observed may be due to steady-state differences or to differential effects of IFN induced during the vaccination phase. No pre-immune results are shown. The differences are relatively small and often found at only one plasma dilution - the whole of Figure 4 could be condensed into one or two panels by proper calculation of Ab titers - would these titres be significantly different? This, as all of the other in vivo experiments, has not been repeated, if I understand the methods section correctly.

      (2) The basic conundrum here and in later figures is never addressed by the authors: Situations where IFN type 1 and 3 signalling deficiency each have an independent effect (i.e., Figure 4d) suggest that they act by separate, unrelated mechanisms. However, all the literature about these IFN families suggests that they show almost identical signalling and gene induction downstream of their respective receptors. How can the same signalling, clearly active here downstream of the receptors for IFN type 1 or type 3, be non-redundant, i.e., why does the unaffected IFN family not stand in? This is a major difference from the mouse studies, which showed a rather subtle phenotype when only one of the two IFN systems was missing, but a massive reduction in virus control in double KO mice (the correct primary paper should be quoted here, not only the review by McNab). Reasons could be a direct effect of IFNab on B cells and an indirect effect of IFNL through non-B cells, timing issues, and many other scenarios can be envisaged. The authors do not address this question, which limits the depth of analysis.

      (3) In the one in vivo experiment performed with chickens, only one virus was tested; more influenza strains should be included, as well as non-influenza viruses.

      (4) The basic conundrum of point 2 applies equally to Figure 6a; both KOs have a phenotype. Again in 6d, both IFNs appear to be separately required for Mx induction. An explanation is needed.

      (5) Line 308, where are the viral titers you refer to in the text? The statement that the results demonstrate that excessive IFNab has a negative impact is overstretched, as no IFN measurements of the infected embryos are shown here.

      (6) The in vivo infection is the most interesting experiment, and the key outcome here is that IFN type 1 is crucial for anti-H3N1 protection in chickens, while type 3 is less impactful. However, this experiment suffers from the different time points when chickens were culled, so many parameters are impossible to compare (e.g., weight loss, histopathology, IFN measurements, and more). Many of these phenomena are highly dynamic in acute virus infections, so disparate time points do not allow a meaningful comparison between different genotypes. What are the stats in 7b? Is the median rather than the mean indicated by the line? Otherwise, the lines appear in surprising places. SD must be shown, and I find it difficult to believe that there is a significant difference in weight, for e.g., IFNAR KO, unless maybe with a paired t test. What is the statistical test?

      (7) Figures 7e,f: these comparisons are very difficult to interpret as the virus loads at these time points already differ significantly, so any difference could be secondary to virus load differences.

    1. eLife Assessment

      Non-essential amino acids such as glutamine have been known to be required for T cell general activation through sustaining basic biosynthetic processes, including nucleotide biosynthesis, ATP generation, and protein synthesis. In this important study, the authors found that extracellular asparagine (Asn) is required not only for T cells to generally refuel metabolic reprogramming, but to produce helper T cell lineage-specific cytokine, for instance, IL17. In particular, the importance of Asn in IL17 production was convincingly demonstrated in the mouse experimental autoimmune encephalomyelitei (EAE) model, mimicking human multiple sclerosis disease.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors reveal that the availability of extracellular asparagine (Asn) represents a metabolic vulnerability for the activation and differentiation of naive CD4+ T cells. To deplete extracellular Asn, they employed two orthogonal approaches: activating naive CD4+ T cells in either PEGylated asparaginase (PEG-AsnASE)-treated medium or custom-formulated RPMI medium specifically lacking Asn. Importantly, they demonstrate that Asn depletion not only impaired metabolic reprogramming associated with CD4+ T cell activation but also reduced CD4+ helper T cell lineage-specific cytokine production, thereby ameliorating the severity of experimental autoimmune encephalomyelitis.

      Strengths:

      The experiments presented here are comprehensive and well-designed, providing compelling evidence for the conclusions. The conclusions will be important to the field.

      Weaknesses:

      (1) EAE is the prototypic T cell-mediated autoimmune disease model, and both Th1 and Th17 cells are implicated in its pathogenesis. In contrast, Th2 and Treg cells and their associated cytokines (such as IL-4 and IL-10) have been shown to play a role in the resolution of EAE, and potentially in the modulation of disease progression. Thus, it will be important to determine whether Asn depletion affects the differentiation of naive CD4+ T cells into corresponding subsets under Th2 and Treg polarization conditions, as well as the expression of lineage-specific transcription factors and cytokine production.

      (2) EAE is characterized by inflammation and demyelination in the central nervous system (CNS), leading to neurological deficits. Myelin destruction is directly correlated with the severity of the disease. For Figure 6, did the authors perform spinal cord histological analysis by hematoxylin and eosin (H&E) or Luxol fast blue (LFB) staining? This is important to rigorously examine pathological EAE symptoms.

    3. Reviewer #2 (Public review):

      While the importance of asparagine in the differentiation and activation of CD8 T cells has been previously reported, its role in CD4 T cells remained unclear. Using culture media containing specific amino acids, the authors demonstrated that extracellular asparagine promotes CD4 T cell proliferation. Consistent with this, depletion of extracellular asparagine using PEG-AsnASE suppressed CD4 T cell activation. Proteomic analysis focusing on asparagine content revealed that, during the early phase of T cell activation, most asparagine incorporated into proteins is derived from extracellular sources. The authors further confirmed the importance of extracellular asparagine in vivo, demonstrating improved EAE pathology.

      While the data are well organized and convincing, the mechanism by which asparagine deficiency leads to altered T cell differentiation remains unclear. It is also necessary to investigate the transporters involved in asparagine uptake. In particular, elucidating whether different T cell subsets utilize the same or distinct transport mechanisms would provide important insight into the immunoregulatory role of asparagine.

      (1) The finding that asparagine supplementation promotes T cell proliferation under various amino acid conditions is highly significant. However, the concentration at which this effect occurs remains unclear. A titration analysis would be necessary to determine the dose-dependency of asparagine.

      (2) The effects of asparagine deficiency occur during the early phase of T cell activation. Thus, it is likely that the transporters responsible for asparagine uptake are either rapidly induced upon activation or already expressed in the resting state. Since this is central to the focus of the manuscript, it is interesting to identify the transporter responsible for asparagine uptake during early T cell activation. A recent paper (DOI: 10.1126/sciadv.ads350) reported that macrophages utilize Slc6a14 to use extracellular asparagine. Is this also true for CD4+ T cells?

      (3) Given that depletion of extracellular asparagine impairs differentiation of Th1 and Th17 cells, it is possible that TCR signaling is compromised under these conditions. This point should be investigated by targeting downstream signaling molecules such as Lck, ZAP70, or mTOR. Also, does it affect the protein stability of master transcription factors such as T-bet and RORgt?

      (4) Is extracellular asparagine also important for the differentiation of helper T cell subsets other than Th1 and Th17, such as Th2, Th9, and iTreg?

      (5) Asparagine taken up from outside the cell has been shown to be used for de novo protein synthesis (Figure 3E), but are there any proteins that are particularly susceptible to asparagine deficiency? This can be verified by performing proteome analysis, and the effects on Th1/17 subset differentiation mentioned above should also be examined.

      (6) While the importance of extracellular asparagine is emphasized, Asns expression is markedly induced during early T cell activation. Nevertheless, the majority of asparagine incorporated into proteins appears to be derived from extracellular sources. Does genetic deletion of Asns have any impact on early CD4+ T cell activation? The authors indicated that newly synthesized Asns have little impact on CD8+ T cells in the Discussion section, but is this also true for CD4+ T cells? This could be verified through experiments using CRISPR-mediated Asns gene targeting or pharmacological inhibition.

    1. eLife Assessment

      This study illustrates a valuable application of BID-seq to bacterial RNA, allowing transcriptome-wide mapping of pseudouridine modifications across various bacterial species. The evidence presented includes a mix of solid and incomplete data and analyses, and would benefit from more rigorous approaches. The work will interest a specialized audience involved in RNA biology.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Xu et al. reported base-resolution mapping of RNA pseudouridylation in five bacterial species, utilizing recently developed BID-seq. They detected pseudouridine (Ψ) in bacterial rRNA, tRNA, and mRNA, and found growth phase-dependent Ψ changes in tRNA and mRNA. They then focused on mRNA and conducted a comparative analysis of Ψ profiles across different bacterial species. Finally, they developed a deep learning model to predict Ψ sites based on RNA sequence and structure.

      Strengths:

      This is the first comprehensive Ψ map across multiple bacterial species, and systematically reveals Ψ profiles in rRNA, tRNA, and mRNA under exponential and stationary growth conditions. It provides a valuable resource for future functional studies of Ψ in bacteria.

      Weaknesses:

      Ψ is highly abundant on non-coding RNA such as rRNA and tRNA, while its level on mRNA is very low. The manuscript focuses primarily on mRNA, which raises questions about the data quality and the rigor of the analysis. Many conclusions in the manuscript are speculative, based solely on the sequencing data but not supported by additional experiments.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Xu et al. present a transcriptome-wide, single-base resolution map of RNA pseudouridine modifications across evolutionarily diverse bacterial species using an adapted form of BID-Seq. By optimizing the method for bacterial RNA, the authors successfully mapped modifications in rRNA, tRNA, and, importantly, mRNA across both exponential and stationary growth phases. They uncover evolutionarily conserved Ψ motifs, dynamic Ψ regulation tied to bacterial growth state, and propose functional links between pseudouridylation and bacterial transcript stability, translation, and RNA-protein interactions. To extend these findings, they develop a deep learning model that predicts pseudouridine sites from local sequence and structural features.

      Strengths:

      The authors provide a valuable resource: a comprehensive Ψ atlas for bacterial systems, spanning hundreds of mRNAs and multiple species. The work addresses a gap in the field - our limited understanding of bacterial epitranscriptomics, by establishing both the method and datasets for exploring post-transcriptional modifications.

      Weaknesses:

      The main limitation of the study is that most functional claims (i.e., translation efficiency, mRNA stability, and RNA-binding protein interactions) are based on correlative evidence. While suggestive, these inferences would be significantly strengthened by targeted perturbation of specific Ψ synthases or direct biochemical validation of proposed RNA-protein interactions (e.g., with Hfq). Additionally, the GNN prediction model is a notable advance, but methodological details are insufficient to reproduce or assess its robustness.

    4. Reviewer #3 (Public review):

      Summary:

      This study aimed to investigate pseudouridylation across various RNA species in multiple bacterial strains using an optimized BID-seq approach. It examined both conserved and divergent modification patterns, the potential functional roles of pseudouridylation, and its dynamic regulation across different growth conditions.

      Strengths:

      The authors optimized the BID-seq method and applied this important technique to bacterial systems, identifying multiple pseudouridylation sites across different species. They investigated the distribution of these modifications, associated sequence motifs, their dynamics across growth phases, and potential functional roles. These data are of great interest to researchers focused on understanding the significance of RNA modifications, particularly mRNA modifications, in bacteria.

      Weaknesses:

      (1) The reliability of BID-seq data is questionable due to a lack of experimental validations.

      (2) The manuscript is not well-written, and the presented work shows a major lack of scientific rigor, as several key pieces of information are missing.

      (3) The manuscript's organization requires significant improvement, and numerous instances of missing or inconsistent information make it difficult to understand the key objectives and conclusions of the study.

      (4) The rationale for selecting specific bacterial species is not clearly explained, and the manuscript lacks a systematic comparison of pseudouridylation among these species.

    1. eLife Assessment

      This study presents valuable data suggesting that ATP-induced modulation of alveolar macrophage (AM) functions is associated with NLRP3 inflammasome activation and enhanced phagocytic capacity. While the in vivo and in vitro data reveal an interesting phenotype, the evidence provided is incomplete and does not fully support the paper's conclusions. Additional investigations would be of value in complementing the data and strengthening the interpretation of the results. This study should be of interest to immunologists and the mucosal immunity community.

    2. Reviewer #1 (Public review):

      Summary:

      Alveolar macrophages (AMs) are key sentinel cells in the lungs, representing the first line of defense against infections. There is growing interest within the scientific community in the metabolic and epigenetic reprogramming of innate immune cells following an initial stress, which alters their response upon exposure to a heterologous challenge. In this study, the authors show that exposure to extracellular ATP can shape AM functions by activating the P2X7 receptor. This activation triggers the relocation of the potassium channel TWIK2 to the cell surface, placing macrophages in a heightened state of responsiveness. This leads to the activation of the NLRP3 inflammasome and, upon bacterial internalization, to the translocation of TWIK2 to the phagosomal membrane, enhancing bacterial killing through pH modulation. Through these findings, the authors propose a mechanism by which ATP acts as a danger signal to boost the antimicrobial capacity of AMs.

      Strengths:

      This is a fundamental study in a field of great interest to the scientific community. A growing body of evidence has highlighted the importance of metabolic and epigenetic reprogramming in innate immune cells, which can have long-term effects on their responses to various inflammatory contexts. Exploring the role of ATP in this process represents an important and timely question in basic research. The study combines both in vitro and in vivo investigations and proposes a mechanistic hypothesis to explain the observed phenotype.

      Weaknesses:

      First, the concept of training or trained immunity refers to long-term epigenetic reprogramming in innate immune cells, resulting in a modified response upon exposure to a heterologous challenge. The investigations presented demonstrate phenotypic alterations in AMs seven days after ATP exposure; however, they do not assess whether persistent epigenetic remodeling occurs with lasting functional consequences. Therefore, a more cautious and semantically precise interpretation of the findings would be appropriate.

      Furthermore, the in vivo data should be strengthened by additional analyses to support the authors' conclusions. The authors claim that susceptibility to Pseudomonas aeruginosa infection differs depending on the ATP-induced training effect. Statistical analyses should be provided for the survival curves, as well as additional weight curves or clinical assessments. Moreover, it would be appropriate to complement this clinical characterization with additional measurements, such as immune cell infiltration analysis (by flow cytometry), and quantification of pro-inflammatory cytokines in bronchoalveolar lavage fluid and/or lung homogenates.

      Moreover, the authors attribute the differences in resistance to P. aeruginosa infection to the ATP-induced training effect on AMs, based on a correlation between in vivo survival curves and differences in bacterial killing capacity measured in vitro. These are correlative findings that do not establish a causal role for AMs in the in vivo phenotype. ATP-mediated effects on other (i.e., non-AM) cell populations are omitted, and the possibility that other cells could be affected should be, at least, discussed. Adoptive transfer experiments using AMs would be a suitable approach to directly address this question.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Thompson et al. investigate the impact of prior ATP exposure on later macrophage functions as a mechanism of immune training. They describe that ATP training enhances bactericidal functions, which they connect to the P2x7 ATP receptor, Nlrp3 inflammasome activation, and TWIK2 K+ movement at the cell surface and subsequently at phagosomes during bacterial engulfment. With stronger methodology, these findings could provide useful insight into how ATP can modulate macrophage immune responses, though they are generally an incremental addition to existing literature. The evidence supporting their conclusions is currently inadequate. Gaps in explaining methodology are substantial enough to undermine trust in much of the data presented. Some assays may not be designed rigorously enough for interpretation.

      Strengths:

      The authors demonstrate two novel findings that have sufficient rigor to assess:

      (1) prolonged persistence of TWIK2 at the macrophage plasma membrane following ATP, and can translocate to the phagosome during particle engulfment, which builds upon their prior report of ATP-driven 'training' of macrophages.

      (2) administering mice intra-nasal ATP to 'train' lungs to protect mice from otherwise fatal bacterial infection.

      Weaknesses:

      (1) Missing details from methods/reported data: Substantial sections of key methods have not been disclosed (including anything about animal infection models, RNA-sequencing, and western blotting), and the statistical methods, as written, only address two-way comparisons, which would mean analysis was improperly performed. In addition, there is a general lack of transparency - the methods state that only representative data is included in the manuscript, and individual data points are not shown for assays.

      (2) Poor experimental design including missing controls: Particularly problematic are the Seahorse assay data (requires normalization to cell numbers to interpret this bulk assay - differences in cell growth/loss between conditions would confound data interpretation) and bacterial killing assays (as written, this method would be heavily biased by bacterial initial binding/phagocytosis which would confound assessment of killing). Controls need to be included for subcellular fractionating to confirm pure fractions and for dye microscopy to show a negative background. Conclusions from these assays may be incorrect, and in some cases, the whole experiment may be uninterpretable.

      (3) The conclusions overstate what was tested in the experiments: Conceptually, there are multiple places where the authors draw conclusions or frame arguments in ways that do not match the experiments used. Particularly:<br /> a) The authors discuss their findings in the context of importance for AM biology during respiratory infection but in vitro work uses cells that are well-established to be poor mimics of resident AMs (BMDM, RAW), particularly in terms of glycolytic metabolism.<br /> b) In vivo work does not address whether immune cell recruitment is triggered during training.<br /> c) Figure 3 is used to draw conclusions about K+ in response to bacterial engulfment, but actually assesses fungal zymosan particles.<br /> d) Figure 5 is framed in bacterial susceptibility post-viral infection, but the model used is bacterial post-bacterial.<br /> e) In their discussion, the authors propose to have shown TWIK2-mediated inflammasome activation. They link these separately to ATP, but their studies do not test if loss of TWIK2 prevents inflammasome activation in response to ATP (Figure 4E does not use TWIK2 KO).

      In summary, this work contains some useful data showing how ATP can 'train' macrophages. However, it largely lacks the expected level of rigor. For this work to be valuable to the field, it is likely to need substantial improvement in methods reporting, inclusion of missing assay controls, may require repeating key experiments that were run with insufficient methodology (or providing details and supplemental data to prove that methodology was sufficient), and should either add additional experiments that properly test their experimental question or rewrite their conclusions.

    1. eLife Assessment

      This convincing study, which is based on a survey of researchers, finds that women are less likely than men to submit articles to elite journals. It also finds that there is no relation between gender and reported desk rejection. The study is an important contribution to work on gender bias in the scientific literature.

    2. Joint Public Review:

      Summary from an earlier round of review:

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):- Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      - Women are more likely to cite the demands of co-authors as a reason why they didn’t submit to highly influential journals.

      - Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists’ careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Editor’s note: This is the third version of this article.

      Comments made during the peer review of the second version, along with author’s responses to these comments, are available below. Revisions made in response to these comments include changing the colour scheme used for the figures to make the figures more accessible for readers with certain forms of colour blindness.

      Comments made during the peer review of the first version, along with author’s responses to these comments, are available with previous versions of the article.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      Thank you for this thoughtful question. We conditioned on academic rank in all regression analyses to account for structural differences in career stage that may potentially influence submission behaviors. Academic rank (e.g., assistant, associate, full professor) is a key determinant of publishing capacity and strategic considerations, such as perceived likelihood of success at elite journals, tolerance for risk, and institutional expectations for publication venues.

      Importantly, academic rank is also correlated with gender due to cumulative career disadvantages that contribute to underrepresentation of women at more senior levels. Failing to adjust for rank would conflate gender effects with differences attributable to career stage. By including rank as a covariate, we aim to isolate gender-associated patterns in submission behavior within comparable career stages, thereby producing a more precise estimate of the gender effect.

      Regarding explanatory power, academic rank does indeed contribute significantly to model fit across our analyses, indicating that it captures meaningful variation in submission behavior. However, even after adjusting for rank, we continue to observe significant gender differences in submission patterns in several disciplines. This suggests that while academic rank explains part of the variation, it does not fully account for the gender gap—highlighting the importance of examining other structural and behavioral factors that shape the publication trajectory.

      Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

      Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Colour schemes of the Figures are not adjusted for colour-blindness (red-green is a big NO), some suggestions can be found here https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf

      We appreciate the suggestion. We’ve adjusted the colors in the manuscript to be color-blind friendly using one of the colorblind safe palettes suggested by the reviewer.

      (2) I do not think that the authors have fully addressed the comment about APCs and the decision to submit, given that PNAS has publication charges that amount to double of someone's monthly salary. I would add a sentence or two to explain that publication charges should not be a factor for Nature and Science, but might be for PNAS.

      While APCs are definitely a factor affecting researchers’ submission behavior, it is mostly does so for lower prestige journals rather than for the three elite journals analyzed here. As mentioned in the previous round of revisions, Nature and Science have subscription options. And PNAS authors without funding have access to waivers: https://www.pnas.org/author-center/publication-charges

      (3) Line 268, the first suggestion here is not something that would likely work. Thus, I would not put it as the first suggestion.

      We made the suggested change.

      (4) Data availability - remove AND in 'Aggregated and de-identified data' because it sounds like both are shared. Suggest writing: 'Aggregated, de-identified data..'. I still suggest sharing data/code in a trusted repository (e.g. Dryad, ZENODO...) rather than on GitHub, as per the current recommendation on the best practices for data sharing.

      Thank you for your comment regarding data availability. Due to IRB restrictions and the conditions of our ethics approval, we are not permitted to share the survey data used in this study. However, to support transparency and reproducibility, we have made all analysis code available on Zenodo at https://doi.org/10.5281/zenodo.16327580. In addition, we have included a synthetic dataset with the same structure as the original survey data but containing randomly generated values. This allows others to understand the data structure and replicate our analysis pipeline without compromising participant confidentiality.

    1. eLife Assessment

      This valuable study introduces a modern and accessible PyTorch reimplementation of the widely used SpliceAI model for splice site prediction. The authors provide convincing evidence that their OpenSpliceAI implementation matches the performance of the original while improving usability and enabling flexible retraining across species. These advances are likely to be of broad interest to the computational genomics community.

    2. Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. Their comparisons to the original SpliceAI models are convincing on the grounds of model performance and their evaluation of how well the new models match the original's understanding of non-local mutation effects. However, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of the limitations of calibration.

      Strengths

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance and mutation effect estimation capabilities of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple well well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      Weaknesses

      (1) Their discussion of their package's calibration functionality does not adequately acknowledge the limitations of model calibration. This is problematic as this is a package intended for general use and users who are not experienced in modeling broadly and the subfield of model calibration specifically may not already understand these limitations. This could lead to serious errors and misunderstandings down the road. A model is not calibrated or uncalibrated in and of itself, only with respect to a specific dataset. In this case they calibrated with respect to the training dataset, a set of canonical transcript annotations. This is a perfectly valid and reasonable dataset to calibrate against. However, this is unlikely to be the dataset the model is applied to in any downstream use case, and this calibration is not guaranteed or expected to hold for any shift in the dataset distribution. For example, in the next section they use ISM based approaches to evaluate which sequence elements the model is sensitive to and their calibration would not be expected to hold for this set of predictions. This issue is particularly worrying in the case of their model because annotation of canonical transcript splice sites is a task that it is unlikely their model will be applied to after training. Much more likely tasks will be things such as predicting the effects of mutations, identification of splice sites that may be used across isoforms beyond just the canonical one, identification of regulatory sequences through ISM, or evaluation of human created sequences for design or evaluation purposes (such as in the context of an MPSA or designing a gene to splice a particular way), we would not expect their calibration to hold in any of these contexts. To resolve this issue, the authors should clarify and discuss this limitation in their paper (and in the relevant sections of the package documentation) to avoid confusing downstream users.

      (2) The clarity of their analysis of mutation effects could be improved with some minor adjustments. While they report median ISM importance correlation it would be helpful to see a histogram of the correlations they observed. Instead of displaying (and calculating correlations using) importance scores of only the reference sequence, showing the importance scores for each nucleotide at each position provides a more informative representation. This would also likely make the plots in 6B clearer.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplantation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species pre-training on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is not even any comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      Update for the revised version:

      The update includes mostly clarifications for tech questions/comments raised by the other two reviewers. There is no additional analysis/results that changes our above initial assessment of this paper's contribution.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub>, as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub>  and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.”

      Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. eLife Assessment

      This fundamental work advances our understanding of how SP5 and SP8 promote neuromesodermal competent progenitors in murine embryos. Generally the evidence is compelling, with strong developmental genetics, transcriptomic, and genomic transcription binding surveys contributing to the strength of the data. Some of the language could be softened to avoid overinterpretation of the data, and figures and diagrams could be improved.

    2. Reviewer #1 (Public review):

      This is an important, interesting, and in-depth study examining the role of Sp5/8 transcription factors in maintaining the neuromesodermal progenitor (NMP) niche. The authors first used Sp5/8 double conditional KO mouse embryos to establish that these factors function in the NMP niche to promote trunk elongation. They then conducted extensive single-cell analyses on embryos of various genetic mutant backgrounds to unravel the complex and intricate interactions between Wnt signaling and Sp5/8. The key conclusion from these experiments is that Sp5/8 function within an autoregulatory loop crucial for maintaining the NMP niche. The authors went on to identify and characterize a novel enhancer element downstream of the Wnt3a coding sequence, which mediates the effects of Sp5/8 on Wnt3a expression. Overall, the data presented are compelling and of high quality, and the study offers a prime example of how a relatively small set of signaling pathways and transcription factors can function in concert to impart robustness to developmental processes.

    3. Reviewer #2 (Public review):

      Chalamalasetty et al. investigate the regulatory circuit of signaling molecules and transcription factors that drive the fate of neuromesodermal competent progenitors (NMCs). NMCs contribute to Sox2-positive spinal cord and Tbxt/Bra-expressing somitic mesoderm, and this choice is governed by the interplay between Wnt3a and Fgf signaling. The authors discovered that the transcription factors SP5 and SP8 participate in this process. Mouse genetics, in vivo development, and transcription factors profiling point to a model where SP5 and SP8 directly regulate Wnt3a expression to foster Tbxt-marked mesoderm formation at the expense of Sox2-marked neural ectoderm. Mechanistically, SP5/8 bind to an enhancer which the authors characterize: its activity depends on the presence of SP5, CDX2, TCF7, and TBXT binding sites, and it is activated only in primitive streak cells at E7.5, in NMP, and in caudal and somitic mesoderm, underscoring the tissue and stage-specific nature of this Wnt3a enhancer.

      Moreover, the authors find that SP5/8 likely regulate the TCF7 association with the chromatin and compete for its binding to the TLE repressor.

      The study is extensive, compelling, and well written. The combination of in vivo evidence with single-cell transcriptomics, transcription factors profiling, and in vitro regulatory element characterization is notable and builds a convincing picture of the action of SP5/SP8.

      Here, I provide a series of comments and questions that, if addressed and clarified, could, in my opinion, improve the study.

      (1) While Sp5 and Sp8 are both present in NMCs, their expression does not fully overlap. Sp5 is also detected in caudal and presomitic mesoderm, notochord and gut, while Sp8 overlaps with Sox2 in neural progenitors of the spinal cord and brain (Fig. 1D). Accordingly, Sp8 expression is also activated by the neural-promoting RA+Fgf. It is not easy for me to reconcile this non-fully overlapping expression pattern - and in particular the overlap of Sp8 and Sox2 - with the presumed redundancy (or similarity of function) described later. Sp5/8 dko NMCs show reduced Tbxt and expanded Sox2, indicating that SP8 also represses Sox2 or neural fate, an observation confirmed by Sp8 overexpression (Figure 4c). What is the explanation for this, and is the function of SP8 in Sox2-positive neural progenitors different from its Wnt3a-sustaining role in NMCs? Or what am I missing?

      (2) I suggest that the authors show relevant ChIP-seq peaks in Figure 3 to lend credibility to the complicated overlapping Venn diagrams. I consider visual inspection of peak tracks as primary quality control of this type of experiment. A good choice could be the cis-regulatory elements at Sp5, Sp8, Tbxt, Cdx1, 2, 4 bound by TBXT and either CDX2, SP5, or SP8 (now referring to the Venn diagrams and the annotated peak table). On ChIP-seq visualization, in reference to Figures 5 and 7, I also suggest that the authors show the tracks of a negative control (IgG, non-related antibody, or better anti-flag in Sp5/8 dko). While I do not doubt the validity of these experiments, there are peaks in these figures bound by all factors tested that could be suspicious (even though, admittedly, they look like genuinely good TF peaks). A negative track would clearly show beyond any doubt that these are not suspect regions of positive unspecific signal caused by open chromatin, excessive cross-linking, or antibody cross-reaction.

      (3) SP5 here is found as a direct inducer of Wnt3a expression, and accordingly positive regulator of Tbxt and mesoderm, caudal development. I find this in partial contradiction with a finding by the Willert group (PMID: 29044119). They show that "genes with an associated SP5 peak, such as SP5 itself, AXIN2, AMOTL2, GPR37, GSC, MIXL1, NODAL, and T, show significant upregulation in expression upon Wnt3a treatment in SP5 mutant cells". There, essentially, SP5 inhibits Wnt target genes. While the authors are aware of this and cite Huggins et al., I find that this deserves a better discussion addressing how opposite functions could be sustained in different contexts, if these really are different cellular contexts in the first place, or if this could result from different methodologies.

      (4) The gastruloid experiment is nice, but I wonder whether there is any marker that the authors can use to show that other features of the gastruloids respond accordingly. For example, is the Sox2 expression domain expanded? And is there any unaffected marker to emphasize the specificity of the decreased Tbxt and Cdx2?

      (5) SP5/8 seems to enhance the TCF7 occupancy at WRE. And then, SP5/8 appears to counteract the presence of TLE repressor associated with TCF7. While these two mechanisms are interesting, they are not necessarily interconnected. According to the still-established view, TCF7 should be associated with WRE even in the absence of the Wnt signal, when TLEs are also present on the locus. One could expect that SP5 competes with TLE, to decrease its presence on TCF7-bound loci, leaving the abundance of TCF7 binding unchanged. Yet, the authors also observe that the TCF7 association changes. What is the mechanism implied? Do they perhaps consider a TCF7L1 > TCF7 switch, and if so, what evidence exists for this?

      (6) Along the same line as above, I wonder whether beta-catenin binding is also enhanced at these sites? Any TCF/LEF would require beta-catenin for gene upregulation.

      (7) The authors write that "Small Tle peaks were identified at these WREs in WT cells, demonstrating that both repressive Tle and activating Tcf7 could be detected at active genes". However, ChIP-seq is a population assay, and it is possible - more plausible, in fact - that cells displaying TLE binding are not expressing the target genes.

    4. Reviewer #3 (Public review):

      Summary:

      This is a well-done study. It shows, in a comprehensive manner, that Sp5 and Sp8 play essential roles in maintaining the complicated positive feedback circuitry needed for specification of neuromesodermal competent progenitors (NMCs) in caudal mesodermal development in murine embryos.

      Strengths:

      The developmental genetics, transcriptomic, and genomic survey of TF binding are all satisfactory and make a compelling story. The CRISPR deletion of the Wnt3a downstream enhancer clearly demonstrates that it plays an important role in the positive feedback circuit.

      Weaknesses:

      My only concerns are some of the language surrounding the mechanistic interpretation of the Wnt3a downstream enhancer and the relationship between TCF and TLE binding.

    1. eLife Assessment

      This work presents important information on rhythmicity of overlapping target and distractor processing and how this affects behaviour. The methods are, in general, clearly laid out and defensible, with several supplementary analyses leading to a solid base of evidence for their claims.

    2. Reviewer #1 (Public review):

      Summary:

      Using a combination of EEG and behavioural measurements, the authors investigate the degree to which processing of spatially-overlapping targets (coherent motion) and distractors (affective images) are sampled rhythmically and how this affects behaviour. They found that both target processing (via measurement of amplitude modulations of SSVEP amplitude to target frequency) and distractor processing (via MVPA decoding accuracy of bandpassed EEG relative to distractor SSVEP frequency) displayed a pronounced rhythm at ~1Hz, time-locked to stimulus onset. Furthermore, the relative phase of this target/distractor sampling predicted accuracy of coherent motion detection across participants.

      Strengths:

      - The authors are addressing a very interesting question with respect to sampling of targets and distractors, using neurophysiological measurements to their advantage in order to parse out target and distractor processing.<br /> - The general EEG analysis pipeline is sensible and well-described.<br /> - The main result of rhythmic sampling of targets and distractors is striking and very clear even on a participant-level.<br /> - The authors have gone to quite a lot of effort to ensure the validity of their analyses, especially in the Supplementary Material.<br /> - It is incredibly striking how the phase of both target and distractor processing are so aligned across trials for a given participant. I would have thought that any endogenous fluctuation in attention or stimulus processing like that would not be so phase aligned. I know there is literature on phase resetting in this context, the results seem very strong here and it is worth noting. The authors have performed many analyses to rule out signal processing artifacts, e.g. the sideband and beating frequency analyses.

      Weaknesses:

      - In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is going to most likely be related to the contrast of the dots, as opposed to representing coherent motion energy which is the actual target. These may well be linked (e.g. greater attention to the coherent motion task might increase SSVEP amplitude) but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted. Overall, this limitation remains and has been noted in the Limitations section.<br /> - Then comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and reflect probably different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)? Again, this has been noted in the Limitations section, but changing the data to z-scores doesn't really take care of the conceptual issue, i.e. that on-screen contrast changes would necessarily be distracting during emotional category decision-making.

    3. Reviewer #2 (Public review):

      In this study, Xiong et al. investigate whether rhythmic sampling - a process typically observed in the attended processing of visual stimuli - extends to task-irrelevant distractors. By using EEG with frequency tagging and multivariate pattern analysis (MVPA), they aimed to characterize the temporal dynamics of both target and distractor processing and examine whether these processes oscillate in time. The central hypothesis is that target and distractor processing occur rhythmically, and the phase relationship between these rhythms correlates with behavioral performance.

      Major Strengths<br /> (1) The extension of rhythmic attentional sampling to include distractors is a novel and interesting question.<br /> (2) The decoding of emotional distractor content using MVPA from SSVEP signals is an elegant solution to the problem of assessing distractor engagement in the absence of direct behavioral measures.<br /> (3) The finding that relative phase (between 1 Hz target and distractor processes) predicts behavioral performance is compelling.

      Major Weaknesses and Limitations<br /> (1) The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.<br /> (2) The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.<br /> (3) The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.<br /> (4) Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which-while interesting-may benefit from additional converging evidence.<br /> (5) Phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Additional Considerations<br /> • The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.<br /> • In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      Comments on revisions:

      The authors have addressed my previous points, and the manuscript is substantially improved. The key methodological clarifications have been incorporated, and the interpretation of findings has been appropriately moderated. I have no further major concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. The SSVEP amplitude of the target at the whole trial level indeed reflected the combined effect of the stimulus parameters (e.g., contrast of the moving dots) as well as attention. However, the time course of the target SSVEP amplitude within a trial, derived from the moving window analysis, reflected the temporal fluctuations of target processing, since the stimulus parameters remained the same during the trial. We now make this clearer in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that of the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduced the use of decoding accuracy as an index of distractor processing. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude which is influenced by the target SSVEP amplitude. Also, to address the apples-vs-oranges issue, the correlation was computed on normalized time series, in which a z-score time series replaced the original time series so that the correlated variables are dimensionless. Regarding the question of assessing the relation between behavior and different levels of processing, we do not have means to address it, given that we are not able to empirically separate the effects of stimulus parameters versus attention.

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      We appreciate the reviewer’s insightful suggestion. In response, we tested different windowing parameters, e.g., 0.1s sliding window with a 0.05s step size. Figure S5 demonstrates that the strength of both target and distractor processing fluctuates around ~1 Hz, both at the individual and group levels. Additionally, Figures S6(A) and S6(B) show that the relative phase between target and distractor processing time series exhibits a uniform distribution across subjects. In terms of the relation between relative phase and behavior, Figure S6(C) illustrates two representative cases: a high-performing subject with 84.34% task accuracy exhibited a relative phase of 0.9483π (closer to π), while a low-performing subject with 30.95% accuracy showed a phase of 0.29π close to 0). At the group level, a significant positive correlation between relative phase and task performance was found (r = 0.6343, p = 0.0004), as shown in Figure S6(D). All these results, aligning closely with our original findings (0.5s window length and 0.25s step size), suggest that the conclusions are not dependent on windowing parameters. We discuss these results in the revised manuscript.

      To further validate our findings, we also employed the Hilbert transform to extract amplitude envelopes of the target and distractor signals on a time-point-by-time-point basis, providing a window-free estimate of signal strength (Figures R3 and R4). The results remain consistent with both the original findings and the new sliding window analyses (Figure S6). Specifically, Figure S7 reveals ~1 Hz fluctuations in target and distractor processing at both individual and group levels. Figures S8(A) and S8(B) confirm a uniform distribution of the relative phase across subjects. In Figure S8(C), the relative phase was 0.9567π for a high-performing subject (84.34% accuracy) and 0.2247π for a low-performing subject (28.57% accuracy). At the group level, a significant positive correlation was again observed between relative phase and task performance (r = 0.4020, p = 0.0376), as shown in Figure S8(D).

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      The lack of a no-distractor control condition is certainly a limitation and will be acknowledged as such in the revised manuscript. However, given that our decoding results are between two different classes of distractors, we are confident that they reflect distractor processing.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is an important point. To test robustness, we have implemented a random permutation procedure in which trial labels were randomly shuffled to construct a nullhypothesis distribution for decoding accuracy. We then compared the decoding accuracy from the actual data to this distribution. Figure S9 shows the results based on 1,000 permutations. For each of the three pairwise classifications—pleasant vs. neutral, unpleasant vs. neutral, and pleasant vs. unpleasant—as well as the three-way classification, the actual decoding accuracies fall far outside the null-hypothesis distribution (p < 0.001), and the effect size in all four cases is extremely large. These findings indicate that the observed decoding accuracies are statistically significant and robust in terms of both statistical inference and effect size.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the target or distractor strength over the whole trial that matters for behavior, it is their temporal relationship within the trial that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We have stressed this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We thus normalized each time course using zscoring, making them dimensionless, and then computed the temporal relations between them.

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phaseperformance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. As indicated above, in the revision process, we have carried out a number of additional analyses, some suggested by the reviewers, and the results of the additional analyses, now included in the Supplementary Materials, served to further validate the main findings and strengthen our conclusions.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as has been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI data, which is known to possess low temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways in the future, where we will be integrating the two modalities for more insights that are not possible with either modality used alone.

      Author response image 1.

      Appyling moving window analysis (0.02s window duration and 0.01 step size) to a different EEG-fMRI dataset. (A) The amplitude time series of the 4.29 Hz component and the Fourier spectrum. (B) The group level Fourier spectrum. At both individual and group level, no 1 Hz modulation is observed, suggesting that the 1 Hz modulation observed in our data is not introduced by the artifact removal procedure.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Our scanner artifact removal procedure is very standardized. As such, it stands to reason that if the 1Hz rhythmicity observed here results from the artifact removal process, it should also be present in other datasets where the same preprocessing steps were implemented. We tested this using another EEG-fMRI dataset (Rajan et al., 2019) . Author response image 1 shows that the EEG power time series of the new dataset doesn't have 1 Hz rhythmicity, whether at the individual level or at the group level, suggesting that the 1 Hz rhythmicity reported in the manuscript is not coming from the removal of the scanner artifacts, but instead reflects true rhythmic sampling of stimulus information. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz impact behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect, not an artifact.

      References

      Rajan, A., Siegel, S. N., Liu, Y., Bengson, J., Mangun, G. R., & Ding, M. (2019). Theta Oscillations Index Frontal Decision-Making and Mediate Reciprocal Frontal–Parietal Interactions in Willed Attention. Cerebral Cortex, 29(7), 2832–2843. https://doi.org/10.1093/cercor/bhy149

    1. eLife Assessment

      This fundamental work significantly advances our understanding of gravity sensing and orientation behavior in the ctenophore, an animal of major importance in understanding the evolution of nervous systems. Through comprehensive reconstruction with volumetric electron microscopy, and time-lapse imaging of cilia motion, the authors provide compelling evidence that the aboral nerve net coordinates the activity of balancer cilia. The resemblance to the ciliomotor circuit in marine annelids provides a fascinating example of how neural circuits may convergently evolve to solve common sensorimotor challenges.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      Also, to better establish the importance of the study, it could be useful to explain why the balancers' cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ's balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

    1. eLife Assessment

      This valuable work presents a novel computational framework for modeling macroscopic traveling waves in the mouse cortex by integrating open-source connectomic and transcriptomic data into a spiking network model. This approach allows the computational model to assign excitatory/inhibitory connections based on neurotransmitter profiles and extends simulations to the 3D domain. The authors present results that demonstrate how spatiotemporal dynamics such as slow oscillations (0.5-4 Hz) emerge and self-organize at the whole-brain scale. This study provides convincing initial insights into the structural basis of traveling waves at the whole-brain scale in the mouse.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Realistic coupling enables flexible macroscopic traveling waves in the mouse cortex" by Sun, Forger, and colleagues presents a novel computational framework for studying macroscopic traveling waves in the mouse cortex by integrating realistic brain connectivity data with large-scale neural simulations.

      The key contributions include:<br /> (1) developing an algorithm that combines spatial transcriptomic data (providing detailed neuron positions and molecular properties) with voxelized connectivity data from the Allen Brain Atlas to construct neuron-to-neuron connections across ~300,000 cortical neurons;<br /> (2) building a GPU-accelerated simulation platform capable of modeling this large-scale network with both excitatory and inhibitory Hodgkin-Huxley neurons;<br /> (3) extending phase-based analysis methods from 2D to 3D to quantify traveling wave activity in the realistic brain geometry; and<br /> (4) demonstrating that realistic Allen connectivity generates significantly higher levels of macroscopic traveling waves compared to simplified local or uniform connectivity patterns.

      The study reveals that wave activity depends non-monotonically on coupling strength and that slow oscillations (0.5-4 Hz) are particularly conducive to large-scale wave propagation, providing new insights into how anatomical connectivity enables flexible spatiotemporal dynamics across the cortex.

      Strengths:

      The authors leverage two existing dense datasets of spatial transcriptomic data and connection strength between pairwise voxels in the mouse cortex in a novel way, allowing for the computational model to capture molecular and functional properties of neurons as determined by their neurotransmitter profiles, rather than making arbitrary assignments of excitatory/inhibitory roles. Additionally, the author's expansion of 2D phase dynamics to 3D phase gradient analysis methods is important and can be widely applied to calcium imaging, LFP recordings, and likely other electrophysiological recordings.

      Weaknesses:

      Despite these important computational advancements, a few aspects of this model, particularly the inability to validate the model with experimental neural data, diminish my enthusiasm for this paper:

      (1) The model's Allen connectivity approach overlooks critical aspects of real cortical dynamics. Most importantly, it excludes subcortical structures, especially the thalamus, which drives cortical traveling waves through thalamocortical interactions. The authors' method of electrically stimulating all layer 4 neurons simultaneously to initiate waves is artificially crude and bears little resemblance to natural wave generation mechanisms.

      (2) The model handles voxel-to-voxel connections crudely when neurons have mixed excitatory/inhibitory properties and varying synaptic strengths. Real connectivity differs dramatically between neuron types (pyramidal cells vs. interneurons, across cortical layers), but the model only distinguishes excitatory and inhibitory neurons. Additionally, uniform synaptic weights ignore natural variations in connection strength based on neuron type, distance, and functional role. Integrating the updated thalamocortical dataset mentioned by the authors, even at regional resolution, would substantially improve the model.

      (3) While the authors bridge microscopic (single neuron) and mesoscopic (regional connectivity) data to study macroscopic (whole-cortex) waves, they don't integrate the distinct mechanisms operating at each scale. The framework demonstrates that realistic connectivity enables macroscopic waves but fails to connect how wave dynamics emerge and interact across spatial scales systematically.

      (4) Claims that Allen connectivity produces higher phase gradient directionality (PGD) than local connectivity appear limited to delta oscillations at very specific coupling strengths and applied currents. Few parameter combinations show significantly higher PGD for Allen connectivity, and these are generally low PGD values overall.

      (5) Broadly, it's unclear how this computational framework can study memory, learning, sleep, sensory processing, or disease states, given the disconnect between simulated intracellular voltages and the local field potentials or other electrophysiological measurements typically used to study cortical traveling waves. While computationally impressive, the practical research applications remain vague.

      (6) The paper needs a clearer explanation for why medium coupling (100%) eliminates waves in Allen connectivity (Figure 6) while stronger coupling (150%) restores them.

      (7) Does using a single connectivity parameter (ρ = 300) across all regions miss important regional differences in cortical connectivity density?

    3. Reviewer #2 (Public review):

      Summary:

      This work presents a spiking network model of traveling waves at the whole-brain scale in the mouse neocortex. The authors use data from the Allen Institute to reconstruct connectivity between different neocortical sites. They then quantify macroscopic traveling waves following stimulation of all layer 4 neurons in the neocortex.

      Strengths:

      Overall, the results are interesting and shed new light on the dynamic organization of activity across the neocortex of the mouse. The paper uses realistic neuron models specifically fit to intracellular recordings, demonstrating that traveling waves occur in the mouse neocortex with both realistic connectivity and realistic single-neuron dynamics. The paper is also well-written in general. For these reasons, the authors have generally achieved their aims in this work.

      Weaknesses:

      (1) Description of Algorithm 1:<br /> While the Methods section clearly explains the density parameter \rho, the statement on line 358 concerning the "ideal" average number of connections is a little unclear. The authors should explicitly clarify that \rho is a free parameter that can be adjusted to balance computational feasibility (for a given set of computational resources) and biological fidelity.

      (2) Lines 102-103:<br /> The \rho parameter used here results in approximately 300 connections per neuron on average. The authors should state clearly that the number of connections per cell is the key determinant of computational feasibility (cf. Morrison et al., Neural Computation, 2005). The authors should also review neuronal density and synaptic connectivity in the mouse neocortex and clearly reference density and connectivity in their model to the biological scales found in the mouse.

      (3) Line 131:<br /> From the plots in Figure 2, it is not clear that the stimulus response is necessarily a rhythmic oscillation, in the sense of a single narrowband frequency.

      (4) Line 217:<br /> The authors should clarify how these findings relate to the results from Mohajerani et al. (Nature Neuroscience, 2013) or differ from them.

      (5) Line 230:<br /> Because higher temporal frequency activity also tends to be more spatially localized, a correlation between PGD and temporal frequency could be an inherent consequence of this relationship, rather than a meaningful result.

      (6) Line 247-248:<br /> It is not clear that the algorithm for generating connections between neurons presented here really relates to those for community detection. For example, in the case of the Allen Institute data, the communities are essentially in the data already.

      (7) Line 284-285:<br /> The relationship between conduction delay is more direct than this sentence suggests. Conduction delay is fundamentally determined by the time required for action potentials to propagate along axons, making it intrinsically linked to anatomical distance.

      (8) Line 287-288:<br /> The authors suggest at this point that they do not have enough information to estimate time delays due to axonal conduction along white matter fibers. However, experimental data from white matter connections typically includes information about fiber length, which does enable estimating conduction delays. These estimations have been previously implemented for Allen Institute connectome data in the mouse (Choi and Mihalas, PLoS Comput Biology, 2019) and human connectome data (Budzinski et al., Physical Review Research, 2023).

      (9) Lines 294-295:<br /> Several methods do exist for detecting and characterizing wave dynamics in three-dimensional data (Budzinski et al., Physical Review Research, 2023).

    1. eLife Assessment

      This important study utilizes behavioral data and computational modeling to show that spatial properties of visual attention affect human planning. The methodology and statistical analyses are solid, though the way attention is conceptualized and modeled could be refined. The findings of this study will interest cognitive scientists studying attention, perception, and decision-making.

    2. Reviewer #1 (Public review):

      Summary: This study investigated how visuospatial attention influences the way people build simplified mental representations to support planning and decision-making. Using computational modeling and virtual maze navigation, the authors examined whether spatial proximity and the spatial arrangement of obstacles determine which elements are included in participants' internal models of a task. The study developed and tested an extension of the value-guided construal (VGC) model that incorporates features of spatial attention for selecting simpler task mental representation.

      Strengths:

      (1) Original Perspective: The study introduces an explicit attentional component to established models of planning, offering an approach that bridges perception, attention, and decision-making.

      (2) Methodological Approach: The combination of computational modeling, behavioral data, and eye-tracking provides converging measures to assess the relationship between attention and planning representations.

      (3) Cross-validated data: The study relies on the analysis of three separate datasets, two already published and an additional novel one. This allows for cross-validation of the findings and enhances the robustness of the evidence.

      (4) Focus on Individual Differences: Reports of how individual variability in attentional "spillover" correlates with the sparsity of task representations and spatial proximity add depth to the analysis.

      Weaknesses:

      (1) Clarity of the VGC model and behavioral task: The exposition of the VGC model lacks sufficient detail for non-expert readers. It is not clear how this model infers which maze obstacles are relevant or irrelevant for planning, nor how the maze tasks specifically operationalize "planning" versus other cognitive processes.

      The method for classifying obstacles as relevant or irrelevant to the task and connecting metacognitive awareness (i.e., participants' reports of noticing obstacles) to attentional capture is not well justified. The rationale for why awareness serves as a valid attention proxy, as opposed to behavioral or neurophysiological markers, should be clearer.

      (2) Attention framework: The account of attention is largely limited to the "spotlight" model. When solving a maze, participants trace the correct trail, following it mentally with their overt or covert attention. In this perspective, relevant concepts are also rooted in attention literature pertaining to object-based attention using tasks like curve tracing (e.g., Pooresmaeili & Roelfsema, 2014) and to mental maze solving (e.g., Wong & Scholl, 2024), which may be highly relevant and add nuance to the current work. This view of attention may be more pertinent to the task than models of simultaneously tracking multiple objects cited here. Prior work (notably from the Roelfsema group) indicates that attentional engagement in curve-tracing tasks may be a continuous, bottom-up process that progressively spreads along a trajectory, in time and space, rather than a "spotlight" that simply travels along the path. The spread of attention depends on the spatial proximity to distractors - a point that could also be pertinent to the findings here.

      Moreover, the tracing of a "solution" trail in a maze may be spontaneous and not only a top-down voluntary operation (Wong & Scholl, 2024), a finding that requires a more careful framing of the link to conscious perception discussed in the manuscript.

      Conceptualizing attention as a spatial spotlight may therefore oversimplify its role in navigation and planning. Perhaps the observed attentional modulation reflects a perceptual stage of building the trail in the maze rather than a filter for a later representation for more efficient decision making and planning. A fuller discussion of whether the current model and data can distinguish between these frameworks would benefit readers.

      (3) Lateralization of attention: The analysis considers whether relevant information is distributed bilaterally or unilaterally across the visual display, but does not sufficiently address evidence for attentional asymmetries across the left and right visual fields due to hemispheric specialization (e.g., Bartolomeo & Seidel Malkinson, 2019). Whether effects differ for left versus right hemifield arrangements is not made explicit in the presented findings.

      (4) Individual differences: Individual differences in attentional modulation are a strength of the work, but similar analyses exploring individual variation in lateralization effects could provide further insight, and the lack of such analyses may mask important effects.

      (5) Distinction between overt and covert attention: The current report at times equates eye movement patterns with the locus of attention. However, attention can be covertly shifted without corresponding gaze changes (see, for example, Pooresmaeili & Roelfsema, 2014).

      The implications for interpreting the relationship between eye movement, memory, and attention in this setting are not fully addressed. The potential dynamics of attention along a maze trajectory and their impact on lateralization analysis would benefit from further clarification.

      Appraisal of Aims and Results:

      The study sets out to determine how spatial attention shapes the construction of task representations in planning contexts. The authors provide evidence that spatial proximity and arrangement influence which environmental features are incorporated into internal models used for navigation, and that accounting for these effects improves model predictions. There is clear documentation of individual variation, with some participants showing greater attentional spillover and more sparse awareness profiles.

      However, some conceptual and methodological aspects would be clearer with greater engagement with the broader literature on attention dynamics, a more explicit justification of operational choices, and more targeted lateralization analyses.

    3. Reviewer #2 (Public review):

      Summary:

      Castanheira et al. investigate the role of spatial attention for planning during three maze navigation experiments (one new experiment and two existing datasets). Effective planning in complex situations requires the construction of simplified representations of the task at hand. The authors find that these mental representations (as assessed by conscious awareness) of a given stimulus are influenced by (spatially) surrounding stimuli. Individual participants varied in the degree to which attention influenced their task representations, and this attentional effect correlated with the sparsity of representations (as measured by the range of awareness reports across all stimuli). Spatially grouping task-relevant information on either the left or right side of the maze led to mental representations more similar to optimal representations predicted by the value-guided construal (VGC) model - a normative model describing a theoretical approach to simplifying complex task information. Finally, the authors propose an update to this model, incorporating an attentional spotlight component; the revised descriptive model predicts empirical task representations better than the original (normative) VGC model.

      Strengths:

      The novelty of this study lies in the proposal and investigation of a cognitive mechanism through which a normative model like value-guided construal can enable human planning. After proposing attention as this mechanism, the authors make concrete hypotheses about mismatches between the VGC predictions and real human behavior, which are experimentally validated. Thus, not only does this study describe a possible mechanism for simplification of task information for planning, but the authors also propose a descriptive model, revising VGC to incorporate this attentional component.

      A strength of this paper is the variety of investigative approaches: analysis of existing data, novel experiment, and a computational approach to predict experimental findings from a theoretical model. Analyzing pre-existing datasets increases the size of the participant cohort and strengthens the authors' conclusions. Meanwhile, comparing the predictions of the existing normative model and the authors' own refined model is a clever approach to substantiate their claims. In addition, the authors describe several crucial controls, which are key to the interpretability of their results. In particular, the eye tracking results were critical.

      In summary, this paper constitutes an important step toward a more complete understanding of the human ability to plan.

      Weaknesses:

      (1) There is a critical conceptual gap in the study and its interpretation, mainly due to the reliance on a self-report metric of awareness (rather than an objective measure of behavioral performance).

      a. Awareness is tested by a 9-point self-report scale. It is currently unclear why awareness of task-irrelevant obstacles in this task would necessarily compromise optimal planning. There is no indication of whether self-reported awareness affects performance (e.g., navigation path distance, time to complete the maze, number of errors). Such behavioral evidence of planning would be more compelling.

      b. Relatedly, it would have been more convincing to have an objective measure of awareness, for instance, how the presence or absence of a "task-irrelevant" obstacle affects performance (e.g., change navigation path distance or time to complete the maze), or whether participants can accurately recall the location of obstacles.

      c. Consequently, I'm not sure that we can conclude that the spatial context does impact participants' ability to plan spatial navigation or to "incorporate task-relevant information into their construal". We know that the spatial context affects subjective (self-reported) awareness, but the authors do not present evidence that spatial context affects behavioral performance.

      d. Another concern that may complicate interpretation is the following: Figure 3c shows improved VGC model predictions (steeper slope) for mazes with greater lateralization. However, there are notable outliers in these plots, where a high lateralization index does not correspond to good model performance. There is currently no discussion/explanation of these cases.

      (2) I noticed an issue with clarity regarding task-relevance. It is currently not fully clear which obstacles are "task irrelevant". Also, the term is used inconsistently, sometimes conflating with "awareness". For example, in the "Attentional spotlight model of task representations" section, the authors state that "task-relevant information becomes less relevant when surrounded by task-irrelevant information". But they really mean that participants become less aware of those task-relevant obstacles. I assume task-relevance is an objective characteristic related to maze organization, not to a participant's construal. Indeed, the following paragraph provides evidence of model predictions of awareness.

      (3) The behavioral paradigm has some distinct disadvantages, and the validity of the task is not backed up by behavioral data.

      a. I understand the need for central fixation, but it also makes the task less naturalistic.

      b. The task with its top-down grid view does not seem to mimic real human navigation. Though this grid may be similar to mental maps we form for navigation, the sensory stimuli corresponding to possible paths and to spatial context during real-life navigation are very different.

      c. Behavioral performance is not reported, so it is unknown whether participants are able to properly complete the task. The task seems pretty difficult to navigate, especially when the obstacles disappear, and in combination with the central fixation.

      d. There is no discussion of whether/how this navigation task generalizes to other forms of planning.

    4. Reviewer #3 (Public review):

      Summary:

      The authors build on a recent computational model of planning, the "value-guided construal" framework by Ho et al. (2022), which proposes that people plan by constructing simple models of a task, such as by attending to a subset of obstacles in a maze. They analyze both published experimental data and new experimental data from a task in which participants report attention to objects in mazes. The authors find that attention to objects is affected by spatial proximity to other objects (i.e., attentional overspill) as well as whether relevant objects are lateralized to the same hemifield. To account for these results, the authors propose a "spotlight-VGC" model, in which, after calculating attention scores based on the original VGC model, attention to objects is enhanced based on distance. They find that this model better explains participant responses when objects are lateralized to different hemifields. These results demonstrate complex interactions between filtering of task-relevant information and more classical signatures of attentional selection.

      Strengths:

      (1) The paper builds on existing modeling work in a novel manner and integrates classic results on attention into the computational framework.

      (2) The authors report new and extensive analyses of existing data that shed light on additional sources of systematic variability in responses related to attentional spillover effects

      (3) They collect new data using new stimuli in the original paradigm that directly test predictions related to the lateralization of task-relevant information, including eye tracking data that allows them to control for possible confounds.

      (4) The extended model (spotlight-VGC) provides a formal account of these new results.

      Weaknesses:

      (1) The spotlight-VGC model has a free parameter - the "width" of the attentional spotlight. This seems to have been fixed to be 3 squares. It would be good if the authors could describe a more principled procedure for selecting the width so that others can use the model in other contexts.

      (2) Have the authors considered other ways in which factors such as attentional spillover and lateralization could be incorporated into the model? The spotlight-VGC model, as presented, involves first computing VGC predictions and only afterwards computing spillover. This seems psychologically implausible, since it supposes that the "optimal" representation is first formed and then it gets corrupted. Is there a way to integrate these biases directly into the VGC framework, perhaps as a prior on construals? The authors gesture towards this when they talk about "inductive biases", but this is not formalized.

      (3) Can the authors rule out that the lateralization effects are the result of memory biases since the main measure used is a self-report of attention?

    1. eLife Assessment

      This study presents a valuable and rigorous molecular resource, offering subtype-specific insight into the composition of ribosome-associated protein complexes in the developing cerebral cortex. The evidence is compelling in terms of data quality and is strongly supported by the results, given the rigorous technical execution. However, the findings remain primarily descriptive, as the study lacks functional validation to support mechanistic conclusions.

    2. Reviewer #1 (Public review):

      This work provides a valuable toolkit for endogenous isolation of projection neuron subtypes. With further validation, it could present a solid method for low-input ribosome affinity purification using a ribosomal RNA (rRNA) antibody. The experimental evidence for the distinct ribosomal complexes is limited to this method and indirect support from complementary analyses of pre-existing data. However, with additional experimental data to support the specificity of ribosomal complex pulldown and confirmation of the putative ribosomal complex proteins of interest, the study would provide compelling evidence for translation regulation of neuronal development through compositional ribosome heterogeneity. This work would be of interest to neuroscientists, developmental biologists, and those studying translational networks underlying gene regulation.

      Strengths

      (1) This in vivo labeling of specific projection neurons and ribosomal rRNA affinity purification method accommodates a low input of <100K somata per replicate, which is useful for the study of neuronal subtypes with limited input. In principle, this set of techniques could work across different cell types with limited input, depending on the molecule used for cell type labeling.

      (2) The authors are also able to isolate endogenous neurons with minimal perturbation up to the point of collection, preserving the native state for the neuron in vivo as long as possible prior to processing.

      (3) This study identified over a dozen potential non-ribosomal proteins associated with SCPN ribosomal complexes, as well as a ribosomal protein enriched in CPN.

      Limitations

      (1) In this study, the authors address the advantages of their ribosomal complex isolation method in SCPN and CPN against RPL22-HA affinity purification. While this does show more pull-down of the ribosomal RNA by the Y10B rRNA antibody, the authors claim this method identifies cell-type-specific ribosomal complex proteins without demonstrating a positive control for the method's specificity. There are very limited experiments to truly delineate how "specific" this method is working and whether there could be contamination from other complexes bound by the antibody. I see this as the major limitation that should be addressed. To boost their claims of capturing cell-type-specific ribosomal complexes, the authors could consider applying their rRNA affinity purification pipeline to compare cell types with well-characterized ribosome-associated proteins, like mouse embryonic stem cells and HELA cells. The reviewer can completely appreciate the elegance in the neural characterization here, but it seems there needs to be a solid foothold on the specificity of the method, perhaps facilitated by cell types that can be more readily scaled up and tested.

      (2) The authors followed up on their differentially enriched ribosomal complex proteins by analyzing the ribosome association of these proteins in external datasets. While this analysis supports the ribosome-association of these proteins, there is limited experimental validation of physical association with the ribosome, much less any functional characterization. The reciprocal pulldown of PRKCE is promising; however, I would recommend orthogonal validation of several putative ribosomal complex proteins to increase confidence. Specifically, the authors could use sucrose gradient fractionation of SCPN and CPN, followed by a western blot to identify the putative interaction with the 80S monosome or polysomes. This would also provide evidence towards the pulldown capturing association with mature ribosome species, which is currently unclear. This experiment would provide substantial evidence for the direct association of these non-ribosomal proteins with subtype-specific ribosomal complexes.

      (3) The authors state interest in learning more about the differences underlying translational regulation of projection neuron development. This method only captures neuronal somata, which will only capture ribosomes in the main cell body. There are also ribosomes regulating local translation in the axons, which may also play a critical role in axonal circuit establishment and activity. These ribosomal complex interactions may also be rather transient and difficult to capture at only one developmental stage. Therefore, this method is currently limited to a single developmental snapshot of ribosomal complexes at P3 within the main cell body. It would be exciting to see the extended utility of this method to sample neurites and additional developmental stages to gain further resolution on the developmental translation regulation of these projection neurons.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The authors introduce a unique pipeline of techniques to identify cell-type-specific ribosomal complex compositions. With more validation, there is certainly potential for those studying neuronal translation to leverage this method in limited primary cells as an alternative to existing methods that do not rely on ribosomal protein tagging, such as ARC-MS (Bartsch et al., 2023), RAPIDASH (Susanto and Hung et al., 2024), and RAPPL (Nature Communications, 2025).

    3. Reviewer #2 (Public review):

      Summary:

      This study presents a sophisticated molecular dissection of ribosome-associated complexes (RCs) in two well-defined cortical projection neuron subtypes (ScPN and CPN) during early postnatal development. The authors develop and optimize an rRNA immunoprecipitation-mass spectrometry (rRNA IP-MS) workflow to recover RCs from FACS-purified, retrogradely labeled neurons, achieving remarkable subtype specificity and biochemical resolution. Through proteomic profiling, they reveal both shared and distinct ribosome-associated proteins between ScPN and CPN, with a focus on non-core RC components and their potential functional relevance. The work advances our understanding of cell-type-specific translation regulation, moving beyond the transcriptome to explore the proteome-level complexity in neuronal subtypes.

      Strengths:

      This work stands out for its technical sophistication and innovation. The authors combine retrograde labeling, FACS purification, and an optimized rRNA IP-MS approach (low input) to isolate ribosome-associated complexes from highly specific neuronal subtypes in vivo, a challenging issue that they execute with impressive rigor. The methodological pipeline is both elegant and well-controlled, yielding high-quality, reproducible data. The depth of proteomic coverage is remarkable, with nearly all known cytoplasmic ribosomal proteins identified, along with hundreds of ribosome-associated proteins (RAPs), including translation factors, chaperones, and RNA-binding proteins. The analysis not only reveals shared components between ScPN and CPN RCs but also uncovers subtype-specific differences in associated proteins.

      Particularly notable is the integration of this new proteomic dataset with previously published transcriptomic and ribosome footprinting data, which helps to validate the specificity and relevance of the findings. Overall, the clarity of the writing, the robustness of the data, and the transparency of the methods make this a strong and compelling contribution.

      Weaknesses:

      Despite the depth and high quality of the dataset, the study remains descriptive. While the identification of subtype-specific RC components is intriguing, the current version of the manuscript does not explore their functional roles or the biological consequences of their alterations. There is no perturbation, causal testing, in vitro or in vivo manipulation to demonstrate whether these proteins are necessary for ScPN or CPN identity, specific axonal targeting, metabolism, or synaptic function.

      One important point highlighted by the authors in the discussion - and critical for establishing the subtype specificity of the identified proteins - is that some ribosomal complexes may be specialized for specific developmental stages, rather than exclusively for the subtype-specific needs of projection neuron development. The work presented here provides a valuable starting point for further investigation into such RC specialization. However, it will be essential to determine to what extent these RCs exhibit true subtype specificity, independently of their temporal maturation context.

      As a result, key mechanistic insights remain a bit speculative. Although several of the identified proteins have known roles in processes like synaptogenesis or metabolism, their relevance to the specific neuronal subtypes under study is not experimentally addressed. That said, given its rich content and the comprehensive early postnatal dataset, the manuscript represents an extremely valuable resource for the community. While primarily exploratory, it lays a strong foundation for future functional studies aimed at uncovering the biological impact of the identified ribosomal complexes.

    1. eLife Assessment

      This valuable model-based study seeks to mimic bat echolocation behavior and flight under conditions of high interference, such as when large numbers of bats leave their roost together. Although some of the assumptions made in the model may be questioned, the simulations convincingly suggest that the problem of acoustic jamming in these situations may be less severe than previously thought. This finding will be of broad interest to scientists working in the fields of bat biology and collective behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      * The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      * The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents succesfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      * The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      * The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      * Authors have not yet provided convincing justification for the use of different echolocation phases during emergence and in cave behaviour. In the previous modelling paper cited for the details - here the bat-agents are performing a foraging task, and so the switch in echolocation phases is understandable. While flying with conspecifics, the lab's previous paper has shown what they call a 'clutter response' - but this is not necessarily the same as going into a 'buzz'-type call behaviour. As pointed out by another reviewer - the results of the simulations may hinge on the fact that bats are showing this echolocation phase-switching, and thus improving their echo-detection. This is not necessarily a major flaw - but something for readers to consider in light of the sparse experimental evidence at hand currently.

      * The decision to model direction-of-arrival with such high angular resolution (1-2 degrees) is not entirely justifiable - and the authors may wish to do simulation runs with lower angular resolution. Past experimental paradigms haven't really separated out target-strength as a confounding factor for angular resolution (e.g. see the cited Simmons et al. 1983 paper). Moreover, to this reviewer's reading of the cited paper - it is not entirely clear how this experiment provides source-data to support the DoA-SNR parametrisation in this manuscript. The cited paper has two array-configurations, both of which are measured to have similar received levels upon ensonification. A relationship between angular resolution and signal-to-noise ratio is understandable perhaps - and one can formulate such a relationship, but here the reviewer asks that the origin/justification be made clear. On an independent line, also see the recent contrasting results of Geberl, Kugler, Wiegrebe 2019 (Curr. Biol.) - who suggest even poorer angular resolution in echolocation.

    3. Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the directionof-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 58-64):

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      • Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      • Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      • Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 84-85, 119-120). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      • Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      • Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion. We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections – see lines 346-349, 372-375.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a twodimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature. 

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me. 

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight. 

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats(Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 543-545. 

      If so, what is the difference between phi_target and phi_tx in the model equations? 

      𝝓<sub>𝒕𝒂𝒓𝒈𝒆𝒕</sub> represents the angle between the bat and the reflected object (target).

      𝝓<sub>𝑻𝒙</sub> the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      𝝓<sub>𝑻𝒙𝑹𝒙</sub> refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      𝝓<sub>𝑹𝒙𝑻𝒙</sub> represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 525-530). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      What is a bat's response to colliding with a conspecific (rather than a wall)? 

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldshtein et al., 2025). Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics. See lines 479-484.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both? 

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 110-111):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials. 

      We clarified in the revised text (Lines 627-628 in Statistical Analysis) 

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the s below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation? 

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on welldocumented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 499-508).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect? 

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase most of the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma), we also have empirical recordings of individuals flying under similar conditions (Goldshtein et al., 2025). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities. See lines 500-508.

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filterbank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003).

      We have now explicitly highlighted this in the revised version (see 548-581).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation. 

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming. 

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.  

      The reviewer is correct. Indeed, integration over multiple calls improves signal-tonoise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem? 

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 600-616 in the revised version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach. 

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      • Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m (Fujioka et al., 2021), as observed in Myotis grisescens (Sabol and Hudson, 1995) and Tadarida brasiliensis (Theriault et al., no date; Betke et al., 2008; Gillam et al., 2010)

      • Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable (see Methods lines 450455).

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem. 

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler, Bioscience and 2001, no date; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022)). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 1: The impact of confusion on performance, and lines 399-404 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines 411-420 in the manuscript for further discussion. 

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (see Lines 509-512 in Methods).

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"  :

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to support stable and realistic flight trajectories while maintaining a reasonable collision rate. These values reflect a trade-off between maneuverability and behavioral coherence under crowding. To address this point, we added a sensitivity analysis to the revised manuscript. Specifically, we tested the effect of varying the conspecific avoidance distance from 0.2 to 1.6 meters at bat densities of 2 to 40 bats/3m². The only statistically significant impact was at the highest density (40 bats/3m²), where exit probability increased slightly from 82% to 88% (p = 0.024, t = 2.25, DF = 958). No significant changes were observed in exit time, collision rate, or jamming probability across other densities or conditions (GLM, see revised Methods). These results suggest that the selected avoidance distances are robust and not a major driver of model performance, see lines 469-47.

      The 15-second exit limit was determined as described in the text (Lines 489-491): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer— measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?  

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions?

      Does it include masking, no masking, or which species? 

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss and Surlykke, 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking. We have revised the text to clarify these details see, lines 489-491.

      Reviewer #1 (Recommendations for the authors):

      (1) Data Availability:

      As it stands now, this reviewer cannot vouch for the uploaded code as it wasn't accessible according to F.A.I.R principles. The link to the code/data points to a private company's file-hosting account that requires logging in or account creation to see its contents, and thus cannot be accessed.

      This reviewer urges the authors to consider uploading the code onto an academic data repository from the many on offer (e.g. Dryad, Zenodo, OSF). Some repositories offer an option to share a private link (e.g. Zenodo) to the folder that can then be shared only with reviewers so it is not completely public.

      This is a computational paper, and the credibility of the results is based on the code used to generate them.

      The code is available at GitHub as required:

      https://github.com/omermazar/Colony-Exit-Bat-Simulation

      (2) Abstract:

      Line 22: 'To explore whether..' - replace 'whether' with 'how'?

      The sentence was rephrased as suggested by the reviewer.

      (2) Main text:

      Line 43: '...which may share...' - correct to '...which share...', as elegantly framed in the authors' previous work - jamming avoidance is unavoidable because all FM bats of a species still share >90% of spectral bandwidth despite a few kHz shift here and there.

      The sentence was rephrased as suggested by the reviewer.

      Line 49: The authors may wish to additionally cite the work of Fawcett et al. 2015 (J. Comp. Phys A & Biology Open)

      Thank you for the suggestion. We have included a citation to the work of Fawcett et al. (2015) in the revised manuscript.

      Line 61: This statement does not match the recent state of the literature. While the previous models may have assumed that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from the potential inability to track all neighbours, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Jhawar et al. 2020 Nature Physics.

      We have added citations to the important studies suggested by the reviewer, as detailed in the Public Review above.

      Line 89: '..took all interference signals into account...' - what is meant by 'interference signals' - are the authors referring to reflections, unclear.

      We have revised the sentence and detailed the acoustic signals involved in the process: self-generated echoes, calls from conspecifics, and echoes from cave walls and other bats evoked by those calls, see lines 99-106.

      Figure 1A: The colour scheme with overlapping points makes the figure very hard to understand what is happening. The legend has colours from subfigures B-D, adding to the confusion.

      What does the yellow colour represent? This is not clear. Also, in general, the color schemes in the simulation trajectories and the legend are not the same, creating some amount of confusion for the reader. It would be good to make the colour schemes consistent and visually separable (e.g. consp. call direct is very similar to consp. echo from consp. call), and perhaps also if possible add a higher resolution simulation visualisation. Maybe it is best to separate out the colour legends for each sub-figure.

      The updated figure now includes clearer, more visually separable colors, and consistent color coding across all sub-panels. The yellow trajectory representing the focal bat’s flight path is now explicitly labeled, and we adjusted the color mapping of acoustic signals (e.g., conspecific calls vs. echoes) to improve distinction. We also revised the figure caption accordingly and ensured that the legend is aligned with the updated visuals. These modifications aim to enhance interpretability and reduce ambiguity for the reader.

      Figure C3: What is 'FB Channel', this is not explained in the legend.

      FB Channel’ stands for ‘Filter Bank Channel’. This clarification has been added to the caption of Figure 1. 

      Figure 3: Visually noticing that the colour legend is placed only on sub-figure A is tricky and readers may be left searching for the colour legend. Maybe lay out the legend horizontally on top of the entire figure, so it stands out?

      We have adjusted the placement of the color legend in Figure 3 to improve visibility and consistency.

      Line 141: '..the probability of exiting..' - how is this probability calculated - not clear.

      We have clarified in the revised text that the probability of exiting the cave within 15 seconds is defined as the number of bats that exited the cave within that time divided by the total number of bats in each scenario, see lines 159160.

      Line 142: What are the sample sizes here - i.e. how many simulation replicates were performed?

      We have clarified the number of repetitions in each scenario the revised text, as detailed in the Public Review above.

      Line 151: 'The jamming probability,...number of jammed echoes divided by the total number of reflected echoes' - it seems like these are referring to 'own' echoes or first-order reflections, it is important to clarify this.

      The reviewer is right. We have clarified it in the revised text, see lines 173175.

      Line 153: '..with a maximum difference of ...' - how is this difference calculated? What two quantities are being compared - not clear.

      We have revised the text to clarify that the 14.3% value reflects the maximum difference in jamming probability between the RM and PK models, which occurred at a density of 10 bats. The values at each density are shown in Figure 2D, see lines 175-177.

      Line 221: '..temporal aggregation helps..' - I'm assuming the authors meant temporal integration? However, I would caution against using the exact term 'temporal integration' as it is used in the field of audition to mean something different. Perhaps something like 'sensory integration' , or 'multi-call integration'

      To avoid ambiguity and better reflect the process modeled in our work, we have replaced the term "temporal aggregation" with "multi-call integration" throughout the revised manuscript. This term more accurately conveys the idea of combining information from multiple echolocation calls without conflicting with existing terminology.

      (4) Discussion

      Lines 302: 'Our model suggests...increasing the call-rate..' - not clear where this is explicitly tested or referred to in this manuscript. Can't see what was done to measure/quantify the effect of this variable in the Methods or anywhere else.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 346-349.

      Line 319: 'spatial interference' - unclear what this means. This reviewer would strongly caution against creating new terms unless there is an absolute need for it. What is meant by 'interference' in this paper is hard to assess given that the word seems to be used as a synonym for jamming and also for actual physical wave-based interference.

      We have rephrased this paragraph as detailed in the Public Review above, see line 119-120, 366-367.

      Line 323: '..no benefit beyond a certain level...' - also not clear where this is explicitly tested. It seems like there was a set of simulations run for a variety of parameters but this is not written anywhere explicitly. What type of parameter search was done, was it all possible parameter combinations - or only a subset? This is not clear.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 372-375.

      Line 324: '..ca. 110 dB-SPL.' - what reference distance?

      All call levels were simulated and reported in dB-SPL, referenced at 0.1 meters from the emitting bat. We have clarified it in the revised text in the relevant contexts and specifically in line 529.

      (5) Methods

      Line 389 : '...over a 2 x 1.5 m2 area..' It took a while to understand this statement and put it in context. Since there is no previous description of the entire L-arena, the reviewer took it to mean the simulations happened over the space of a 2 x 1.5 m2 area. Include a top-down description of the simulation's spatial setup and rephrase this sentence.

      To address the confusion, we revised the text to clarify that the full simulation environment represents a corridor-shaped cave measuring 14.5 × 2.5 meters, with a right-angle turn located 5.5 meters before the exit, as shown in Figure 1A. The 2 × 1.5 m area refers specifically to the small zone at the far end of the cave where bats begin their flight. The revised description now includes a clearer spatial overview to prevent ambiguity, see lines 456-460.

      Line 398: Replace 'High proximity' with 'Close proximity'

      Replaced.

      Line 427: 'uniform target strength of -23 dB' - at what distance is this target strength defined? Given the reference distance can vary by echolocation convention (0.1 or 1 m), one can't assess if this is a reasonable value or not.

      The reference distance for the reported target strength is 1 meter, in line with standard acoustic conventions. We have revised the text to clarify this explicitly (line 531).

      Also, independent of the reference distance, particularly with reference to bats, the target strength is geometry-dependent, based on whether the wings are open or not. Using the entire wingspan of a bat to parametrise the target strength is an overestimate of the available reflective area. The effective reflective area is likely to be somewhere closer to the surface area of the body and a fraction of the wingspan together. This is important to note and/or mention explicitly since the value is not experimentally parametrised.

      For comparison, experimentally based measurements used in Goetze et al. 2016 are -40 dB (presumably at 1 m since the source level is also defined at 1 m?), and Beleyur & Goerlitz 2019 show a range between -43 to -34 dB at 1 m.

      We agree with the reviewer that target strength in bats is strongly influenced by their geometry, particularly wing posture during flight. In our model, we simplified this aspect by using a constant target strength, as the detailed temporal variation in body and wing geometry is pseudo-random and not explicitly modeled. We acknowledge that this is a simplification, and have now stated this limitation clearly in the revised manuscript. We chose a fixed value of –23 dB at 1 meter to reflect a plausible mid-range estimate, informed by anatomical data and consistent with values reported for similarly sized species (Beleyur and Goerlitz, 2019). To support this, we directly measured the target strength of a 3D-printed RM bat model, obtaining –32dB. 

      Moreover, a sensitivity analysis across a wide range (–49 to –23 dB) confirmed that performance metrics remain largely stable, indicating that our conclusions are not sensitive to this parameter, and suggesting that our results hold for different-sized bats. See lines 384-390, 533-538, and Supplementary Figures 3 and 4 in the revised article. 

      Line 434: 'To model the bat's cochlea...'. Bats have two cochleas. This model only describes one, while the agents are also endowed with the ability to detect sound direction - which requires two ears/cochleas.... There is missing information about the steps in between that needs to be provided.

      We appreciate the reviewer’s observation. Indeed, our model is monaural, and simulates detection using a single cochlear-like filter bank receiver. We have clarified this in the revised text to avoid confusion. This paragraph specifically describes the detection stage of the auditory processing pipeline. The localization process, which builds on detection and includes directional estimation, is described in the following paragraph (see line 583 onward), as discussed in the next comment and response.

      Line 457: 'After detection, the bat estimates the range and Direction of Arrival...' This paragraph describes the overall idea, but not the implementation. What were the inputs and outputs for the range and DOA calculation performed by the agent? Or was this information 'fed' in by the simulation framework? If there was no explicit DOA step that the agent performed, but it was assumed that agents can detect DOA, then this needs to be stated.

      In the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. Instead, based on experimental studies (Simmons et al., 1983; Popper and Fay, 1995).  we assumed that bats can estimate the direction of an echo with an angular error that depends on the signal-to-noise ratio (SNR). Accordingly, the inputs to the DOA estimation were the peak level of the desired echo, noise level, and the level of acoustic interference. The output was an estimated direction of arrival that included a random angular error, drawn from a normal distribution whose standard deviation varied with the SNR. We have revised the relevant paragraph (Lines 583-592) to clarify this implementation.

      Line 464: 'To evaluate the impact of the assumption...' - the 'self' and 'non-self' echoes can be distinguished perhaps using pragmatic time-delay cues, but also using spectro-temporal differences in individual calls/echoes. Do the agents have individual call structures, or do all the agents have the same call 'shape'? The echolocation parameters for the two modelled species are given, but whether there is call parameter variation implemented in the agents is not mentioned.

      In our relatively simple model, all individuals emit the same type of chirp call, with parameters adapted only based on the distance to the nearest detected object. However, individual variation is introduced by assigning each bat a terminal frequency drawn from a normal distribution with a standard deviation of 1 kHz, as described in the revised version -lines 519-520. This small variation is not used explicitly as a spectro-temporal cue for echo discrimination.

      In our model, all spectro-temporal variations—whether due to call structure or variations resulting from overlapping echoes from nearby reflectors—are processed through the filter bank, which compares the received echoes to the transmitted call during the detection stage. As such, the detection process itself can act as a discriminative filter, to some extent, based on similarity to the emitted call.

      We acknowledge that real bats likely rely on a variety of spectro-temporal features for distinguishing self from non-self-echoes—such as call duration, received level, multi-harmonic structure, or amplitude modulation. In our simulation, we focus on comparing two limiting conditions: full recognition of self-generated echoes versus full confusion. Implementing a more nuanced self-recognition mechanism based on temporal or spectral cues would be a valuable extension for future work.

      (6) References

      Reference 22: Formatting error - and extra '4' in the reference.

      The error has been fixed.

      (7) Thoughts/comments

      Even without 'recogntion' of walls & conspecifics, bats may be able to avoid obstacles - this is a neat result. Also, using their framework the authors show that successful 'blind' object-agnostic obstacle avoidance can occur only when supported by some sort of memory. In some sense, this is a nice intermediate step showing the role of memory in bat navigation. We know that bats have good long-term and long-spatial scale memory, and here the authors show that short-term spatial memory is important in situations where immediate sensory information is unreliable or unavailable.

      We appreciate the reviewer’s thoughtful summary. Indeed, one of the main takeaways of our study is that successful obstacle avoidance can occur even without explicit recognition of walls or conspecifics—provided that a clustered multi-call integration is in place. Our model shows that when immediate sensory information is unreliable, integrating detections over time becomes essential for effective navigation. This supports the broader view that memory, even on short timescales, plays an important role in bat behavior.

      (8) Reporting GLM results

      The p-value, t-statistic, and degrees of freedom are reported consistently across multiple GLM results. However, the most important part which is the effect size is not consistently reported - and this needs to be included in all results, and even in the table. The effect size provides an indicator of the parameter's magnitude, and thus scientific context.

      We agree that the effect size provides essential scientific context. In fact, we already include the effect size explicitly in Table 1, as shown in the “Effect Size” column for each tested parameter. These values describe the magnitude of each parameter’s effect on exit probability, jamming probability, and collision rate. In the main text, effect sizes are presented as concrete changes in performance metrics (e.g., “exit probability increased from 20% to 87%,” or “with a decrease of 3.5%±8% to 5.5%±5% (mean ± s.e.)”), which we believe improves interpretability and scientific relevance.  

      To further clarify this in the main text, we have reviewed the reported results and ensured that effect sizes are mentioned more consistently wherever GLM outcomes are discussed. Additionally, we have added a brief note in the table caption to emphasize that effect sizes are provided for all tested parameters.

      The 'tStat' appears multiple times and seems to be the output of the MATLAB GLM function. This acronym is specific to the MATLAB implementation and needs to be replaced with a conventionally used acronym such as 't', or the full form 't-statistic' too. This step is to keep the results independent of the programming language used.

      We have replaced all instances of tStat with the more conventional term ‘t’ throughout the manuscript to maintain consistency with standard reporting practices.

      Reviewer #2 (Recommendations for the authors):

      In addition to my public review, I had a few minor points that the authors may want to consider when revising their paper.

      (1) Figures 2, 3, and 4 may benefit from using different marker styles, in addition to different colors, to show the different cases.

      Thank you for the suggestion. In Figures 2–4, the markers represent means with standard error bars. To maintain clarity and consistency across all conditions, we have chosen to keep a standardized marker style – and we clarify this in the legend. We found that varying only the colors is sufficient for distinguishing between conditions without introducing visual clutter.

      (2) The text "PK" in the inset for Figure 2A is very difficult to read. I would suggest using grey as with "RM" in the other inset.

      We have updated the insert in Figure 2A to improve legibility.

      (3) Are the error bars in Figure 3 very small? I wasn't able to see them. If that is the case, the authors may want to mention this in the caption.

      You are correct—the error bars are present in all plots but appear very small due to the large number of simulation repetitions and low variability. We have revised the caption to explicitly mention this.

      (4) The species name of PK is spelled inconsistently (kuhli, khulli, and kuhlii).

      We have corrected the species name throughout the manuscript.

      (5) Table 1 is a great condensation of all the results, but the time to exit is missing. It may be helpful if summary statistics on that were here as well.

      We have added time-to-exit to the effect size column in Table 1, alongside the other performance metrics, to provide a more complete summary of the simulation results.

      (6) I may have missed it, but why are there two values for the exit probability when nominal flight speed is varied?

      The exit probability was not monotonic with flight speed, but rather showed a parabolic trend with a clear optimum. Therefore, we reported two values representing the effect before and after the peak. We have clarified this in the revised table and updated the caption accordingly.

      (7) Table 2 has an extra header after the page break on page 18.

      The extra header in Table 2 after the page break has been removed in the revised manuscript.

      (8) The G functions have 2 arguments in their definitions and Equation 1, but only one argument in Equations 2 and 3. I wasn't able to see why.

      Thank you for pointing this out. You are correct—this was a typographical error. We have corrected the argument notation in Equations 2 and 3 and explicitly included the frequency dependence of the gain (G) functions in both equations.

      (9) D_txrx was not defined but it was used in Equation 2.

      The variable D_txrx is defined in the equation notation section as: D<sub>₍ₜₓ</sub>r<sub>ₓ</sub> – the distance [m] between the transmitting conspecific and the receiving focal bat, from the transmitter’s perspective. We have now ensured that this definition is clearly linked to Equation 2 in the revised text. Moreover, we have added a supplementary figure that illustrates the geometric configuration defined by the equations to further support clarity, as described in the Public Review above.

      (10) It was hard for me to understand what was meant by phi_rx and phi_tx. These were described as angles between the rx or tx bats and the target, but I couldn't tell what the point defining the angle was. Perhaps a diagram would help, or more precise definitions.

      We have revised the caption to provide clearer and more precise definitions Additionally, we have included a geometric diagram as a supplementary figure, as noted in the Public Review above, to visually clarify the spatial relationships and angle definitions used in the equations, see lines 498-499.

      (11) Was the hearing threshold the same for both species?

      Yes. We have clarified it in the revised version.

      (12) Collision avoidance is described as turning to the "opposite direction" in the supplemental figure explaining the model. Is this 90 degrees or 180 degrees? If 90 degrees, how do these turns decide between right and left?

      In our model, the bat does not perform a fixed 90° or 180° turn. Instead, the avoidance behavior is implemented by setting the maximum angular velocity in the direction opposite to the detected echo. For example, if the obstacle or conspecific is detected on the bat’s right side, the bat begins turning left, and vice versa.

      This turning direction is re-evaluated at each decision step, which occurs after every echolocation pulse. The bat continues turning in the same direction if the obstacle remains in front, otherwise it resumes regular pathfinding. We have clarified this behavior in the updated figure caption and model description, see lines 478-493.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 27-31: These sentences mischaracterize the results. This claim appears to equate "the model works" with "this is what bats actually do." Also, the model does not indicate that bats' echolocation strategies are robust enough to mitigate the effects of jamming - this is self-evident from the fact that bats navigate successfully via echolocation in dense groups.

      Thank you for the comment. Our aim was not to claim that the model confirms actual bat behavior, but rather to demonstrate that simple and biologically plausible strategies—such as signal redundancy and basic pathfinding—are sufficient to explain how bats might cope with acoustic interference in dense settings. We have revised the wording to better reflect this goal and to avoid overinterpreting the model's implications.

      See abstract in the revised version.  

      (2) Line 37: This number underestimates the number of bats that form some of the largest aggregations of individuals worldwide - the free-tailed bats can form aggregations exceeding several million bats.

      We have revised the text to reflect that some bat species, such as free-tailed bats, are known to form colonies of several million individuals, which exceed the typical range. The updated sentence accounts for these extreme cases, see lines 36-37.

      (3) The flight densities explained in the introduction and chosen references are not representative of the literature - without providing additional justification for the chosen species, it can be interpreted that the selection of the species for the simulation is somewhat arbitrary. If the goal is to model dense emergence flight, why not use a species that has been studied in terms of acoustic and flight behavior during dense emergence flights---such as Tadarida brasiliensis?

      Our goal was to develop a general model applicable to a broad class of FMecholocating bat species. The two species we selected—Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM)—span a wide range of signal characteristics: from wideband (PK) to narrowband (RM), providing a representative contrast in call structure. 

      Although we did not include Tadarida brasiliensis (TB) specifically, its echolocation calls are acoustically similar to RM in terminal frequency and fall between PK and RM in bandwidth. Therefore, we believe our findings are likely to generalize to TB and other FM-bats.

      Moreover, as noted in a previous response, the average inter-bat distance in our highest-density simulations (0.27 m) is still smaller than those reported for Tadarida brasiliensis during dense emergences—further supporting the relevance of our model to such scenarios.

      To support broader applicability, we also provide a supplementary graphical user interface (GUI) that allows users to modify key echolocation parameters and explore their impact on behavior—making the framework adaptable to additional species, including TB.

      (4) Line 78: It is not clear how (or even if) the simulated bats estimate the direction of obstacles. The explanation given in lines 457-463 is quite confusing. What is the acoustic/neurological mechanism that enables this direction estimation? If there is some mechanism (such as binaural processing), how does this extrapolate to 3D?

      This comment echoes a similar concern raised by a previous reviewer. As explained earlier, in the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. The complete  is detailed in  to Reviewer #1, Line 457. This implementation is now clarified in the revised text, and a detailed description of the localization process is also provided in the Methods section (lines 583-592).

      (5) The authors propose they are modeling the dynamic echolocation of bats in the simulation (line 79), but it appears (whether this is due to a lack of information in the manuscript or true lack in the simulation) that the authors only modeled a flight response. How did the authors account for bats dynamically changing their echolocation? This is unclear and from what I can tell may just mean that the bats can switch between foraging phase call types depending on the distance to a detected obstacle. Can the authors elaborate more on this?

      The echolocation behavior of the bats—including dynamic call adjustments— was implemented in the simulation and is described in detail in the Methods section (lines 498-520 and Table 2). To avoid redundancy, the Results chapter originally referred to this section, but we have now added a brief explanation in the Results to clarify that the bats’ call parameters (IPI, duration, and frequency range) adapt based on the distance to detected objects, following empirically documented echolocation phases ("search," "approach," "buzz"). These dynamics are consistent with established bat behavior during navigation in cluttered environments such as caves.

      (6) Figure 1 C3: "Detection threshold": what is this and how was it derived?

      The caption also mentions yellow arrows, but they are absent from the figure. C4: Each threshold excursion is marked with an asterisk, but there are many more excursions than asterisks. Why are only some marked? Unclear.

      C3: The detection threshold is determined dynamically. It is set to the greater of either 7 dB above the noise level (0 dB-SPL)(Kick, 1982; Saillant et al., 1993; Sanderson et al., 2003; Boonman et al., 2013) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB. This clarification has been added to the Methods section. The yellow arrow has been added.

      C4: Thank you for this important observation. Only peaks marked with asterisks represent successful detections—those that were identified in both the interference-free and full detection conditions, as explained in the Methods. Other visible peaks result from masking signals or overlapping echoes from nearby reflectors, but they do not meet the detection criteria. To keep the figure caption concise, we have elaborated on this process more clearly in the revised Methods section. We added this information to the legend

      (7) Figure 2: A line indicating RM, No Masking is absent

      Thank you for pointing this out. The missing line for RM, No Masking has now been added in the revised version of Figure 2.

      (8) Line 121: "reflected off conspecifics". Does this mean echoes due to conspecifics?

      The phrase "reflected off conspecifics" refers to echoes originating from the bat’s own call and reflected off the bodies of nearby conspecifics. We have clarified the wording in the revised text to avoid confusion

      (9) Line 125: Why are low-frequency channels stimulated by higher frequencies? This needs further clarification.

      The cochlear filter bank in our model is implemented using gammatone filters, each modeled as an 8th-order Butterworth filter. Due to the non-ideal filter response and relatively broad bandwidths—especially in the lower-frequency channels—strong energy from the beginning of the downward FM chirp (at higher frequencies) can still produce residual activation in lower-frequency channels. While these stimulations are usually below the detection threshold, they may still be visible as early sub-threshold responses. Given the technical nature of this explanation (a property of the filter implementation) and it does not influence the detection outcomes, we have chosen not to elaborate on it in the figure caption or Methods.

      (10) Lines 146-150: This is an interesting finding. Is there a theoretical justification for it?

      This outcome arises directly from the simulation results. As noted in the Discussion (lines 359-365), although Pipistrellus kuhlii (PK) shows a modest advantage in jamming resistance due to its broader bandwidth, the redundancy in sensory information across calls—enabled by frequent echolocation—appears to compensate for these signal differences. As a result, the small variations in echo quality between species do not translate into significant differences in performance. We speculate that if the difference in jamming probability had been larger, performance disparities would likely have emerged.

      (11) Line 151: The authors define a jammed echo as an echo entirely missed due to masking. Is this appropriate? Doesn't echo mis-assignment also constitute jamming?

      We agree that echo mis-assignment can also degrade performance; however, in our model, we distinguish between two outcomes: (1) complete masking (echo not detected), and (2) detection with a localization error. As explained in the Methods (lines 500–507), we run the detection analysis twice—once with only desired echoes (“interference-free detection”) and once including masking signals (“full detection”). If a previously detected echo is no longer detected, it is classified as a jammed echo. If the echo is still detected but the delay shifts by more than 100 µs compared to the interference-free condition, it is also considered jammed. If the delay shift is smaller, it is treated as a detection with localization error rather than full jamming. We have clarified this distinction in the revised Methods section.

      (12) Figure 2-E: Detection probability statistics are of limited usefulness without accompanying false alarm rate (FAR) statistics. Do the authors have FAR numbers?

      We understand FAR to refer to instances where masking signals or other acoustic phenomena are mistakenly interpreted as real echoes from physical objects. As explained in the manuscript, we implemented two model versions: one without confusion, and one with full confusion.

      Figure 2E reports detection performance under the non-confusion model, in which only echoes from actual physical reflectors are used, and no false detections occur—hence, the false alarm rate is effectively zero in this condition. In the full-confusion model, all detected echoes—including those originating from masking signals or conspecific calls—are treated as valid detections, which may include false alarms. However, we did not explicitly quantify the false alarm rate as a separate metric in this simulation.

      We agree that tracking FAR could be informative and will consider incorporating it into future versions of the model.

      (13) Line 161: RM bats suffered from a significantly higher probability of the "desired conspecific's echoes" being jammed. What does "desired conspecific's echoes" mean? This is unclear.

      The term “desired conspecific's echoes” refers to echoes originating from the bat’s own call, reflected off nearby conspecifics, which are treated as relevant reflectors for collision avoidance. We have revised the wording in the text for clarity.

      (14) Line 188: Why didn't the size of the integration window affect jamming probability? I couldn't find this explained in the discussion.

      The jamming probability in our analysis is computed at the individual-echo level, prior to any temporal integration. Since the integration window is applied after the detection step, it does not influence whether a specific echo is masked (i.e., jammed) or not. Therefore, as expected, we did not observe a significant effect of integration window size on jamming probability.

      (15) Line 217-218: Why do the authors think this would be?

      Thank you for the thoughtful question. We agree that, in theory, increasing call intensity should raise the levels of both desired echoes and masking signals proportionally. However, in our model, the environmental noise floor and detection threshold remain constant, meaning that higher call intensities increase the signal-to-noise ratio (SNR) more effectively for weaker echoes, especially those at longer distances or with low reflectivity. This could lead to a higher likelihood of those echoes crossing the detection threshold, resulting in a small but measurable reduction in jamming probability.

      Additionally, the non-linear behavior of the filter-bank receiver—including such as thresholding at multiple stages—can introduce asymmetries in how increased signal levels affect the detection of target versus masking signals.

      That said, the effect size was small, and the improvement in jamming probability did not translate into any significant gain in behavioral performance (e.g., exit probability or collision rate), as shown in Figure 3C.

      (16) Line 233: I'm not sure I understand how a slightly improved aggregation model that clustered detected reflectors over one-second periods is different. Doesn't this just lead to on average more calls integrated into memory?

      While increasing the memory duration does lead to more detections being available, the enhanced aggregation model (we now refer to as multi-call clustering) differs fundamentally from the simpler one. As detailed in the Methods, it includes additional processing steps: clustering spatially close detections, removing outliers, and estimating wall directions based on the spatial structure of clustered echoes. In contrast, the simpler model treats each detection as an isolated point without estimating obstacle orientation. These additional steps allow for more robust environmental interpretation and significantly improve performance under high-confusion conditions. We have clarified it in revised text (lines 606-616) and added a Supplementary Figure 2B.

      (17) Table 1: What about conspecific target strength?

      We have now added the conspecific target strength as a tested parameter in Table 1, along with its tested range, default value, and measured effect sizes. A detailed sensitivity analysis is also presented in Supplementary Figure 4, demonstrating that variations in conspecific target strength had relatively minor effects on performance metrics.  

      (18) Figure 3-A: The x-axis is the number of calls in the integration window. But the leftmost sample on each curve is at 0 calls. Shouldn't this be 1?

      “0 calls” refers to the case where only the most recent call is used for pathfinding—without integrating any information from prior calls. The x-axis reflects the number of previous calls stored in memory, so a value of 0 still includes the current call. We’ve clarified this terminology in the figure caption.

      (19) Lines 282-283: This statement needs to be clarified that it is with the constraints of using a 2D simulation with at most 33 bats/m^2. It also should be clarified that it is assumed the bat can reliably distinguish between its own echoes and conspecific echoes, which is a very important caveat.

      We have revised the text to clarify that the results are based on a 2D simulation with a maximum tested density of 33 bats/m². We also now explicitly state that the model assumes bats can distinguish between their own echoes and those generated by conspecifics—an assumption we recognize as a simplification. These clarifications help place the results within the scope and constraints of the simulation. Moreover, as described in the text (and noted in previous response): the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m

      (20) Line 294: What is this sentence referring to?

      The sentence refers to the finding that, even under high bat densities, a substantial portion of the echoes—particularly those reflected from nearby obstacles (e.g., 1 m away)—were jammed due to masking. Nevertheless, the bats in the simulation were still able to navigate successfully using partial sensory input. We have clarified the sentence in the revised text to make this point more explicit, see line 333-336.

      (21) Line 302: Was jamming less likely when IPI was higher or lower? I could not find this demonstrated anywhere in the manuscript.

      We agree that the original text was not sufficiently clear on this point. While we did not explicitly test fixed IPI values as a parameter, the model does simulate the natural behavior of decreasing IPI as bats approach obstacles. This behavior is supported by empirical observations and is incorporated into the echolocation dynamics of the simulation. We have clarified this point in the revised text (see Lines 346-351) and explained that while lower IPI introduces more acoustic overlap, it also increases redundancy and improves detection through temporal integration.

      (22) Lines 313-314: This is an interesting assumption, but it is not evident that is substantiated by the references.

      The claim is based on well-established principles in signal processing and bioacoustics. Wideband signals—such as those emitted by PK bats— distribute their energy over a broader frequency range, which makes them inherently more resistant to narrowband interference and masking. This concept is commonly applied in both biological and artificial sonar systems and is supported by empirical studies in bats and theory in acoustic sensing.

      For example, Beleyur & Goerlitz (2019) demonstrate that broader bandwidth calls improve detection in cluttered and jamming-prone environments. Similarly, Ulanovsky et al. (2004) and Schnitzler & Kalko (200) discuss how FM bats' wideband calls enhance temporal and spatial resolution, helping to reduce the impact of overlapping signals from conspecifics. These findings align with communication theory where spread-spectrum techniques improve robustness in noisy environments.

      We agree with the reviewer that this is an important point and we have updated the manuscript to clarify this rationale and cite the relevant literature accordingly – lines 631-363,

      (23) Lines 318-319: What is the justification for "probably"? Isn't this just a supposition?

      We agree with the reviewer’s point and have rephrased the sentence

      (24) Line 320: How does this 63% performance match the sentence in line 295?

      The sentence in Line 295 refers to the overall ability of the bats to navigate successfully despite high jamming levels, highlighting the robustness of the strategy under challenging conditions. The figure in Line 320 (63%) quantifies this performance under the most extreme simulated scenario (100 bats / 3 m²), where both spatial and acoustic interferences are maximal. We have rephrased the text in the revised version (lines 324-327).

      (25) Lines 341-345: It seems like this is more likely to be the main takeaway of the paper.

      As noted in the Public Review above, there is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from those of conspecifics (e.g., Schnitzler, Bioscience, 2001; Kazial et al., 2001, 2008; Burnett & Masters, 2002; Chiu et al., 2009; Yovel et al., 2009; Beetz & Hechavarría, 2022). Therefore, we consider our assumption of selfrecognition to be well-supported, at least under typical conditions. That said, we agree that the impact of echo confusion on performance is significant and highlights a critical challenge in dense environments.

      To our knowledge, this is the first computational model to explicitly simulate both self-recognition and full echo confusion under high-density conditions. We believe that the combination of modeled constraints and the demonstrated robustness of simple sensorimotor strategies, even under worst-case assumptions, is what makes this contribution both novel and meaningful.

      (26) Lines 349-350: What is the aggregation model? What is meant by "integration"?

      We have revised the text to clarify that the “aggregation model” refers to a multi-call clustering process that includes clustering of detections, removal of outliers, and estimation of wall orientation, as described in detail in the revised Methods and Results sections.

      (27) Line 354: Again, why isn't this the assumption we're working under?

      As addressed in our response to Comment 25, our primary model assumes that bats can recognize their own echoes—an assumption supported by substantial empirical evidence. The alternative "full confusion" model was included to explore a worst-case scenario and highlight the behavioral consequences of failing to distinguish self from conspecific echoes. We assume that real bats may experience some degree of echo misidentification; however, our assumption of full confusion represents a worst-case scenario.

      (28) Line 382: "Under the assumption that..." I agree that bats probably can, but if we assume they can differentiate them all, where's the jamming problem?

      The assumption that bats can theoretically distinguish between different signal sources applies after successful detection. However, the jamming problem arises during the detection and localization stages, where acoustic interference can prevent echoes from crossing the detection threshold or distort their timing.

      (29) Lines 386-387: The paper referenced focused on JAR in the context of foraging. What changes were made to the simulation to switch to obstacle avoidance?

      While the simulation framework in Mazar & Yovel (2020) was developed to study jamming avoidance during foraging, the core components—such as the acoustic calculations, receiver model, and echolocation behavior—remain applicable. For the current study, we adapted the simulation extensively to address colony-exit behavior. These modifications include modeling cave walls as acoustic reflectors, implementing a pathfinding algorithm, integrating obstacle-avoidance maneuvers, and adapting the integration window and integration processes. These updates are detailed throughout the Methods section.

      (30) Line 400-402: Something doesn't add up with the statement: each decision relies on an integration window that records estimated locations of detected reflectors from the last five echolocation calls, with the parameter being tested between 1 and 10 calls. Can the authors reword this to make it less confusing?

      We have reworded the sentence to clarify that the default integration window includes five calls, while we systematically tested the effect of using 1 to 10 calls, see lines 486-487.

      (31) Line 393: "30 deg/sec" why was this value chosen?

      The turning rate of 30 deg/sec was manually selected to approximate the curvature of natural foraging flight paths observed in Rhinopoma microphyllum using on-board tags. Moreover, in Mazar & Yovel (2020), we showed that the flight dynamics of simulated bats in a closed room closely matched those of Pipistrellus kuhlii flying in a room of similar dimensions. However, in the current simulation, bats rarely follow a random-walk trajectory due to the structured environment and frequent obstacle detection. As a result, this parameter has no meaningful impact on the simulation outcomes.

      (32) Line 412: "Harmony" --- do you mean harmonic? And what is the empirical evidence that RM bats use the 2nd harmonic compared to the 1st?

      Perhaps showing a spectrogram of a real RM signal would be helpful.

      The typo-error was corrected. For reference See (Goldshtein et al., 2025)

      (33) Table 2: Something is incorrect with the table. The first row on the next page is the wrong species name. Also, where are the citations for these parameter values?

      The table header has been corrected in the revised version. The parameter values for flight and echolocation behavior were derived from existing literature and empirical data: Pipistrellus kuhlii parameters were based on Kalko (1995), and Rhinopoma microphyllum parameters were extracted from our own recordings using on-board tags, as described in Goldstein et al. (2025). We have added the appropriate citations to Table 2.

      (34) Line 442: How was the threshold level chosen?

      The detection threshold in each level is set to the greater of either 7 dB above the noise level (0 dB-SPL) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB.

      (35) Line 445: 100 micros: This is about 3cm. The resolution of PK is about 1cm. For RM it's about 10cm. So, this window is generous for PK, but too strict for RM.

      To keep the model simple and avoid introducing species-specific detection thresholds, we selected a biologically plausible compromise that could reasonably apply to both species. This simplification ensures consistency across simulations while remaining within the known behavioral range.

      (36) Line 448: What is the spectrum of the Gaussian noise, and did it change between PK and RM?

      We used the same white Gaussian noise with a flat spectrum across the relevant frequency range (10–80 kHz) for both species. We have clarified this in the revised text in lines 570-572.

      (37) Line 451: 4 milliseconds is 1.3m. Is this appropriate?

      The 4 milliseconds window was selected based on established auditory masking thresholds described in Mazar & Yovel (2020), and supported by (Popper and Fay, 1995) ch. 2.4.5, ((Blauert, 1997),  ch. 3.1 and (Mohl and Surlykke, 1989). These values provide conservative lower bounds on bats’ ability to cope with masking (Beleyur and Goerlitz, 2019). For simplicity, we used constant thresholds within each window, see lines 574-576.  

      (38) Line 452: Citation for the forward and backward masking durations?

      See the  to the previous comment.

      (39) Lines 460-461: This is unclear. How does the bat get directional information? The authors claim to be able to measure direction-of-arrival for each detection, but it is not clear how this is done

      As noted in our response to Reviewer 1 (Comment on Line 457), directional information is not computed via an explicit binaural model. Instead, we assume the bat estimates the direction of arrival with an angular error that depends on the SNR, based on established studies (e.g., Simmons et al., 1983; Popper & Fay, 1995). We have clarified this in the revised text in lines 583-592.

      (40) Line 467: It seems like the authors are modeling pulse-echo ambiguity, at least in this one alternative model, which is good! However the alternative model doesn't get much attention in the paper. Is there a reason for this?

      We would like to clarify that we did not model pulse-echo. In our confusion model, all echoes received within the IPI are attributed to the bat’s most recent call. This includes echoes that may in fact originate from conspecific calls, but the model does not assign self-echoes to earlier pulses or span multiple IPIs. Therefore, while the model captures echo confusion, it does not include true pulse-echo ambiguity. We have clarified this point in the revised text in lines 551-553.

      (41) Line 41: "continuous" is more appropriate than "constant".

      Thank you, we have rephrased the text accordingly.

      (42) Line 69: "band width" should be one word.

      Thank you, we have corrected it to “bandwidth”.

      (43) Line 79: "bats" should be in the possessive.

      Thank you, the text has been rephrased.

      (44) Line 128: "convoluted" don't you mean "convolved"?

      We have replaced “convoluted” with the correct term “convolved” in the revised text.

      (45) Please check your references, as there are some incomplete citations and typos.

      Thank you, we have reviewed and corrected all references for completeness and consistency.

      References

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Beleyur, T. and Goerlitz, H.R. (2019) ‘Modeling active sensing reveals echo detection even in large groups of bats’, Proceedings of the National Academy of Sciences of the United States of America, 116(52), pp. 26662–26668. Available at: https://doi.org/10.1073/pnas.1821722116.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Blauert, J. (1997) ‘Spatial Hearing: The Psychophysics of Human Sound Localization (rev. ed.)’.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A. et al. (2013) ‘It’s not black or white-on the range of vision and echolocation in echolocating bats’, Frontiers in Physiology, 4 SEP(September), pp. 1–12. Available at: https://doi.org/10.3389/fphys.2013.00248.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldshtein, A. et al. (2025) ‘Onboard recordings reveal how bats maneuver under severe acoustic interference’, Proceedings of the National Academy of Sciences, 122(14), p. e2407810122. Available at: https://doi.org/10.1073/PNAS.2407810122.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042. Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at:

      https://doi.org/10.1073/pnas.1006630107.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/15451542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469– 478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kick, S.A. (1982) ‘Target-detection by the echolocating bat, Eptesicus fuscus’, Journal of Comparative Physiology □ A, 145(4), pp. 431–435. Available at: https://doi.org/10.1007/BF00612808/METRICS.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Mohl, B. and Surlykke, A. (1989) ‘Detection of sonar signals in the presence of pulses of masking noise by the echolocating bat , Eptesicus fuscus’, pp. 119–124.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Popper, A.N. and Fay, R.R. (1995) Hearing by Bats. Springer-Verlag.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648– 1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001, undefined (no date) ‘Echolocation by insecteating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ’, academic.oup.comHU Schnitzler, EKV KalkoBioscience, 2001•academic.oup.com [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. et al. (1983) ‘Acuity of horizontal angle discrimination by the echolocating bat , Eptesicus fuscus’. Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-64269271-0_20.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight’, cs-web.bu.edu [Preprint]. Available at: https://csweb.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491– 8498. Available at: https://doi.org/10.1073/pnas.0703550105. Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.