10,000 Matching Annotations
  1. Apr 2025
    1. eLife Assessment

      This useful study provides the first assessment of potentially interactive effects of seasonality and blood source on mosquito fitness, together in one study. During revision, the manuscript has been substantively improved, providing additional solid data to support the robustness of observations. Overall, this interesting study will advance our current understanding of mosquito biology.

    2. Reviewer #1 (Public review):

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

    3. Reviewer #2 (Public review):

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates.

    4. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you for your valuable comments and suggestions, which have greatly contributed to improving our manuscript.

      We have carefully addressed all the reviewers' suggestions, and detailed responses for each Reviewer are provided at the end of this letter. In summary:

      • The Introduction has been revised to provide a more focused discussion on results, toning down the speculative discussion on seasonal host shifts.

      • The methodology section has been clarified, particularly the power analysis, which now includes a clearer explanation. The random effects in the models have been better described to ensure transparency.

      • The Results section was reorganized to highlight the key findings more effectively.

      • The Discussion has been restructured for clarity and conciseness, ensuring the interpretation of the results is clearer and better aligned with the study objectives.

      • Minor edits throughout the manuscript were made to improve readability and accuracy.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx.

      quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

      Introduction and discussion have been modified, toned down and sticked to discuss the results as suggested.

      Reviewer #1 (Recommendations for the authors):

      Some more minor comments are mentioned below.

      Line 51: Because 'of' this,

      Changed as suggested.

      Line 56: specialists 'or' generalists

      Changed as suggested.

      Line 56: primarily

      Changed as suggested.

      Line 98: Because 'of' this,

      Changed as suggested.

      Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed hostswitching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.

      Changed as suggested.

      (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.

      The power analysis was conducted a posteriori, as you correctly inferred. While I did not indicate that it was performed a priori, you are right in noting that this was not explicitly mentioned. As you suggested, the methodology for the power analysis has been revised to clarify any potential doubts.

      Regarding the model for hatchability, a model without a random effect variable was used, as all attempts to fit models with random effects resulted in poor validation. These points have now been clarified and explained in the corresponding section.

      (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?

      Changed as suggested.

      (4) According to eLife policy, data must be made freely available (not just upon request).

      Data and code will be publicly available. The corresponding section was modified.

      Reviewer #2 (Recommendations for the authors):

      Your manuscript is much improved by the inclusion of two additional replicates! The results are much more robust when we can see that the trends that you report are replicable across 3 iterations of the experiment. Congratulations on a greatly improved study and paper! I have several minor concerns and suggestions, listed below:

      38-39: I think it is clearer to say "no statistically significant effect of season on hatchability of eggs" ... or specify if you are referring to blood or the interaction of blood and season. It isn't clear which treatment you are referring to here.

      Changed as suggested.

      54-57: This could be stated more succinctly. Instead of citing papers that deal with specific examples of patterns, I would suggest citing a review paper that defines these terms.

      Changed as suggested.

      83-84: What if another migratory bird is the preferred host in Argentina? I would state this more cautiously (e.g. "may not be applicable...").

      Changed as suggested.

      95-96: I don't understand what you mean by this. These hypotheses are specifically meant to understand mosquitoes that DO have a distinct seasonal phenology, so I'm not sure why this caveat is relevant. And naturally this hypothesis is host dependent, since it is based on specific host reproductive investments. I think that the strongest caveat to this hypothesis is simply that it hasn't been proven.

      Changed as suggested.

      97-115: This is a great paragraph! Very clear and compelling.

      Thanks for your words!

      118: Do you have an exact or estimated number of rafts collected?

      Sorry, I have not the exact number of rafts, but it was at leas more than 20-30.

      135: "over twenty" was changed to "several"; several would imply about 3 generations, so this is misleading. If the colony was actually maintained for over twenty generations, then you should keep that wording.

      Changed as suggested.

      163-164: Can you please clarify whether the replicates were conducted a separate time points?

      Changed as suggested.

      Note: the track changes did not capture all of the changes made; e.g. 163-164 should show as new text but does not.

      You are absolutely right; when I uploaded the last version, I unfortunately deleted all tracked changes and cannot recover them. In this new version, I will ensure that all minimal changes are included as tracked changes.

      186 - 189: the terms should be "fixed effect" and "random effect"

      Changed as suggested.

      191: Edit: linear

      Changed as suggested.

      194: why was replicate not included as a random effect here when it was above? Also, can you please clarify "interaction effects"? Which interactions did you include?

      Changed as suggested. Explained above and in methodology. Hatchability models with random effect variable were poor fitted and validated. The interactions for hatchability were a four-way (season, blood source, cycle and replicate)

      207-208: I'm not sure what you mean by "aimed to achieve"? Weren't you doing this after you conducted the experiments, so wouldn't this be determining the power of your model (post-hoc power analysis)? Also, I think you should provide the parameter estimates that were used (e.g. effect size - did you use the effect size you estimated across the 3 replicates?).

      Changed as suggested.

      214-215: this should be reworded to acknowledge that this is estimated for the given effect size; for example, something like "This sample size was sufficient to detect the observed effect with a statistical power of 0.8" or something along those lines (unless I am misunderstanding how you conducted this test).

      Changed as suggested.

      246. Abbreviate Culex

      Changed as suggested.

      253-255: This sentence isn't clear. What do you mean by mixed? Also, the season really seemed to mainly impact the fitness of mosquitoes fed on mouse blood and here the way it is phrased seems to indicate that season has an impact on the fitness of those fed with chicken blood.

      Changed as suggested.

      258-260: You stated your hypothesis as the relative fitness shifting between seasons, but this statement about the expectation is different from your hypothesis stated earlier. Please clarify.

      You are right. Thank you for noting this. It was changed as suggested.  

      263-266: I also don't understand this sentence; what does the first half of the sentence have to do with the second?

      Changed as suggested.

      269-270: This doesn't align with your observation exactly; you say first AND second are generally most productive, but you observed a drop in the second. Please clarify this.

      Changed as suggested.

      280: I suggest removing "as same as other studies"; your caveats are distinct because your experimental design was unique

      Changed as suggested.

      287: you shouldn't be looking for a "desired" effect; I suggest removing this word

      Changed as suggested.

      288: It wasn't really a priori though, since you conducted it after your first replicate (unless you didn't use the results from the first replicate you reported in the original drafts?)

      It was a posteriori. Changed as suggested.

      290: Why is 290 written here?

      It was a mistype. Deleted as suggested.

      291-298: The meaning of this section of your paragraph is not clear.

      Improve as suggested.

      304-313: This list of 3 explanations are directed at different underlying questions. Explanations 1 and 2 are alternative explanations for why host switching occurs if not due to differences in fitness. This isn't really an explanation of your results so much as alternative explanations for a previously reported phenomenon. And the third is an explanation for why you may not have observed the expected effect. I suggest restructuring this to include the fact that Argentinian quinqs may not host switch as part of your previous list of caveats. Then you can include your two alternative explanations for host switching as a possible future direction (although I would say that it is really just one explanation because "vector biology" is too broad of a statement to be testable). Also, you haven't discussed possible explanations for your actual result, which showed that mosquito fitness decreased when feeding on mouse blood in autumn conditions and in the second gonotrophic, while those that fed on chicken did not experience these changes. Why might that be?

      The discussion was restructured to include all these suggested changes. Additionally, it was also discussed some possible explanations of our results.

      315-317: This statement is vague without a direct explanation of how this will provide insight. I suggest removing or providing an explanation of how this provides insight to transmission and forecasting.

      Changed as suggested.

      319-320: According to eLife policy, all data should be publicly available. From guidelines: "Media Policy FAQs Data Availability Purpose and General Principles To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all data associated with an article to be made freely and widely available. These must be in the most useful formats and according to the relevant reporting standards, unless there are compelling legal or ethical reasons to restrict access. The provision of data should comply with FAIR principles (Findable, Accessible, Interoperable, Reusable). Specifically, authors must make all original data used to support the claims of the paper, or that is required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). This must include all variables, treatment conditions, and observations described in the manuscript. The authors must also provide a full account of the materials and procedures used to collect, pre-process, clean, generate and analyze the data that would enable it to be independently reproduced by other researchers."

      - so you need to make your data available online; I also understand the last sentence to indicate that code should be made available.  

      Data and code will be publicly available.

      Table 1: it is notable that in replicate 2, the autumn:mouse:gonotrophic cycle II fecundity and fertility are actually higher than in the summer, which is the opposite of reps 1 and 3 and the overall effect you reported from the model. This might be worth mentioning in the discussion.

      Mentioned in the discussion as suggested.

      Tables 1 and 2: shouldn't this just be 8 treatments? You included replicate as a random effect, so it isn't really a separate set of treatments.

      This table reflects the output of the whole experiment, that is why it is present the 24 expetiments.

      Figure 3: Can you please clarify if this is showing raw data?

      Changed as suggested.

      Note: grammatical copy editing would be beneficial throughout

      Grammar was improved as suggested.

    1. eLife Assessment

      This important work advances our understanding of how the SARS-CoV-2 Nsp16 protein is regulated by host E3 ligases to promote viral mRNA capping. Support for the overall claims in the revised manuscript is convincing . This work will be of interest to those working in host-viral interactions and the role of the ubiquitin-proteasome system in viral replication.

    2. Reviewer #1 (Public review):

      In this study, Tiang et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      Minor comments:<br /> Figure 5A- The authors should ensure that the figure is properly labeled to clearly distinguish between the IP (Immunoprecipitation) panel and the input panel.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Comments on revisions:

      I had gone through the revised form of the manuscript thoroughly. The authors have addressed all of my concerns. To me, the experimental approach looks convincing that the host E3 ubiquitin ligases (UBR5 and MARCHF7) ubiquitinate NSP16 and mark it for proteasomal degradation via K48- and K27- linkage. The authors have represented the final figure (Fig.8) in a convincing manner, opening a new window to explore the mechanism of capping the vRNA bu NSP16.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Tian et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      Minor comments:

      Figure 5A- The authors should ensure that the figure is properly labeled to clearly distinguish between the IP (Immunoprecipitation) panel and the input panel.

      Thank you for your suggestion. We have exchanged Figure 5 in this version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Comments on revisions:

      I had gone through the revised form of the manuscript thoroughly. The authors have addressed all of my concerns. To me, the experimental approach looks convincing that the host E3 ubiquitin ligases (UBR5 and MARCHF7) ubiquitinate NSP16 and mark it for proteasomal degradation via K48- and K27- linkage. The authors have represented the final figure (Fig.8) in a convincing manner, opening a new window to explore the mechanism of capping the vRNA bu NSP16.

      Thank you for your recognition.

    1. eLife Assessment

      This manuscript shows that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important for two reasons. This approach can be used as a mouse model for Parkinson's Disease without the need for the infusion of toxins (e.g. 6-OHDA or MPTP) — this mouse model also has the advantage of showing axon-first degeneration over a time course (2–4 weeks) that is suitable for experimental investigation. Also, the findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration, alters motor behavior, and alters mRNA expression is convincing. This is an exciting paper that will have an impact on the Parkinson's Disease field.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. They also presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      Weaknesses:

      Although not fully supported by data, the authors provided a well-explained rationale and proposed possible mechanisms for dopamine neuron degeneration due to chronic activation in their results and discussion.

      Comments on revised version:

      The authors have adequately addressed most of my comments, and I have no further concerns.

    3. Reviewer #2 (Public review):

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important for two reasons: 1. This approach can be used as a mouse model for Parkinson's Disease without the need for the infusion of toxins (e.g. 6-OHDA or MPTP). This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration, alters motor behavior, and alters mRNA expression is convincing. This is an exciting and important paper and will have an impact on the Parkinson's Disease field.

      Strengths:

      This is an exciting and important paper and will have an impact on the Parkinson's Disease field.

      It presents a new highly useful mouse model of PD.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      Weaknesses:

      The authors have addressed all my concerns. This is an interesting, important, and carefully-controlled study.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons. In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript. The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease.

      Comments on revisions:

      The authors have done a good job at revising the manuscript. The revised manuscript better frames the results in the context of previous literature.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a substantial mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have executed several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We believe that these additions significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      We thank the reviewer for this important recommendation. Although the initial version showed that CNO does not produce degeneration of DA neuron terminals, it did not exclude a contribution to the behavioral changes. To address this, we now include a cohort of DREADD free non-injected mice treated with either vehicle or CNO (Figure S1C). We found that on its own, CNO did not significantly impact either light cycle or dark cycle running. Together these results along with the lack of degeneration observed with CNO treatment in non-DREADD mice (Figure 2D) support that our behavioral and histological results are the result of dopamine neuron activation.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and have completed these experiments in the revision (Figure 1, Figure S2). We now show that in vivo treatment with CNO causes some of the same physiological changes in VTA dopamine neurons as we found in SNc dopamine neurons, including an increased spontaneous firing rate, and a similar decrease in responsiveness to CNO in the slice recordings. Together these observations support the conclusion that SNc axons are intrinsically more vulnerable to increased activity than VTA dopamine axons. 

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We have clarified which mice had access to a running wheel in the methods of our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps prevented mice from having access to a running wheel in their home cage. Mice used for non-responder and non-hM3Dq (CNO alone) experiments also had access to a running wheel during their treatment. Mice used for the isradipine experiment did not have access to a running wheel, as the number of mice was too large and while unilateral hM3Dq expression allows for within-animal controls, it does not lend to clear interpretation of running wheel data.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promote degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not detect an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than VTA DA neurons, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel. In addition, we are not aware of prior studies that have chronically activated DREADDs over several weeks to produce neurodegeneration.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      Thank you for this comment. As discussed in greater detail in the “comments on results section” below, our data suggests this isn’t a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and have expanded on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking, and the little data that exists is difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). In addition to the human and rodent data already discussed in the manuscript, additional support for increased activity in PD models include:

      • Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      • Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      • Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We have included citation of these important examples in our revision. In our model, we have found that chronic hyperactivity causes a substantial loss of nigral DA terminals while mesolimbic terminals are relatively spared (Figure 2), and that striatal DA levels are markedly decreased (Figure S6), phenomena that are hallmarks of Parkinson’s disease.

      There are additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models a form of increased intrinsic activity, and interpretation of our results will be facilitated as we learn more about how the activity of DA neurons changes in humans in PD. Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. Importantly, while significant changes to burst firing were not seen until almost complete loss of dopamine neurons, these recordings were made in anesthetized rats which may not be representative of neural activity in awake animals. We adjusted the text so that this is no longer referred to as ‘partial’ loss. At the same time, we point out that the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al., Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al., Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al., J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al., Annu Rev Pathol 2011, PMID: 21034221).   

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020, PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. Accordingly, we have expanded on our citation of this literature in both the introduction and discussion sections. However, we believe that the novelty of our study lies in: 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Figure S1C (now Figure S2A, S2B), which was unchanged in CNO-treated animals compared to controls. We did not previously report the resting membrane potential because many of the DA neurons were spontaneously firing. In the revision, we now report the initial membrane potential on first breaking into the cell for the whole cell recordings, which did not vary between groups (Figure S2). This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing the neuron with the internal solution, which might alter the intracellular concentrations of ions. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S2). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (Figure S4B). This finding is also consistent with increased activity of the DA neurons. We have added discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, coexpressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Our control animals give us an indicator of injection variability, which is likely substantial and prevents us from detecting more subtle changes. Nonetheless, we believe that it conveys useful complementary data. We discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine

      neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals, such that only robust effects would be detected. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. Given this small effect size, we would indeed need much larger groups to better discern these changes. Stereology is an intensive technique, and we have therefore elected to focus on terminal loss. We have also replaced panel 2G with a more representative CNO image.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We have included a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We have also included frequency and amplitude data for these recordings (Figure S4A), along with discussion of the significance of these findings.

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      While levels of DARPP32 mRNA were unchanged, our additional HPLC data show strong decreases in striatal dopamine in hyperactivated mice. We do not see strong changes in classic activity-related genes (data not shown), however these genes may behave differently in the context of chronic hyperactivity and ongoing degeneration. Instead, we employed NEUROeSTIMator (Bahl et al., Nature Comm. 2024, PMID: 38278804), a deep learning method to predict neural activation based on transcriptomic data. We found that predicted activity scores were significantly higher in GqCNO dopaminergic regions compared to controls (Figure X). Indeed, some of the genes used within the model to predict activity are immediate early genes eg. c-fos.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared? Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing our mouse model to early PD samples when there is more limited SNc DA neuron loss (see the proportion of DA neurons within the areas of human tissues we selected for sampling in Author response image 1). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration to those in patients where degeneration is ongoing.    

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV). Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by activating Gq pathways that are classically expected to increase intracellular calcium to increase neuronal excitability. Indeed in slices from mice that were not treated with CNO, acute CNO application caused depolarizations (Figure 1E) that can be due to an increase in intracellular calcium and also cause increases in intracellular calcium. Additionally, our results show increased calcium by fiber photometry and changes to calcium-related genes, suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point. Indeed, a small preliminary experiment with chronic isradipine failed to show protection, although it lacked power to detect a partial effect. We have acknowledged this in the text, and also briefly consider other mechanisms such as increased dopamine levels that could also mediate the toxicity.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we sampled SN DA neurons in early PD (see Author response image 1), and in our view there is great value for such comparisons.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we have included additional electrophysiology experiments and have added discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150), while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020, PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We have amended our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we have revised the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The temporal design of the experiments is quite confusing. For instance, Figures 1 and 3 illustrate the daily changes of the mice and suggest some critical time points within 2 weeks of CNO administration, whereas Figure 2 presents data at 2 and 4 weeks, which are much later than the proposed critical time points. Furthermore, Figure 4 includes only 1 week data, and lacks subsequent data from 2 and 4 weeks, at which significant changes such as calcium levels and neuronal/axonal degeneration are observed.

      While interesting behavior and calcium phenotypes were detected within 2 and 4 weeks of CNO administration (Figures 1 and 3), we only collected tissues for histology at the 2 and 4 week time points (Figure 2). Observing degeneration of DA neuron axons but not cell bodies at 2 weeks served as a rationale to extend to the 4 week time point to determine whether degeneration was progressive. At the same time, our primary focus is on identifying early changes that may drive or contribute to the degeneration. As such, we recorded calcium changes over a 2-week treatment period, capturing the period during which almost all of the dopamine axons are lost. Similarly, we had the capacity to perform spatial transcriptomics at only one time point, and the 1 week time point was selected to capture transcriptomic changes that precede and potentially contribute to the mild and severe degeneration that occurs at 2 and 4 weeks, respectively. We have added text clarifying the rationale for the time points chosen.

      (2) The authors showed the changes in neuronal firing in dopamine neurons by the administration of CNO. However, one of the most important features of dopaminergic neuronal activity is dopamine release at its axon terminals in the striatum. Thus, the claims raised in this paper would be better supported if the authors further show any alterations in dopamine release (by FSCV or fluorescent dopamine sensors) at some critical time points during or after CNO application.

      While we are confident that DA release is altered due to the significant changes in behavior when hM3Dq DREADDs are activated specifically in DA neurons, the current manuscript does not quantify this, or distinguish between axonal and somatodendritic DA release. Interestingly, we did find significantly decreased striatal dopamine by HPLC after chronic activation (Figure S6). We believe that resolving these questions is beyond the scope of this manuscript, but have added text indicating the importance of these experiments.

      (3) The authors used 2% sucrose as a vehicle via drinking water. Please explain the rationale behind this choice.

      We used 2% sucrose as the vehicle because it is also added to the CNO water to counteract the bitterness of CNO (Kumar et al., J Neurotrauma 2024, PMID: 37905504). We have clarified this in the manuscript.

      (4) As we know, mRNA levels of some genes do not always predict their protein levels; there is sometimes a huge discrepancy between mRNA and protein abundance. In this paper, the mechanistic interpretation of the results by the authors heavily relies on the spatial transcriptomics of the midbrain and striatum. Thus, the authors need to provide additional data proving that the gene expression of some genes in the CNO group is also changed at the level of protein.

      We agree that validating hits at the protein level is valuable, however we were limited in our ability to assess these changes for the revision. However, we have done additional transcriptomics with the high resolution Xenium platform to increase confidence in a subset of hits of interest for follow up in future work, and we included data on genes related to DA metabolism and markers of DA neurons.

      (5) The authors provided spatial transcriptomics data only for mice with one week of chronic activation. However, other data also indicate significant differences when the activation period extends beyond 10 to 12 days (Figure 1C, Figure 3D-F). While a 7-day chronic activation time point might be crucial, additional transcriptomics data from later time points would be beneficial to confirm the persistence of these changes in gene expression. Furthermore, differential gene expression (DEG) analysis at these later time points could identify novel pathways or genes influenced by the chronic activation of dopamine neurons.

      This is an interesting point and would provide valuable data as to how chronic activity influences gene expression, however additional transcriptomics at later timepoints is beyond the scope of this paper. In future studies we will assess changes observed in this manuscript at other time points.

      (6) Figure 1D, Figure S1C:

      The authors should present the sample recording traces to demonstrate that the electrophysiological recordings were appropriately made.

      These data have been provided in Figure S2.

      (7) Figure S1C:

      AP thresholds in SNc dopamine neurons from both groups look quite high. In addition, considering the data from the previous reports, AP peak amplitudes in SNc dopamine neurons from both groups seem to be very low. Are these values correct? 

      The thresholds and peaks are correct, including the AP (threshold to peak), which is typical in our (Dr. Margolis’s) experience. AP thresholds are measured from an average of at least 10 APs, as the voltage at which the derivative of the trace first exceeds 10 V/s. As mentioned in the methods section, junction potentials were not corrected, which can result in values that are a bit depolarized from ground truth. This junction potential would be consistent across all recordings, thus not impede detection of a difference in AP thresholds between groups of animals.

      (8) Figure 1E:

      It would be better if the statistical significance is depicted in the graph.

      We don’t perform repeated measures statistics across data like these, as the data are continuous, collected at 10 kHz. For ease of displaying the data, the data for each neuron is binned and then these traces are averaged together. We display SEM to give a sense of the variance across neurons. We have provided sample traces of individual neurons to better demonstrate the variability and significance of this data (Figure S2).

      (9) Figure 2C:

      The representative staining images appear to be taken from coronal slices at anatomically different positions along the rostral-to-caudal axis. Although the total numbers of TH+ cells are comparable between vehicle and CNO groups in the graph, the sample images do not reflect this result. The authors should replace the current images with the better ones.

      We have replaced this image in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Minor concerns:

      (1) The authors claim that their transcriptomics experiments are conducted 'before any degeneration has occurred'. And they do not see significant differences in the TH expression in the striatum. However, the n for these mice at 1 week is lower than the n use at 2 weeks (n=5 vs n=8-9) and the images used to show 'no degeneration' really look like there is some degeneration going on. Also, throughout the paper, there is a stronger effect when degeneration is measured with mCherry compared to when it is measured with TH. The 'no change' claim is made only with the TH comparison. It seems possible (and almost likely) that there would be significant axonal degeneration at one week with either a higher sample size or using the mCherry comparison. The authors should simply claim that their transcriptomics data is collected before any 'somatic' degeneration occurs.

      Thank you, we have included data that shows partial terminal loss after one week of activation (Figure S3B, Figure S5A) and have corrected this language in the manuscript to reflect transcriptomics occurring before somatic degeneration.

      (2) While selective degeneration is one of the most interesting findings in the paper, that finding is not emphasized and why it would be interesting to compare the VTA vs SNc is not discussed in the introduction.

      Emphasis for comparing the VTA vs the SNc has been added to the introduction, along with additional electrophysiology data in VTA dopamine neurons in Figure 1 and Figure S2.

      (3) In a similar direction, the vulnerability of dopaminergic neurons has been shown to be differential even within the SNc, with the ventral tier neurons degenerating more severely and the dorsal tier neurons remaining resilient. Is there any evidence for a ventral-dorsal degeneration gradient in the SNc in these experiments?

      This is a really interesting point and changes to dopamine neuron subtypes along the ventraldorsal axis may be occurring in this model, particularly as there is more selective loss of SNc neurons. However, the cell type involved would be difficult to determine at this stage, since single cell transcriptomic resolution is necessary across the entire SNc to identify cell subtypes. Transcriptomic identification is further complicated given that transcriptome change has recently been shown with genetic manipulation (Gaertner et al., bioRxiv 2024, PMID: 38895448), and we would think could similarly change with increased activity. Assessing these issues are beyond the scope of this paper.

      (4) The running data is very interesting and the circadian rhythm alterations are compelling.

      However, it is unclear whether the CNO mice run more total compared with the vehicle mice.

      The authors should show the combined total running data to evaluate this. We now show total running data in Figure 1C.

      (5) The finding that acute CNO has no effect on the membrane potential of SNc neurons after chronic CNO exposure is very peculiar! Especially because the fiber photometry data suggests that CNO continues to have an effect in vivo. Is there any explanation for this?

      While there is no acute electrophysiological response to CNO detected in this group, there may be intracellular pathways activated by the DREADD that do not acutely impact membrane potential in current clamp (I = 0 pA) mode.

      (6) The terminology of chronic CNO is sometimes confusing as it refers to both 2-week and 4week administration. Using additional terminology such as 'early' and 'late' might help with clarity.

      We have decreased usage of ‘chronic,’ and increased usage of more specific treatment times in order to increase clarity throughout the manuscript.

      (7) In Figure 2C, the SNc image looks binarized.

      This image has been updated.

      (8) Also in Figure 2, why are TH and mCherry measured for the 4-week time point, but only TH measured for the 2-week time point?

      mCherry quantification was performed to further support the finding of DA neuron death, and was therefore not assessed at 2 weeks given that there was no change in the TH stereology.

      (9) Additional scale bars and labeling is needed in Figure 3. In addition, there is such a strong reduction in noise after chronic CNO in the fiber photometry recordings, and the noise does not return upon CNO washout. What is the explanation for this?

      Additional scale bars were added to Figure 3. Traces are not getting less noisy with chronic CNO treatment, rather, there is less bursting activity in the dopamine cells. Our interpretation is that the baseline activity is rescued during washout but this bursting activity is not.

      (10) While not necessary to support the claims in this paper, it would be very interesting to see if chronic inhibition of dopaminergic neurons had a similar or different effect, as too little dopaminergic activity may also cause degeneration in some cases.

      We agree that assessing chronic inhibition is valuable, and this is an important area for future research.

      Reviewer #3 (Recommendations For The Authors):

      All the mice used in the study are not listed in the methods section. For example, the GCaMP6f floxed mice discussed in the results section are not listed in the methods. Also, the breeding scheme used for the different mouse lines needs to be described. For example, did the DAT-Cre mice carry one or two alleles?

      Both the DAT<sup>IRES</sup>Cre and GCaMP6f floxed (Ai148) Jax mouse line numbers and RRIDs are included in the methods. DAT<sup>IRES</sup>Cre mice carried two alleles.

      In the methods section, the amount of virus injected needs to be mentioned.

      This information has been added to the methods section.

      In all result graphs, please include the individual data points so that the readers can see the distribution of the data and quickly see the sample size.

      Graphs have been updated to include all individual data points. For line graphs, the distribution is communicated by the error bars, while the n is in the legends.

      The authors provide running wheel data in supplementary figure 1A to validate that chemogenetic activation of dopamine neurons leads to increased locomotor activity. The results shown in the figure appear to be qualitative as no average data is presented. The authors should provide average data from all mice tested.

      Average IP response data for all mice assessed for running wheel activity has been included in Figure S1.

    1. eLife assessment

      In this manuscript, Rademacher and colleagues examined the effect of a chemogenetic approach on the integrity of the dopamine system in mice with chronically stimulating dopamine neurons. These findings are important: (1)This approach led to an axon-first degeneration over an experimentally useful time course (2-4 weeks); (2) The finding that direct excitation of dopaminergic neurons causes differential degeneration sheds light on dopaminergic neuron selective vulnerability mechanisms. Overall, the strength of the evidence is solid, but the behavior experiments that do not include a CNO control provide incomplete support for the findings.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

    3. Reviewer #2 (Public Review):<br /> <br /> Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:<br /> This is an exciting and important paper.<br /> The paper compares mouse transcriptomics with human patient data.<br /> It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons. In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript. The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focussing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis. Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results.

      The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

    1. eLife Assessment

      This valuable study reveals that the structural protein vimentin promotes the epithelial-mesenchymal transition in breast cancer cells. Utilizing robust and validated methodologies, the data collected provide a solid foundation for further investigation into metastasis models. This work will be of significant interest to researchers in the field of breast cancer.

    2. Reviewer #2 (Public review):

      The aim of the investigation was to find out more about the mechanism(s) by which the structural protein vimentin can facilitate the epithelial-mesenchymal transition in breast cancer cells.

      The authors focused on a key amino acid of vimentin, C238, its role in the interaction between vimentin and actin microfilaments, and the downstream molecular and cellular consequences. They model the binding between vimentin and actin in silico to demonstrate the potential involvement of C238, due to its location in a rod domain known to bind beta-actin. The phenotype of a non-metastatic breast cancer cell line MCF7, which doesn't express vimentin, could be changed to a metastatic phenotype when mutant C238S vimentin, but not wild-type vimentin, was expressed in the cells. Expression of vimentin was confirmed at the level of mRNA, protein and microscopically. Patterns of expression of vimentin and actin reflected the distinct morphology of the two cell lines. Phenotypic changes were assessed through assay of cell adhesion, proliferation, migration and morphology and were consistent with greater metastatic potential in the C238S MCF7 cells. Changes in the transcriptome of MCF7 cells expressing wild-type and C238S vimentins were compared and expression of Xist long ncRNA was found to be the transcript most markedly increased in the metastatic cells expressing C238S vimentin. Moreover changes in expression of many other genes in the C238S cells are consistent with an epithelial mesenchymal transition. Tumourigenic potential of MCF7 cells carrying C238S but not wild-type, vimentin was confirmed by inoculation of cells into nude mice. This assay is a measure of stem-cell quality of the cells and not a measure of metastasis. It does demonstrate phenotypic changes that could be linked to metastasis.

      shRNA was used to down-regulate vimentin or Xist in the MCF7 C238S cells. The description of the data is limited in parts and data sets require careful scrutiny to understand the full picture. Down-regulation of vimentin reversed the morphological changes to some degree, but down-regulation of Xist didn't. Conversely down-regulation of Xist inhibited cell growth, a sign of reversing metastatic potential, but down-regulation of vimentin had no effect on growth. Down-regulation of either did inhibit cell migration, another sign of metastatic reversal. Most of these findings are consistent with previous work based on ectopic expression of wild-type vimentin in MCF7 cells, but the mechanism of inhibition of cell migration by downregulation of Xist remains speculative. More complete knockdown of vimentin or Xist by CRISPR technology may be helpful.

      Overall the study describes an intriguing model of metastasis that is worthy of further investigation, especially at the molecular level to unravel the connection between vimentin and metastasis. The identification of a potential role for Xist in metastasis, beyond its normal role in female cells to inactivate one of the X chromosomes, corroborates the work of others demonstrating increased levels in a variety of tumours in women and even in some tumours in men. It would be of great interest to see where in metastatic cells Xist is expressed and what it binds to.

      Comments on revisions:

      The revised manuscript incorporates changes in presentation of the data modelling interaction between the region of vimentin including C238 and F-actin. There is also inclusion of an extra citation supporting the role for Xist in cancer stem cell differentiation.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary, and Strengths:

      The authors and their team have investigated the role of Vimentin Cysteine 328 in epithelial-mesenchymal transition (EMT) and tumorigenesis. Vimentin is a type III intermediate filament, and cysteine 328 is a crucial site for interactions between vimentin and actin. These interactions can significantly influence cell movement, proliferation, and invasion. The team has specifically examined how Vimentin Cysteine 328 affects cancer cell proliferation, the acquisition of stemness markers, and the upregulation of the non-coding RNA XIST. Additionally, functional assays were conducted using both wild-type (WT) and Vimentin Cysteine 328 mutant cells to demonstrate their effects on invasion, EMT, and cancer progression. Overall, the data supports the essential role of Vimentin Cysteine 328 in regulating EMT, cancer stemness, and tumor progression. Overall, the data and its interpretation are on point and support the hypothesis. I believe the manuscript has great potential.

      The authors are thankful to the reviewers for carefully reading the manuscript and evaluating the data to make positive comments and supporting our conclusions.

      Weaknesses:

      Minor issues are related to the visibility and data representation in Figures 2E and 3 A-F

      We have revised the figures (Figure 2E and Figure 3A-F) to increase the data visibility.

      Reviewer #2 (Public review):

      The aim of the investigation was to find out more about the mechanism(s) by which the structural protein vimentin can facilitate the epithelial-mesenchymal transition in breast cancer cells.

      The authors focussed on a key amino acid of vimentin, C238, its role in the interaction between vimentin and actin microfilaments, and the downstream molecular and cellular consequences. They model the binding between vimentin and actin in silico to demonstrate the potential involvement of C238, but the outcome is described vaguely.

      We have expanded the discussion of these results in the manuscript to more explicitly describe the critical role of C238 in the vimentin-actin interaction. Specifically, we highlight that C238 lies within a region of the vimentin rod domain known to mediate key protein-protein interactions. Our modeling shows that the thiol group of C238 enables specific hydrogen bonding and potential disulfide-mediated interactions with actin, which are disrupted upon mutation to serine. These findings provide mechanistic insight into the functional importance of this residue.

      The phenotype of a non-metastatic breast cancer cell line MCF7, which doesn't express vimentin, could be changed to a metastatic phenotype when mutant C238S vimentin, but not wild-type vimentin, was expressed in the cells. Expression of vimentin was confirmed at the level of mRNA, protein, and microscopically. Patterns of expression of vimentin and actin reflected the distinct morphology of the two cell lines. Phenotypic changes were assessed through assay of cell adhesion, proliferation, migration, and morphology and were consistent with greater metastatic potential in the C238S MCF7 cells. Changes in the transcriptome of MCF7 cells expressing wild-type and C238S vimentins were compared and expression of Xist long ncRNA was found to be the transcript most markedly increased in the metastatic cells expressing C238S vimentin. Moreover changes in expression of many other genes in the C238S cells are consistent with an epithelial mesenchymal transition. Tumourigenic potential of MCF7 cells carrying C238S but not wild-type, vimentin was confirmed by inoculation of cells into nude mice. This assay is a measure of the stem-cell quality of the cells and not a measure of metastasis. It does demonstrate phenotypic changes that could be linked to metastasis.

      shRNA was used to down-regulate vimentin or Xist in the MCF7 C238S cells. The description of the data is limited in parts and data sets require careful scrutiny to understand the full picture. Down-regulation of vimentin reversed the morphological changes to some degree, but down-regulation of Xist didn't.

      This is understandable given the fact that vimentin interacts with actin which is known to determine cell shape. XIST being a non-coding RNA will not have the same effect.

      Conversely, down-regulation of XIST inhibited cell growth, a sign of reversing metastatic potential, but down-regulation of vimentin had no effect on growth.

      XIST is known to get induced in a number of cancers (see Figure 3E) which is consistent with our observation that its downregulation will inhibit cell growth. However, downregulation of vimentin had no effect on growth which is consistent with our previously published observation that ectopic expression of wildtype vimentin in MCF-7 cells did not influence cell growth (Usman et al Cells 2022, 11(24), 4035; https://doi.org/10.3390/cells11244035).

      Down-regulation of either did inhibit cell migration, another sign of metastatic reversal.

      We have previously shown that ectopic expression of wildtype vimentin in MCF-7 stimulate cell migration due to downregulation of CDH5 (endothelial cadherins) (Usman et al Cells 2022, 11(24), 4035). Therefore, downregulation of vimentin is expected to inhibit cell migration which is what we observed in this study. Why downregulation of XIST inhibited cell migration is not clear. It is conceivable that XIST downregulation affects Lamin expression which may suppress intercellular interactions to increase cell migration. This hypothesis is supported by the fact that vimentin expression in MCF-7 affects Lamin expression (Usman et al Cells 2022, 11(24), 4035).

      The interpretation of this type of experiment is handicapped when full reversal of expression is not achieved, as was the case in this study.

      Full reversal of any biological effect is almost impossible to achieve which is because the shRNAs by nature are not 100% effective. This can however be tested using crispr Cas 9 gene editing to completely knockdown a protein (can’t be used for XIST as it is a non-coding RNA). In that case one has to assume that it will have no off-target effect.

      Overall the study describes an intriguing model of metastasis that is worthy of further investigation, especially at the molecular level to unravel the connection between vimentin and metastasis. The identification of a potential role for Xist in metastasis, beyond its normal role in female cells to inactivate one of the X chromosomes, corroborates the work of others demonstrating increased levels in a variety of tumours in women and even in some tumours in men. It would be of great interest to see where in metastatic cells Xist is expressed and what it binds to.

      The authors fully agree that it is an interesting model of metastasis/oncogenesis that requires further investigation.

    1. eLife Assessment

      This study provides compelling evidence that SLC7A11 may serve as a potential therapeutic target for trastuzumab-resistant HER2-positive breast cancer. While the findings are well-supported by robust data, the study could have been further strengthened by incorporating additional cell line experiments and providing more detailed clarification on patient sample selection. Nevertheless, this valuable work represents a significant contribution and will be of considerable interest to researchers in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab resistant cells utilize cysteine metabolism.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell lines. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article will have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Comments on revisions:

      While the authors acknowledge the limitation of using only a single trastuzumab resistant/sensitive pair, simply stating that additional cell lines will be tested in future work is simply inadequate. The biological heterogeneity of HER2-positive breast cancer demands validation in at least one independent resistant model (e.g., HCC1954 or BT 474R) alongside its parental counterpart. Without demonstrating that SLC7A11 upregulation, cysteine dependency, and sensitivity to Erastin plus trastuzumab extend beyond the original cell line pair, the generalizability and translational relevance of the findings remain uncertain. The authors need to perform and report key functional results (cell viability, apoptosis, and SLC7A11 expression) in an additional resistant and sensitive HER2-positive cell line before this manuscript can be considered robust.

    3. Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab resistant HER2 positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was insufficiently supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors' claim of ferroptosis induction is primarily based on lipid peroxidation assays (Figure 3). The rescuing effects of ferroptosis inhibitors on cell viability were missing. The xenograft experiments were also suspicious (Figure 4). Systemic cysteine starvation is known to cause adverse effects, including liver necrosis, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, the authors focus on epigenetic regulations (Figures 5 & 6) without first investigating well-established transcription factors, such as NRF2 and ATF4, which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab sensitive and primary resistant HER2 positive breast cancer patients.

      Comments on revisions:

      (1) Figure 3: The unit of concentration should be "μM". "μm" means micrometer.

      (2) Figure S5: Ferroptosis inhibitors should be used in cell viability assays to exclude the off-target effect of RSL3 and erastin. Note that erastin also targets VDAC, while RSL3 may inhibit other selenoproteins at high concentrations. Cell viability assays are critical for demonstrating ferroptosis and should be included in the main figure rather than relegated to the supplemental materials.

      (3) Figure 4B & 4C: the data of "H" group and "Erastin" group are inconsistent. In panel B, the tumor size in the "H" group appears smaller than in the "Erastin" group, while in panel C, the opposite trend is observed.

      (4) The catalog numbers for the cystine/cysteine-deficient DMEM (from BIOTREE) and diet (from Xietong Bio) should be provided. This information is essential for readers to identify and verify the specific products used in the study.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset. We highlighted the usage of “pre-treatment breast cancer tumors” in Line 309 and included the overview of transcriptomic data analysis in I-SPY2 trial in Figure S1F.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We added statistical information in our figure legends, including Line 849-850, Line 865-867, Line 881-882, Line 898-900, Line 910-911 and Line 923-924.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We have modified this figure to demonstrate it more clearly. In Figure 3K, the significance was determined by one-way ANOVA and the comparison presented was relative to the DMSO control. It was indicated that the combination of erastin or cysteine starvation and trastuzumab could increase lipid peroxidation, although trastuzumab monotherapy did not induce ferroptosis.

      Additionally, the combination of erastin and trastuzumab could result in more lipid peroxidation than erastin alone. Similar results were also found in the combination of cysteine starvation and trastuzumab. These results showed that targeting cysteine metabolism plus trastuzumab could have synergic effects to induce ferroptosis in trastuzumab resistant HER2 positive breast cancer.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      We utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was shown that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation, which was consistent with the previous lipid peroxidation tests. This data was included in Figure S5C-E. We added the description in Line 375-379.

      In addition, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We agree with your opinion on the role of erastin in experiments in vivo. We have tried to optimize drug dissolution and other conditions by referring to previous relevant literature. We would continue to improve our experimental design and methods.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We have provided a detailed discussion on Line 598-607.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 334: it would be helpful to clarify that JIMT1 cells are trastuzumab-resistant while SKBR3 cells are trastuzumab sensitive, especially for those not familiar with breast cancer cell lines.

      Thank you for your valuable recommendations. We added the description of trastuzumab sensitive SKBR3 and trastuzumab resistant JIMT1 on Line 334-335.

      (2) Figure 3: the concentrations of erastin and RSL3 should be indicated.

      Thank you for your valuable recommendations. In Figure 3, the concentration of erastin was 10μm and RSL3 was 1μm. We added these details in the figure legends on Line 872-873.

      (3) Figure 3: lipid peroxidation does not necessarily mean ferroptosis. Cell viability data and rescuing effects of ferroptosis inhibitors should be shown.

      Thank you for your valuable recommendations. As we mentioned above, we utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was consistent with lipid peroxidation tests that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation. This data was included in Figure S5C-E. We added the description in Line 375-379.

      As described above, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (4) Figure 3H: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cell culture with cysteine starvation by utilizing cystine/cysteine-deficient DMEM (BIOTREE) and 1% penicillin streptomycin at 37℃ with 5% CO2. We added details of this diet on Line 141-143 in Methods.

      (5) Figure 4: the meaning of "H" should be clarified.

      Thank you for your valuable recommendations. H was indicated as trastuzumab. We clarified the meaning of “H” in the figure legends on Line 898.

      (6) Figure 4B & 4C: the data of "H" group and "Erastin" group are inconsistent.

      Thank you for your valuable recommendations. In the vivo experiments, the tumor volume changes were analyzed using a paired approach, comparing the tumor size of each individual mouse before and after treatment. We noticed the confusion caused and added more details about our vivo experiments on Line 240 in Methods and Line 892-893 in figure legends.

      (7) Figure 4: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cysteine starvation by utilizing cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We have also corrected some grammatical errors in the manuscript and We would like to extend our great appreciation to all editors and reviewers for their invaluable contributions.

    1. eLife Assessment

      This study addresses the structural basis of voltage-activation of BK channels using atomistic simulations of several microseconds, to assess conformational changes that underlie both voltage-sensing and gating of the pore. Simulated effects of voltage on the movement of charged amino acids appear solid as they are generally consistent qualitatively and quantitatively with previous experimental and structural results, providing a potentially valuable way to calculate the contribution of individual charges to voltage-sensitivity. Simulations of conformational changes and interactions associated with channel opening and K+ conduction are likely incomplete owing to the timescale of the simulation and theoretical limitations in simulating K+ and water movement, but nonetheless provide helpful initial predictions and a framework for future improvement. This paper will likely be of interest to ion channel biologists and biophysicists focused on voltage-dependent channel gating mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides new insight into the non-canonical voltage-gating mechanism of BK channels through prolonged (10 μs) MD simulations of the Slo1 transmembrane domain conformation and K+ conduction in response to high imposed voltages (300, 750 mV). The results support previous conclusions based on functional and structural data and MD simulations that the voltage-sensor domain (VSD) of Slo1 undergoes limited conformational changes compared to Kv channels, and predicts gating charge movement comparable in magnitude to experimental results. The gating charge calculations further indicate that R213 and R210 in S4 are the main contributors owing to their large side chain movements and the presence of a locally focused electric field, consistent with recent experimental and MD simulation results by Carrasquel-Ursulaez et al.,2022. Most interestingly, changes in pore conformation and K+ conduction driven by VSD activation are resolved, providing information regarding changes in VSD/pore interaction through S4/S5/S6 segments proposed to underly electromechanical coupling.

      Strengths:

      Include that the prolonged timescale and high voltage of the simulation allow apparent equilibration in the voltage-sensor domain (VSD) conformational changes and at least partial opening of the pore. The study extends the results of previous MD simulations of VSD activation by providing quantitative estimates of gating charge movement, showing how the electric field distribution across the VSD is altered in resting and activated states, and testing the hypothesis that R213 and R210 are the primary gating charges by steered MD simulations. The ability to estimate gating charge contributions of individual residues in the WT channel is useful as a comparison to experimental studies based on mutagenesis which have yielded conflicting results that could reflect perturbations in structure. Use of dynamic community analysis to identify coupling pathways and information flow for VSD-pore (electromechanical) coupling, as well as analysis of state-dependent S4/S5/S6 interactions that could mediate coupling, provides useful predictions extending beyond what has been experimentally tested.

      Weaknesses:

      Include that a truncated channel (lacking the C-terminal gating ring) was used for simulations, which is known to have reduced single channel conductance and reduced electromechanical coupling compared to the full-length channel. In addition, as VSD activation in BK channels is much faster than opening, the timescale of simulations was likely insufficient to achieve a fully open state, as supported by differences in the degree of pore expansion in replicate simulations, which are also smaller than observed in Ca-bound open structures of the full-length channel. Taken together, these limitations suggest that the analysis regarding coupling pathways and interactions is incomplete. In addition, while the simulations convincingly demonstrate voltage-dependent channel opening as evidenced by pore expansion, and conduction of K+ and water through the pore, single channel conductance is underestimated by at least an order of magnitude, as in previous studies of other K+ channels. These quantitative discrepancies suggest that MD simulations may not yet be sufficiently advanced to provide insight into mechanisms underlying the extraordinarily large conductance of BK channels.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses the structural basis of voltage-activation of BK channels using computational approaches. Although a number of experimental studies using gating current and patch-clamp recording have analyzed voltage-activation in terms of observed charge movements and the apparent energetic coupling between voltage-sensor movement and channel opening, the structural changes that underlie this phenomenon have been unclear. The present studies use a reduced molecular system comprising the transmembrane portion of the BK channel (i.e., the cytosolic domain was deleted), embedded in a POPC membrane, with either 0 or 750 mV applied across the membrane. This system enabled acquisition of long simulations of 10 microseconds, to permit tracking of conformational changes of the channel. The authors' principal findings were that the side chains of R210 and R213 rapidly moved toward the extracellular side of the membrane (by 8 - 10 Å), with greater displacements than any of the other charged transmembrane residues. These movements appeared tightly coupled to the movement of the pore-lining helix, pore hydration, and ion permeation. The authors estimate that R210 and R213 contribute 0.25 and 0.19 elementary charges per residue to the gating current, which is roughly consistent with estimates based on electrophysiological measurements that used the full-length channel.

      Strengths:

      The methodologies used in this work are sound, and these studies certainly contribute to our understanding of voltage-gating of BK channels. An intriguing observation is the strongly coupled movement of the S4, S5, and S6 helices that appear to underlie voltage-dependent opening. Based on Figures 2a-d, the substantial movements of the R210 and R213 side chains occur nearly simultaneously to the S6 movement (between 4 - 5 usec of simulation time). This seems to provide support for a "helix-packing" mechanism of voltage gating in the so-called "non-domain-swapped" voltage-gated K channels.

      Weaknesses:

      The main limitation is that these studies used a truncated version of the BK channel, and there are likely to be differences in VSD-pore coupling in the context of the full-length channels that will not be resolved in the present work. Nonetheless, the authors provide a strong rationale for their use of the truncated channel, and the results presented will provide a good starting point for future computational studies of this channel.

    1. eLife Assessment

      This study presents a valuable investigation into cell-specific microstructural development in the neonatal rat brain using diffusion-weighted magnetic resonance spectroscopy. The evidence supporting the core claims is solid, with innovative in vivo data acquisition and modeling, although some conclusions would benefit from stronger validation and methodological justification. The work will be of interest to researchers studying brain development and biophysical imaging methods.

    2. Reviewer #1 (Public review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed non-invasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusion-weighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.<br /> (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.<br /> (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      Ligneul and coauthors have convincingly addressed and included my comments in their revised manuscript.

      I believe the following conceptual concerns, which are inherent to the nature of the study and do not require further adjustments of the manuscript, remain:

      (1) Metabolite compartmentation in one cell type or the other has often been challenged and is currently impossible to validate in vivo. Here, Ligneul and coauthors did not use this assumption a priori and supported their claims also with non-MR literature (eg. for Taurine), but the interpretation of results in that direction should be made with care.

      (2) Longitudinal MR studies of the developing brain make it difficult to extract parameters with an "absolute" meaning. Indirect assumptions used to derive such parameters may change with age and become confounding factors (brain structure, cell distribution, concentrations normalizing metabolites (here macromolecules), relaxation times...). While findings of the manuscript are convincing and supported with literature, the true underlying nature of such changes might be difficult to access.

      (3) Diffusion MRI in addition to diffusion MRS would have been complementary and beneficial to validate some of the signal contributions, but was unfeasible in the time constraints of experiments on young animals.

    3. Reviewer #2 (Public review):

      This second revision has partially addressed criticisms previously raised; however, substantial inadequacies, particularly concerning rigorous validation and model justification, remain unresolved. While recognizing evident strength, novelty, and technical complexity of this work, the authors have yet to fully resolve key major concerns explicitly pointed out during revision in a satisfactory manner. As currently written, the manuscript does not yet provide sufficiently robust validation, methodological rigour, or clarity required for complete acceptance in a top-tier scientific journal.

      Summary of Authors' Aim:

      In this revised version, the authors aimed to address prior reviewer critiques harshly pinpointing the need for greater clarity in the manuscript's logical flow, rigorous external validation, clearer explanation of methodological normalization choices, and deeper elaboration of diffusion MRI method relevance and potential translation. The authors present a diffusion-weighted MRS approach paired with complex biophysical modelling to elucidate differential developmental trajectories of cellular structures in cerebellum and thalamus in rat neonates, providing a novel, non-invasive avenue for monitoring cellular microstructure.

      Major Comments:

      Rigorous Validation (Reviewer #1 - point R1.1, Reviewer #2 - point R2.2):

      The major concern previously raised and reiterated here is the insufficient external cross-validation of the dMRS-derived interpretations about cellular changes, including the particularly speculative interpretation that taurine undergoes compartment switching between neuronal and glial compartments in the thalamus. The authors acknowledge this important shortcoming (R1.1, R2.2) but attempt to mitigate these concerns merely through additional contextual comparisons from existing literature (page 23, lines 877-878, Figure S11, Table S2). While better contextualization is welcome, the modified manuscript still falls notably short of the level of rigour necessary to validate such striking switches in compartmentalization. To justify claims of metabolites changing cellular compartments, explicit verification against independent molecular/histological data, ideally with additional immunohistochemical staining for cellular markers (e.g., glial fibrillary acidic protein, NeuN), is necessary. The mere presence of literature correlations (such as the reported visual comparisons to morphometric reconstructions, page 24, lines 883-884) does not constitute rigorous validation at the required standard for high-impact publication. The revised manuscript remains fundamentally weakened without such validation. To properly improve, the authors must consider incorporating independent ex vivo experiments or, if this is no longer feasible, extensively temper their compartment-switching claims, acknowledging explicitly and prominently the speculative nature of current interpretations.

      Normalization of Metabolite Concentrations (Reviewer #1 - point R1.3):

      The authors clearly responded to a reviewer wish for justification of metabolite normalisation to macromolecular concentrations (page 13, lines 493-503, Figure S2). However, the rationale provided remains only partially convincing. While the authors appropriately acknowledge the unusual nature of their methodological choice and possible confounding factors, they opt to supplement rather than substitute this approach with a more standard method (normalisation by water) in the main body of the manuscript. The additional supplementary Figure S2 is helpful, yet the conclusions derived with macromolecular normalization still remain potentially confounded by age-dependent macromolecular changes (Tkac et al., 2003). The justification given in the revised manuscript remains vague, unsatisfactory, and somewhat contradictory-authors accept macromolecules changes likely with age, yet largely overlook this effect. At least, the comparison between normalization by macromolecules and water should be explicitly discussed in the main text, and conclusions drawn from macromolecular normalization must be cautiously framed.

      Choice and Justification of Biophysical Model (Reviewer #1 - point R1.4):

      The reviewers questioned model assumptions, particularly ignoring macroscopic anisotropy effects due to white matter presence, myelination, and fibre orientation dispersion in the cerebellar voxel. Authors provided newly included DTI data and acknowledged this limitation explicitly (R1.4, Figure S8, page 25, lines 921-924). However, the addition of these poor-quality DTI data with limited interpretability paradoxically weakens rather than strengthens the manuscript as a whole, since the authors now present unclear supplementary results with little additional interpretative value. Recognizing poor data quality in this scenario, although intellectually honest, does not substantially increase the current robustness of their chosen model nor improve justification. To address this fully, either higher-quality data should be collected to robustly probe anisotropy or fibre dispersion effects, or the authors must much further restrict their interpretations in view of this clear limitation. Currently, the solution proposed is incomplete and insufficient to clarify the consequences of their chosen model.

      Logical Flow and Clarity (Reviewer #2 - points R2.1 and R2.3):

      The authors attempted to respond to reviewer comments on logical flow and accessibility (page 3, introduction restructuring). While the manuscript readability has improved, the introduction and discussion remain overly intricate, and at times, detail-oriented without clear links into central claims. In particular, the biological rationale for choosing the specific metabolite markers (especially tCho, Ins, Tau, etc.) and their known relevance must be further streamlined and simplified to increase accessibility and directness. Although some helpful restructuring was carried out, further careful paragraph-level revision for logical flow and readability remains necessary.

      Translation to Human Studies (Reviewer #2 - point R2.4):

      The authors have extended contextual discussion on translational potential regarding taurine as a developmental marker in humans (pages 24-25, lines 906-917). However, mention remains vague and cursory, without presenting sufficiently solid arguments nor drawing from human developmental studies adequately. Translational potential must be assessed within the realistic limitations inherent in clinical translation of MRS studies, particularly given the technical complexities clearly identified even in preclinical studies of this paper. Discussion remains relatively superficial, and if retained, must be expanded to fully discuss realistic human translational hurdles and requirements.

    4. Author response:

      The following is the authors’ response to the original reviews

      Summary of revisions:

      Thanks to the careful review and comments from the reviewers, we restructured the introduction and the discussion to improve clarity and better contextualise findings. We notably discuss further the f<sub>sphere</sub> decrease observations in the cerebellum and the Tau-specific findings (Tau being a possible marker for Purkinje cells development and Tau switching compartment in the thalamus). We added material in Supplementary Information to support these discussion points. We added a figure to show the metabolic profiles normalised by water or by macromolecules and a figure and table related to a rough approximation of f<sub>sphere</sub>, leaning on existing literature. We report the DTI results for thoroughness.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed noninvasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusionweighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data. The assumption of metabolite compartmentation, forming the basis of cell-specific microstructure interpretation of dMRS data, remains debated and should be considered with care (Rae, Neurochem Res, 2014, https://doi.org/10.1007/s11064-013-1199-5). External cross-validation of some of the authors' claims, in particular taurine in the thalamus switching from neurons to astrocytes during brain development, would be a highly valuable addition to this study.

      R1.1: We understand the reviewer's concerns. Metabolic compartmentation is not a one-toone correspondence. Although we interpret the results in the light of metabolic compartmentation, our results are not driven by this assumption. We could not perform a direct cross-validation of the taurine switch in the thalamus, but we now clarify in the discussion why the dMRS results themselves indicate a switch, and we integrate our results better with existing literature on taurine. We now discuss this in more detail for the cerebellar results too.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.

      (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.

      (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      (1) This work lacks an introduction and a discussion about diffusion MRI, which is already a validated technique to assess brain development non-invasively. Although water lacks cellspecificity compared to metabolites, several studies have reported a decrease in water ADC and increased fractional anisotropy with brain maturation, associated with the myelination process and decreased water content (overview in Hüppi, Chapt. 30 of "Diffusion MRI: Theory, Methods, and Applications", Oxford University Press, 2010). Interestingly, the same observations are found in this work (decreased ADC with age for most metabolites in both brain regions), which should have been commented on. Moreover, the authors could have reported water diffusion properties in addition to metabolites', as I believe the water signal, used for coil combination and/or Eddy currents corrections, is usually naturally acquired during diffusion-weighted MRS scans.

      R1.2: Thank you for these helpful suggestions. We have now improved our introduction of the various modalities, and we contextualise the study in light of previous DTI findings in the as suggested by the reviewer. We agree with the reviewer that the comparison with previous human DTI is relevant, and we now mention it at the beginning of the discussion. However, the very different nature of the dMRS signal compared to dMRI (intracellular and absence of exchange for metabolites) prevents us from drawing any strong conclusions.

      (2) It is unclear why the authors have normalized metabolite concentrations (measured from low b-values diffusion-weighted MRS spectra) to the macromolecule concentrations. First, it is not specified whether in vivo macromolecules were acquired at each age or just at one time point. Second, such ratios are not standard practice in the MRS community so this choice should have been explained. Third, the macromolecule content was reported to change with age (Tkac et al., Magn Reson Med, 2003), therefore a change in metabolite to macromolecule ratio with age cannot be interpreted unequivocally.

      R1.3: We agree with the reviewer that this needed further explanations. We now clarify in the Results section “Metabolic profile changes with age” the reasoning behind choosing macromolecules for normalisation. We also added in the Supplementary Information the metabolite concentrations change with age when normalising by water, and a direct comparison with MM normalisation (Figure S2).

      (3) Some discussion is missing about the choice of the analytical biophysical model (although a few are compared in Supplementary Materials), in particular: is a model of macroscopic anisotropy relevant in cerebellum, made of a large fraction of oriented white matter tracks, and does the model remain valid at different ages given white matter maturation and the ongoing myelination process?

      R1.4: We agree with the reviewer that this is a valid concern. We actually acquired some standard DTI at the end of the acquisition sessions (where possible) having in mind the fibre dispersion estimation. However, data could not be acquired in all animals, and the data quality was poor (see Figure S8, the experimental conditions would have required further optimisation). We now add a couple of sentences at the beginning and in the end of discussion to address this limitation, and we include the DTI data in Supplementary Information.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to non-invasively track neuronal development in rat neonates, which they achieved with notable success. However, the direct relationship between the results and broader conclusions regarding developmental biology and potential human implications is somewhat overstretched without further validation.

      Strengths:

      If adequately revised and validated, this work could have a significant impact on the field, providing a non-invasive tool for longitudinal studies of brain development and neurodevelopmental disorders in preclinical settings.

      Weaknesses:

      (1) Consistency and Logical Flow:

      The manuscript suffers from a lack of strategic flow in some sections. Specifically, transitions between major findings and methodological discussions need refinement to ensure a logical progression of ideas. For example, the jump from the introduction of developmental trajectories and the technicalities of MRS (Magnetic Resonance Spectroscopy) processing on page 3 could benefit from a bridging paragraph that explicitly states the study's hypotheses based on existing literature gaps.

      R2.1: Thank you for this general feedback (along with your point (3)) that helped us restructure the introduction and the discussion to improve the clarity and flow.

      (2)  Scientific Rigour:

      While the novel application of diffusion-weighted MRS is commendable, there's a notable gap in the rigorous validation of this approach against gold-standard histological or molecular techniques. Particularly, the assertions regarding the sphere fraction and morphological changes inferred from biophysical modelling mandates direct validation to solidify the claims made. A study comparing these in vivo findings with ex vivo confirmation in at least a subset of samples would significantly enhance the reliability of these conclusions.

      R2.2: We agree with the reviewer that this would have been a great addition to the manuscript. Although we could not run new experiments to address these flaws, we now discuss the results more quantitatively, leaning on existing literature (addition of Figure S11 and Table S2). This helps us understand the results around Tau in both regions better, and illustrate the R<sub>sphere</sub> trend.

      (3) Clarity and Novelty:

      - The manuscript often delves deeply into technical specifics at the expense of accessibility to readers not deeply familiar with MRS technology. The introduction and discussions would benefit from a clearer elucidation of why these specific metabolite markers were chosen and their known relevance to neuronal and glial cells, placing this in the context of what is novel compared to existing literature.

      - The novelty aspect could be reinforced by a more structured discussion on how this method could change the current understanding or practices within neurodevelopmental research, compared to the current state of the art.

      R2.3: See answer to (1). By restructuring the introduction and the discussion, we hope to have addressed this point. We now discuss how these findings compare to the state of the art (notably added comparison with dMRI research). Along with the next comment, we better discuss potential implications of these findings for neurodevelopmental research.

      (4) Completeness:

      - The Discussion section requires expansion to offer a more comprehensive interpretation of how these findings impact the broader field of neurodevelopment and psychiatric disorders. Specifically, the implications for human studies or clinical translation are touched upon but not fully explored.

      - Further, while supplementary material provides necessary detail on methodology, key findings from these analyses should be summarized and discussed in the main text to ensure the manuscript stands complete on its own.

      R2.4: Thank you for these helpful suggestions. We now integrate the findings better into the existing literature. We notably discuss how the results might translate to humans.

      (5) Grammar, Style, Orthography:

      There are sporadic grammatical and typographical errors throughout the text which, while minor, detract from the overall readability. For example, inconsistencies in metabolite abbreviations (e.g., tCr vs Cr+PCr) should be standardized.

      R2.5: Thank you for the careful review. This has been corrected.

      (6) References and Additional Context:

      The current reference list is extensive but lacks integration into the narrative. Direct comparisons with existing studies, especially those with conflicting or supportive findings, are scant. More dedicated effort to contextualize this work within the existing body of knowledge would be beneficial.

      R2.6: Because the nature of this work is novel, it is difficult to find directly conflicting/similar works. However, we now integrate the findings into the broader literature.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Thank you for the careful review, we have addressed most of the minor comments, except for the last one, which we discuss below.

      - Some figures appear blurred in the printed PDF- Introduction: "constrained and hindered by cell membranes," - maybe use "restricted" instead of "constrained", like everywhere else in the text

      - Introduction: "(typically ~8cm3 vs ~8mm3 in dMRI in humans)" - here I suggest to put the rat brain sizes instead to help the reader understand how small the voxel was at P5 in this study, thus explaining the challenges

      - Fig 1 - numbers 1 and 2 on panel A,B should be clarified and they do not match 1 and 2 on panel C, which is confusing- Fig 2 - I am guessing the large dots are the mean and small are individual data points? Please clarify

      - Please specify "Relative CRLB" rather than just "CRLB", in supp. mat as well

      - Fig 3 - title of panel B, I would change "signal" into "concentration"

      - Fig 3 - end of caption: "and levelled to get Signal(tCr,P30)/Signal(MM,P30)=8", I think "in the thalamus" is missing

      - The results section "Biophysical modelling underlines different developmental trajectories of cell microstructure between the cerebellum and the thalamus" is sometimes unprecise, e.g.: "Cerebellum: The sphere fraction and the radius estimated from tNAA diffusion properties vary with age." but the tNAA sphere fraction seems to vary more with age in the thalamus according to table 1 "Cerebellum: fsphere decreases from 0.63 (P10) to 0.41 (P30), but R is stable" this is for tCr I presume

      - Table 1 - "pvalues" please add "before multiple comparison correction"

      - Figure 5 - Panel B, the L-segment subpanel is unclear -which metabolites is it referring to? Why does Tau have a * in panel A?

      - Update Ref 37 to the journal version

      - Methods: "A STELASER (Ligneul et al., MRM 2017) sequence", add numbered reference instead

      - Please specify that the DIVE toolbox uses Gaussian phase distribution approximation, it is important for the dMRS reader given that your diffusion gradient length is long and cannot be neglected, and that the SGP approximation does not apply.

      The Gaussian phase distribution approximation and the SGP approximation are two different concepts. The gradient duration ∂ (7 ms) is short compared to the gradient separation ∆ (100 ms), but it could still be considered too long for the SGP approximation to hold. However, the gradient duration is accounted for in DIVE in any case.

    1. eLife Assessment

      This valuable study demonstrates that silencing of inhibitory interneurons in zebra finch HVC, a premotor nucleus critical for song production, disrupts song. However, song naturally recovers in a way that is surprisingly independent of LMAN, a distinct premotor nucleus required for normal song plasticity. The authors provide solid evidence that disruption is associated with microglial activation, activation of MHCI, synaptic changes, and altered neural dynamics in HVC. However, the manuscript would benefit from a clearer narrative structure, contextualization of the microglial results, and quantitative analyses to fully characterize song syntax and recovery after LMAN lesions.

    2. Reviewer #1 (Public review):

      Summary:

      This study by Torok et al. takes a creative approach to studying circuit perturbations in a sensorimotor region for vocalization control, in a songbird species, the zebra finch. By expressing the light chain of tetanus toxin in neurons in a sensorimotor region HVC, the authors constrain neural firing and study the resulting degradation and then recovery of song, after a protracted (> 70-day) period. Recording data suggest a form of synaptic homeostasis emergent in both HVC and RA as a result of the profound loss of (inhibitory?) tone in HVC. The methods to analyze changes in song are particularly strong here, using dimension reduction and visualization techniques. Single-cell sequencing data showed accompanying changes in microglia abundance, as well as several other markers that were not observed in control viral injections. LFP analyses in birds during the tetanus onset phase showed clear dysregulation of typical voltage deflections and spectral power, each of which showed recovery in parallel with song recovery. Lastly, the authors present data indicating that the anterior forebrain region LMAN is not critical for the song degradation process, pointing instead to the direct relationship between HVC and RA in song plasticity in adults. The methods are generally well established, but my main concerns regard the validation of the viral construct, the lack of direct confirmation of tetanus toxin on inhibitory neurons or E/I balance in HVC, and a missed opportunity to look at song syllable sequence degradation and recovery.

      Strengths:

      The species under investigation is the premier model for the neural basis of vocal learning, and the telencephalic brain regions investigated are well mapped out for their control of vocal learning behavior. The methods for electrophysiology recording and analysis, song analysis, scRNAseq, and in situ hybridization pose no concern as they are well established for this group of co-authors.

      Weaknesses:

      The introduction lays out a case for pursuing long-term E/I imbalances, vis-à-vis transient perturbations that have shown effects on the behavior. However, the rationale is not clearly stated. Why should the reader care that "prolonged E/I imbalances" may occur? Do they occur naturally or in some disease states (as alluded to in the first paragraph)? Without this rationale, the reader is left with an impression that the experiments were done because of a technical capability rather than a conceptual thrust.

      The cited works for the statement the "AAV viral vector expressing TeNT undre the human dlx promoter, which is selective for HVC inhibitory interneurons" (reference 5 Kosche et al., 2016; and reference 10 Vallentin et al 2016) do not substantiate the targeting of this dlx5 promoter for interneurons in zebra finch HVC. Neither of these cited studies used viral vectors, and so this is a misattribution of the dlx5 promoter as targeting HVC inhibitory interneurons. However, the original development of this enhancer by Gord Fishell and others did have solid expression in HVC (Dimidschstein et al., 2016, Nature Neuroscience), and the enhancer was used to successfully target inhibitory neurons in nearby nidopallium NCM (Spool et al., 2022, Curr Biol). Citing these two studies would improve the standing of this viral approach. Nevertheless, the specific construct used here is not the same as the published studies mentioned above (AAV9-dlx-TeNT). The authors therefore need to show expression of the virus using some histological confirmation to cement the idea that they are indeed targeting inhibitory interneurons with this manipulation. The methods statement "a single injection (~100 nL) in the center of HVC was sufficient to label enough cells" is not convincing in the absence of quantified photomicrographs.

      The authors present no physiological confirmation of TeNT on E/I balance directly, and so we don't have a clear picture of how/whether HVC interneurons are physiologically altered by this manipulation. That said, the Npix recordings show that there was a tremendous increase in gamma power following TeNT manipulation, which subsides as the protracted song recovery unfolds. This finding is somewhat counterintuitive, given that gamma oscillations are typically driven by inhibitory neurons in many systems (including songbird pallium) while the TeNT manipulation is purported to cause *reductions* in inhibitory neurotransmitter release within HVC. Some interpretation of these incongruent results would be useful in the Discussion.

      The degradation and recovery of song is based mainly on the measures of duration of syllables and inter-syllable intervals, but HVC is also a key locus for song syllable sequence coding. The supplementary figures show some changes in sequences. It would improve the interpretation of both the degradation and recovery of the song to know whether syllable sequences (iiiABCCDDEF) truly recovered or were morphed in some way (e.g., iiiCDDDBEF). The PCA analyses (that the authors conducted) for these two potential outcomes would likely be very similar, but the actual songs would differ greatly under these two scenarios in terms of syllable sequence. From the representative spectrograms, it appears that the song syllable sequence does indeed recover well in these examples (perhaps less so in Supplementary Figure 3). A simple Markov-chain analysis of the syllable sequences across birds in the study would provide important confirmation of these insights.

    3. Reviewer #2 (Public review):

      This article addresses the question of how complex behavior is maintained despite perturbations in underlying motor circuits. Using zebra finch song production as a model system, the authors employ a genetic approach to perturb activity in GABAergic neurons within the vocal control nucleus HVC. Specifically, they use AAV to deliver the tetanus toxin light chain (TeNT) under the interneuron-specific DLX promoter, with the goal of silencing interneurons. This manipulation causes rapid degradation of song, followed by recovery over several weeks.

      The authors characterize the recovery using a combination of transcriptomic analysis, electrophysiology, and lesion studies. Notably, the recovery does not require the lMAN, which is typically considered critical for vocal learning and plasticity. The authors speculate that homeostatic mechanisms within the motor pathway - potentially involving microglial remodeling -may mediate this recovery.

      The strength of the study lies in the striking behavioral effects - both degradation and recovery - resulting from a specific circuit perturbation, and the use of complementary approaches (gene expression, neurophysiology, behavior, and lesions) to link circuit changes to behavior. The approach is creative, and the findings are intriguing. More detailed comments are provided below that may help enhance the manuscript's value to the community.

      (1) In Figure 1b, the authors show changes in the relative abundance of cell types following TeNT expression in HVC. The most prominent change, as noted by the authors, is an increase in microglia. However, there are also apparent changes in the proportions of other cell types-particularly decreases in neurons and radial glia. How do the authors interpret the observed reductions in GABAergic and glutamatergic cells, as well as radial glia? Are these decreases statistically significant? Given the magnitude of these changes, could they reflect sampling differences (e.g., inclusion of tissue outside HVC) or neuronal cell death? Alternatively, is it possible that the absolute number of mature neurons remains constant, and increases in other cell types shift the relative proportions? The authors should clarify how to interpret the Y-axis of this plot. It appears to reflect relative abundance rather than absolute cell numbers, which has important implications for interpretation.

      (2) The authors appear to define their own cell type clusters and labels, rather than using standard classifications (e.g., Colquitt et al. 2021; Colquitt et al. 2023). This makes cross-study comparisons difficult. For example, Colquitt describes four classes of putative immature neurons (pre2-pre4, GABA-pre). In contrast, the authors refer to "neuroblasts" in Figure 1b. Are these equivalent to pre2-pre4 and/or to "GABA-pre"? What about "migrating neuroblasts" in Supplementary Figure 11? The authors could consider using the standard nomenclature, or if they disagree with that classification, explain why an alternative scheme is warranted.

      (3) The transcriptomic data are underexplored. Many genes appear differentially expressed (e.g., in Figure 1c), however, the main text contains little discussion of differential gene expression beyond MHC I and B2M. It would be useful to discuss whether transcriptomic data support or rule out any other specific mechanistic hypotheses for recovery.

      (4) The authors attribute increased microglial markers to interneuron silencing rather than inflammation from viral injection, based on control virus results (lines 143-146). However, is it plausible that TeNT expression itself, or batch variability, could drive differences in inflammation? The authors could address these alternatives with additional evidence or discussion.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates at behavioral and mechanistic levels the recovery of zebra finch song production after a genetically targeted insult to HVC, a vocal premotor nucleus known to generate stereotyped neural sequences that drive the correspondingly stereotyped song. This study is a close follow up to past work, published in Nature Neuroscience last year (Wang et al, 2024), in which custom lentiviruses were used to deliver a persistently active sodium channel, NacBAC or TeNT to block synaptic release, specifically to the excitatory projection neurons in HVC. In this past work, these manipulations resulted in rapid degradation of song, followed by a slow recovery that, remarkably, did not require practice. Song recovery was associated with synaptic remodeling that appeared to homeostatically bring the affected neurons back to a normal firing regime. This past paper was important because it clearly demonstrated behaviorally and mechanistically how neural plasticity can restore a learned behavior without practice, showing that dominant reinforcement learning models of birdsong are not the full story.

      This past work sets the context for the current paper, which instead targets the inhibitory neuronal population in HVC for silencing via viral-mediated expression of TeNT. Again, this sophisticated targeting of HVC interneurons resulted in rapid degradation of song, followed by a much slower but seemingly full recovery.

      Strengths:

      Overall, this paper has several strengths. First, it provides yet another convincing example of non-canonical vocal learning in the zebra finch because LMAN (a nucleus required for trial and error song learning) is not required for song recovery. Second, its targeting of interneurons clarifies the extent to which inhibition in HVC is essential for vocal patterning (not surprising but important to show). Third, by using RNAseq of HVC at the time of peak song disruption, it zeroes in on specific genetic/cellular activations associated with a lack of inhibition (e.g., microglial activation and MHC1 expression), opening up new avenues for future study. Using in vivo electrophysiology it also characterizes some gross circuit-level abnormalities in HVC-RA transmission and during sleep.

      Weaknesses:

      Yet the paper also has several areas for improvement, primarily:

      Main issues

      (1) Narrative-level confusion, a mix of results, many hanging threads

      The arc of this paper is very hard to follow, new experiments arise without a clear setup or connection to past ones. Concepts jump around unpredictably. The reading experience would be dramatically improved if there were a clear single line of logic going through the entire paper, which could be accomplished by inserting a paragraph at the end of the intro section that walks the reader step-by-step through what they are going to see. I don't recommend this for all papers - but this paper requires it, in my opinion, because we have such an unusual combination of experimental approaches, outcomes, and data formats (behavior, RNA seq, targeted tests of microglial activation in the setting of adult impairment and song development, electrophysiology during sleep. It's very difficult for me to tie this all together into a crisp narrative that sticks with me days after reading the paper. Instead, it feels like some disconnected factoids. Examples:<br /> a) Characterization of degradation and slow recovery (much slower than targeting of projection neurons form past work (Wang et al, 2024).<br /> b) Activation of microglia and MHC1 during the degraded period; microglia return to normal at recovery.<br /> c) Developmenta profile of microglia expression.<br /> e) Sleep replay in HVC is perturbed during the degraded state. Mostly returns to normal following recovery, but *some* aspects are still abnormal.<br /> f) Detailed ephys analysis of HVC excitability and RA suppression, invoking ideas that HVC drives RA inhibition.<br /> g) LMAN lesions do not block degradation or recovery.

      There are at least three threads of this paper - it therefore reads like three different papers stitched together into one - united only by the method of HVC interneuron targeting. In my view, a pretty major overhaul is required, even if it means cutting out specific details and figures that distract from the paper's message (for example there is a whole sub-section analyzing HVC impact on RA that vaguely invokes ideas of HVC engagement of RA

      (2) Interpretation of microglia is confusing and unresolved

      Microglia activation is measured at peak song disruption, and returns to normal following recovery. To test if this phenomenon is associated with learning or degradation, the authors measure microglia during development.

      "The increased inhibitory tone in HVC and the number of microglia could induce synaptic changes that contribute to degraded song production. Alternatively, the rise in microglia could be part of the recovery response to produce synaptic changes needed to regain the song following perturbation."

      This is a great if/then statement on how to interpret the microglial activation at the core of the paper. But it remains unresolved. Is there a causal experiment that could distinguish these possibilities?

      (3) The quantification of song dynamics during the recovery process in LMAN lesioned birds is required to support claims. Perhaps the most interesting claim of the paper - that recovery happens without LMAN, is not sufficiently supported by data analyses. This is a major problem.

      The same analysis used in the LMAN-intact degradation/recovery dataset should be used for the LMAN dataset. At present, there are no quantification, only example spectrograms. Also, Supplementary Figure 4 and Supplementary Figure 5 are identical, suggesting a lack of proofreading in this part of the manuscript. For example the reader cannot even ascertain if the key aspect of song degradation - the production of exceedingly long syllables - is occurring in the LMAN lesioned animals.

    1. eLife Assessment

      The manuscript represents a fundamental advance in designing peptide inhibitors targeting Cdc20, a key activator and substrate-recognition subunit of the APC/C ubiquitin ligase. Supported by compelling biophysical and cellular evidence, the study lays a strong foundation for future developments in degron-based therapeutics. The revised manuscript has been strengthened by additional clarifications and data that address prior reviewer concerns. The work provides a robust framework for developing tools to manipulate protein degradation and will be of broad interest to researchers in protein engineering, cell cycle regulation, and targeted protein degradation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen, et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or applying peptides as portable motifs to achieve targeted degradation. Both of which are impactful. The additional points provided by the authors in response to reviewers further strengthened the manuscript and enhanced its clarity.

      Weaknesses:

      None, the authors have addressed all my comments, and I have no additional suggestions.

    3. Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Comments on revisions:

      I am satisfied with the changes in response to the first round of review.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we have clarified this statement (page 11) to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section has been clarified (page 16). The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      The section now reads:

      “We therefore assume that this is the reason for the lack of observed density in this region of the peptides D20 and D21 (Fig. S3E and S3F, respectively). We believe that it causes a reduction in binding affinities of all peptides in crystallo, given the evidence from SPR highlighting a role of position 7 in the interaction (Table 1). Interestingly, the observed electron density of the peptide correlates with Cdc20 binding affinity: D21 and D20, having the highest affinities, display the clearest electron density allowing six amino acids to be modeled, whereas D7 shows relatively poor density permitting modelling of only four residues. For D19, the lack of density observed likely reflects its intrinsically weaker affinity compared to the other peptides, in addition to losing the interactions from position 7 due to crystal packing.”

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      We have added the following text to the Results section “Design of D-box peptides” (page 10):

      “We focused on D-box peptides, as there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study that tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated ((Qin et al. 2017)). They observed that, whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study (Hartooni et al. 2022) of binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.”

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) On page 12 (towards the end), the author stated D10 contained an A3P mutation, they meant P3A right? 'To test this hypothesis, we proceeded to synthesise D10, a derivative of D4 containing an A3P single point mutation.'

      We thank the reviewer for spotting this typo, which we have corrected.

      (2) Have the authors considered other orthogonal approaches to cross-examine/validate binding affinities? That said, I do not think extra experiments are necessary.

      We did not explore further orthogonal approaches due to the challenges of producing sufficient amounts of the Cdc20 protein. Due to the low affinities of many peptides for Cdc20, many techniques would have required more protein than we were able to produce. We believe that the qualitative TSA combined with the SPR is sufficient to convince the readers; indeed there is a correlation between SPR-determined binding affinities and the thermal shifts: For the natural amino acid-containing peptides (Table 1) D19 has the highest affinity and causes the largest thermal shift in the Cdc20 melting temperature, D10 has the lowest affinity and causes the smallest thermal shift, and D1, D3, D4, and D5 and all rank in the middle by both techniques. For those peptides containing unnatural amino acids (Table 2), again higher affinities are reflected in larger thermal shifts.

      Reviewer #2 (Recommendations for the authors):

      The data seem fine to me. I would appreciate a little more detail on the points mentioned in the public review. Also a thorough reread, maybe by a disinterested party as there are various typos that could be corrected - all in all an excellent clear paper that encompasses a lot of work.

      A colleague has carefully checked the manuscript, and typos have been corrected.

    1. eLife Assessment

      This is an interesting study that adds useful new data addressing how different DAG pools influence cellular signaling. The study dissects how the enzyme Dip2 modulates the minor lipid signaling DAG pool, which is distinct from the lipid metabolism DAG pool utilized in membrane production. Overall the analysis is solid and broadly supports the claims.

    2. Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

      Comments on revisions:

      The revision addresses several of the concerns raised previously. Most importantly, it softens several conclusions that more clearly delineates limitations of the study. The study has yet to address how Dip2 and Pkc1 crosstalk, but new text addresses this limitation. There is also more analysis of Dip2 localization in other conditions where cell DAG pools are elevated (ie a LRO1 and DGA1 double KO, as well as PAH1 KO). Loss of these proteins elevates ER DAG, but Dip2 remains mitochondrially associated. This may imply DAG specificity, or that changes to DAG pools globally does not impact Dip2 import into mitochondria.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      Weaknesses: More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      (2) Does Dip2 colocalize with Plc1 or Pkc1? Does Dip2 reach the plasma membrane upon Plc activation?

      Comments on revisions:

      The authors have addressed my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      We thank the reviewer for finding this study of interest and appreciating our multi-pronged approach to prove our hypothesis that a distinct pool of DAGs regulated by Dip2 activate PKC signalling.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

      We thank the reviewer for the suggestion to trace the localization of Dip2 in the absence of various DAG-acting enzymes. To address this, we generated Dip2-GFP knock-in (KI) in Δpah1, Δlro1 and Δdga1 strains, confirming successful integration by western blotting using an anti-GFP antibody. We then performed microscopy to examine the localization of Dip2. Since Dip2 is a mitochondria-vacuole contact site protein that predominantly localizes to mitochondria (approximately 60% puncta of Dip2 localize to mitochondria) (Mondal et al. 2022), we co-stained the cells with MitoTracker red to visualize mitochondria.

      Consistent with our previous findings, Dip2 colocalizes with the MitoTracker red in WT (Figure 3-figure supplement 2 A). As suggested by the reviewer, we deleted PAH1, which converts phosphatidic acid to DAGs and is also known to work at the nucleus-vacuole junction. On examining whether absence of PAH1 influences the localization of Dip2, we found that there is no change in Dip2’s spatial organization. This could also be due to no observable change in the DAG species on deleting PAH1, as noted in our lipidomic studies (Figure 4. figure supplement 2A). These observations suggest that in a homeostatic condition, Pah1 does not affect the DAG pool acted upon by Dip2 and therefore has no influence on Dip2’s subcellular localization. This data has been incorporated in the revised manuscript (line no. 286-289) and Figure 4-figure supplement 2D-E.

      Similarly, we probed for the localization of Dip2 in LRO1 and DGA1 knock out strains. These enzymes are responsible for converting bulk DAGs to TAGs. We have previously shown that Dip2 is selective for only C36:0 and C36:1 and does not act on the bulk DAGs (Mondal et al. 2022). Both Lro1 and Dga1 are endoplasmic reticulum (ER) resident proteins and the bulk DAG accumulation in their knockouts is shown to be in the ER (Li et al. 2020), not influencing the mitochondrial DAG pool. On tracing Dip2’s localization in these knockouts, we found that Dip2 remains in the mitochondria (Figure 3-figure supplement 2, Figure 4. figure supplement 2D,E). These results suggest that Dip2 localization is not influenced by bulk DAG accumulation, reinforcing its specificity toward selective DAGs, which are likely to be present at mitochondria and mitochondria-vacuole contact sites. We have added this data in the revised manuscript (line no. 240-246) with Figure 3. figure supplement 2.

      Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      We would like to thank the reviewer for the positive comments on our work and finding the study novel and interesting.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      We thank the reviewer for the insightful comments. We were unable to include C36:1 DAG in our in vitro DAG binding assays because it is not commercially available. We have now explicitly mentioned it in the revised manuscript (Line no. 186).

      We agree with the reviewer that PKC activated by C36:0 and C36:1 DAGs is a critical conclusion of our work. While we understand that there is no obvious structural explanation as to how the DAG binding C1 domain of PKC attains the acyl chain specificity for DAGs, our conclusion that yeast Pkc1 is selective for C36:0 and C36:1 DAGs, is supported by a combination of robust in vitro and in vivo data:

      (1) In Vitro Evidence: The liposome binding assays demonstrate that the Pkc1 C1 domain binds only to the selective DAG and does not interact with bulk DAGs.

      (2) In Vivo Evidence: Lipidomic analyses of wild-type cells subjected to cell wall stress reveal increased levels of C36:0 and C36:1 DAGs, while levels of bulk DAGs remain unaffected.

      These findings collectively indicate that Pkc1 neither binds nor is activated by bulk DAGs, reinforcing its specificity for C36:0 and C36:1 DAGs.

      Moreover, the structural basis of this selectivity would require either a specific DAG-bound C1 domain structure of Pkc1, which is difficult owing to the flexibility of the longer acyl chains present in C36:0 and C36:1 DAGs. In addition, capturing the full-length Pkc1 structure that might provide deeper insights has been challenging for several other groups. Also, we hypothesize that the DAG selectivity by Pkc1 is more of a membrane phenomenon wherein these DAGs might create a specific microdomain or form a particular curvature that is sensed by Pkc1. Investigating this would require extensive structural and biophysical studies, that are beyond the scope of the current work but are planned for future research.

      (2) Does Dip2 colocalize with Plc1 or Pkc1?

      As shown in our previous study (Mondal et al. 2022) and in the above section (Figure 3. figure supplement 2(A-B)), Dip2 predominantly localizes to the mitochondria. Pkc1, on the other hand, is known to be found in the cytosol, plasma membrane and bud site (Andrews and Stark 2000). We also checked the localization of Pkc1, co-stained with mitotracker-red and observed no significant overlap between the two, confirming that Pkc1 does not colocalize with Dip2 (Author response image 1).

      Author response image 1.

      Live cell microscopy for tracing Pkc1 localization. (A) Microscopy image panel showing DIC image (left), fluorescence for (A) Pkc1 tagged with GFP, mitotracker-red for staining mitochondria and the merged image for both the fluorophores (right). Scale bar represents 5 µm. (B) Line scan plotted for the fluorescence intensity of Pkc1-GFP along with mitotracker-red across the line shown in the merged panel.

      Moreover, as suggested by the reviewer, we also checked the localization of Plc1 and found that Plc1 is present in cytosol and shows a partial colocalization with the mitochondria (Figure 4-figure supplement 3A-B). As some puncta of Dip2 also colocalize with the vacuoles, we checked whether Plc1 also follows such localization pattern. We costained Plc1-GFP with FM4-64, a vacuolar membrane dye and observed that Plc1 partially localizes to vacuoles as well (Figure 4-figure supplement 3C-D). This is also observed in a previous study where Plc1 was found in a subcellular fractionation of isolated yeast vacuoles and total cell lysate (Jun, Fratti, and Wickner 2004). We also checked similar to Dip2, whether Plc1 also localizes to the Mitochondria-vacuole contact site by using tri-colour imaging with FM4-64 for vacuole, DAPI for mitochondria and GFP tagged Plc1. We were not able to trace Dip2 and Plc1 simultaneously as we could not generate a strain endogenously tagged with two different colours even after several attempts. However, from our observations, we can conclude that Plc1 partially localizes to mitochondria and vacuole and might be locally producing the selective DAGs to be acted upon by Dip2. We have incorporated this data in the revised manuscript (line no. 301-304) with Figure 4-figure supplement 3.

      For probing the localization of Dip2 upon Plc1 activation, we used cell wall stress- a condition inducing Plc1 activation for selective DAG production (this study). Under this condition, we probed the localization of Dip2 by fluorescent microscopy and found that Dip2 does not move to the plasma membrane but remains localized to mitochondria (Figure. 1. figure supplement 3). This result has been added in the revised manuscript (line no. 153-160) with Figure. 1-figure supplement 3.

      This raises intriguing questions regarding the spatial regulation of Pkc1 by Dip2. Since Dip2’s localization remains unaffected, whether the selective DAGs, presumably at the mitochondria, move to the plasma membrane for Pkc1 activation or the Pkc1 translocates to the mitochondria needs further exploration. Addressing these possibilities will require a combination of genetic approaches, organellar lipidomics, and advanced microscopy, which we aim to explore in future studies.

      References:

      Andrews, P. D., and M. J. Stark. 2000. “Dynamic, Rho1p-Dependent Localization of Pkc1p to Sites of Polarized Growth.” Journal of Cell Science 113 ( Pt 15): 2685–93. doi:10.1242/jcs.113.15.2685.

      Jun, Youngsoo, Rutilio A. Fratti, and William Wickner. 2004. “Diacylglycerol and Its Formation by Phospholipase C Regulate Rab- and SNARE-Dependent Yeast Vacuole Fusion*.” Journal of Biological Chemistry 279(51): 53186–95. doi:10.1074/jbc.M411363200.

      Li, Dan, Shu-Gao Yang, Cheng-Wen He, Zheng-Tan Zhang, Yongheng Liang, Hui Li, Jing Zhu, et al. 2020. “Excess Diacylglycerol at the Endoplasmic Reticulum Disrupts Endomembrane Homeostasis and Autophagy.” BMC Biology 18(1): 107. doi:10.1186/s12915-020-00837-w.

      Mondal, Sudipta, Priyadarshan Kinatukara, Shubham Singh, Sakshi Shambhavi, Gajanan S Patil, Noopur Dubey, Salam Herojeet Singh, et al. 2022. “DIP2 Is a Unique Regulator of Diacylglycerol Lipid Homeostasis in Eukaryotes.” eLife 11: e77665. doi:10.7554/eLife.77665.

    1. eLife Assessment

      The authors show that an automated approach using artificial neural networks, which focuses on behaviourally relevant dimensions, can predict human similarity data up to a certain level of granularity. This study has the potential to be a valuable contribution to the broader field of cognitive computational neuroscience, as it provides a tool for the automated collection of similarity judgments under certain conditions. However, as of now, the significance of this method is somewhat limited because of its inability to generalise beyond between-category distinctions and the limited model evaluation. In terms of broader implications, the degree to which this work provides insights into DNN-brain alignment and a better understanding of the functional organisation of the visual system is supported by incomplete evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses the challenge of understanding and capturing the similarity among large numbers of visual images. The authors show that an automated approach using artificial neural networks that focuses upon the embedding of similarity through behaviorally relevant dimensions can predict human similarity data up to a certain level of granularity.

      Strengths:

      The manuscript starts with a very useful introduction that sets the stage with an insightful Figure 1. The methods are state of the art and well thought off, and the data are compelling. The authors demonstrate the added value of their approach in several directions, resulting in a manuscript that is highly relevant for different domains. The authors also explore its limitations (e.g., granularity).

      Weaknesses:

      Although this manuscript and the work it describes are already of high quality, I see several ways in which it could be further improved. Below I rank these suggestions tentatively in order of importance.

      Predictions obtain correlations above 0.80, often close to correlations of 0.90. The performance of DimPred is not trivial, given how much better it performs relative to classic RSA and feature reweighting. Yet, the ceiling is not sufficiently characterized. What is the noise ceiling in the main and additional similarity sets that are used? If the noise ceiling is higher than the prediction correlations, then can the authors try to find the stimulus pairs for which the approach systematically fails to capture similarity? Or is the mismatch very distributed across the full stimulus set?

      Also in the section on p. 8-p.9, it is crucial to provide information on the noise ceiling of the various datasets.

      This consideration of noise ceiling brings me to another consideration. Arguments have been made that a focus on overall prediction accuracy might mask important differences in underlying processes that can be demonstrated in more specific, experimental situations (Bowers et al., 2023). Can the authors exclude the possibility that their automatic approach would fail dramatically in specifically engineered situations? Some examples can be found in the 2024 challenge of the BrainScore platform. How can future users of this approach know whether they are in such a situation or not?

      The authors demonstrated one limitation of the DimPred approach to capture fine-grained similarity among highly similar stimuli. The implications of this finding were not clear to me from the Abstract etc, because it is not sufficiently highlighted in the summaries that in this case DimPred performs even worse, and much worse, than more simple approaches like feature reweighting and even than classic RSA. I would discuss this outcome more in detail. With hindsight, this problem might not be so surprising given that DimPred relies upon the embedding with a few tens dimensions that mostly capture between-category differences. To me, this seems like a more fundamental limitation than a mere problem of granularity or lack of data, as suggested in the abstract.

      The DimPred approach is based on the dimensions of a similarity embedding derived from human behavior. What is important here is (i) that DimPred is based upon an approach that tries to capture latent dimensions; or (ii) that these dimensions are behaviorally relevant? There are a lot of dimension-focused approaches. Generic ones are PCA, MDS, etc. More domain-specific approaches in cogneuro include the following: (i) for two-dimensional shape representations, good results have been obtained with image-computable dimensions of various levels of complexity (Morgenstern et al., 2021, PLOS Comput. Biol.); (ii) another dimension-focused approach has focused upon identifying dimensions that are universal across networks & human representations (Chen & Bonner, 2024, arXiv). Would such generic or more specific approaches work as well as DimPred?

    3. Reviewer #2 (Public review):

      In this paper, the authors successfully incorporated the 49 dimensions found in a human similarity judgment task to better train DNNs to perform accurate human-like object similarity judgments. The results of the model performance are impressive but I am not totally convinced that the present modeling approach may bring new insights regarding the mental and neural representations of visual objects in the human brain. I have a few thoughts that I would like the authors to consider.

      (1) Can the authors provide a detailed description of what these off-the-shelf DNNs are trained on? For models trained on visual images only, because semantic information was never present during training, it is not surprising they fail to capture such information, even with additional DimPred training. For the CLIP models, because visual-sematic associations were included during training, it again comes as no surprise that these models can do better even without DimPred training. Similarly, the results of homogenous image sets are not particularly surprising. In this regard, I am finding the paper reports many obvious results. Better motivations should be used to justify why particular models and analyses were performed, what predictions can be made, and how the results may be informative beyond what we already know.

      (2) I am curious as to what DimPred training is doing exactly. If you create an arbitrary similarity structure (i.e., not the one derived from human similarity judgment) by, e.g., shuffling the values during training or creating 49 arbitrary dimensions, can the models be trained to follow this new arbitrary structure? In other words, do the models intrinsically contain a human-like structure, but we just have to find the right parameters to align them with the human structure or do we actually impose/force the human similarity structure onto the model with DimPred training?

      Is it also an issue that you are including more parameters during DimPred training and that increased parameters alone can increase performance?

      (3) There is very little information on how Figure 8 is generated. I couldn't find in the Methods any detailed descriptions of how the values were calculated. Are results from both the category-insensitive and category-sensitive embedding obtained from the same OpenCLIP-RN50x64? Figure 8 reports the relative improvement. What do the raw activation maps look like for the category-insensitive and category-sensitive embedding? I am surprised that the improvement is seen primarily in the early visual cortex (EVC) and higher visual areas but not more extensively in association areas sensitive to semantics. Why should EVC show such large improvements, given that category information is stored elsewhere?

      Related to this point, how do other DNN models account for human brain fMRI responses in the present study? Many prior studies have documented the similarities and differences between DNN and human fMRI visual object representations. Do category-sensitive CLIP models outperform other DNN models? It is important to report the full results. Even though category-sensitive CLIP models outperform category-insensitive CLIP ones, if the overall model performance is low compared to the other DNNs, the results would not be very meaningful/impressive. I am just wondering if, in the process of achieving better human-like similarity judgment performance, these models lose some of the ability to account for visual object representations in the human ventral visual cortex.

      (4) I am wondering how precisely the present results may yield new insights into the mental and neural representations of visual objects in the human brain. Prior human studies have already identified 49 dimensions that can capture human similarity judgment. Beyond predicting performance for new pairs of objects, how would the present modeling approach help us understand more about the human brain? The authors discussed this, but I am not sure the arguments are convincing.

    4. Reviewer #3 (Public review):

      Summary:

      The authors compare how well their automatic dimension prediction approach (DimPred) can support similarity judgements and compare it to more standard RSA approaches. The authors show that the DimPred approach does better when assessing out-of-sample heterogeneous image sets, but worse for out-of-sample homogeneous image sets. DimPred also does better at predicting brain-behaviour correspondences compared to an alternative approach. The work appears to be well done, but I'm left unsure what conclusions the authors are drawing.

      In the abstract, the authors write: "Together, our results demonstrate that current neural networks carry information sufficient for capturing broadly-sampled similarity scores, offering a pathway towards the automated collection of similarity scores for natural images". If that is the main claim, then they have done a reasonable job supporting this conclusion. However the importance of automating this process for broadly-sampled object categories is not made so clear.

      But the authors also highlight the importance that similarity judgements have been for theories of cognition and brain, such as in the first paragraph of the paper they write: "Similarity judgments allow us to improve our understanding of a variety of cognitive processes, including object recognition, categorization, decision making, and semantic memory6-13. In addition, they offer a convenient means for relating mental representations to representations in the human brain14,15 and other domains16,17". The fact that the authors also assess how well a CLIP model using DimPred can predict brain activation suggests that their work is not just about automating similarity judgements, but highlighting how their approach reveals that ANNs are more similar to brains than previously assessed.

      My main concern is with regards to the claim that DimPred is revealing better similarities between ANNs and brains (a claim that the authors may not be making, but this should be clarified). The fact that predictions are poor for homogenous images is problematic for this claim, and I expect their DimPred scores would be very poor under many conditions, such as when applied to line drawings of objects, or a variety of addition out-of-sample stimuli that are easily identified by humans. The fact that so many different models get such similar prediction scores (Fig 3) also raises questions as to the inferences you can make about ANN-brain similarity based on the results. Do the authors want to claim that CLIP models are more like brains?

      With regards to the brain prediction results, why is the DimPred approach doing so much better in V1? I would not think the 49 interpretable categories are encoded in V1, and the ability to predict would likely reflect a confound rather than V1 encoding these categories (e.g., if a category was "things that are burning" then DNN might predict V1 activation based on the encoding of colour).

      In addition, more information is needed on the baseline model, as it is hard to appreciate whether we should be impressed by the better performance of DimPred based on what is provided: "As a baseline, we fit a voxel encoding model of all 49 dimensions. Since dimension scores were available only for one image per category36, for the baseline model, we used the same value for each image of the same category and estimated predictive performance using cross-validation". Is it surprising that predictions are not good with one image per category? Is this a reasonable comparison?

      Relatedly, what was the ability of the baseline model to predict? (I don't think that information was provided). Did the authors attempt to predict outside the visual brain areas? What would it mean if predictions were still better there?

      Minor points:

      The authors write: "Please note that, for simplicity, we refer to the similarity matrix derived from this embedding as "ground-truth", even though this is only a predicted similarity". Given this, it does not seem a good idea to use "ground truth" as this clarification will be lost in future work citing this article.

      It would be good to have the 49 interpretable dimensions listed in the supplemental materials rather than having to go to the original paper.

      Strengths:

      The experiments seem well done.

      Weaknesses:

      It is not clear what claims are being made.

    5. Author response:

      We wish to express our gratitude to the reviewers for their insightful and constructive comments on the initial version of our manuscript. We greatly value their observations and have every intention of addressing their remarks in a thorough and constructive manner. Based on the editors’ and reviewers’ feedback, we realize that it was not entirely clear that we intended this work primarily to be a resource and not yield strong insights into DNN-human alignment. Since our method also covers the broad range of natural objects - as used in the vast majority of studies on object processing - we also feel we did not sufficiently highlight the breadth of the tool. Based on the editors’ assessment, our explorations into the limits of the method - which we saw as a strength, not a weakness of our work - perhaps overshadowed the otherwise broad applicability somewhat. We hope to clarify this in the revised manuscript. Beyond these general points, we would like to address the following four points:

      • Where feasible, we intend to undertake additional analyses and refine existing ones. For instance, we plan to provide noise ceilings for all datasets where such calculations are possible, and we plan to give careful consideration to implementing a permutation or label-shuffling test to explore some of the ideas shared by the reviewers.

      • We plan to discuss more thoroughly several topics raised by the reviewers (e.g., how our approach might contend with different experimental situations such when using line drawings as stimuli).

      • We aim to enhance the clarity of our manuscript throughout. This will include refining the wording of our abstract and offering a more detailed explanation of the methods employed in the fMRI analyses.

      • We plan to elaborate further on our line of reasoning by addressing potential sources of misunderstanding—such as clarifying what we mean by a “lack of data” and providing greater detail regarding the nature of the 49-dimensional embedding.

    1. eLife Assessment

      This study presents a valuable finding on the direct cytotoxic effects of DuoHexaBody-CD37 in diffuse large B-cell lymphoma, mediated via SHP-1 activation and antibody clustering, independent of complement. The evidence supporting this mechanism is incomplete, with additional work needed to clarify SHP-1's role, the contribution of Fc receptor crosslinking, and the biological relevance across normal and malignant B cells. As the findings are based primarily on in vitro models, further validation would be required to support broader translational conclusions.

    2. Joint Public Reviews:

      In this study, the authors suggest that DuoHexaBody-CD37, a biparatopic CD37-targeting antibody, can induce direct cytotoxicity in diffuse large B-cell lymphoma (DLBCL) cells through antibody clustering and SHP-1 activation, independent of complement. They further propose that DuoHexaBody-CD37 inhibits cytokine-mediated pro-survival signalling, suggesting a broader role for CD37-directed therapy in disrupting tumour supportive signalling networks.

      A strength of the study is the systematic in vitro characterisation of signalling responses to DuoHexaBody-CD37 across both malignant and normal B-cells. The inclusion of phosphoproteomic profiling and mutant constructs provides mechanistic detail, and the findings may be of interest to researchers working on antibody therapeutics in lymphoma.

      However, the evidence supporting key mechanistic processes - particularly the role of SHP-1 in mediating cytotoxicity and the requirement for Fc receptor crosslinking - is incomplete and would benefit from further functional validation. While CD37 has been explored previously as a therapeutic target, this study does add mechanistic insight into direct cytotoxicity and cytokine modulation. Nevertheless, the exclusive reliance on in vitro systems makes the translational relevance unclear.

      Overall, the study provides valuable insight into CD37-mediated signalling in lymphoma cells, but the evidence remains incomplete to support broader conclusions about therapeutic impact.

    3. Author response:

      The evidence supporting this mechanism is incomplete, with additional work needed to clarify SHP-1's role, the contribution of Fc receptor crosslinking, and the biological relevance across normal and malignant B cells. 

      We will address these points by:

      - including SHP-1 inhibitors in the DuoHexaBody-CD37 cytotoxicity experiments to address the role of SHP-1

      - investigating which Fc receptors are involved in the crosslinking using FcR blocking antibodies and/or use purified fixed effector cells that express different Fc receptors in the DuoHexaBody-CD37 cytotoxicity experiments 

      - study the effect of DuoHexaBody-CD37 on normal B cells

      As the findings are based primarily on in vitro models, further validation would be required to support broader translational conclusions.

      We would like to refer to previous studies that showed potent cytotoxicity of DuoHexaBody-CD37 in vivo, including xenograft and PDX lymphoma models supporting broader translational conclusions:

      Oostindie et al. Blood Cancer Journal (2020) 10:30 https://doi.org/10.1038/s41408-020-0292-7

    1. eLife Assessment

      This important study uses standard single-cell RNA-seq analyses combined with methods from the social sciences to assess heterogeneity in gene expression in Drosophila imaginal wing disc cells treated with 4000 rads of ionizing radiation. The use of this methodology from social sciences is novel in Drosophila. A cell cycle based clustering approach allows them to identify a subpopulation of cells that is disproportionately responsible for much of the radiation-induced gene expression. Their convincing analyses reveal genes that are expressed regionally after irradiation, including ligands and transcription factors that have been associated with regeneration, as well as others whose roles in response to irradiation are unknown. This paper would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyze transcription in single cells before and after 4000 rads of ionizing radiation. They use Seuratv5 for their analyses, which allows them to show that most of the genes cluster along the proximal-distal axis. Due to the high heterogeneity in the transcripts, they use the Herfindahl-Hirschman index (HHI) from Economics, which measures market concentration. Using the HHI, they find that genes involved in several processes (like cell death, response to ROS, DNA damage response (DDR)) are relatively similar across clusters. However, ligands activating the JAK/STAT, Pvr, and JNK pathways and transcription factors Ets21C and dysf are upregulated regionally. The JAK/STAT ligands Upd1,2,3 require p53 for their upregulation after irradiation, but the normal expression of Upd1 in unirradiated discs is p53-independent. This analysis also identified a cluster of cells that expressed tribbles, encoding a factor that downregulates mitosis-promoting String and Twine, that appears to be G2/M arrested and expressed numerous genes involved in apoptosis, DDR, the aforementioned ligands, and TFs. As such, the tribbles-high cluster contains much of the heterogeneity.

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented.

      (4) The literature is appropriately cited.

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      (2) Some of the figures are unclear.

    3. Reviewer #2 (Public review):

      This manuscript investigates the question of cellular heterogeneity using the response of Drosophila wing imaginal discs to ionizing radiation as a model system. A key advance here is the focus on quantitatively expressing various measures of heterogeneity, leveraging single-cell RNAseq approaches. To achieve this goal, the manuscript creatively uses a metric from the social sciences called the HHI to quantify the spatial heterogeneity of expression of individual genes across the identified cell clusters. Inter- and intra-regional levels of heterogeneity are revealed. Some highlights include the identification of spatial heterogeneity in the expression of ligands and transcription factors after IR. Expression of some of these genes shows dependence on p53. An intriguing finding, made possible by using an alternative clustering method focusing on cell cycle progression, was the identification of a high-trbl subset of cells characterized by concordant expression of multiple apoptosis, DNA damage repair, ROS-related genes, certain ligands, and transcription factors, collectively representing HIX genes. This high-trbl set of cells may correspond to an IR-induced G2/M arrested cell state.

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

    4. Reviewer #3 (Public review):

      Summary:

      Cruz and colleagues report a single-cell RNA sequencing analysis of irradiated Drosophila larval wing discs. This is a pioneering study because prior analyses used bulk RNAseq analysis, so differences at single-cell resolution were not discernible. To quantify heterogeneity in gene expression, the authors make clever use of a metric used to study market concentration, the Herfindahl-Hirschman Index. They make several important observations, including region-specific gene expression coupled with heterogeneity within each region and the identification of a cell population (high Trbl) that seems disproportionately responsible for radiation-induced gene expression.

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

    5. Author response:

      We thank the reviewers for their comments and for their constructive suggestions. We intend to submit a revised manuscript where we address the comments made in the Public Reviews as well as in the Recommendations for the Authors.

      One of our most interesting findings, as noted by the reviewers, was the discovery of a small subpopulation of cells likely arrested in G2 that accounts for a disproportionate amount of radiation-induced gene expression. In addition, to the responses indicated below, we are planning to include additional “wet lab” experiments in the revised manuscript that address the properties of this seemingly important subpopulation of cells.

      Reviewer 1:

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented.

      (4) The literature is appropriately cited.

      Thank you for these comments

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      (2) Some of the figures are unclear.

      In the revised manuscript, we will provide a more detailed quantitative analysis. For each condition, we analyzed 4 - 9 discs.

      We assume that the reviewer in referring to panels in Figure 1. We will review these images and if necessary, repeat the experiments or choose alternative images that appear clearer.

      Reviewer 2:

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      We intend to include more  “wet lab” experiments in our revised manuscript to address the identity and properties of the high-trbl cells that we have identified using the clustering approach based on cell-cycle gene expression.

      Reviewer 3:

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Thank you.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNAseq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to  57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Clusters 0, 1, and 2 likely contain cells in other stages of the cell cycle, including early G1. Other studies indicate that more than 70% of cells are expected to have a 4C DNA content 4 h after irradiation at 4000 Rad. The high-trbl cluster only accounts for 18% of cells. Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs. We are mining the gene expression patterns in these clusters with the goal of estimating their location in the cell cycle and will include those data in the revised manuscript.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread. We intend to present additional data that address this point and also a more thorough discussion in the revised manuscript.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high_-trbl_ cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state. We will attempt to look for co-localization as suggested by the reviewer.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. We are investigating other possibilities such as the maturity of discs.

    1. eLife Assessment

      The article presents important findings describing the role of IL27 in maintaining HSCs at steady state, and in emergency haematopoiesis in response to T. goodii by limiting the inflammatory monocyte outcomes. However, the evidence is still incomplete, as not enough evidence is provided to support that IL27 only acts at the level of HSCs and not downstream. This study will be of interest to immunologists and hematologists, as well as infectious disease researchers.

    2. Reviewer #1 (Public review):

      In the manuscript, Aldridge and colleagues investigate the role of IL-27 in regulating hematopoiesis during T. gondii infection. Using loss-of-function approaches, reporter mice, and the generation of serial chimeric mice, they elegantly demonstrate that IL-27 induction plays a critical role in modulating bone marrow myelopoiesis and monocyte generation to the infection site. The study is well-designed, with clear experimental approaches that effectively address the mechanisms by which IL-27 regulates bone marrow myelopoiesis and prevents HSC exhaustion.

    3. Reviewer #2 (Public review):

      Summary:

      Aldridge et al. aim to demonstrate the role of IL27 in limiting emergency myelopoiesis in response to Toxoplasma gondii infection by acting directly at the level of early haematopoietic progenitors.

      They used different mouse genetic models, such as HSC lineage tracing, IL27 and IL27R-deficient mice, to show that:

      (1) HSCs actively participate in emergency myelopoiesis during Toxoplasma gondii infection.

      (2) The absence of IL27 and IL27R increases monocyte progenitors and monocytes, mainly inflammatory monocytes CCR2hi.

      (3) At steady state, loss of IL27 impairs HSC fitness as competitive transplantation shows long-term engraftment deficiency of IL27 BM cells. This impairment is exacerbated after infection.

      (4) IL27 is produced by various BM and other tissue cells at steady state, and its expression increases with infection, mainly by increasing the number of monocytes producing it.

      Although it is indisputable that IL27 has a role in emergency myelopoiesis by limiting the number of pro-inflammatory monocytes in response to infection, the authors' claim that it acts only on HSCs and not on more committed progenitors (CMP, GMP, MP) is not supported by the quality of the data presented here, as described below in the weakness section. In addition, this study highlights a role for IL27 during infection, but does not focus on trained immunity, which is the focus of the targeted elife issue.

      Weaknesses:

      (1) In Figure 4, MFI quantification is required. This figure also shows the expression level (FACS and RNA) in progenitors (GMP and CMP, GP, MP), which is quite similar to that of HSC at this level, so it is really surprising that CMP does not respond at all to IL27 (S5C).

      (2) Total BM was used to test the direct effect of IL27 on HSC. There could be an indirect effect from other more mature BM cells, even if they show lower receptor expression than HSC. This should be done on a different sorted population to prove the direct effect of IL27 on HSC. The authors need to look more closely at some stat-dependent genes or stat itself in different sorted cell populations, not just irgm1. It is also known that Stat is associated with increased HSC proliferation in response to IFN, which is the opposite of what is observed here.

      (3) The decrease in HSC fitness in IL27R KO at steady state could be an indirect effect of the increase in proinflammatory monocytes contributing to high levels of inflammatory cytokines in the BM and thus chronic HSC activation that is enhanced in response to infection. What is the pro-inflammatory cytokine profile of the BM of IL27 or IL27R deficient mice and of mixed chimera mice?

      (4) Furthermore, the FACS profile of KI67/brdu of Figure 7 is doubtful, as it is shown in different literature that KSL are not predominantly quiescent as shown here, but about 50% are KI67-. This is also inconsistent with the increase of HSC observed in Figure 1. Quantification of total BruDU+ HSC and other progenitors is also important to quantify all cells that have proliferated during infection. As the repopulation of IL27-deficient BM is also lower in the absence of infection, the proliferation of HSC in IL27R KO mice in the absence of infection is also important.

      (5) The immunofluorescence in Figure 3 shows a high level of background and it is difficult to see the GFP and tomato positive cells. In this sense, the number of HSCs quantified as Procr+ (more than 8000 on a single BM section) is inconsistent with the total number of HSCs that a BM can contain (i.e., around 6000 per BM as quantified in Figure 1).

      (6) The addition of arrows to the figure will help to visualise positive cells. It is also not clear why the author normalised the GFP+ cells to the tomato+ cells in Figure 3D.

      (7) Furthermore, even if monocytes represent a high proportion of IL27-producing cells, they are only 50% of the cells at 5dpi, as shown in Figure 3 and S4. Without other monocyte markers, line 307 is incorrect.

      (8) How do the authors explain that in Figure 1, 5-10% of labelled precursors and monocytes can give 100% of monocytes? This would mean that only labelled HSC can differentiate into PEC monocytes.

    1. eLife Assessment

      The granularity with which neural activity in the sensorimotor cortex of mice corresponds to voluntary forelimb motion is a key open question. This paper provides convincing evidence for the encoding of low-level features like joint angles and represents an important step forward toward understanding the cortical origins of limb control signals.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

      Weaknesses:

      In the last section of the results, the authors purport to examine the representation of "higher-level target-related signals," using the decoding of reach direction. While I think the authors are careful in their phrasing here, I think they should be more explicit about what these signals could be reflecting. The "signals" here that are used to decode direction could relate to anything - low-level signals related to limb or postural muscles, or true high-level commands that dictate only what movement downstream motor centers should execute, rather than the muscle commands that dictate how. One could imagine using a partial correlation-type approach again here to extract a signal uncorrelated with all the measured low-level parameters, but there would still be all the unmeasured ones. Again, I think it is still ok to call these "high-level signals," but I think some explicit discussion of what these signals could reflect is necessary.

      Related to this, I think the manuscript in general does not do an adequate job of explicitly raising the important caveats in interpreting parametric correlations in motor system signals, like those raised by Todorov, 2000. The authors do an expert job of handling the correlations, using PCA to extract uncorrelated components and using the partial correlation approach. However, more clarity about the range of possible signal types the recorded activity could reflect seems necessary.

      The manuscript could also do a better job of clarifying relevant similarities and differences between the rodent and primate systems, especially given the claims about the rodent being a "first-class" system for examining the cellular and circuit basis of motor control, which I certainly agree with. Interspecies similarities and differences could be better addressed both in the Introduction, where results from both rodents and primates are intermixed (second paragraph), and in the Discussion, where more clarity on how results here agree and disagree with those from primates would be helpful. For example, the ratio of corticospinal projections targeting sensory and motor divisions of the spinal cord differs substantially between rodents and primates. As another example, the relatively high physical proximity between the typical neurons in mouse M1 and S1 compared to primates seems likely to yoke their activity together to a greater extent. There is also the relatively large extent of fS1 from which forelimb movements can be elicited through intracortical microstimulation at current levels similar to those for evoking movement from M1. All of these seem relevant in the context of findings that activity in mouse M1 and S1 are similar.

      In addition, there are a number of other issues related to the interpretation of findings here that are not adequately addressed. These are described in the Recommendations for improvement.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Weaknesses:

      It would be helpful for the authors to be more explicit about which models of mouse cortical function their results support or rule out, and how their findings break new conceptual ground.

    1. eLife Assessment

      This useful study reports a method to detect and analyze a novel post-translational modification, lysine acetoacetylation (Kacac), finding it regulates protein metabolism pathways. The study unveils epigenetic modifiers involved in placing this mark, including key histone acetyltransferases such as p300, and concomitant HDACs, which remove the mark. Proteomic and bioinformatics analysis identified many human proteins with Kacac sites, potentially suggesting broad effects on cellular processes and disease mechanisms. While the data presented are solid, the functional validation of the sites would add significantly to the manuscript's description of this modification; the study will be of interest to those studying protein and metabolic regulation.

    2. Reviewer #1 (Public review):

      Summary

      Lysine acetoacetylation (Kacac) is a recently discovered histone post-translational modification (PTM) connected to ketone body metabolism. This research outlines a chemo-immunological method for detecting Kacac, eliminating the requirement for creating new antibodies. The study demonstrates that acetoacetate acts as the precursor for Kacac, which is catalyzed by the acyltransferases GCN5, p300, and PCAF, and removed by the deacetylase HDAC3. Acetoacetyl-CoA synthetase (AACS) is identified as a central regulator of Kacac levels in cells. A proteomic analysis revealed 139 Kacac sites across 85 human proteins, showing the modification's extensive influence on various cellular functions. Additional bioinformatics and RNA sequencing data suggest a relationship between Kacac and other PTMs, such as lysine β-hydroxybutyrylation (Kbhb), in regulating biological pathways. The findings underscore Kacac's role in histone and non-histone protein regulation, providing a foundation for future research into the roles of ketone bodies in metabolic regulation and disease processes.

      Strengths

      (1) The study developed an innovative method by using a novel chemo-immunological approach to the detection of lysine acetoacetylation. This provides a reliable method for the detection of specific Kacac using commercially available antibodies.

      (2) The research has done a comprehensive proteome analysis to identify unique Kacac sites on 85 human proteins by using proteomic profiling. This detailed landscape of lysine acetoacetylation provides a possible role in cellular processes.

      (3) The functional characterization of enzymes explores the activity of acetoacetyltransferase of key enzymes like GCN5, p300, and PCAF. This provides a deeper understanding of their function in cellular regulation and histone modifications.

      (4) The impact of acetyl-CoA and acetoacetyl-CoA on histone acetylation provides the differential regulation of acylations in mammalian cells, which contributes to the understanding of metabolic-epigenetic crosstalk.

      (5) The study examined acetoacetylation levels and patterns, which involve experiments using treatment with acetohydroxamic acid or lovastatin in combination with lithium acetoacetate, providing insights into the regulation of SCOT and HMGCR activities.

      Weakness

      (1) There is a limitation to functional validation, related to the work on the biological relevance of identified acetoacetylation sites. Hence, the study requires certain functional validation experiments to provide robust conclusions regarding the functional implications of these modifications on cellular processes and protein function. For example, functional implications of the identified acetoacetylation sites on histone proteins would aid the interpretation of the results.

      (2) The authors could have studied acetoacetylation patterns between healthy cells and disease models like cancer cells to investigate potential dysregulation of acetoacetylation in pathological conditions, which could provide insights into their PTM function in disease progression and pathogenesis.

      (3) The time-course experiments could be performed following acetoacetate treatment to understand temporal dynamics, which can capture the acetoacetylation kinetic change, thereby providing a mechanistic understanding of the PTM changes and their regulatory mechanisms.

      (4) Though the discussion section indeed provides critical analysis of the results in the context of existing literature, further providing insights into acetoacetylation's broader implications in histone modification. However, the study could provide a discussion on the impact of the overlap of other post-translational modifications with Kacac sites with their implications on protein functions.

      Impact

      The authors successfully identified novel acetoacetylation sites on proteins, expanding the understanding of this post-translational modification. The authors conducted experiments to validate the functional significance of acetoacetylation by studying its impact on histone modifications and cellular functions.

    3. Reviewer #2 (Public review):

      In the manuscript by Fu et al., the authors developed a chemo-immunological method for the reliable detection of Kacac, a novel post-translational modification, and demonstrated that acetoacetate and AACS serve as key regulators of cellular Kacac levels. Furthermore, the authors identified the enzymatic addition of the Kacac mark by acyltransferases GCN5, p300, and PCAF, as well as its removal by deacetylase HDAC3. These findings indicate that AACS utilizes acetoacetate to generate acetoacetyl-CoA in the cytosol, which is subsequently transferred into the nucleus for histone Kacac modification. A comprehensive proteomic analysis has identified 139 Kacac sites on 85 human proteins. Bioinformatics analysis of Kacac substrates and RNA-seq data reveals the broad impacts of Kacac on diverse cellular processes and various pathophysiological conditions. This study provides valuable additional insights into the investigation of Kacac and would serve as a helpful resource for future physiological or pathological research.

      The following concerns should be addressed:

      (1) A detailed explanation is needed for selecting H2B (1-26) K25 sites over other acetylation sites when evaluating the feasibility of the chemo-immunological method.

      (2) In Figure 2(B), the addition of acetoacetate and NaBH4 resulted in an increase in Kbhb levels. Specifically, please investigate whether acetoacetylation is primarily mediated by acetoacetyl-CoA and whether acetoacetate can be converted into a precursor of β-hydroxybutyryl (bhb-CoA) within cells. Additional experiments should be included to support these conclusions.

      (3) In Figure 2(E), the amount of pan-Kbhb decreased upon acetoacetate treatment when SCOT or AACS was added, whereas this decrease was not observed with NaBH4 treatment. What could be the underlying reason for this phenomenon?

      (4) The paper demonstrates that p300, PCAF, and GCN5 exhibit significant acetoacetyltransferase activity and discusses the predicted binding modes of HATs (primarily PCAF and GCN5) with acetoacetyl-CoA. To validate the accuracy of these predicted binding models, it is recommended that the authors design experiments such as constructing and expressing protein mutants, to assess changes in enzymatic activity through western blot analysis.

      (5) HDAC3 shows strong de-acetoacetylation activity compared to its de-acetylation activity. Specific experiments should be added to verify the molecular docking results. The use of HPLC is recommended, in order to demonstrate that HDAC3 acts as an eraser of acetoacetylation and to support the above conclusions. If feasible, mutating critical amino acids on HDAC3 (e.g., His134, Cys145) and subsequently analyzing the HDAC3 mutants via HPLC and western blot can further substantiate the findings.

      (6) The resolution of the figures needs to be addressed in order to ensure clarity and readability.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents a timely and significant contribution to the study of lysine acetoacetylation (Kacac). The authors successfully demonstrate a novel and practical chemo-immunological method using the reducing reagent NaBH4 to transform Kacac into lysine β-hydroxybutyrylation (Kbhb).

      Strengths:

      This innovative approach enables simultaneous investigation of Kacac and Kbhb, showcasing their potential in advancing our understanding of post-translational modifications and their roles in cellular metabolism and disease.

      Weaknesses:

      The paper's main weaknesses are the lack of SDS-PAGE analysis to confirm HATs purity and loading consistency, and the absence of cellular validation for the in vitro findings through knockdown experiments. These gaps weaken the evidence supporting the conclusions.

    1. eLife Assessment

      This paper investigates how isoform II of transcription factor RUNX2 promotes cell survival and proliferation in oral squamous cell carcinoma cell lines. The authors used gain and loss of function techniques to provide convincing evidence showing that RUNX2 isoform silencing led to cell death via several mechanisms including apoptosis and ferroptosis that was partially suppressed through RUNX2 regulation of PRDX2 expression. The study provides valuable insight into the underlying mechanism by which RUNX2 acts in oral squamous cell carcinoma.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, authors investigated the role of RUNT-related transcription factor 2 (RUNX2) in oral squamous carcinoma (OSCC) growth and resistance to ferroptosis. They found that RUNX2 suppresses ferroptosis through transcriptional regulation of peroxiredoxin-2. They further explored the upstream positive regulator of RUNX2, HOXA10 and found that HOXA1/RNUX2/PRDX2 axis protects OSCC from ferroptosis.

      Strengths:

      The study is well designed and provides a novel mechanism of HOXA1/RNUX2/PRDX2 control of ferroptosis in OSCC.

      Weaknesses:

      According to the data presented in (Figure 2F, Figure 3F and G, Figure 5D and Figure 6E and F), apoptosis seems to be affected in the same amount as ferroptosis by HOXA1/RNUX2/PRDX2 axis, which raises a question on the authors' specific focus on ferroptosis in this study. Reasonably, authors should adapt the title and the abstract in a way that it recapitulates the whole data, which is HOXA1/RNUX2/PRDX2 axis control of cell death, including ferroptosis and apoptosis in OSCC.

      Comments on revisions:

      The revised manuscript has been well improved, and I'm satisfied with the authors' response to my comments.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper investigates how isoform II of transcription factor RUNX2 promotes cell survival and proliferation in oral squamous cell carcinoma cell lines. The authors used gain and loss of function techniques to provide incomplete evidence showing that RUNX2 isoform silencing led to cell death via several mechanisms including ferroptosis that was partially suppressed through RUNX2 regulation of PRDX2 expression. The study provides useful insight into the underlying mechanism by which RUNX2 acts in oral squamous cell carcinoma, but the conclusions of the authors should be revised to acknowledge that ferroptosis is not the only cause of cell death.

      We appreciate the editor’s positive comments on our work and the valuable suggestions provided by the reviewers. We did find that RUNX2 isoform II knockdown or HOXA10 knockdown could also lead to apoptosis. We have revised our title as following: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”. In addition, we have also revised our conclusions in the abstract as follows: “OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.” We have added more experiments to better support our conclusions. Please see following responses to reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, authors investigated the role of RUNT-related transcription factor 2 (RUNX2) in oral squamous carcinoma (OSCC) growth and resistance to ferroptosis. They found that RUNX2 suppresses ferroptosis through transcriptional regulation of peroxiredoxin-2. They further explored the upstream positive regulator of RUNX2, HOXA10 and found that HOXA10/RUNX2/PRDX2 axis protects OSCC from ferroptosis.

      Strengths:

      The study is well designed and provides a novel mechanism of HOXA10/RUNX2/PRDX2 control of ferroptosis in OSCC.

      Weaknesses:

      According to the data presented in (Figure 2F, Figure 3F and G, Figure 5D and Figure 6E and F), apoptosis seems to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis, which raises questions on the authors' specific focus on ferroptosis in this study. Reasonably, authors should adapt the title and the abstract in a way that recapitulates the whole data, which is HOXA10/RUNX2/PRDX2 axis control of cell death, including ferroptosis and apoptosis in OSCC.

      We really grateful for your comments. We agree that these figures do show that isoform II-knockdown or HOXA10-knockdown could induce apoptosis. We have adapted the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”

      In addition, we have performed the rescue experiment showing that PRDX2 overexpression rescues the apoptosis induced by isoform II-knockdown (Figure 4-figure supplement 4) or HOXA10-knockdown (Figure 7-figure supplement 2).

      We have added the description about these experiments in result “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis and apoptosis through RUNX2 isoform II” as follow: “In addition, we found that PRDX2 overexpression could partially reduce the increased apoptosis caused by isoform II-knockdown. (Figure 4-figure supplement 4).” “PRDX2 overexpression also could rescue the increased cellular apoptosis caused by HOXA10 knockdown (Figure 7-figure supplement 2).”.

      Comments:

      In the description of the result section related to Figure 3E, the author wrote "In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). These results suggest that RUNX2 isoform II may suppress ferroptosis." The interpretation provided here is not clear to the reviewer. How shrunken mitochondria and vanished cristae can be linked to ferroptosis?

      We apologize for the inaccurate description. Ferroptotic cells usually exhibit shrunken mitochondria, reduced or absent cristae, and increased membrane dentistry (Dixon et al., 2012). However, the presence of shrunken mitochondria or vanished cristae does not guarantee that ferroptosis has occurred in the cells. Other evidences, such as the increased ROS production and lipid peroxidation accumulation in cells with RUNX2 isoform II-knockdown must be evaluated as we are showing in Figure 3A and 3B. Furthermore, isoform II overexpression suppressed ROS production (Figure 3C) and lipid peroxidation (Figure 3D). We have revised our interpretation as follow: “In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). This phenomenon along with the above results of ROS production and lipid peroxidation accumulation assays suggests that RUNX2 isoform II may suppress ferroptosis.”.

      Dixon, S. J., Lemberg, K. M., Lamprecht, M. R., Skouta, R., Zaitsev, E. M., Gleason, C. E., . . . Stockwell, B. R. (2012). Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell, 149(5), 1060-1072. doi:10.1016/j.cell.2012.03.042 PMID:22632970

      The electron microscopy images show more elongated mitochondria in the RUNX2 isoform II-KO cells than in RUNX2 isoform II positive cells, which might result from the fusion of mitochondria. These images should complete with a fluorescent mitochondria staining of these cells.

      We do find that the TEM images of RUNX2 isoform II-knockdown cells show more elongated mitochondria. The mitochondria undergo cycles of fission and fusion, known as mitochondrial dynamics, which in turn leads to changes in mitochondrial length. Through examining factors related to mitochondrial dynamics, we find that isoform II knockdown could decrease the expression levels of FIS1 (Fission, Mitochondrial 1) (Figure 3-figure supplement 2B) which mediates the fission of mitochondria. Therefore, we speculate that the elongated mitochondria in the isoform II-knockdown cells may be due to the decrease in mitochondrial fission through inhibiting FIS1 expression.

      In addition, we have tried our best to perform the fluorescent staining of mitochondrial to observe mitochondrial morphology. However, due to the quality of probes and fluorescent microscope, our images of mitochondrial fluorescence were not satisfactory. So, we re-capture more electron microscopy images, measure the length of mitochondria, and perform statistical analyses. We find that isoform II-knockdown cells show significantly more mitochondrial elongation than the control cells (Author response image 1 and Figure 3-figure supplement 2A). Therefore, we believe that isoform II knockdown promotes mitochondrial elongation to be relatively reliable.

      Author response image 1.

      The new electron microscopy images in RUNX2 isoform II-knockdown cells. RSL3 (a ferroptosis activator) served as a positive control. Scale bar: 1 μm. The calculation and statistical analysis of mitochondrial elongation were added in Figure 3-figure supplement 2A.

      What is the oxygen consumption rate in RUNX2 KO cells?

      We have performed a new mitochondrial stress assay to analyze the oxygen consumption rate (OCR). We find that RUNX2 isoform II-knockdown can decrease OCR in OSCC cell line. This result has been added to Figure 3-figure supplement 3A and B. It is consistent with our observation of the damaged mitochondria morphology in the cells with RUNX2 isoform II knockdown.

      The increase in cell proliferation after RUNX2 overexpression in Figure 2A is not convincing, is there any differences in their migration or invasion capacity?

      We agree that overexpression of isoform II didn’t dramatically enhance OSCC cell proliferation. We consider that it may be due to the existing high level of isoform II in OSCC cells. We have performed wound-healing assay and transwell assay to analyze the migration or invasion capacity of cells with RUNX2 isoform II or isoform I overexpression. We find that isoform II overexpression has no effect on the migration and invasion in OSCC cells (Figure 2-figure supplement 2). This phenomenon suggests that further increasing isoform II cannot improve the migration or invasion capacity of OSCC cells. However, isoform I overexpression suppresses the migration and invasion of cancer cells (Figure 2-figure supplement 2), indicating that the upregulation of isoform I, which is downregulated in OSCC cells, may inhibit tumorigenesis. In addition, we found that the expression level of isoform I was lower in TCGA OSCC patients than that in normal controls (Figure 1D), and patients with higher isoform I showed longer overall survival (Figure 1-figure supplement 1). These results support that isoform I may inhibit tumorigenesis in OSCC cells.

      The in vivo study shows 50% reduction in primary tumor growth after RUNX2 inhibition by shRNA in CAL 27 xenografts, but only one shRNA is shown. Is this one shRNA clone? At least 2 shRNA clones should be used.

      In this vivo primary tumor growth experiment, we used a CAL 27 stable cell line transfected with an shRNA against RUNX2 isoform II (shisoform II-1). We agree that at least two shRNAs should be used. In this revision, we perform another tumor growth experiment with the CAL 27 stably transfected with another new shRNA targeting the different region in isoform II (shisoform II-2). As with the previous experiment, CAL 27 cells stably transfected with this new shRNA also showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 2-figure supplement 4A-D).

      Apoptosis and necroptosis seem to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis. This is evident from experiments in Figure 3E, F and from Figure 6E, F and Figure 3G. Either Fer-1, Z-VAD, or Nec-1 used alone, were not able to fully restore cell proliferation to control cell level, which implies an additive effect of ferroptosis, apoptosis and necrosis. The author should verify potential additive or synergistic effect of the combination of Fer-1 and Z-VAD in these assays after si-RUNX2 in Figure 3 F and G and after si-HOX assays.

      We sincerely appreciate your valuable comments. We have performed the new assay to analyze the potential additive or synergistic effect of the combination of Fer-1 and Z-VAD after RUNX2 isoform II (si-II) or HOXA10 (si-HOX) knockdown. We find that the combination of Fer-1 and Z-VAD is more effective in rescuing the cell proliferation than Fer-1 or Z-VAD alone. (Figure 3- figure supplement 6 and Figure 6- figure supplement 4).

      What is the effect of PRDX2 or HOXA10 depletion on tumor growth?

      We have performed a new xenograft tumor formation assay in nude mice to analyze the effect of PRDX2-knockdown on tumor growth. We found that CAL 27 cells stably transfected with shRNAs against PRDX2 showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D). Regarding the effect of HOXA10 depletion on tumor growth, please allow us to cite a study (Guo et al., 2018) which demonstrated that HOXA10 knockout in Fadu cells (a cell line of pharyngeal squamous cell carcinoma) could inhibit tumor growth. 

      We have added these results to the section of “RUNX2 isoform II promotes the expression of PRDX2” as follows: “In line with the inhibitory effect of isoform II-knockdown on tumor growth, CAL 27 cells stably transfected with anti-PRDX2 shRNAs showed notably reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D).”.

      Guo, L. M., Ding, G. F., Xu, W., Ge, H., Jiang, Y., Chen, X. J., & Lu, Y. (2018). MiR-135a-5p represses proliferation of HNSCC by targeting HOXA10. Cancer Biol Ther, 19(11), 973-983. doi:10.1080/15384047.2018.1450112 PMID:29580143

      What is the clinical relevance of HOXA10 in OSCC patients?

      In Figure 5-figure supplement 1B, we have showed that the expression levels of HOXA10 in TCGA OSCC patients were also significantly higher than those in normal controls. In this revision, we further find that patients with higher HOXA10 show significantly shorter overall survival in TCGA OSCC dataset (Figure 5-figure supplement 2C). In addition, we have also analyzed the expression of HOXA10 in our clinical OSCC and adjacent normal tissues, and found that HOXA10 expression level of OSCC tissues is significantly higher than that of normal controls (Figure 5-figure supplement 2A and B), which is consistent with the results from TCGA OSCC dataset.

      We have revised our writing in the result “HOXA10 is required for RUNX2 isoform II expression and cell proliferation in OSCC” as follows: “Similarly, HOXA10 expression level of our clinical OSCC tissues is significantly higher than that of adjacent normal tissues (Figure 5-figure supplement 2A and B). Moreover, TCGA OSCC patients with higher expression levels of HOXA10 showed shorter overall survival (Figure 5-figure supplement 2C).”

      Reviewing editor (Public Review):

      This paper reports the role of the Isoform II of RUNX2 in activating PRDX2 expression to suppress ferroptosis in oral squamous cell carcinoma (OSCC).

      The following major issues should be addressed.

      A major postulate of this study is the specific role of RUNX2 isoform II compared to isoform I.

      Figure 1F shows association between patient survival and Iso II expression, but nothing is shown for Iso I, this should be added, in addition the number of patients at risk in each category should be shown.

      We sincerely appreciate your valuable comments. We have added the survival curve of isoform I (exon 2.1) in the new Figure 1-figure supplement 1. In contrast to isoform II, patients with higher isoform I showed longer overall survival. The numbers of patients at risk in each category in the Figure 1F and Figure 1-figure supplement 1 are added.

      The authors test Iso I and Iso II overexpression in CAL27 or SCC-9 model cell lines. In Fig. 2A in CAL27, the overexpression of Iso II is much stronger than Iso I so it seems premature to draw any conclusions. More importantly, however, no Iso l silencing is shown in either of the cell lines nor the xenografted tumours. This is absolutely essential for the authors hypothesis and should be tested using shRNA in cells and xenografted tumours.

      Thank you for your valuable comments. We agree that the overexpression of isoform I is much stronger than isoform II in CAL 27 cells in Fig. 2A-B. We have done another repeat experiment which shows the similar overexpression of isoform II and I in Figure 2A-figure supplement 1. This repeat experiment also shows that overexpression of FLAG tagged isoform II significantly promoted the proliferation of OSCC cells. We tried our best to knockdown isoform I. However, the specific sequence of isoform I is 317 nt. We designed four anti-isoform I siRNAs, and unfortunately found that none of these siRNAs could knockdown isoform I efficiently. Please see following Author response image 2. Therefore, currently we cannot knockdown isoform I. However, we have tried the overexpression of isoform I. We find that isoform I overexpression inhibits the migration and invasion of cancer cells (Figure 2- figure supplement 2). In addition, we have shown that isoform II overexpression showed enhanced cell proliferation compared with isoform I overexpression in OSCC cells (Figure 2A). Therefore, we consider that isoform I is not essential for OSCC cell proliferation and tumorigenesis. Then, we mainly focus on isoform II in this study.  

      Author response image 2.

      The knockdown efficiency of RUNX2 isoform I (anti-isoform I, si-I-1, si-I-2, si-I-3, si-I-4) in OSCC cells were analyzed by RT-PCR, 18S rRNA served as a loading control. The sequences of siRNAs are as follows: 5’ GGCCACUUCGCUAACUUGU 3’ (si-I-1), 5’ GUUCCAAAGACUCCGGCAA 3’ (si-I-2), 5’ UGGCUGUUGUGAUGCGUAU 3’ (si-I-3), and 5’ CGGCAGUCGGCCUCAUCAA 3’ (si-I-4).

      A major conclusion of this study is that Iso II expression suppresses ferroptosis. To support this idea, the authors use the inhibitor Ferrostatin-1 (Fer -1). While Fer-1 typically does not lead to a 100% rescue, here the effect is only marginal and as shown in Figures 3F and G only marginally better than Z-VAD or Necrostatin 1. These data do not support the idea that the major cause of cell death is ferroptosis. Instead. Iso II silencing leads to cell death through different pathways. The authors should acknowledge this and rephrase the conclusion of the paper accordingly. Moreover, the authors consistently confound cell proliferation with cell death.

      We agree that RUNX2 isoform II-knockdown could also induce apoptosis. We have revised the description in the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”.

      We apologize for confusing cell proliferation with cell death. We have checked the whole manuscript and corrected the mistakes.

      In Fig. 4A the authors investigate GPX1 expression, whereas GPX4 is often the key ferroptosis regulator, this has to be tested. This is important as the authors also test the effect of the GPX4 inhibitor RSL3, however, the authors do not determine IC<sub50</sub> values of the different cell lines with or without Iso II overexpression or silencing or compared to other RSL3 sensitive or resistant cells. Without this information, no conclusions can be drawn.

      We greatly appreciated the reviewer’s comments. We have performed new experiment to analyze the effect of isoform II on GPX4 expression. We find that isoform II knockdown decreases the expression of GPX4 mRNA and protein (Figure 4-figure supplement 1A and B), and conversely isoform II overexpression promotes GPX4 expression (Figure 4-figure supplement 1C and D), which is consistent with the inhibition of ferroptosis by RUNX2 isoform II. As an upstream positive regulator of RUNX2 isoform II, HOXA10 knockdown also inhibited the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).

      We also perform new experiment to determine IC<sub50</sub> values of the cells with or without isoform II overexpression or silencing. We find that isoform II overexpression elevates the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreases the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).

      We have added the description of these experiments in Result “RUNX2 isoform II suppresses ferroptosis”, “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis through RUNX2 isoform II” as follow:

      RUNX2 isoform II suppresses ferroptosis: “Isoform II overexpression could elevate the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreased the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).”.

      RUNX2 isoform II promotes the expression of PRDX2: “Firstly, we found that RUNX2 isoform II-knockdown or overexpression could downregulate or upregulate the expression of GPX4 mRNA and protein, respectively (Figure 4-figure supplement 1A-D). In addition to the GPX4, we found that PRDX2 is the most significantly down-regulated gene upon isoform II-knockdown in CAL 27 (Figure 4A).”.

      HOXA10 inhibits ferroptosis through RUNX2 isoform II: “In addition, HOXA10-knockdown could suppress the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).”.

      In summary, while the authors show that RUNX2 Iso II expression enhances cell survival, the idea that cell death is principally via ferroptosis is not fully established by the data. The authors should modify their conclusions accordingly.

      We agree that RUNX2 isoform II could enhance cell survival via suppressing both ferroptosis and apoptosis. We have revised the description in the title and abstract as follow:

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”

    1. eLife Assessment

      This important study offers a molecular characterization of neurons and glia in the adult nervous system of the fruit fly Drosophila melanogaster. The study focuses on the progeny of a specific set of neural stem cells that contribute to the central complex, a conserved brain region that plays key roles in sensorimotor integration. The data are convincing and collected using validated methodology, generating an invaluable resource for future studies. The study will be of interest to developmental neurobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      Comments on revisions:

      The authors have addressed my recommendations.

    3. Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Comments on revisions:

      We appreciate the authors' efforts to address some of the comments. While these revisions have improved the clarity of certain sections, some of the larger concerns remain unaddressed. Specifically, the manuscript still lacks the additional analyses that would allow for more specific conclusions, rather than the general observations currently presented. Although the revisions have certainly made the text clearer, the core issue of needing more detailed analysis to draw more concrete conclusions still stands.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Thanks for the clear and accurate summary of our findings.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Thank you for the positive comments.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      We agree that we do not provide mechanistic or functional insights. Our goal was to produce hypothesis generating datasets for our lab and others to use to direct functional or mechanistic studies.

      Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      Thank you for the positive comments.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Thank you for the positive comments. We too hope that future research will benefit from this dataset.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) In Figures 1E and 4A, the T1 and T2 glia subsets reveal sub-clusters for several cell types as seen by the distribution of points on the UMAP. This observation is never validated or discussed. Do these sub-clusters represent true differences in identities or are they artifacts of the single-nucleus preparation? For Figure 1E, it is not clear whether specific sub-clusters (see Ensheathing-4 vs Ensheathing-5 and Astrocyte-2 vs. Astrocyte-6) are differentially enriched between the T1 and T2 lineages. The existence of these sub-clusters must be discussed or dismissed.  

      We agree that this needs to be addressed more clearly in the manuscript and have made text changes in the Results and Discussion sections to clarify. We note that a recent glial cell atlas (Lago-Baldaia et al., 2023: PMID: 37862379) of the developing fly VNC and optic lobes found sub-clusters that mapped to the same subtype annotations. Interestingly, Lago-Baldaia and colleagues found that the transcriptional diversity of glia cell types did not match the morphological diversity of glia validated in vivo. See text changes below.

      Lines 131-133: “Similar to a previous glial cell atlas (Lago-Baldaia et al., 2023) we found some glial subtypes (astrocytes, ensheathing, and subperineurial) mapped to multiple clusters (Figure 1E, 1F).”

      Lines 206-208: “In line with our T1+T2 atlas and previous glia cell atlas (Lago-Baldaia et al., 2023), some subtypes mapped to several subclusters including ensheathing, astrocytes, and chiasm (Figure 4A-B).”

      Lines 397-401: “Similar to a recent glial cell atlas (Lago-Baldaia et al., 2023), we found glial subtypes like astrocytes, ensheathing, and subperineurial glia mapped to several sub-clusters (Figure 1E-F). It remains unclear if these sub-clusters with the same cell type annotation represent distinct glial identities or different transcriptional states within these populations.”

      (2) The authors present evidence for sex-specific neuronal and glia subtypes and find differential expression of specific yolk proteins and long non-coding RNAs. However, whether any of these differences are driven by other canonical sex-specific genes such as Fruitless (Fru) or Double-sex (Dbx) has not been reported or discussed. The authors must re-analyze their data for these genes and claim whether they have any contribution to sex-specific sub-clusters.

      Thank you for pointing this out. We have made text changes and clarifications to highlight the expression of other canonical sex-specific genes. Fru was enriched in male nuclei as expected. Interestingly, dbx was enriched in female nuclei. It remains to be determined if these genes are mechanisms that may be driving sex-specific changes.

      Lines 224-226: “Additionally, female nuclei were enriched for dbx (Supp Table 8). Male glial nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5C; Supp Table 8) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 237-239: “Male nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5G; Supp Table 9) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 428-431:” We found the expected differential expression of yolk proteins (yp1, yp2, yp3) enriched in female nuclei and the long non-coding RNAs rox1/2 and fru enriched in male neuronal nuclei (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997; Warren et al., 1979). Interestingly, we found dbx to be enriched in both glial and neuronal female nuclei.”

      Lines 433-435: “It remains to be determined if these genes are driving these sex-specific differences in glia and neurons.”

      (3) In Figure 6C, it is unclear whether the Ms-2A-LexA-expressing neurons of clusters 157 and 160 project to two different neuropils or share projects to both neuropils. However, it is not explicitly shown in the immunostaining data whether indeed there are two populations to begin with. The authors must check for cluster 157 and 160 specific markers (such as Dh44 and ple) and test whether they appear mutually exclusively in the Ms-2A-LexA-expressing neurons. The same reasoning would apply to the data shown in Figures 6D and 6E, where the authors must test whether the NPF and AstA expressing cells are indeed neurons from clusters 100 and 128, using orthogonal cluster markers to conclude that they are similar (or the same) neurons.

      We changed the focus of the paragraph to confirm that these neurons indeed come from type II and that they target the central complex. Although due to the lack of reagents we cannot test the identity of each one of these neurons, we could make meaningful interpretations of the staining to validate our ideas about neuropeptidergic cells in the central complex. We made sure to mention the limitation of our experiment to avoid any wrong conclusions.

      Minor points

      (1) Line 115 - "cluster that represents optic lobe neurons". How was this cluster identified?

      We reexamined the most significant genes enriched in this cluster 124, and found they are Rh2, ninaC, trpl, and phototransduction related genes (Supplemental table 1). We reassigned the identity of this cluster as ocelli, which also express photoreceptor genes but can’t be easily removed during dissection. We modified the text as follows:

      "We used known markers (Croset et al., 2018; Davie et al., 2018; Supp Table 2) to identify distinct cell types in the central brain, including glia, mushroom body neurons, olfactory projection neurons, clock neurons, Poxn+ neurons, serotonergic neurons, dopaminergic neurons, octopaminergic neurons, corazonergic neurons, hemocytes, and ocelli (Figure 1B, Supp. Table 1)."

      (2) As the separation in Figure 1B is not obvious, annotated cell type clusters must be re-colored instead of being labelled as the exact dots are indistinguishable. This would especially be helpful for OCTY, SER, OPN, and CLK clusters.

      (3) Cluster labels in Figure 1C are barely visible and the font size must be increased for the reader. Recoloring the cluster identities and attaching a legend would again help in this case.

      We recolored the atlas in Figure 1B, 1C and 1C’ and increased the font size in Figure 1C’.

      (4) For Figure 4A, clusters should be labelled on the UMAP along with the legend as it is difficult for the reader to match identities using Seurat colors. The same is true for the UMAPs in Figure 5A.

      Yes, we agree that labeling would improve readability and have done so for UMAPs in Figure 4A and 5A-A’’.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of adult central brain neurons and glia Through the use of a ingenious permanent labeling technique, they are able to trace the progeny of T2 neuroblasts, which contribute significantly to the formation of the central complex. This transcriptomic dataset is the first of its kind and will likely serve as a valuable resource for future studies.

      The authors further explore this dataset through several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. However, the approach used to identify the identity of each neuron cluster could be more clearly articulated, and some of the authors' conclusions are more generalized - either already well-established or lacking sufficient support.

      Detailed comments:

      Abstract - "Our data support the hypothesis that each transcriptional cluster represents one or a few closely related neuron subtypes. - Is this a novel finding? If so, it would be helpful if the authors could explain why this is the case more clearly.

      Our results are not generally novel, and many single cell/single nuclei RNA-seq papers have been published (more citations added to Introduction). Our work is novel in that we analyze Type 1 and Type 2 neuroblasts in the central brain.

      Line 53 - In the introduction the authors should also reference other single-cell studies done in the Drosophila brain.

      Done.

      Line 59 - There are some typos here. The authors could also mention type zero.

      Both done.

      Figure 1 and Sup Table 1 - Authors show in sup table 1 the top cell markers by cluster but there is no correspondence between cluster number and identity. The authors do not say which known markers were used to give the identity to each cluster.

      We have added the cell identity in the Supplemental Table 1. For the unknown cells, we left the column blank. We have also added a Supplemental Table 2 to show the markers we used to give identity to the clusters.

      Supplementary Tables - For each table, more detailed information should be provided regarding what is being compared and the methods used for these comparisons.

      We have added the methods we used in Seurat to generate each individual table.

      Line 138 - Differential gene expression analysis between T1 and T2 glial progeny did not show differences across any glial cell types (Supp Table 4). - Was this comparison done per cluster? Is differential gene expression of top markers, which are anyway the genes that define each glial cell type, enough for this type of analysis?

      Yes, we performed the differential expression analysis using all genes (i.e., not just marker defining) at a cluster-by-cluster resolution with results in Supplemental Table 4. We have edited the text to make this clarification.

      Lines 139-141: “Differential gene expression analysis for all genes between T1 and T2 glial progeny did not show differences across any glial cell types or clusters (Supp Table 4).”

      Line 146 - We identified T1-derived neurons by excluding cells co-expressing T2-specific. Markers FLP+/GFP+/RFP+ plus repo+ glial clusters. - Bioinformatically, correct?

      Yes. We clarified the sentence as follows:

      "We identified T1-derived neurons by bioinformatically excluding cells co-expressing T2-specific markers FLP+/GFP+/RFP+ plus repo+ glial clusters."

      Line 156 - We found that each cluster strongly expressed a unique combination of genes. - As they are grouped by seurat in different clusters, why is this surprising?

      Line 175 - "top 10 significantly enriched genes gathered from each T2 neuron cluster" - can these lists be included?

      Yes they are grouped by Seurat. We toned down the sentence and refer each combination of genes as cluster markers. We modified the sentences as follows:

      Each unique combination of enriched genes could be referred to as cluster markers.

      Line 211- How did the authors identify sex-biased clusters? How did the authors separate the samples/cells by sex? Was it done bioinformatically by the expression of certain genes? If so, which?

      We collected male and female nuclei separately. We have added text in the methods section as follows:

      "Equal amounts of male and female central brains (excluding optic lobes) were dissected at room temperature within 1 hour. The samples were flash-frozen in liquid nitrogen and stored separately at -80°.

      In the first round, we pooled male and female brains together to select GFP+ nuclei and used particle-templated instant partitions to capture single nuclei to generate cDNA library (Fluent BioSciences, Waterton, MA). In the second and third rounds, RFP+ nuclei from male and female brains were collected separately. The split-pool method was then used to generate barcoded cDNA libraries from each individual nucleus."

      Are there sex-specific differences in genes in glia other than genes that were previously known to be sex-specific?

      We report the comprehensive list of sex-specific differences in gene expression for both glia and neurons in Supp tables 8 and 9.

      Line 237 - When the authors mention "We conclude that male and female adult T2 neurons have sex-specific differences in gene expression within the same neuronal subtype" does this mean that these neurons are the same in male and in female brains, but they additionally specifically express sex-specific genes?

      Yes, we report that male and females contain the same neurons defined by their transcriptional profile. It remains to be seen if this sex-specific differences changes how these same neuronal subtypes function between male and females. We have added additional text in the discussion to expand on this thought.

      Lines 437-441: “It remains to be determined if these genes are driving sex-specific differences within glial and neuronal subtypes. These genes may reflect sex-specific differences in the adult central brain and may provide insight into how behavioral circuits are linked to sex-specific behaviors. Future work should aim to characterize and test these genes.”

      Line 250 - The idea behind these sections "What is the relationship between neuropeptide expression and cluster identity?" "relation between cluster and morphology" lacks clarity. As clusters are defined based on principal component analysis, and the genes used to define a cluster are dependent on this method, there is no assumption that each cluster represents only one type of neuron or that it should include only neurons expressing the same neurotransmitter genes. Even if some clusters consist of a single neuron type, this should not be generalized to all clusters (and vice-versa).

      Correct, we cannot determine from the transcriptome data whether distinct clusters will have different morphology. We have changed the focus of the question to address that we are confirming they come from type 2 and that they target the central complex while comparing to known cells that express the neuropeptide.

      Line 265 - We first assayed the neuronal morphology of Ms+ neurons - why did the authors choose these neurons?

      Resolved in main text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/orFB".

      Line 268 - "Currently we can't determine whether Ms+ neurons in clusters 157 and 160 project to different CX neuropils, or whether neurons from both clusters share projections into both neuropils. " - The purpose of this point is unclear.

      Resolved in text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/or FB”.

      Line 279 - This analysis could be more explored.

      Thank you for your feedback. As the comment was somewhat broad, we were unsure of the specific revisions needed and have therefore left the text unchanged.

      Line 301 - The text regarding this section, and the description and details of respective figures should be proofread to ensure clarity.

      Done.

      Line 386 - Alternatively, co-expression may be due to background from RNAs released during dissociation. - RNA in soup could be bioinformatically analysed.

      Correct. We opted to delete this sentence since our split-pool based method does not create background RNA expression. Additionally, the analysis is performed on scaled expression >2, and any background RNA is unlikely to yield such high expression.

      Discussion - Some of the conclusions are a bit too general, suggesting that the results might be meaningful, but also acknowledging the possibility of artifacts. If the authors could refine this, it would strengthen the manuscript.

      We are sorry but we are uncertain what you are asking; we don't know what you want us to refine. Our apologies for the misunderstanding.

    1. eLife Assessment

      This manuscript presents an important contribution to the field of single-cell transcriptomic analysis in cancer by introducing a novel computational framework-SCellBOW-which applies embedding techniques from natural language processing to model phenotypic heterogeneity in tumors. The revised version includes new validation experiments and significant clarifications that provide convincing evidence for the method's utility. The authors have benchmarked SCellBOW across diverse datasets, including glioblastoma, breast, and metastatic prostate cancer, and have demonstrated its superior performance compared to existing state-of-the-art methods.

    2. Reviewer #2 (Public review):

      Summary:

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Weaknesses:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissues.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      This review evaluates the SCellBOW framework, which applies phenotype algebra to obtain vectors from cancer subclusters or user-defined subclusters.

      Strengths:

      SCellBOW employs an innovative application of NLP-inspired techniques to analyze scRNA-seq data, facilitating the identification and visualization of phenotypically divergent cell subpopulations. The framework demonstrates robustness in accurately representing various cell types across multiple datasets, highlighting its versatility and utility in different biological contexts. By simulating the impact of specific malignant subpopulations on disease prognosis, SCellBOW provides valuable insights into the relative risk and aggressiveness of cancer subpopulations, which is crucial for personalized therapeutic strategies. The identification of a previously unknown and aggressive AR−/NElow subpopulation in metastatic prostate cancer underscores the potential of SCellBOW in uncovering clinically significant findings.

      Major concerns:

      The reliance on bulk RNA-seq data as a reference raises concerns about potentially misleading results due to the presence of RNA expression from immune cells in the TME. It is unclear if SCellBOW adequately addresses this issue, which could affect the accuracy of the cancer subcluster vectors.

      We appreciate the reviewer's concerns. To address the concern about potentially misleading results due to the TME when using bulk RNA-seq data as a reference:

      a. We account for systematic biases between the single-cell and bulk transcriptomics readouts by creating pseudo-bulk profiles for single-cell clusters, enabling more accurate comparisons [Section Materials and methods, Data preparation for phenotype algebra].

      b. We encode expressions into word vectors and co-embed them together. By doing this, we mitigate any possibility of systematic differences in the embedding. It is imperative that we subject both single-cell and bulk data through the same treatments because otherwise, it will be difficult to perform algebraic operations on them [Section Materials and methods, Generating vectors for phenotype algebra].

      c. In our new analysis of the tumor microenvironment, we have shown that SCellBOW effectively differentiates between malignant and non-malignant cells, confirming that it is not biased by the immune cell composition in the bulk RNA-seq data [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      The method of extracting vectors in phenotype algebra appears to be a straightforward subtraction operation. This simplicity might limit its efficiency in excluding associations with phenotypes from specific subpopulations, potentially leading to inaccurate interpretations of the data.

      Thanks for this excellent query. Vector algebra operations are not done in the gene expression space (i.e., gene expression vectors associated with tumor samples), rather we process the single cell and bulk expression profiles through multiple steps (pseudo-bulk vector generation for single cell clusters, mapping gene expression values to word frequencies as better understood by the Doc2vec neural networks etc.) to ensure their embeddings are consistent and capture intricate phenotypic information. We have demonstrated this through rigorous validation of the clusters yielded on various types of healthy and diseased samples. Furthermore, we have demonstrated the consistency of the vector algebra operations on known cancer subtypes in breast cancer, glioblastoma, and prostate cancer. We have clarified this further in text. [Section Materials and methods, ‘Generating vectors for phenotype algebra’, ‘Survival risk attribution’].

      The review would benefit from additional validation studies to assess the effectiveness of SCellBOW in distinguishing between cancerous and non-cancerous signals, particularly in heterogeneous tumor environments.

      We thank the reviewer for advising this additional validation. While our study primarily focused on signals from malignant cells, we have now considered the impact of the tumor microenvironment. We observed that the predicted risk score increases when the immune component is subtracted from the tumor, suggesting that tumor aggressiveness increases in the absence of immune components. Importantly, the aggressiveness ranking of tumor subtypes (NE > ARAL > ARAH) remained consistent, confirming that SCellBOW effectively preserves subtype-specific risk stratification [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      Further clarification on how SCellBOW handles mixed-cell populations within bulk RNA-seq data would strengthen the evaluation of its applicability and reliability in diverse research settings.

      We really appreciate the reviewer’s observation. We clarify that rather than relying on absolute gene expression values, SCellBOW maps bulk RNA-seq data into an embedding space, where we extract the latent representation of the tumor. This process effectively masks the influence of mixed-cell populations, reducing biases introduced by immune or stromal components. Furthermore, phenotype algebra operates within this embedding space by comparing cosine similarities between latent representations of bulk and pseudo-bulk datasets, rather than using direct gene expression values. This allows SCellBOW to capture biologically meaningful relationships and infer tumor-specific signals effectively, even in the presence of heterogeneous cell populations. Our benchmarking across diverse cancer types confirms its effectiveness [Section Results, ‘SCellBOW enables pseudo-grading of metastatic prostate cancer tumor microenvironment’, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Reviewer #2 (Public Review):

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single-cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Major concerns:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissue

      (1) We completely agree with the reviewer’s view. Our selection of glioblastoma and breast cancer for this study was primarily driven by the focus on extensively studied and well-defined cancer types. To demonstrate the effectiveness of our model, we tested it on advanced prostate cancer, which currently lacks clinically relevant and well-recognized stratification methods. This application to metastatic prostate cancer serves as a proof of concept, illustrating our model's potential to provide valuable insights into cancer types where established stratification approaches are limited or absent.

      (2) Regarding the application of our tool to spatial transcriptomics, we have already analyzed data from Digital Spatial Profiling (DSP). The article is already quite complex and involved, and we are afraid the inclusion of spatial transcriptomics may amount to a significant extension of the method. To this end, although we will discuss the future possibilities, we will skip the method validity check on spatial transcriptomics data.

      Reviewer #2 (Recommendations For The Authors):

      (1) "SCellBOW adapts the popular document-embedding model Doc2vec for single-cell latent representation learning, which can be used for downstream analysis...": Using only simple gene frequency might overlook the dependent relationships between genes, potentially compromising the biological significance. This could be discussed further.

      This is an excellent point raised by the reviewer. We acknowledge that using only simple gene frequency may overlook dependent relationships between genes, potentially compromising biological significance. To address this, we have now compared SCellBOW on the specific task of phenotype algebra and demonstrated its effectiveness in capturing meaningful biological relationships which is overlooked by simple gene frequency. We have now added the results of this comparison and showed that gene expression data alone couldn't cut it for accurate risk stratification [Section Overall discussion, Supplementary Note 7, Supplementary Fig. 8i-k].

      (2) "While existing methods effectively reveal the subpopulations, they are insufficient in associating malignant risk with specific cellular subpopulations identified from scRNA-seq data....": Perhaps I missed it in the methods section, but how does SCellBOW compare to simply performing pseudobulk analysis on separate cell clusters, treating them as bulk RNA-seq, and then associating the signatures with disease prognosis?

      This is an insightful point, and we appreciate the opportunity to clarify it.

      (1) While pseudobulk analysis on separate cell clusters, followed by associating their signatures with disease prognosis, is a common approach, SCellBOW achieves this without requiring a priori knowledge of prognostic biomarkers to determine whether a subpopulation is aggressive.

      (2) Moreover, pseudobulk analysis aggregates gene expression across cells, which can potentially mask intra-cluster heterogeneity, thereby obscuring important signatures associated with disease prognosis. In contrast, the latent representation in SCellBOW captures the semantic meaning of disease aggressiveness, allowing for a more nuanced and biologically meaningful risk assessment.

      (3) "The proposed approach, SCellBOW, can effectively capture the heterogeneity and risk associated with each phenotype, enabling the identification and assessment of malignant cell subtypes in tumors directly from scRNA-seq gene expression profiles, thereby eliminating the need for marker genes...": Have the author compared the resulting group with well-known markers and do they overlap?

      We appreciate this thoughtful question. While SCellBOW does not rely on predefined marker genes for clustering or risk stratification, we have systematically evaluated whether the resulting subpopulations align with well-known markers. To assess this, we compared SCellBOW-derived clusters with established marker-based annotations across multiple datasets. We observed a significant overlap between SCellBOW clusters and canonical marker-defined cell types in various cancers, including GBM, BRCA, and mCRPC.

      (4) "We constructed three use cases leveraging publicly available scRNA-seq datasets...": The three training and testing datasets are all from healthy tissue. How about in tumor tissue? i.e., Could SCellBOW also identify better cell clusters in tumor datasets?

      We appreciate the reviewer’s inquiry. For benchmarking and method validation, we primarily selected normal tissue datasets as they are heavily annotated and well-characterized. Our goal was to extensively evaluate SCellBOW across different clustering metrics, including ARI, NMI, and SI, which required datasets with reliable ground truth. Tumor datasets, in contrast, often lack confirmatory ground truth, making direct benchmarking more challenging. However, to assess SCellBOW’s applicability in tumor settings, we performed downstream analyses on tumor scRNA-seq datasets using phenotype algebra. Our results demonstrate that SCellBOW effectively identifies distinct cell clusters, including malignant and non-malignant populations, reinforcing its applicability in tumor settings [Section Results, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Minor issues:

      (1) Labels of subplots within the manu/figure should be revised to ensure correct order (missing Figures 3a-d, 4b before 4a, etc).

      We thank the reviewer for pointing this out. We have corrected the figure labels and ensured that all subplots follow the correct order, aligning with the manuscript.

      (2) "reaffirmed the clinically known aggressiveness order, i.e., CLA >-MES >-PRO, where CLA succeeds the rest of the subtypes in aggressiveness48 (Figures 4c, d)...": "Fig. 4c, d" should be "Fig. 4e, f". Also please put Figure 4a before 4b. Overall the order of Figure 4 needs to be revised to match the order in the manu. Similar to Figure 6.

      We have corrected the figure reference to Fig. 4e, f and revised the order of Figure 4 to maintain consistency with the manuscript.

      (3) "Our results showed that SCellBOW learned latent representation of single-cells accurately captures the 'semantics' associated with cellular phenotypes and allows algebraic operations such as'+' and'-'." Figure 5f (SCellBOW performances on mCRPC) should also be cited here since Supplementary Figure 6 contains three datasets (GBM, BRCA, mCRPC) while in Figure 4 only GBM and BRCA were shown?

      We thank the reviewer for this suggestion. We have now cited Figure 5f in this section to ensure that all datasets, including mCRPC, are appropriately referenced.

      (4) Under the subheading "SCellBOW facilitates survival-risk attribution of tumor subpopulations", the lines start with "We refer to this as phenotype algebra. We utilized this ability to find an association between the embedding vectors, representing total tumor - a specific malignant cell cluster with tumor aggressiveness..." could be reduced a little bit especially the re-intro of phenotype algebra since the author has already discussed previously (under "overview of SCellBOW").

      We appreciate the feedback and have condensed this section to avoid redundancy while maintaining clarity in connecting phenotype algebra to survival-risk attribution.

      (5) "Most CD4+ T cells map to CL0 and CL9 (here, CL is used as an abbreviation for cluster) (Figure 3f)..." "(here, CL is used as an abbreviation for cluster)" this note could be moved forward to SF2 since CL is first introduced in SF2.

      We thank the reviewer for the suggestion. We have moved the definition of CL (cluster) to Supplementary Figure 2 (SF2), where it is first introduced, for improved clarity.

    1. eLife Assessment

      The submission by Praveen and colleagues reports important findings describing the structure of genetic and colour variation in its native range for the globally invasive weed Lantana camara. Whilst the importance of the research question and the scale of the sampling is appreciated, the analysis, which is currently incomplete, requires further tests to support the claims made by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

    3. Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses. Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation. Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested. Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

    4. Author response:

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have addressed several of the concerns in the responses below and are currently working on additional analyses to further strengthen the study. These results will be incorporated into the final version of the research paper.

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      While a recent bottleneck may have increased inbreeding, the strong and consistent genetic structuring we observe within populations is more indicative of predominant self-fertilization. To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location—similar to what we observed in Lantana camara—can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation.

      We thank the reviewer for the suggestion regarding the use of simulations based on genomic data from Lantana and for explaining the importance of it. We are currently conducting demographic simulations using genomic data from Lantana to estimate divergence times between the different flower colour variants. We believe this analysis will offer deeper insights and provide further clarity on the points raised by the reviewers.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Hardy-Weinberg Equilibrium (HWE) filtering is a commonly used step in SNP filtering analysis to exclude loci potentially under selection, thereby enriching for neutral variants and minimizing bias in downstream analyses. To ensure that our results are not influenced by selection-driven SNPs, we conducted the analysis both with and without applying the HWE filter. Notably, the number of SNPs retained did not drop significantly after filtering, and the overall patterns observed remained consistent across both approaches.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      The aim of the SLiM simulation was to demonstrate that the extreme genetic structuring observed in Lantana camara can plausibly arise in natural systems under predominant self-fertilization. For the simulation, we used mutation and recombination rates estimated for Arabidopsis thaliana, as these parameters are currently unknown for Lantana. The details of this will be added in the revised version, and thanks to the reviewer for pointing this out. While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilization alone. The impact of the selfing on the mutation rate is not incorporated in the simulations now. We will look into the details of this.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We recognize that one of the key improvements needed for the manuscript is to provide experimental evidence supporting self-fertilization. To address this, we conducted a bagging experiment on Lantana camara inflorescences to prevent insect visitation and eliminate insect-mediated cross-fertilization. The results showed no significant difference in seed set between bagged and open-pollinated inflorescences, indicating that Lantana is predominantly self-fertilizing in India. This finding is consistent with our genetic data and will be included in the revised version of the manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The different flower colour variants are visually distinguishable. Our classification of these variants is not based on the colour of individual flowers at a single time point, but rather on the overall colour change pattern across the inflorescence over time. In other words, the temporal aspect of colour change has been considered in our grouping. For example, in the “yellow-pink” variant, flowers begin as yellow when young and gradually turn pink as they age. Importantly, variants that follow this pattern do not transition to an orange type at any stage, which distinguishes them from other colour types. The varieties that don't change colours are named based on the single flower colour like “orange”.

    1. eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. In this latest version, the authors addressed the benchmarking of the novel method, but the absence of quantitative comparisons to state-of-the-art methods still make this study incomplete. Based on the shown validation approaches, one can neither ultimately judge if the shown method will be an advance over previous work nor whether the approach will be of general useful applicability.

    2. Reviewer #1 (Public review):

      The authors present tviblindi, an algorithm to infer cell development trajectories from single-cell molecular data. The paper is well-written and the algorithm is conceptually interesting. However, the validation is incomplete as the comparison against existing trajectory inference methods is weak: although the lack of a proper benchmark was pointed out as the main weakness of the original version of the manuscript, the revised version still only contains qualitative comparisons against state-of-the-art methods.

      Both me and Reviewer 2 pointed out that the lack of a proper benchmark against state-of-the-art methods on a wider variety of datasets (including scRNA-seq data) was a major weakness of the original version of the manuscript. In response to this criticism, the authors now did the following:

      - They ran various competitor methods on the datasets that were used already for the previous version of the manuscript.<br /> - They ran tviblindi and two of the competitors on two public scRNA-seq datasets.<br /> - For all datasets, they qualitatively assessed the trajectories computed by tviblindi and its competitors and argued that tviblindi's trajectories better reflect the biological signal in the data.<br /> - The results of all of these additional analyses are reported in the supplement, which has now become very lengthy (88 pages).

      In my opinion, this is insufficient to establish that tviblindi is comparable or even superior to the state of the art in the field. To show that this is the case, the authors would have to carry out a systematic benchmark study which relies on quantitative evaluation metrics rather than on qualitative intepretations of trajectories. As method developers, we are all susceptive to confirmation bias when comparing our new algorithms to the state of the art. To avoid this pitfall, reporting quantitative performance metrics is required. At the moment, the only quantitative metric reported by the authors is runtime, which is insufficient.

      Moreover, the results of a benchmark study should be reported in the main manuscript, not in the supplement. When presenting a new algorithm in a field as crowded as trajectory inference, a benchmark against the state of the art serves to establish trust in the new algorithm and to provide the readers with a rationale to use it for their research. For this, the results of the benchmark have to be presented prominently and should not be hidden in the supplement.

      A second major criticism raised in Reviewer 2's review of the original version of the manuscript is that tviblindi invites cherry picking due to its inherently interactive design. In response to this, the authors now argue at length that "the data-driven expert interpretation approach of tviblindi" (quote from Section 2.2.2) is a strength rather than a weakness. If we concede for the sake of the argument that tviblindi's "expert interpretation approach" is indeed a strength of the method (although I tend to agree with Reviewer 2 that it is rather a limitation), usability for biologists becomes critical. However, given the current implementation of tviblindi, its usability is far from optimal. The authors do not provide tviblindi as a web interface that is directly usable for domain experts without programming experience and not even as a package that is installable via some widely used package manager such as conda. Instead, they implemented tviblindi as an R package with a Shiny GUI that can either run in a Docker container or requires the installation of several dependencies. I therefore strongly doubt that many biologists will be able or willing to run tviblindi, which substantially limits the value of its "expert interpretation approach". Moreover, tviblindi does not support Apple silicon, which prevented also myself from testing the tool.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. Both benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in the Supplementary Note, although this was not sufficiently highlighted in the main text, which has now been improved.

      Our manuscript contains benchmarking against a challenging synthetic dataset in Figure 1; furthermore, both the synthetic dataset and the real-world thymus dataset have been analyzed in parallel using currently available TI tools (as detailed in the Supplementary Note). z other single-cell datasets (single-cell RNA-seq) were added in response to the reviewers' comments.

      One of the reviewers correctly points out that tviblindi goes against the philosophy of automated trajectory inference. This is correct; we believe that a new class of methods, complementary to fully automated approaches, is needed to explore datasets with unknown biology. tviblindi is meant to be a representative of this class of methods—a semi-automated framework that builds on features inferred from the data in an unbiased and mathematically well-founded fashion (pseudotime, homology classes, suitable low-dimensional representation), which can be used in concert with expert knowledge to generate hypotheses about the underlying dynamics at an appropriate level of detail for the particular trajectory or biological process.

      We would also like to mention that the algorithm and the workflow are not the sole results of the paper. We have thoroughly characterized human thymocyte development, where, in addition to expected biological endpoints, we found and characterized an unexpected activated thymic T-reg endpoint.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021), StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Also, in the meantime we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (Bakardjieva M, et al. Tviblindi algorithm identifies branching developmental trajectories of human B-cell development and describes abnormalities in RAG-1 and WAS patients. Eur J Immunol. 2024 Dec;54(12):e2451004. doi: 10.1002/eji.202451004.).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We improved the Results text to better point the reader to the mathematical foundations in the Supplementary Note.  

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data. Furthermore we successfully used tviblindi to investigate bone marrow atlas scRNA-Seq dataset Zhang et al. (2024) and atlas of mouse gastrulation Pijuan-Sala et al. (2019). The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We have emphasized this in the revised version and added the results of the corresponding analysis (see Supplementary note, section 9).

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We have expanded the “sensitivity to hyperparameters” section 8.1 also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we have accommodated, we responded point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models. 

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by a bioinformatician, who knew nothing about the presence of beta-selection in the data.  

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of a structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default). In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We have improved the discussion of the robustness in the current version.  

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We have added this analysis to the study (Supplementary note, section 9).

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We have added a corresponding comment into the Supplementary note.  

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we have accommodated it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms. 

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses:

      -  Benchmark against existing trajectory inference methods.

      -  Benchmark on scRNA-seq data or an explicit statement that, unlike existing methods, tviblindi is not designed for such data.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      -  Systematic evaluation of the effetcs of hyper-parameters on the performance of tviblindi (as mentioned above, there is at least one hyper-parameter, the number k to construct the k-NN graphs).

      This is described in Supplementary Note section 8.1

      Recommendations for improving the writing and presentation:

      -  The GitHub link to the algorithm which is currently hidden in the Methods should be moved to the abstract and/or a dedicated section on code availability.

      -  The presentation of the persistent homology approach used for random walk clustering should be improved (see public comment above).

      This is described extensively in Supplementary Note  

      -  A very minor point (can be ignored by the authors): consider renaming the algorithm. At least for me, it's extremely difficult to remember.

      We choose to keep the original name

      Minor corrections to the text and figures:

      -  Labels and legend texts are too small in almost all figures.

      Reviewer #2 (Recommendations For The Authors):  

      (1) On page 3: "(2) Analysis is performed in the original high-dimensional space avoiding artifacts of dimensionality reduction." In mass cytometry data where there is no issue of dropouts, one may choose proteins such that they are not correlated with each other making dimensionality reduction techniques less relevant. But in the context of an unbiased assays such as single-cell RNA-sequencing (scRNA-seq), one measures all the genes in a cell so dimensionality reduction can help resolve the redundancy in the feature space due to correlated/co-regulated gene expression patterns. This assumption forms the basis of most methods in scRNA-seq. More importantly, in scRNA-seq data the dropouts and ambient molecules in mRNA counts result in so much noise that modeling cells in the full gene expression is highly problematic. So the authors are requested to discuss in detail how they would propose to deal with noise in scRNA-seq data.

      On this note, the authors mention in Supplementary Note 9 (Analysis of human thymus single-cell RNA-seq data): "Imputed data are used as the input for the trajectory inference, scaled counts (no imputation) are shown in line plots". The line plots indicate the gene expression trends along the obtained pseudotime. The authors use MAGIC to impute the data, and we request the authors to mention this in the Methods section (currently one must look through the code on Supplementary Note 1.3 to find this). Data imputation in single-cell RNA-seq data are intended to enable quantification of individual gene expression distribution or pairwise gene associations. But when all the genes in an imputed data are used for visualization, clustering or trajectory inference, the averaging effect will compound and result in severely smoothed data that misses important differences between cell states. Especially, in the case of MAGIC, which uses a transition matrix raised to a power, it is over-smoothing of the data to use a transition matrix smoothed data to obtain another transition matrix to calculate the hitting time (or simulate random walks). Second, the authors' proposal to use scaled counts to study gene trends cannot be generalized to other settings due to drop out issue. Given the few genes (and only one branch) that are highlighted in Figure 7D-G and Figure 31 in Supplementary Note, it is hard to say if scaling raw values would pick up meaningful biology robustly here for other branches.

      We recommend that this data be reanalyzed with non-imputed data used for trajectory inference and imputed gene expression used for line plots.

      As stated above in the public review, we reanalyzed the scRNA Seq data using a more standard approach (first 50 principal components). We have also analyzed two additional scRNA Seq datasets (Section 1 and section 10 of Supplementary Note)

      On the same note, the authors use Seurat's CellCycleScoring to obtain the cell cycle phase of each cell and later use ScaleData to regress them out. While we agree that it is valuable to remove cell cycle effect from the data for trajectory inference (and has been used previously in other methods), the regression approach employed in Seurat's ScaleData is not appropriate. It is an aggressive approach that severely changes expression pattern of many genes and can result in new artifacts (false positives) in the data. We recommend the authors to explore this more and consider using a more principled alternatives such as fscLVM (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1334-8). 

      Cell cycle correction is an open problem (Heumos, Nat Rev Genetics, 2023)

      Here we use an (arguably aggressive) approach to make the presentation more straightforward. The cells we are interested here (end #6) are not dividing and the regression does not change the conclusion drawn in the paper

      (2) The figures provided are extremely low in resolution that it is practically impossible to correctly interpret a lot of the conclusion and references made in the figure (especially Figure 3 in the main text).

      Resolution of the Figures was improved

      (3) There are many aspects of the method that enable easy user biases and can lead to substantial overfitting of the data.

      a. On page 7: "The topology of the point cloud representing human T-cell development is more complex ... and does not offer a clear cutoff for the choice of significant sparse regions. Interactive selection allows the user to vary the resolution and to investigate specific sparse regions in the data iteratively." This implies that the method enables user biases to be introduced into the data analysis. While perhaps useful for exploration, quantitative trajectory assessment using such approach can be faulty when the user (A) may not know the underlying dynamics (B) forces preconceived notion of trajectory.

      The authors should consider making the trajectory inference approach less dependent on interactive user input and show that the trajectory results are robust to any choices the user may make. It may also help if the authors provide an effective guide and mention clearly what issues could result due to the use of such thresholds.

      As explained in the response in public reviews, tviblindi is not designed as a fully automated TI tool, but as a data driven framework for exploratory analysis of unknown data. 

      There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.  To specifically address the points raised by the reviewer:

      “(A) may not know the underlying dynamics” - tviblindi is designed to perform exploratory analysis of the unknown underlying dynamics. We showcase in the study how this can be performed and we highlight possible cases which can be resolved expertly (spurious connections (doublets), different scales of resolution (beta selection)). Crucially, compared to other TI methods, tviblindi offers a clear mechanism on how to discover, focus and resolve these issues which would (and do) contaminate the trajectories discovered fully automatically by tested methods (cf. the beta selection, or the development of plasmacytoid dendritic cells (PDCs) (Supplementary note, section 10.1).

      “(B) forces preconceived notion of trajectory” - user interaction in tviblindi does not force a preconceived notion of the trajectory. The random walks are simulated before the interactive step in an unbiased manner. During the interactive step the user adjusts trajectory specific resolution - incorrect choice of the resolution may result in either merging distinct trajectories into one or over separating the trajectories (which is arguably much less serious). However the interactive step is designed to deal with exactly this kind of challenge. We showcase (e.g. beta selection, or PDCs development) how to address the issue - tviblindi allows us to investigate deeper structure in any considered trajectory.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools. It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner, including pseudotime, homology classes, and appropriate low-dimensional representations. These can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      b. In Figure 4, the authors discuss the trajectory of cells emanating from CD3 negative double positive stage and entering apoptotic phase and mention tviblindi may give "the false impression that cells may pass through an apoptotic phase into a later developmental stage" and propose that the interactive version of tviblindi can help user zoom into (increase resolution) this phenomenon and identify that there are in fact two trajectories in one. Given this, how do the other trajectories in the data change if a user manually adjusts the resolution? A quantification of the robustness is important. Also, it appears that a more careful data clean up could avoid such pitfalls where the algorithm infers trajectory based on mixed phenotype and the user would not have to manually adjust the resolution to obtain clear biological conclusion. We not that the original publication of this data did such "data clean up" using simple diffusion map based dimensionality reduction which the authors boast they avoid. There is a reason for this dimensionality reduction (distinguishing signal from noise), even in CyTOF data, let alone its importance in single cell data.

      The reviewer is concerned about two different, but intertwined issues we wish to untangle here. First, data clean-up is typically done on the premise that dead cells are irrelevant and they are a source of false signals. In the case of the thymocytes in the human thymus this premise is not true. Apoptotic cells are a legitimate (actually dominant) fate of the development and thus need to be represented in the TI dataset. Their biological behavior is however complex as they stop expressing proteins and thus lose their surface markers gradually, as dictated by the particular protein degradation kinetics. So can we clean up dead and dying cells better? Yes, but we don't want to do it since we would lose cells we want to analyze. Second, do trajectories change when we zoom into the data? No, only the level of detail presented visually changes. Since we calculate 5000 trajectories in the dataset, we need to aggregate them already for the hierarchical clustering visualization. Note that Figure 4, panel A highlights 159 trajectories selected in V. group. Zooming in means that the hierarchy of trajectories within V. group is revealed (panel D, groups V.a and Vb.) and can be interpreted on the vaevictis and lineplot graphs (panel E, F). 

      c. In the discussion, the authors write "[tviblindi] allows the selection and grouping of similar random walks into trajectories based on visual interaction with the data". This counters the idea of automated trajectory inference and can lead to severe overfitting.

      As explained in reply to Q3, our aim was NOT to create a fully automated trajectory inference tool. Even more, in our experience we realized that all current tools are taking this fully  automated approach with a search for an “ideal” set of hyperparameters. This, in our experience,  leads to a “blackbox” tool that is difficult to interpret for the expert in the biological field. To respond to this need we designed a modular approach where the results of the TI are presented and the expert can interact with them to focus the visualization and to derive interpretation. Our interactive concept is based on 15 years of experience with the data analysis in flow cytometry, where neither manual gating nor full automation is the ultimate solution but smart integration of both approaches eventually wins the game.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools.  It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner. These features include pseudotime, homology classes, and appropriate low-dimensional representations. These features can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      d. The authors provide some comment on the robustness to the relaxation parameter for witness complex construction in Supplementary Note Section 8.1.2 but it is limited given the importance of this parameter and a more thorough investigation is recommended. We request the authors to provide concrete examples with figures of how changing alpha2 parameter leads to simplicial complexes of different sizes and an assessment of contexts in which the parameter is robust and when not (in both simulated and publicly available real data). Of note, giving the users a proper guide for parameter choice based on these examples and offering them ways to quantify robustness of their results may also be valuable.

      Section 8 in Supplementary Note was extended as requested.

      e. The authors are requested for an assessment of possible short-circuits (e.g. cells of two distantly related phenotypes that get connected erroneously in the trajectory) in the data, and how their approach based on persistent homology deals with it.

      If a short circuit results in a (spurious) alternative trajectory, the persistent homology approach allows us to distinguish it from genuine trajectories that do not follow the short circuit. This prevents contamination of the inferred evolution by erroneous connections. The ability to distinguish and separate distinct trajectories with the same fate is a major strength of this approach (e.g., the trajectory through doublets or the trajectories around checkpoints in thymocytes’ evolution).

      (4) The authors propose vaevictis as a new visualization tool and show its performance compared to the standard UMAP algorithm on a simulated data set (Figure 1 in Supplementary Notes). We recommend a more comprehensive comparison between the two algorithms on a wide array of publicly available single-cell datasets. As well as comparison to other popular dimensionality reduction approaches like force directed layouts, which are the most widely used tool specifically to visualize trajectories.

      We added Section 10 to Supplementary Note that presents multiple comparisons of this kind. It is important to note that tviblindi works independently of visualization and any preferred visualization can be used in the interactive phase (multiple visualisation methods are implemented).

      (5) In Supplementary Note 8.2, the authors compare tviblindi against the other methods. We recommend the authors to quantify the comparison or expand on their assesments in real biological data. For example, in comparison against Palantir and VIA the authors mention "... discovers candidate endpoints in the biological dataset but lacks toolbox to interrogate subtle features such as complex branching" and "fails to discover subtle features (such as Beta selection)" respectively. We recommend the authors to make these comparisons more precise or provide quantification. While the added benefit of interactive sessions of tviblindi may make it more user friendly, the way tviblindi appears to enable analysis of subtle features (e.g. Figure 1H) should be possible in Palantir or VIA as well.

      We extended the comparisons and presented them in Section 8 and 10 in Supplementary Note.  

      (6) The notion of using random walk simulations to identify terminal (and initial states) has been previously used in single-cell data (CellRank algorithm: https://www.nature.com/articles/s41592-021-01346-6). We request the authors to compare their approach to CellRank.

      We compared our algorithm to the CellRank successor CellRank 2 (see section 8.2, Supplementary Note)

      (7) The notion of using persistent homology to discover trajectories has been previously used in single cell data https://pubmed.ncbi.nlm.nih.gov/28459448/. we request a comparison to this approach

      The proposed algorithm was not able to accommodate the large datasets we used.

      scTDA (Rizvi, Camara et al. Nat. Biotechnol. 2017) has not been updated for 6 years. It is not suited for complex atlas-sized datasets both in terms of performance and utility, with its limited visualization tools. It also lacks capabilities to analyze individual trajectories.

      (8) In Figure 3B, the authors visualize the endpoints and simulated random walks using the connectome. There is no edge from start to the apoptotic cells here. It is not clear why? If they are not relevant based on random walks, can the user remove them from analysis? Same for the small group of pink cells below initial point.

      The connectome is a fully automated approach (similar to PAGA) which gives a basic overview of the data. It is not expected to be able to compete with the interactive pipeline of tviblindi for the same reasons as the fully automated methods (difficult to predict the effect of hyperparameters).

      (9) In Supplementary Figure 3, in relation to "Variants of trajectories including selection processes" the author mention that there is a spurious connection between CD4 single positive, and the doublet set of cells. The authors mention that the presence of dividing cells makes it difficult to remove the doublets. We request the authors to discuss why. For example, the authors seem to have cell cycle markers (e.g. Ki67, pH3, Cyclin) and one would think that coupled with DNA intercalator 191/193lr one could further clean-up the data. Can the authors employ alternative toolkits such as doublet detection methods?

      To address this issue, we do remove doublets with illegitimate cell barcodes (e.g. we remove any two cells from two samples with different barcode which present with double barcode). Although there are computational doublet removal approaches for mass cytometry (Bagwell, Cytometry A 2020), mostly applied to peripheral blood samples (where cell division is not present under steady state immune system conditions), these are however not well suited for situations where dividing samples occur (Rybakowska P, Comput Struct Biotechnol J. 2021), which is the case of our thymocyte samples. Furthermore, there are other situations where doublet formation is not an accident, but rather a biological response (Burel JG, Cytometry A (2020). Thus, the doublet cell problem is similar to the apoptotic cell problem discussed earlier.

      We could remove cells with the double DNA signal, but this would remove not only accidental doublets but also the legitimate (dividing) cells. So the question is how to remove the illegitimate doublets but not the legitimate?

      Of note, the trajectory going through doublets does not affect the interpretation of other trajectories as it is readily discriminated by persistent homology and thus random walks passing through this (spurious) trajectory do not contaminate the markers’ evolution inferred for legitimate trajectories.

      We therefore prefer to remove only the barcode illegitimate and keep all others in analysis, using the expert analysis step also to identify (using the cell cycle markers plus other features) the artificially formed doublets and thus spurious connections.

      (10) The authors should discuss how the gene expression trend plots are made (e.g. how are the expression averaged? Rolling mean?).

      The development of those markers is shown as a line plot connecting the average values of a specific marker within a pseudotime segment. By default, the pseudotime values are divided into uniform segments (each containing the same number of points) whose number can be changed in the GUI. To focus on either early or late stages of the development, the segment division can be adjusted in GUI. See section 6 of the Supplementary Note.

      Reviewer #3 (Recommendations For The Authors):

      The overall figures quality needs to be improved. For example, I can barely see the text in Figure 3c.

      Resolution of the Figures was improved

    1. eLife Assessment

      This study provides a valuable and comprehensive dataset on transcription factor binding in Pseudomonas aeruginosa, along with analyses of its regulatory network, key virulence and metabolic regulators, and a pangenomic examination of transcription factors. Utilizing large-scale ChIP-seq and multi-omics integration, the research convincingly supports the hierarchical regulatory structures and offers insights into virulence mechanisms. While further experimental validation is needed, this publicly accessible PATF_Net database enhances its utility for researchers investigating this significant pathogen associated with hospital infections and antibiotic resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain underevaluated. But this could be improved in the future.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

    3. Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Weaknesses:

      Drawbacks of the study include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions, 2) limited practical value of the presented TRN topological analysis, and 3) lack of independent experimental validation of the proposed master regulators of virulence and metabolism.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work done by Huang et.al. revealed the complex regulatory functions and transcription network of 172 unknown transcription factors of Pseudomonas aeruginosa PAO1. The authors utilized ChIP-seq to profile TFs binding site information across the genome, demonstrating diverse regulatory relationships among them via hierarchical networks with three levels. They further constructed thirteen ternary regulatory motifs in small subs and co-association atlas with 7 core associated clusters. The study also uncovered 24 virulence-related master regulators. The pan-genome analysis uncovered both the conservation and evolution of TFs with P. aeruginosa complex and related species. Furthermore, they established a web-based database combining both existing and novel data from HT-SELEX and ChIP-seq to provide TF binding site information. This study offered valuable insights into studying transcription regulatory networks in P. aeruginosa and other microbes.

      Strengths:

      The results are presented with clarity, supported by well-organized figures and tables that not only illustrate the study's findings but also enhance the understanding of complex data patterns.

      Thank you for your valuable feedback on our paper exploring the transcription regulatory networks in P. aeruginosa.

      Weaknesses:

      The results of this manuscript are mainly presented in systematic figures and tables. Some of the results need to be discussed as an illustration how readers can utilize these datasets.

      We appreciate the valuable suggestion about enhancing the practical aspects of our manuscript. We have expanded the discussion section to include more detailed explanations of how these datasets can be utilized in practical applications. 

      Reviewer #2 (Public review):

      In this work, the authors comprehensively describe the transcriptional regulatory network of Pseudomonas aeruginosa through the analysis of transcription factor binding characteristics. They reveal the hierarchical structure of the network through ChIP-seq, categorizing transcription factors into top-, middle-, and bottom-level, and reveal a diverse set of relationships among the transcription factors. Additionally, the authors conduct a pangenome analysis across the Pseudomonas aeruginosa species complex as well as other species to study the evolution of transcription factors. Moreover, the authors present a database with new and existing data to enable the storage and search of transcription factor binding sites. The findings of this study broaden our knowledge on the transcriptome of P. aeruginosa. This study sheds light on the complex interconnections between various cellular functions that contribute to the pathogenicity of P. aeruginosa, along with the associated regulatory mechanisms. Certain findings, such as the regulatory tendencies of DNA-binding domain-types, provides valuable insights on the possible functions of uncharacterized transcription factors and new functions of those that have already been characterized. The techniques used hold great potential for discovery of transcription factor functions in understudied organisms as well.

      The study would benefit from a more clear discussion on the implications of various findings, such as binding preferences, regulatory preferences, and the link between regulatory crosstalk and virulence. Additionally, the pangenome analysis would be furthered through a discussion of the divergence of the transcription factors of P. aeruginosa PAO1 across species in relation to the findings on the hierarchical structure of the transcriptional regulatory network.

      Thank you for your positive feedback and suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      (1) It appears that many TFs are conserved among bacteria, archaebacteria, fungi, plants, and animals. Does this mean these TFs in bacterial could be the ancestors of TFs in fungi, plants, and animals? If we fetch these TFs out and build an evolutionary tree, can we visual the three kingdoms as well?

      Thank you for this comment. While many TFs are conserved across bacteria, archaea, fungi, plants, and animals, this conservation does not necessarily imply a direct ancestral relationship. Instead, it may reflect the fundamental importance of certain domains and regulatory mechanisms, which could have arisen from a common ancestral system or through convergent evolution. If we fetch TF PA2032 out to build an evolutionary tree by setting PAO1 as the root, we can visualize these kingdoms in a tree. We added this content in the revised manuscript. Please see Figure S7D and Lines 404-411.

      “The phylogenetic tree of PA2032 across bacteria, archaea, fungi, plants, and animals, with PAO1 as the root revealed that the bacterial TFs (purple) indicates a high degree of conservation within prokaryotes, suggesting a fundamental role in core regulatory processes. In contrast, eukaryotic TFs (fungi, plants, and animals) form distinct clades with longer branch lengths, indicating significant divergence and specialization during eukaryotic evolution. These findings suggest that while TF is conserved across domains of life, its functional roles and regulatory mechanisms have undergone substantial diversification in eukaryotes.”

      (2) Can the authors give an indication how could we employ the findings of this study in designing next generation of antimicrobial agents?

      Thank you for this important suggestion. We have provided this content in the discussion part. Please see Lines 481-492.

      “The extensive datasets generated in this study offer valuable insights into understanding and targeting P. aeruginosa pathogenicity. The genome-wide binding profiles can be systematically analyzed through our hierarchical regulatory network framework to decode complex virulence mechanisms. The virulence-related master regulators and core regulatory clusters identified in this study highlighted key nodes of transcriptional control. Understanding these regulatory relationships is particularly valuable for identifying targets whose modulation would significantly impact virulence while accounting for potential compensatory mechanisms. This knowledge base thus provides a foundation for developing targeted approaches to combat P. aeruginosa infections, moving beyond traditional antibiotic strategies toward more sophisticated interventions based on regulatory network manipulation.”

      Minor:

      (1) Lines 178-180: It would strengthen the discussion to include a few additional references that support the claims made in this section, providing a more comprehensive context for the readers.

      Yes. We have added more citations(1-5) (No. 1-5 in the references at the end of the rebuttal) to support the claims. Please see Line 182.

      (2) Line 198: You mention 'seven' motifs containing toggle switches, but Fig.3 actually displays eight motifs. Please revise this discrepancy to ensure consistency between the text and the figure.

      Yes. We have revised the wording to “eight”. Please see Line 200.

      (3) Figure 3A: Consider adding a diagram or legend that represents the colors associated with each DNA-binding domain (DBD) family.

      Thank you for your suggestion. The colors of DBD were aligned with the legend in Figure S3. We have added it in Figure 3A.

      Reviewer #2 (Recommendations for the authors):

      Line 21: The use of the abbreviation 'TF' should be done at the first instance of 'transcription factor'.

      Yes. We have revised it. Please see Line 21.

      Line 74: The purpose of this paragraph is slightly unclear. It is recommended that appropriate modifications are made.

      We are sorry for the confusion. The purpose of this paragraph was to introduce the major virulence pathways in P. aeruginosa and mention the important role of TRN in these pathways. We have modified it to make it clearer. Please see Lines 74-75.

      “P. aeruginosa employs diverse virulence pathways to establish successful infection, with QS being one of the major mechanisms involving the expression of many virulence genes.”

      Line 113: How were these 172 TFs selected?

      Thank you for indicating this question. In a previous study, we performed HT-SELEX to characterize the DNA-binding motifs of all TFs in P. aeruginosa PAO1, successfully identifying binding sequences for 182 TFs. To further elucidate the binding landscapes of the rest, we performed ChIP-seq on the remaining TFs (172 TFs in total with high-quality ChIP-seq libraries). Please see Lines 100-101 in the revised manuscript.

      Line 119: Defining other features, namely downstream and include Feature, would be helpful.

      Thank you for your suggestion. We have added the definition for all peak annotation in the legend. Please see Lines 569-574.

      “Annotation heatmap of all peak distribution with 6 locations: Upstream, where the peak is located entirely upstream of the gene; Downstream, where the peak is positioned completely downstream of the gene; Inside, where the peak is entirely contained within the gene body; OverlapStart, where the peak overlaps with the 5' end of the gene; OverlapEnd, where the peak overlaps with the 3' end of the gene; and IncludeFeature, where the peak completely encompasses the gene.”

      Line 129: The distribution type of AraC-type TFs is unclear - it is mentioned that AraC has a 'broad distribution', but it is later stated that it has a 'narrow distribution'.

      We are sorry for this mistake, and we have revised the example for “broad distribution”, which is Cor_CI instead of AraC. Please see Lines 132-135.

      Line 161: 'h value' here may need to be modified to 'absolute h value'.

      Yes. We have revised it. Please see Line 164.

      Line 502: "s The DNA" needs to be corrected.

      Yes. We have revised it. Please see Line 514.

      Line 515: It would be helpful to readers if the reference used for these pathways was cited.

      Yes. We have added the review reference (Shao et al, 2023) related to these pathways(6) (the 6th reference at the end of the rebuttal). Please see Line 527.

      Line 558: "Translation start site" needs to be corrected to "Transcription start site"

      The “TSS” here exactly indicated “Translation start site”.

      Line 593. "Virulent" pathways needs to be corrected to "virulence" pathways.

      Yes. We have revised it. Please see Line 609.

      Line 604: The type of categorization based on which the proportion of genes is displayed needs to be mentioned.

      Yes, we agree. We have added the type of categorization in the legend. Please see Lines 621-627.

      “Figure 6. Conservation and variability of TFs in PAO1. (A). The pie chart shows the proportions of genes categorized by their presence across P. aeruginosa strains for all genes. (B). The pie chart shows the distribution of TFs identified from PAO1 across different conservation categories. (C). The bar plot of the proportion for non-core TFs. Genes are categorized based on their presence frequency across P. aeruginosa strains: Core genes (present in 99% ~ 100% strains), Soft core genes (present in 95% ~ 99% strains), Shell genes (present in 15% ~ 95% strains), and Cloud genes (present in 0% ~ 15% strains).”

      Reference:

      (1) Liang H, Deng X, Li X, Ye Y, Wu M. 2014. Molecular mechanisms of master regulator VqsM mediating quorum-sensing and antibiotic resistance in Pseudomonas aeruginosa. Nucleic acids research 42:10307-10320.

      (2) Jones CJ, Ryder CR, Mann EE, Wozniak DJ. 2013. AmrZ modulates Pseudomonas aeruginosa biofilm architecture by directly repressing transcription of the psl operon. Journal of bacteriology 195:1637-1644.

      (3) Hickman JW, Harwood CS. 2008. Identification of FleQ from Pseudomonas aeruginosa as ac‐di‐GMP‐responsive transcription factor. Molecular microbiology 69:376-389.

      (4) Déziel E, Gopalan S, Tampakaki AP, Lépine F, Padfield KE, Saucier M, Xiao G, Rahme LG. 2005. The contribution of MvfR to Pseudomonas aeruginosa pathogenesis and quorum sensing circuitry regulation: multiple quorum sensing‐regulated genes are modulated without affecting lasRI, rhlRI or the production of N‐acyl‐L‐homoserine lactones. Molecular microbiology 55:998-1014.

      (5) Lizewski SE, Lundberg DS, Schurr MJ. 2002. The transcriptional regulator AlgR is essential for Pseudomonas aeruginosa pathogenesis. Infection and immunity 70:6083-6093.

      (6) Shao X, Yao C, Ding Y, Hu H, Qian G, He M, Deng X. 2023. The transcriptional regulators of virulence for Pseudomonas aeruginosa: Therapeutic opportunity and preventive potential of its clinical infections. Genes & Diseases 10:2049-2063.

    1. eLife Assessment

      In contrast with mammals, measures of cochlear tuning in budgerigars do not match the frequency dependence of behavioral tuning. Earlier behavioral data in the budgerigar had shown good selectivity at around 3-4 kHz, but it was unknown whether this unusual selectivity arose in the inner ear or was a more central adaptation. The authors measured both auditory-nerve tuning curves and stimulus-frequency otoacoustic emissions and found fairly normal-looking cochlear tuning in the budgerigar. These important findings imply that any behavioral/perceptual differences in frequency selectivity are likely more central in original. These solid new data also provide significant support for the utility of otoacoustic estimates of cochlear tuning.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies and the resulting limitations are discussed.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning were obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar. Possible complications arising from an avian apical-basal transition analogous to that found in mammals are discussed.

    3. Reviewer #2 (Public review):

      Summary:

      Earlier behavioral data in the budgerigar have suggested frequency selectivity that was different from that in many other avian species, showing particularly good selectivity at around 3-4 kHz. It was unknown whether this unusual selectivity was determined in the inner ear, or whether it was a more central adaptation. The results using direct auditory-nerve tuning curves and less invasive stimulus-frequency otoacoustic emissions, suggest fairly normal-looking cochlear tuning in the budgerigar, implying that any behavioral/perceptual differences in frequency selectivity are likely more central in original.

      Strengths:

      - The study presents novel data in budgerigar, comparing the bandwidths of auditory-nerve tuning curves with the latencies of stimulus-frequency otoacoustic emissions (SFOAEs), which are thought to reflect the sharpness of cochlear tuning.<br /> - Using a conversion factor taken from previous data in the chicken to avoid circularity of reasoning, the study shows quite good correspondence between the non-invasive estimates obtained from SFOAEs and the tuning obtained from auditory-nerve fibers. Similarity between budgerigar and chicken are harder to ascertain with the way the data are presented.

      Weaknesses:

      - The comparison of SFOAEs and auditory-nerve tuning curves in the most interesting regions (beyond 3.5 kHz, where some perceptual anomalies seem to occur in some previous data), relies on an extrapolation of the data from the chicken.<br /> - No new behavioral data are presented, so the comparisons made in the paper are between studies separated by decades. None of the behavioral studies cited used the more current techniques that have been claimed to provide a behavioral estimate of cochlear tuning.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Previous studies in mammals and other vertebrates have shown that a noninvasive measure of cochlear tuning, based on the latency derived from stimulus-frequency otoacoustic emissions, provides a reasonable, and non-invasive, estimate of cochlear tuning. This valuable study confirms that finding in a new species, the budgerigar, and provides convincing support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored primarily in mammals. The study's remaining claims of a mismatch between behavioral frequency selectivity and cochlear tuning are based on old behavioral data, and collected in an extreme frequency region at the edge of the limits of hearing. Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, and the evidence for these claims is weak.

      We appreciate the detailed summary of our paper by the editors highlighting its strengths. As described in the following responses, we added additional evidence to the Introduction supporting that budgerigars have (1) unusual behavioral frequency tuning compared to other bird species and (2) unusual behavioral tuning results in budgerigars are not readily explainable by the audiogram. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars. Moreover, that the behavioral data are “old” seems not particularly relevant considering that the same behavioral methods are still widely used in animal research, as elaborated upon in the responses below. We suggest the term “previously published” to clarify the behavioral data used in our analyses.

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning now avoid possible circularity and were are obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar.

      Weaknesses:

      * In mammals, accurate prediction of neural Q_ERB from otoacoustic N_SFOAE involves the application of species-invariance of the tuning ratio combined with an attempt to compensate for possible species differences in the location of the so-called apical-basal transition (for a review, see Shera & Charaziak, Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 2019; 9:pii a033498. doi: 10.1101/cshperspect.a033498; in particular, the text near Eq. 2 and the value of CFa|b).

      Despite this history, the manuscript makes no mention of the apical-basal transition, its possible role in birds, or why it was ignored in the present analysis. As but one result, the comparative discussion of the tuning ratio (paragraph beginning on lines 383) is incomplete and potentially misleading. Although the paragraph highlights differences in the tuning ratio across groups, perhaps these differences simply reflect differences in the value of CFa|b. For example, if the cochlea of the budgerigar is assumed to be entirely "apical" in character (so that CFa|b is around 7-8 kHz), then the budgerigar tuning ratios appear to align remarkably well with those previously obtained in mammals (see Shera et al 2010, Fig 9).

      We added sections on the apical-basal transition to the Results and Discussion, including how this concept might apply in budgerigars and other birds.

      * For the most part, the authors take previous behavioral results in budgerigar at face value, attributing the discrepant behavioral results to hypothesized "central specializations for the processing of masked signals". But before going down this easy road, the manuscript would be stronger if the authors discussed potential issues that might affect the reliability of the previous behavioral literature. For example, the ANF data show that thresholds rise rapidly above about 5 kHz. Might the apparent broadening of the behavioral filters arise as a consequence of off-frequency listening due to the need to increase signal levels at these frequencies? Or perhaps there are other issues. Inquiring readers would appreciate an informed discussion.

      This is a good point, also raised by reviewer 2, that declining audibility above 4 kHz could impact behavioral tuning estimates. On the other hand, other bird species with highly similar audiograms to budgerigars show conventional behavioral tuning that increases in sharpness relatively slowly and monotonically for higher frequences. Thus, the unusual pattern of behavioral tuning in budgerigars is not fully explainable by the audiogram. We added a section to the Introduction highlighting these points.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes two new sets of data involving budgerigar hearing: 1) auditory-nerve tuning curves (ANTCs), which are considered the 'gold standard' measure of cochlear tuning, and 2) stimulus-frequency otoacoustic emissions (SFOAEs), which are a more indirect measure (requiring some assumptions and transformations to infer cochlear tuning) but which are non-invasive, making them easier to obtain and suitable for use in all species, including humans. By using a tuning ratio (relating ANTC bandwidths and SFOAE delay) derived from another bird species (chicken), the authors show that the tuning estimates from the two methods are in reasonable agreement with each other over the range of hearing tested (280 Hz to 5.65 kHz for the ANTCs), and both show a slow monotonic increase in cochlear tuning quality over that range, as expected. These new results are then compared with (much) older existing behavioral estimates of frequency selectivity in the same species.

      Strengths:

      This topic is of interest, because there are some indications from the older behavioral literature that budgerigars have a region of best tuning, which the current authors refer to as an 'acoustic fovea', at around 4 kHz, but that beyond 5 kHz the tuning degrades. Earlier work has speculated that the source could be cochlear or higher (e.g., Okanoya and Dooling, 1987). The current study appears to rule out a cochlear source to this phenomenon.

      Weaknesses:

      The conclusions are rendered questionable by two major problems.

      The first problem is that the study does not provide new behavioral data, but instead relies on decades-old estimates that used techniques dating back to the 1970s, which have been found to be flawed in various ways. The behavioral techniques that have been developed more recently in the human psychophysical literature have avoided these well-documented confounds, such as nonlinear suppression effects (e.g., Houtgast, https://doi.org/10.1121/1.1913048; Shannon, https://doi.org/10.1121/1.381007; Moore, https://doi.org/10.1121/1.381752), perceptual confusion between pure-tone maskers and targets (e.g., Neff, https://doi.org/10.1121/1.393678), beats and distortion products produced by interactions between simultaneous maskers and targets (e.g., Patterson, https://doi.org/10.1121/1.380914), unjustified assumptions and empirical difficulties associated with critical band and critical ratio measures (Patterson, https://doi.org/10.1121/1.380914), and 'off-frequency listening' phenomena (O'Loughlin and Moore, https://doi.org/10.1121/1.385691). More recent studies, tailored to mimic to the extent possible the techniques used in ANTCs, have provided reasonably accurate estimates of cochlear tuning, as measured with ANTCs and SFOAEs (Shera et al., 2003, 2010; Sumner et al., 2010). No such measures yet exist in budgerigars, and this study does not provide any. So the study fails to provide valid behavioral data to support the claims made.

      We appreciate the reviewer’s efforts in summarizing and critiquing our study. We feel that the budgerigar data collected by the Dooling and Saunders labs remain essentially valid today. The methods used in these behavioral studies are rigorous and remain widely used in animal research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). The methods are based on the same power-spectrum-model assumptions of auditory masking as even the most recent and elaborate human psychophysical procedures. We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. More importantly, while forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial. For example, one study showed a closer match between forward-masking results and auditory-nerve tuning (ferret: Sumner et al., 2018), whereas several others showed a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as valid while also discussing limitations and remaining open to alternative interpretations. We added these points to the discussion and added clarification throughout as to the specific behavioral approaches used.

      The second, and more critical, problem can be observed by considering the frequencies at which the old behavioral data indicate a worsening of tuning. From the summary shown in the present Fig. 2, the conclusion that behavioral frequency selectivity worsens again at higher frequencies is based on four data points, all with probe frequencies between 5 and 6 kHz. Comparing this frequency range with the absolute thresholds shown in Fig. 3 (as well as from older budgerigar data) shows it to be on the steep upper edge of the hearing range. Thus, we are dealing not so much with a fovea as the point where hearing starts to end. The point that anomalous tuning measures are found at the edge of hearing in the budgerigar has been made before: Saunders et al. (1978) state in the last sentence of their paper that "the size of the CB rapidly increases above 4.0 kHz and this may be related to the fact that the behavioral audibility curve, above 4.0 kHz, loses sensitivity at the rate of 55 dB per octave."

      Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, in humans as well as in other species. The few attempts to measure human frequency selectivity at that upper edge have resulted in quite messy data and unclear conclusions (e.g., Buus et al., 1986, https://doi.org/10.1007/978-1-4613-2247-4_37). Indeed, the only study to my knowledge to have systematically tested human frequency selectivity in the extended high frequency range (> 12 kHz) seems to suggest a substantial broadening, relative to the earlier estimates at lower frequencies, by as much as a factor of 2 in some individuals (Yasin and Plack, 2005; https://doi.org/10.1121/1.2035594) - in other words by a similar amount as suggested by the budgerigar data. The possible divergence of different measures at the extreme end of hearing could be due to any number of factors that are hard to control and calibrate, given the steep rate of threshold change, leading to uncontrolled off-frequency listening potential, the higher sound levels needed to exceed threshold, as well as contributions from middle-ear filtering. As a side note, in the original ANTC data presented in this study, there are actually very few tuning curves at or above 5 kHz, which are the ones critical to the argument being forwarded here. To my eye, all the estimates above 5 kHz in Fig. 3 fall below the trend line, potentially also in line with poorer selectivity going along with poorer sensitivity as hearing disappears beyond 6 kHz.

      This is an excellent point, also raised by reviewer 1, that declining audibility above 4 kHz could influence behavioral tuning measures. While we acknowledge this possibility, declining audibility cannot fully explain the unusual pattern of behavioral frequency tuning in budgerigars considering that other bird species with the same audiogram phenotype show conventional tuning patterns. We added these points to the Introduction and Fig. 1B. We also added clarification throughout that it is not just the shape of tuning function that is noteworthy in budgerigars, but also the extreme slope in the 1-3.5 kHz region. Behavioral tuning quality in budgerigars increases by 5.3 dB/octave in this range (i.e., nearly doubling each octave increase in frequency), vs. 1.8 dB/octave in humans, 2.5 dB/octave in ferret, 1.1 dB/octave in macaque, and 1.9 dB/octave in starling. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars.

      The basic question posed in the current study title and abstract seems a little convoluted (why would you expect a behavioral measure to reflect cochlear mechanics more accurately than a cochlear-based emissions measure?). A more intuitive (and likely more interesting) way of framing the question would be "What is the neural/mechanical source of a behaviorally observed acoustic fovea?" Unfortunately, this question does not lend itself to being answered in the budgerigar, as that 'fovea' turns out to be just the turning point at the end of the hearing range. There is probably a reason why no other study has referred to this as an acoustic fovea in the budgerigar.

      Overall, a safe interpretation of the data is that hearing starts to change (and becomes harder to measure) at the very upper frequency edge, and not just in budgerigars. Thus, it is difficult to draw any clear conclusions from the current work, other than that the relations between ANTC and SFOAEs estimates of tuning are consistent in budgerigar, as they are in most (all?) other species that have been tested so far.

      We removed the term fovea from the paper. See above for our argument that unusual behavioral tuning in budgerigars is not simply or fully explainable by the audiogram.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 34. As far as I could tell, no other study has referred to this region in budgerigar as an acoustic fovea. Probably for good reason (see above). This wording should probably be avoided.

      We removed the term.

      Line 35. Describing 3.5-4 kHz as 'mid-frequencies' is a stretch. 4 kHz is actually the corner frequency, above which hearing degrades.

      We added a more detailed and accurate description of the tuning pattern.

      Lines 89-91. This seems a nice statement of the problem, and to my mind makes for a much better rationale for the study.

      Line 255. "mixed effect" should "mixed effects".

      We made the correction.

      Line 380. Kuhn and Saunders didn't measure high enough to detect any changes in tuning.

      We removed the reference here.

    1. eLife Assessment

      This important study provides compelling evidence for the evolutionary diversification and conserved NFκB-inducing function of RHIM-containing RIP kinase proteins across animal lineages, combining thorough bioinformatic analysis with functional assays in human cells. The findings are of broad interest to immunologists and evolutionary biologists, though some novel observations would benefit from deeper conceptual integration.

    2. Reviewer #2 (Public review):

      Summary:

      By combining bioinformatical and experimental approaches, the authors address the question why several vertebrate lineages lack specific genes of the necroptosis pathway, or those that regulate the interplay between apoptosis and necroptosis. The lack of such genes was already known from previous publications, but the current manuscript provides a more in-depth analysis and also uses experiments in human cells to address the question of functionality of the remaining genes and pathways. A particular focus is placed on RIPK3/RIPK1 and their dual roles in inducing NFkB and/or necroptosis.

      Strengths:

      The well documented bioinformatical analyses provide a comprehensive data basis of the presence/absence of RIP-kinases, other RHIM proteins, apoptosis signaling proteins (FADD,CASP8,CASP10) and some other genes involved in these pathway. Several of these genes are known to be missing in certain animal lineages, which raises the question why their canonical binding partners are present in these species. By expressing several such proteins (both wildtype and mutants destroying particular interaction regions) in human cells, the authors succeed in establishing a general role of RIPK3 and RIPK1 in NFkB activation. This function appears to be better conserved and more universal than the necroptotic function of the RHIM proteins. The authors also scrutinize the importance of the kinase function and RHIM integrity for these separate functionalities.

      Weaknesses:

      A weakness of the presented study is the experimental restriction to human HEK293 cells. There are several situations where the functionality of proteins from distant organisms (like lampreys or even mussels) in human cells is not necessarily indicative of their function in native context. In some cases, these problems are addressed by co-expressing potential interaction partners, but not all of these experiments are really informative. However, I agree with the authors that it is not possible to perform all the experiments in native cells, and that comparing all proteins in the same (human) cell type allows for a better comparison.

      The conclusions drawn by the authors are supported by convincing evidence. I have no doubts that this study will be very useful for future studies addressing the evolution of necroptosis and its regulation by NFkB and apoptosis.

    3. Reviewer #3 (Public review):

      In this study, the authors employ both computational and experimental methods to reveal functional conservation of RIP family kinases and associated proteins in animals, with particular focus on mammals and other major groups of vertebrates. The bionformatic part of the work involves genomic data from diverse animal groups, providing insightful data on loss and duplications patterns for RIP and other necroptosis-related genes, and positive selection signals for RIPK1/3 genes in certain mammalian clades. These findings are then extensively used for selecting species and RHIM tetrad candidates for further experiments, in which the authors demonstrate different modes of functional conservation for RIPK proteins in necroptosis and NF-kB signaling across vertebrate species.

      As an only major drawback, I would mention several important findings which the authors make in the course of their research but do not pursue further in the experimental part of the paper. These include:

      • An additional copy for RIPK2 (RIPK2B) found in monotremes and non-mammalian vertebrates and its functions;<br /> • The entire diversity of RHIM functional tetrad variants; of particular interest here are IQFG and IQLG tetrads specific for bats, which are known to harbor human-affecting viruses and were demonstrated to have their RIPK1/3 genes under positive selection in this study;<br /> • Functions and involvement of RIPK3 protein in NF-kB pathway in lampreys;<br /> • The mode of NF-kB activation in non-mammalian species retaining ZBP1 copies.

      Further elucidation of some or all of these points in the experimental part would facilitate conceptualizing the paper's numerous findings, which otherwise might appear insufficiently scrutinized. On the other hand, I agree that at least some of them require separate studies to be elucidated in. Given the importance of the results presented in this paper, I believe these points will be further addressed in future works.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript titled "Evolutionary and Functional Analyses Reveal a Role for the RHIM in Tuning RIPK3 Activity Across Vertebrates" by Fay et al. explores the function of RIPK gene family members across a wide range of vertebrate and invertebrate species through a combination of phylogenomics and functional studies. By overexpressing these genes in human cell lines, the authors examine their capacity to activate NF-κB and induce cell death. The methods employed are appropriate, with a thorough analysis of gene loss, positive selection, and functionality. While the study is well-executed and comprehensive, its broader relevance remains limited, appealing mainly to specialists in this specific field of research. It misses the opportunity to extract broader insights that could extend the understanding of these genes beyond evolutionary conservation, particularly by employing evolutionary approaches to explore more generalizable functions.

      Major comments:

      The main issue I encounter is distinguishing between what is novel in this study and what has been previously demonstrated. What new insights have been gained here that are of broader relevance? The discussion, which would be a good place to do so, is very speculative and has little to do with the actual results. Throughout the manuscript, there is little explanation of the study's importance beyond the fact that it was possible to conduct it. Is the evolutionary analysis being used to advance our understanding of gene function, or is the focus merely on how these genes behave across different species? The former would be exciting, while the latter feels less impactful.

      We thank the reviewer for the positive feedback. With regard to the major comment, we have now made changes throughout the revised manuscript to highlight the novel insights that emerge from our work, as well as the importance of using evolutionary and functional analyses to understand gene function. 

      Reviewer #2 (Public review):

      Summary:

      By combining bioinformatical and experimental approaches, the authors address the question of why several vertebrate lineages lack specific genes of the necroptosis pathway or those that regulate the interplay between apoptosis and necroptosis. The lack of such genes was already known from previous publications, but the current manuscript provides a more in-depth analysis and also uses experiments in human cells to address the question of the functionality of the remaining genes and pathways. A particular focus is placed on RIPK3/RIPK1 and their dual roles in inducing NFkB and/or necroptosis.

      Strengths:

      The well-documented bioinformatical analyses provide a comprehensive data basis of the presence/absence of RIP-kinases, other RHIM proteins, apoptosis signaling proteins (FADD, CASP8, CASP10), and some other genes involved in these pathways. Several of these genes are known to be missing in certain animal lineages, which raises the question of why their canonical binding partners are present in these species. By expressing several such proteins (both wildtype and mutants destroying particular interaction regions) in human cells, the authors succeed in establishing a general role of RIPK3 and RIPK1 in NFkB activation. This function appears to be better conserved and more universal than the necroptotic function of the RHIM proteins. The authors also scrutinize the importance of the kinase function and RHIM integrity for these separate functionalities.

      Weaknesses:

      A major weakness of the presented study is the experimental restriction to human HEK293 cells. There are several situations where the functionality of proteins from distant organisms (like lampreys or even mussels) in human cells is not necessarily indicative of their function in the native context. In some cases, these problems are addressed by co-expressing potential interaction partners, but not all of these experiments are really informative.

      A second weakness is that the manuscript addresses some interesting effects only superficially. By using host cells that are deleted for certain signaling components, a more focussed hypothesis could have been tested.

      Thus, while the aim of the study is mostly met, it could have been a bit more ambitious. The limited conclusions drawn by the authors are supported by convincing evidence. I have no doubts that this study will be very useful for future studies addressing the evolution of necroptosis and its regulation by NFkB and apoptosis.

      We thank the reviewer for the positive feedback. We agree that our study is limited by using HEK293 cells. However, we do not have appropriate cell lines for all species analyzed and therefore wished to use a single system to test all effects. As the reviewer points out, we do  co-express when possible, and are careful in the manuscript to not overextend our conclusions. We, like the reviewer, believe that many of the intriguingly findings in this study, which was intended to cover a broad range of species, will be useful for more in-depth studies in a given species.

      Reviewer #3 (Public review):

      This important study provides insights into the functional diversification of RIP family kinase proteins in vertebrate animals. The provided results, which combine bioinformatic and experimental analyses, will be of interest to specialists in both immunology and evolutionary biology. However, the computational part of the methodology is insufficiently covered in the paper and the experimental results would benefit from including data for additional species.

      We thank the reviewer for the positive feedback. As described below, we have now addressed the concerns about the description of the computational methods.

      (1) In the Methods section concerning gene loss analysis, the authors refer to the 'Phylogenetic analysis' section for details of RIPK sequence acquisition and alignment procedure. This section is missing from the manuscript as provided. In its absence, it is hard for the reviewer to provide relevant comments on gene presence/absence analysis.

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (2) In the same section, the authors state that gene sequences were filtered and grouped based on the initial gene tree pattern (lines 448-449). How exactly did the authors filter the non-RIP kinases and other irrelevant homologs from the gene trees? Did they consider the reciprocal best (BLAST) hit approach or similar approaches for orthology inference? Did they also encounter potential pseudogenes of genes marked as missing in Figure 1C? Will the gene trees mentioned be available as supplementary files?

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (3) The authors state the presence of additional RIPK2 paralog in non-therian vertebrates.

      The ramifications of this paralog loss in therians are not discussed in the text, although RIPK2 is also involved in NF-kB activation. In addition, the RIPK2B gene loss pattern is shunned from Figure 1C to Supplementary Figure 4, despite posing comparable interest to the reader.

      We are also intrigued by the RIPK2/RIPK2B data and felt it important to include our findings here, however we do not have functional data for RIPK2B at this point and feel it is better suited for a separate study. We therefore focused both the title and the main figures on RIPK3, for which we have functional data.

      (4) The authors present evidence for (repeated) positive selection in both RIPK1 and RIPK3 in bats; however, neither bat RIPK1/3 orthologs nor bat-specific RHIM tetrad variants (IQFG, IQLG) are considered in the experimental part of the work.

      We included a tetrad variant (VQFG) that is found in bats and multiple other species. We wanted to test a wide range of variant amino acids, so testing both IQFG (found only in bats) and VQFG (found in bats and multiple other diverse species) was not of high importance.

      (5) The authors present gene presence/absence patterns for zebra mussels as an outgroup of vertebrate species analyzed. From the evolutionary perspective, adding results for a closer invertebrate group, such as lancelets, tunicates, or echinoderms, would be beneficial for reconstructing the evolutionary progression of RIPK-mediated immune functions in animals.

      In our initial analyses, we searched for RIPK-like proteins in cnidarians, arthropods, nematodes, amoeba, and spiralia, with only spiralia species containing proteins with substantial homology to vertebrate RIPK1 proteins, as defined by a homologous N-terminal kinase domain and C-terminal RHIM and death domain. We have expanded this analysis to include lancelets, tunicates, and echinoderms and found several lancelet species with RIPK1 like proteins. These data have been added to the manuscript.

      (6) In the broader sense, the list of non-mammalian species included in the study is not explained or substantiated in the text. What was the rationale behind selecting lizards, turtles, and lampreys for experimental assays? Why was turtle RIPK3 but not turtle RIPK1CT protein used for functional tests? Which results do the authors expect to observe if amphibian or teleost RIPK1/3 are included in the analysis, especially those with divergent tetrad variants?

      We have added additional text to define our rationale for selecting which species were tested. 

      (7) For lamprey RIPK3, the observed NF-kB activity levels still remain lower than those of mammalian and reptilian orthologs even after catalytic tetrad modification. In the same way, switching human RIPK3 catalytic tetrad to that of lamprey does not result in NF-kB activation. What are the potential reasons for the observed difference? Does it mean that lamprey's RIPK3 functions in NF-kB activation are, at least partially, delegated to RIPK1?

      The function of lamprey RIPK3 is intriguing, albeit unknown. The reduced activation in human cells may be due to an incompatibility between lamprey RIPK3 and human NF-kB machinery, or it may not function in NF-kB at all. Considering that lamprey do not have other components of the known mammalian necroptosis pathway, it is unclear what function RIPK3 would serve in these species. It is possible lamprey may have a necroptosis pathway that is RIPK3-dependent but distinct from the mammalian pathway. It is an interesting question for future study. 

      (8) In lines 386-388, the authors state that 'only non-mammalian RIPK1CT proteins required the RHIM for maximal NF-kB activation', which is corroborated by results in Figure 4B. The authors further associate this finding with a lack of ZBP1 in the respective species (lines 388-389). However, non-squamate reptiles seem to retain ZBP1, as suggested by

      Supplementary Table 1. Given that, do the authors expect to observe RHIM-independent (maximal) NF-kB activation in turtles and crocodilians or respective RIPK1CT-transfected cells?

      While turtles and crocodiles do retain ZBP1, it is still unclear if they are able to activate ZBP1/RIPK3/MLKL-dependent necroptosis similar to mammals, especially given the divergence in the turtle ZBP1 RHIMs seen in Figure 4C. Future studies will be needed to further test our hypotheses and to continue to characterize innate immune function and evolution across a range of vertebrate species. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) The title is somewhat restrictive, as it only mentions RIPK3, despite the manuscript covering a broader range of RIPKs and associated proteins.

      We agree that a title that encompasses both the breadth of our study and the depth with which we analyzed RIPK3 would be ideal. However, we were unable to come up with a succinct title that conveyed both points appropriately, so opted for one that focused on our RIPK3 insights.

      (2) Several supplementary figures contain valuable information that could be incorporated into the main figures for greater clarity and emphasis.

      We agree that many interesting pieces of data are in the supplement. We felt it was important to include those data in the manuscript, but also wanted to keep the main manuscript figures as focused as possible.  

      Reviewer #2 (Recommendations for the authors):

      (1) I do not fully agree with the claim that caspase-8 is absent from fish. I briefly repeated this part of the analysis and found several fish proteins that cluster with caspase-8 rather than caspase-10 or cFLIP. From the method section, it does not really become clear how the Casp8/Casp10/cFLIP decision was made, and particularly, how cases were addressed where Genew predate the caspase-8/caspase-10 split. To name just a few examples, the authors might check uniprot:A0A444UA91, W5MXS4, or A0A8X8BKJ8 for being fish Caspase-8 candidates.

      We thank the reviewer for their critical analysis. CASP8 and CASP10 are very similar proteins in humans. We are distinguishing between the two based on vertebrate phylogeny with outgroup proteins (CASP2, CASP9, and CFLAR, see tree in Author response image 1 below) to help define the CASP8/CASP10 clade. Once we isolate CASP8/10, we build an additional tree to distinguish CASP8 and CASP10. Using this method, all fish CASP8/10-like proteins cluster with the mammalian CASP10 clade rather than the CASP8 clade, despite many fish proteins being annotated as CASP8 or CASP8-like. We do acknowledge that, because of the similarities between CASP8 and CASP10, there are likely proteins that can fall in either clade depending on which outgroups are included. To this end, we have updated our gene loss figure to only denote whether a species has no CASP8/10, a single CASP8/10 protein, or both CASP8 and CASP10. We have also updated our methods to better define how we completed our analyses. 

      Author response image 1.

      (2) While analyzing which RIPK3 protein causes cell death (lines 188ff), the underlying assumption is that the heterologous RIPK3 proteins can interact with human MLKL and activate it by phosphorylation. No attempts are being made to check if MLKL actually gets phosphorylated, and this issue is also not discussed. In Figure 2C, cell death is either measured by RIPK3 overexpression alone or by the additional overexpression of ZBP1 and MLKL. However, it is not shown if in all cases all the transfected proteins are expressed at a comparable level, or if the observed cell death might be caused by MLKL/ZBP1 overexpression alone.

      Cell death is dependent on expression of ZBP1, MLKL, and RIPK3, as shown in

      Supplementary Figure 6. We have attempted to detect phospho-MLKL via western blot. However, in these overexpression assays, we are able to detect phospho-MLKL in the presence of RIPK3 and MLKL alone, independent of activation of cell death. In fact, we see reduced phospho-MLKL and reduced expression of MLKL overall when ZBP1, MLKL, and RIPK3 are added, presumably due to cell death induced in these conditions (see blot in Author response image 2 below). We therefore felt these data were of limited use here.

      Author response image 2.

      (3) The manuscript describes a well-documented bioinformatical analysis and acknowledges the body of earlier published work on necroptosis evolution and associated gene losses. However, when discussing the RHIM-related aspects, the authors do not mention previous publications on RHIM conservation in invertebrates and even fungal proteins such as Het-S. They also fail to mention/discuss the amyloid-forming properties of RHIMs, which I consider crucial for understanding the function of RHIM-containing proteins.

      We thank the reviewer for their insight. We have added additional points on both RHIM conservation and amyloid formation.

      (4) Related to the above issue: In lines 226ff, the induction of NFkB by RIPK3 overexpression is described. While RIPK3 from other mammals requires endogenous (human) RIPK1 to be present, lizard and turtle RIPK3 do not require human RIPK1 but *do* require functional RHIMs. It is not checked (or at least discussed) if RHIM amyloid formation is required, nor if the RHIM of the heterologous RIPK3 might act through interaction with endogenous (human) RIPK3.

      We and others (PMID: 29073079) did not detect RIPK3 protein in HEK293T cells. This, combined with the requirement for exogenous RIPK3 to activate cell death, indicate that endogenous RIPK3 is not contributing to these assays. 

      (5) In lines 275ff, the authors observe that RIPK1s from other mammalian species do not require the RHIM for NFkB activation, while RIPK1 from non-mammalian species do require the RHIM. I wonder why the (in my opinion) most obvious explanation is not addressed: Maybe the mammalian RIPK1 proteins are similar enough to the human one so that they can signal on their own, while the more distant RIPK1 cannot and thus require human RIPK1 (associated via RHIMs) for NFkB activation? Since the authors used RIPK1-deficient cells in previous experiments, wouldn't it make sense to test them here, too?

      It is intriguing that the more diverged RIPK1 species require the RHIM for NF-kB signaling. In Supplementary Figure 12, we do test the mammalian and non-mammalian proteins in RIPK1 KO cells and all proteins are able to activate NF-kB. So while nonmammalian RIPK1 signaling is dependent on the RHIM, it is independent of endogenous RIPK1.  

      Minor comments:

      (1) In the legend of Figure 1, there is a typo "heat amp".

      This typo has now been corrected.

      (2) In Figure 3A, the term "FUBAR" is not explained at all.

      FUBAR has now been defined in the methods section.

      Reviewer #3 (Recommendations for the authors):

      A few typos and graph inconsistencies have been encountered in the course of the manuscript, e.g.:

      (1) Line 168: 'heat amp' -> 'heat map'.

      (2) Lines 290-291: 'known mediate' -> 'known to mediate' (?)

      We thank the reviewer for catching these mistakes. They have been corrected. 

      (3) Supplementary Figure 12: Are human RIPK1 results presented in both 'mammalian' and 'non-mammalian' parts of the figure? If so, why do human data differ between the graphs?

      Mammalian and non-mammalian data were collected in separate experiments with human RIPK1 used as a control for both. The human data shown in the two graphs represent two separate experiments.

    1. eLife Assessment

      This important study investigates the molecular mechanisms by which the p53 isoforms Δ133p53α and Δ160p53α exert dominant-negative effects on full-length p53 (FLp53). Through a combination of chromatin immunoprecipitation, transcriptional reporter assays, subcellular localization analyses, and protein aggregation experiments, the authors provide solid evidence that these N-terminally truncated isoforms promote co-aggregation with FLp53, disrupting its transcriptional activity and cellular distribution. The revised manuscript successfully addresses prior reviewer concerns, and the findings are well supported by the experimental data.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors have provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. The authors proposed this happens by sequestration of full length P53 by truncated P53. In the study, the performed experiments are well described.

      Significance:

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Comments on latest version:

      The authors have made significant effort to address my concerns using the system available to them. I find the justifications provided in the rebuttal letter and the revised figures satisfactory. My initial concerns regarding the overexpression system have been largely addressed. However, the experimental system used by the authors lacks the means to measure the effect on endogenous p53, which remains a limitation.

    3. Reviewer #2 (Public review):

      Summary:

      The revised manuscript by Zhao and colleagues presents a novel and compelling investigation into the p53 isoforms, Δ133p53 and Δ160p53, which are implicated in aggressive cancer phenotypes. The primary goal of this study was to elucidate how these isoforms exert a dominant-negative impact on the activity of full-length p53 (FLp53). The authors demonstrate that the Δ133p53 and Δ160p53 isoforms display impaired binding to p53-regulated promoters. Their findings suggest that the dominant-negative effects observed are primarily due to the co-aggregation of FLp53 with Δ133p53 and Δ160p53.

      Overall, the study is innovative, thoroughly executed, and supported by robust data analysis. The authors have effectively addressed the reviewers' criticisms and incorporated their suggestions in this revised manuscript.

      Significance:

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the co-aggregation of FLp53 with Δ133p53 and Δ160p53.

    4. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Authors has provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. Authors proposed this happens by sequestration of full length P53 by truncated P53.

      In the study, performed experiments are well described.

      My area of expertise is molecular biology/gene expression, and I have tried to provide suggestions on my area of expertise. The study has been done mainly with overexpression system and I have included few comments which I can think can be helpful to understand effect of truncated P53 on endogenous wild type full length protein. Performing experiments on these lines will add value to the observation according to this reviewer.

      Major comments:

      (1) What happens to endogenous wild type full length P53 in the context of mutant/truncated isoforms, that is not clear. Using a P53 antibody which can detect endogenous wild type P53, can authors check if endogenous full length P53 protein is also aggregated as well? It is hard to differentiate if aggregation of full length P53 happens only in overexpression scenario, where lot more both of such proteins are expressed. In normal physiological condition P53 expression is usually low, tightly controlled and its expression get induced in altered cellular condition such as during DNA damage. So, it is important to understand the physiological relevance of such aggregation, which could be possible if authors could investigate effect on endogenous full length P53 following overexpression of mutant isoforms.

      Thank you very much for your insightful comments.

      (1) To address “what happens to endogenous wild-type full-length P53 in the context of mutant/truncated isoforms," we employed a human A549 cell line expressing endogenous wild-type p53 under DNA damage conditions such as an etoposide treatment(1). We choose the A549 cell line since similar to H1299, it is a lung cancer cell line (www.atcc.org). For comparison, we also transfected the cells with 2 μg of V5-tagged plasmids encoding FLp53 and its isoforms Δ133p53 and Δ160p53. As shown in Author response image 1A, lanes 1 and 2, endogenous p53 expression, remained undetectable in A549 cells despite etoposide treatment, which limits our ability to assess the effects of the isoforms on the endogenous wild-type FLp53. We could, however, detect the V5-tagged FLp53 expressed from the plasmid using anti-V5 (rabbit) as well as with antiDO-1 (mouse) antibody (Author response image 1). The latter detects both endogenous wildtype p53 and the V5-tagged FLp53 since the antibody epitope is within the Nterminus (aa 20-25). This result supports the reviewer’s comment regarding the low level of expression of endogenous p53 that is insufficient for detection in our experiments.   

      In summary, in line with the reviewer’s comment that ‘under normal physiological conditions p53 expression is usually low,’ we could not detect p53 with an anti-DO-1 antibody. Thus, we proceeded with V5/FLAG-tagged p53 for detection of the effects of the isoforms on p53 stability and function. We also found that protein expression in H1299 cells was more easily detectable than in A549 cells (Compare Author response image 1A and B). Thus, we decided to continue with the H1299 cells (p53-null), which would serve as a more suitable model system for this study.  

      (2) We agree with the reviewer that ‘It is hard to differentiate if aggregation of full-length p53 happens only in overexpression scenario’. However, it is not impossible to imagine that such aggregation of FLp53 happens under conditions when p53 and its isoforms are over-expressed in the cell. Although the exact physiological context is not known and beyond the scope of the current work, our results indicate that at higher expression, p53 isoforms drive aggregation of FLp53. Given the challenges of detecting endogenous FLp53, we had to rely on the results obtained with plasmid mediated expression of p53 and its isoforms in p53-null cells.

      Author response image 1.

      Comparative analysis of protein expression in A549 and H1299 cells. (A) A549 cells (p53 wild-type) were treated with etoposide to induce endogenous wild-type p53 expression. To assess the effects of FLp53 and its isoforms Δ133p53 and Δ160p53 on endogenous wild-type p53 aggregation, A549 cells were transfected with 2 μg of V5-tagged p53 expression plasmids, with or without etoposide (20μM for 8h) treatment. Western blot analysis was done with the anti-V5 (rabbit) to detect V5-tagged proteins and anti-DO-1 (mouse), the latter detects both endogenous wild-type p53 and V5-tagged FLp53. The merged image corresponds to the overlay between the V5 and DO1 antibody signals. (B) H1299 cells (p53-null) were transfected with 2 μg V5tagged p53 expression plasmids or the empty vector control pcDNA3.1. Western blot analysis was done with the anti-V5 (mouse) antibody. 

      (2) Can presence of mutant P53 isoforms can cause functional impairment of wild type full length endogenous P53? That could be tested as well using similar ChIP assay authors has performed, but instead of antibody against the Tagged protein if the authors could check endogenous P53 enrichment in the gene promoter such as P21 following overexpression of mutant isoforms. May be introducing a condition such as DNA damage in such experiment might help where endogenous P53 is induced and more prone to bind to P53 target such as P21.

      Thank you very much for your valuable comments and suggestions. To investigate the potential functional impairment of endogenous wild-type p53 by p53 isoforms, we initially utilized A549 cells (p53 wild-type), aiming to monitor endogenous wild-type p53 expression following DNA damage. However, as mentioned and demonstrated in Author response image 1, endogenous p53 expression was too low to be detected under these conditions, making the ChIP assay for analyzing endogenous p53 activity unfeasible. Thus, we decided to utilize plasmid-based expression of FLp53 and focus on the potential functional impairment induced by the isoforms.

      (3) On similar lines, authors described:

      "To test this hypothesis, we escalated the ratio of FLp53 to isoforms to 1:10. As expected, the activity of all four promoters decreased significantly at this ratio (Figure 4A-D). Notably, Δ160p53 showed a more potent inhibitory effect than Δ133p53 at the 1:5 ratio on all promoters except for the p21 promoter, where their impacts were similar (Figure 4E-H). However, at the 1:10 ratio, Δ133p53 and Δ160p53 had similar effects on all transactivation except for the MDM2 promoter (Figure 4E-H)."

      Again, in such assay authors used ratio 1:5 to 1:10 full length vs mutant. How authors justify this result in context (which is more relevant context) where one allele is Wild type (functional P53) and another allele is mutated (truncated, can induce aggregation). In this case one would except 1:1 ratio of full-length vs mutant protein, unless other regulation is going which induces expression of mutant isoforms more than wild type full length protein. Probably discussing on these lines might provide more physiological relevance to the observed data.

      Thank you for raising this point regarding the physiological relevance of the ratios used in our study.

      (1) In the revised manuscript (lines 193-195), we added in this direction that “The elevated Δ133p53 protein modulates p53 target genes such as miR‑34a and p21, facilitating cancer development(2, 3). To mimic conditions where isoforms are upregulated relative to FLp53, we increased the ratios to 1:5 and 1:10.” This approach aims to simulate scenarios where isoforms accumulate at higher levels than FLp53, which may be relevant in specific contexts, as also elaborated above.

      (2) Regarding the issue of protein expression, where one allele is wild-type and the other is isoform, this assumption is not valid in most contexts. First, human cells have two copies of TPp53 gene (one from each parent). Second, the TP53 gene has two distinct promoters: the proximal promoter (P1) primarily regulates FLp53 and ∆40p53, whereas the second promoter (P2) regulates ∆133p53 and ∆160p53(4, 5). Additionally, ∆133TP53 is a p53 target gene(6, 7) and the expression of Δ133p53 and FLp53 is dynamic in response to various stimuli. Third, the expression of p53 isoforms is regulated at multiple levels, including transcriptional, post-transcriptional, translational, and post-translational processing(8). Moreover, different degradation mechanisms modify the protein level of p53 isoforms and FLp53(8). These differential regulation mechanisms are regulated by various stimuli, and therefore, the 1:1 ratio of FLp53 to ∆133p53 or ∆160p53 may be valid only under certain physiological conditions. In line with this, varied expression levels of FLp53 and its isoforms, including ∆133p53 and ∆160p53, have been reported in several studies(3, 4, 9, 10). 

      (3) In our study, using the pcDNA 3.1 vector under the human cytomegalovirus (CMV) promoter, we observed moderately higher expression levels of ∆133p53 and ∆160p53 relative to FLp53 (Author response image 1B). This overexpression scenario provides a model for studying conditions where isoform accumulation might surpass physiological levels, impacting FLp53 function. By employing elevated ratios of these isoforms to FLp53, we aim to investigate the potential effects of isoform accumulation on FLp53.

      (4) Finally does this altered function of full length P53 (preferably endogenous one) in presence of truncated P53 has any phenotypic consequence on the cells (if authors choose a cell type which is having wild type functional P53). Doing assay such as apoptosis/cell cycle could help us to get this visualization.

      Thank you for your insightful comments. In the experiment with A549 cells (p53 wild-type), endogenous p53 levels were too low to be detected, even after DNA damage induction. The evaluation of the function of endogenous p53 in the presence of isoforms is hindered, as mentioned above. In the revised manuscript, we utilized H1299 cells with overexpressed proteins for apoptosis studies using the Caspase-Glo® 3/7 assay (Figure 7). This has been shown in the Results section (lines 254-269). “The Δ133p53 and Δ160p53 proteins block pro-apoptotic function of FLp53.

      One of the physiological read-outs of FLp53 is its ability to induce apoptotic cell death(11). To investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on FLp53-induced apoptosis, we measured caspase-3 and -7 activities in H1299 cells expressing different p53 isoforms (Figure 7). Caspase activation is a key biochemical event in apoptosis, with the activation of effector caspases (caspase-3 and -7) ultimately leading to apoptosis(12). The caspase-3 and -7 activities induced by FLp53 expression was approximately 2.5 times higher than that of the control vector (Figure 7). Co-expression of FLp53 and the isoforms Δ133p53 or Δ160p53 at a ratio of 1: 5 significantly diminished the apoptotic activity of FLp53 (Figure 7). This result aligns well with our reporter gene assay, which demonstrated that elevated expression of Δ133p53 and Δ160p53 impaired the expression of apoptosis-inducing genes BAX and PUMA (Figure 4G and H). Moreover, a reduction in the apoptotic activity of FLp53 was observed irrespective of whether Δ133p53 or Δ160p53 protein was expressed with or without a FLAG tag (Figure 7). This result, therefore, also suggests that the FLAG tag does not affect the apoptotic activity or other physiological functions of FLp53 and its isoforms. Overall, the overexpression of p53 isoforms Δ133p53 and Δ160p53 significantly attenuates FLp53-induced apoptosis, independent of the protein tagging with the FLAG antibody epitope.”

      Referees cross-commenting

      I think the comments from the other reviewers are very much reasonable and logical.

      Especially all 3 reviewers have indicated, a better way to visualize the aggregation of full-length wild type P53 by truncated P53 (such as looking at endogenous P53# by reviewer 1, having fluorescent tag #by reviewer 2 and reviewer 3 raised concern on the FLAG tag) would add more value to the observation.

      Thank you for these comments. The endogenous p53 protein was undetectable in A549 cells induced by etoposide (Figure R1A). Therefore, we conducted experiments using FLAG/V5-tagged FLp53.  To avoid any potential side effects of the FLAG tag on p53 aggregation, we introduced untagged p53 isoforms in the H1299 cells and performed subcellular fractionation. Our revised results, consistent with previous FLAG-tagged p53 isoforms findings, demonstrate that co-expression of untagged isoforms with FLAG-tagged FLp53 significantly induced the aggregation of FLAG-FLp53, while no aggregation was observed when FLAG-tagged FLp53 was expressed alone (Supplementary Figure 6). These results clearly indicate that the FLAG tag itself does not contribute to protein aggregation. 

      Additionally, we utilized the A11 antibody to detect protein aggregation, providing additional validation (Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137). Given that the fluorescent proteins (~30 kDa) are substantially bigger than the tags used here (~1 kDa) and may influence oligomerization (especially GFP), stability, localization, and function of p53 and its isoforms, we avoided conducting these vital experiments with such artificial large fusions. 

      Reviewer #1 (Significance):

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Thank you for your insightful comments. We appreciate your recognition of the significance of our work in providing mechanistic insights into how wild-type FLp53 can be inactivated by truncated isoforms. We agree that these findings have potential for exploring new strategies to restore p53 function as a therapeutic approach against cancer. 

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      This study is innovative, well-executed, and supported by thorough data analysis. However, the authors should address the following points:

      (1) Introduction on Aggregation and Co-aggregation: Given that the focus of the study is on the aggregation and co-aggregation of the isoforms, the introduction should include a dedicated paragraph discussing this issue. There are several original research articles and reviews that could be cited to provide context.

      Thank you very much for the valuable comments. We have added the following paragraph in the revised manuscript (lines 74-82): “Protein aggregation has become a central focus of modern biology research and has documented implications in various diseases, including cancer(13, 14, 15). Protein aggregates can be of different types ranging from amorphous aggregates to highly structured amyloid or fibrillar aggregates, each with different physiological implications. In the case of p53, whether protein aggregation, and in particular, co-aggregation with large N-terminal deletion isoforms, plays a mechanistic role in its inactivation is yet underexplored. Interestingly, the Δ133p53β isoform has been shown to aggregate in several human cancer cell lines(16). Additionally, the Δ40p53α isoform exhibits a high aggregation tendency in endometrial cancer cells(17). Although no direct evidence exists for Δ160p53 yet, these findings imply that p53 isoform aggregation may play a major role in their mechanisms of actions.”

      (2) Antibody Use for Aggregation: To strengthen the evidence for aggregation, the authors should consider using antibodies that specifically bind to aggregates.

      Thank you for your insightful suggestion. We addressed protein aggregation using the A11 antibody which specifically recognizes amyloid-like protein aggregates. We analyzed insoluble nuclear pellet samples prepared under identical conditions as described in Figure 6B. To confirm the presence of p53 proteins, we employed the anti-p53 M19 antibody (Santa Cruz, Cat No. sc-1312) to detect bands corresponding to FLp53 and its isoforms Δ133p53 and Δ160p53. The monomer FLp53 was not detected (Figure 8, lower panel, Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137), which may be attributed to the lower binding affinity of the anti-p53 M19 antibody to it. These samples were also immunoprecipitated using the A11 antibody (Thermo Fischer Scientific, Cat No. AHB0052) to detect aggregated proteins. Interestingly, FLp53 and its isoforms, Δ133p53 and Δ160p53, were clearly visible with Anti-A11 antibody when co-expressed at a 1:5 ratio suggesting that they underwent co-aggregation. However, no FLp53 aggregates were observed when it was expressed alone (Author response image 2). These results support the conclusion in our manuscript that Δ133p53 and Δ160p53 drive FLp53 aggregation. 

      Author response image 2.

      Induction of FLp53 Aggregation by p53 Isoforms Δ133p53 and Δ160p53. H1299 cells transfected with the FLAG-tagged FLp53 and V5-tagged Δ133p53 or Δ160p53 at a 1:5 ratio. The cells were subjected to subcellular fractionation, and the resulting insoluble nuclear pellet was resuspended in RIPA buffer. The samples were heated at 95°C until the pellet was completely dissolved, and then analyzed by Western blotting. Immunoprecipitation was performed using the A11 antibody, which specifically recognizes amyloid protein aggregates, and the anti-p53 M19 antibody, which detects FLp53 as well as its isoforms Δ133p53 and Δ160p53. 

      (3) Fluorescence Microscopy: Live-cell fluorescence microscopy could be employed to enhance visualization by labeling FLp53 and the isoforms with different fluorescent markers (e.g., EGFP and mCherry tags).

      We appreciate the suggestion to use live-cell fluorescence microscopy with EGFP and mCherry tags for the visualization FLp53 and its isoforms. While we understand the advantages of live-cell imaging with EGFP / mCherry tags, we restrained us from doing such fusions as the GFP or corresponding protein tags are very big (~30 kDa) with respect to the p53 isoform variants (~30 kDa).  Other studies have shown that EGFP and mCherry fusions can alter protein oligomerization, solubility and aggregation(18, 19) Moreover, most fluorescence proteins are prone to dimerization (i.e. EGFP) or form obligate tetramers (DsRed)(20, 21, 22), potentially interfering with the oligomerization and aggregation properties of p53 isoforms, particularly Δ133p53 and Δ160p53.

      Instead, we utilized FLAG- or V5-tag-based immunofluorescence microscopy, a well-established and widely accepted method for visualizing p53 proteins. This method provided precise localization and reliable quantitative data, which we believe meet the needs of the current study. We believe our chosen method is both appropriate and sufficient for addressing the research question.

      Reviewer #2 (Significance):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      We sincerely thank the reviewer for the thoughtful and positive comments on our manuscript and for highlighting the significance of our findings on the p53 isoforms, Δ133p53 and Δ160p53. 

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript entitled "Δ133p53 and Δ160p53 isoforms of the tumor suppressor protein p53 exert dominant-negative effect primarily by coaggregation", the authors suggest that the Δ133p53 and Δ160p53 isoforms have high aggregation propensity and that by co-aggregating with canonical p53 (FLp53), they sequestrate it away from DNA thus exerting a dominantnegative effect over it.

      First, the authors should make it clear throughout the manuscript, including the title, that they are investigating Δ133p53α and Δ160p53α since there are 3 Δ133p53 isoforms (α, β, γ), and 3 Δ160p53 isoforms (α, β, γ).

      Thank you for your suggestion. We understand the importance of clearly specifying the isoforms under study. Following your suggestion, we have added α in the title, abstract, and introduction and added the following statement in the Introduction (lines 57-59): “For convenience and simplicity, we have written Δ133p53 and Δ160p53 to represent the α isoforms (Δ133p53α and Δ160p53α) throughout this manuscript.” 

      One concern is that the authors only consider and explore Δ133p53α and Δ160p53α isoforms as exclusively oncogenic and FLp53 dominant-negative while not discussing evidences of different activities. Indeed, other manuscripts have also shown that Δ133p53α is non-oncogenic and non-mutagenic, do not antagonize every single FLp53 functions and are sometimes associated with good prognosis. To cite a few examples:

      (1) Hofstetter G. et al. D133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. Br. J. Cancer 2011, 105, 15931599.

      (2) Bischof, K. et al. Influence of p53 Isoform Expression on Survival in HighGrade Serous Ovarian Cancers. Sci. Rep. 2019, 9,5244.

      (3) Knezovi´c F. et al. The role of p53 isoforms' expression and p53 mutation status in renal cell cancer prognosis. Urol. Oncol. 2019, 37, 578.e1578.e10.

      (4) Gong, L. et al. p53 isoform D113p53/D133p53 promotes DNA doublestrand break repair to protect cell from death and senescence in response to DNA damage. Cell Res. 2015, 25, 351-369.

      (5) Gong, L. et al. p53 isoform D133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Sci. Rep. 2016, 6, 37281.

      (6) Horikawa, I. et al. D133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell Death Differ. 2017, 24, 1017-1028.

      (7) Gong, L. p53 coordinates with D133p53 isoform to promote cell survival under low-level oxidative stress. J. Mol. Cell Biol. 2016, 8, 88-90.

      Thank you very much for your comment and for highlighting these important studies. 

      We agree that Δ133p53 isoforms exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. However, our mission here was primarily to reveal the molecular mechanism for the dominant-negative effects exerted by the Δ133p53α and Δ160p53α isoforms on FLp53 for which the Δ133p53α and Δ160p53α isoforms are suitable model systems. Exploring the oncogenic potential of the isoforms is beyond the scope of the current study and we have not claimed anywhere that we are reporting that. We have carefully revised the manuscript and replaced the respective terms e.g. ‘prooncogenic activity’ with ‘dominant-negative effect’ in relevant places (e.g. line 90). We have now also added a paragraph with suitable references that introduces the oncogenic and non-oncogenic roles of the p53 isoforms.

      After reviewing the papers you cited, we are not sure that they reflect on oncogenic /non-oncogenic role of the Δ133p53α isoform in different cancer cases.  Although our study is not about the oncogenic potential of the isoforms, we have summarized the key findings below:

      (1) Hofstetter et al., 2011: Demonstrated that Δ133p53α expression improved recurrence-free and overall survival (in a p53 mutant induced advanced serous ovarian cancer, suggesting a potential protective role in this context.

      (2) Bischof et al., 2019: Found that Δ133p53 mRNA can improve overall survival in high-grade serous ovarian cancers. However, out of 31 patients, only 5 belong to the TP53 wild-type group, while the others carry TP53 mutations.

      (3) Knezović et al., 2019: Reported downregulation of Δ133p53 in renal cell carcinoma tissues with wild-type p53 compared to normal adjacent tissue, indicating a potential non-oncogenic role, but not conclusively demonstrating it.

      (4) Gong et al., 2015: Showed that Δ133p53 antagonizes p53-mediated apoptosis and promotes DNA double-strand break repair by upregulating RAD51, LIG4, and RAD52 independently of FLp53.

      (5) Gong et al., 2016: Demonstrated that overexpression of Δ133p53 promotes efficiency of cell reprogramming by its anti-apoptotic function and promoting DNA DSB repair. The authors hypotheses that this mechanism is involved in increasing RAD51 foci formation and decrease γH2AX foci formation and chromosome aberrations in induced pluripotent stem (iPS) cells, independent of FL p53.

      (6) Horikawa et al., 2017: Indicated that induced pluripotent stem cells derived from fibroblasts that overexpress Δ133p53 formed noncancerous tumors in mice compared to induced pluripotent stem cells derived from fibroblasts with complete p53 inhibition. Thus, Δ133p53 overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but it still compromises certain p53-mediated tumor-suppressing pathways. “Overexpressed Δ133p53 prevented FL-p53 from binding to the regulatory regions of p21WAF1 and miR-34a promoters, providing a mechanistic basis for its dominant-negative

      inhibition of a subset of p53 target genes.”

      (7) Gong, 2016: Suggested that Δ133p53 promotes cell survival under lowlevel oxidative stress, but its role under different stress conditions remains uncertain.

      We have revised the Introduction to provide a more balanced discussion of Δ133p53’s dule role (lines 62-73):

      “The Δ133p53 isoform exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. Recent studies demonstrate the non-oncogenic yet context-dependent role of the Δ133p53 isoform in cancer development. Δ133p53 expression has been reported to correlate with improved survival in patients with TP53 mutations(23, 24), where it promotes cell survival in a nononcogenic manner(25, 26), especially under low oxidative stress(27). Alternatively, other recent evidences emphasize the notable oncogenic functions of Δ133p53 as it can inhibit p53-dependent apoptosis by directly interacting with the FLp53 (4, 6). The oncogenic function of the newly identified Δ160p53 isoform is less known, although it is associated with p53 mutation-driven tumorigenesis(28) and in melanoma cells’ aggressiveness(10). Whether or not the Δ160p53 isoform also impedes FLp53 function in a similar way as Δ133p53 is an open question. However, these p53 isoforms can certainly compromise p53-mediated tumor suppression by interfering with FLp53 binding to target genes such as p21 and miR-34a(2, 29) by dominant-negative effect, the exact mechanism is not known.” On the figures presented in this manuscript, I have three major concerns:

      (1) Most results in the manuscript rely on the overexpression of the FLAGtagged or V5-tagged isoforms. The validation of these construct entirely depends on Supplementary figure 3 which the authors claim "rules out the possibility that the FLAG epitope might contribute to this aggregation. However, I am not entirely convinced by that conclusion. Indeed, the ratio between the "regular" isoform and the aggregates is much higher in the FLAG-tagged constructs than in the V5-tagged constructs. We can visualize the aggregates easily in the FLAG-tagged experiment, but the imaging clearly had to be overexposed (given the white coloring demonstrating saturation of the main bands) to visualize them in the V5-tagged experiments. Therefore, I am not convinced that an effect of the FLAG-tag can be ruled out and more convincing data should be added. 

      Thank you for raising this important concern. We have carefully considered your comments and have made several revisions to clarify and strengthen our conclusions.

      First, to address the potential influence of the FLAG and V5 tags on p53 isoform aggregation, we have revised Figure 2 and removed the previous Supplementary Figure 3, where non-specific antibody bindings and higher molecular weight aggregates were not clearly interpretable. In the revised Figure 2, we have removed these potential aggregates, improving the clarity and accuracy of the data.

      To further rule out any tag-related artifacts, we conducted a coimmunoprecipitation assay with FLAG-tagged FLp53 and untagged Δ133p53 and Δ160p53 isoforms. The results (now shown in the new Supplementary Figure 3) completely agree with our previous result with FLAG-tagged and V5tagged Δ133p53 and Δ160p53 isoforms and show interaction between the partners. This indicates that the FLAG / V5-tags do not influence / interfere with the interaction between FLp53 and the isoforms. We have still used FLAGtagged FLp53 as the endogenous p53 was undetectable and the FLAG-tagged FLp53 did not aggregate alone. 

      In the revised paper, we added the following sentences (Lines 146-152): “To rule out the possibility that the observed interactions between FLp53 and its isoforms Δ133p53 and Δ160p53 were artifacts caused by the FLAG and V5 antibody epitope tags, we co-expressed FLAG-tagged FLp53 with untagged Δ133p53 and Δ160p53. Immunoprecipitation assays demonstrated that FLAGtagged FLp53 could indeed interact with the untagged Δ133p53 and Δ160p53 isoforms (Supplementary Figure 3, lanes 3 and 4), confirming formation of hetero-oligomers between FLp53 and its isoforms. These findings demonstrate that Δ133p53 and Δ160p53 can oligomerize with FLp53 and with each other.”

      Additionally, we performed subcellular fractionation experiments to compare the aggregation and localization of FLAG-tagged FLp53 when co-expressed either with V5-tagged or untagged Δ133p53/Δ160p53. In these experiments, the untagged isoforms also induced FLp53 aggregation, mirroring our previous results with the tagged isoforms (Supplementary Figure 5). We’ve added this result in the revised manuscript (lines 236-245): “To exclude the possibility that FLAG or V5 tags contribute to protein aggregation, we also conducted subcellular fractionation of H1299 cells expressing FLAG-tagged FLp53 along with untagged Δ133p53 or Δ160p53 at a 1:5 ratio. The results showed (Supplementary Figure 6) a similar distribution of FLp53 across cytoplasmic, nuclear, and insoluble nuclear fractions as in the case of tagged Δ133p53 or Δ160p53 (Figure 6A to D). Notably, the aggregation of untagged Δ133p53 or Δ160p53 markedly promoted the aggregation of FLAG-tagged FLp53 (Supplementary Figure 6B and D), demonstrating that the antibody epitope tags themselves do not contribute to protein aggregation.” 

      We’ve also discussed this in the Discussion section (lines 349-356): “In our study, we primarily utilized an overexpression strategy involving FLAG/V5tagged proteins to investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on the function of FLp53. To address concerns regarding potential overexpression artifacts, we performed the co-immunoprecipitation (Supplementary Figure 6) and caspase-3 and -7 activity (Figure 7) experiments with untagged Δ133p53 and Δ160p53. In both experimental systems, the untagged proteins behaved very similarly to the FLAG/V5 antibody epitopecontaining proteins (Figures 6 and 7 and Supplementary Figure 6). Hence, the C-terminal tagging of FLp53 or its isoforms does not alter the biochemical and physiological functions of these proteins.”

      In summary, the revised data set and newly added experiments provide strong evidence that neither the FLAG nor the V5 tag contributes to the observed p53 isoform aggregation.

      (2) The authors demonstrate that to visualize the dominant-negative effect, Δ133p53α and Δ160p53α must be "present in a higher proportion than FLp53 in the tetramer" and the need at least a transfection ratio 1:5 since the 1:1 ration shows no effect. However, in almost every single cell type, FLp53 is far more expressed than the isoforms which make it very unlikely to reach such stoichiometry in physiological conditions and make me wonder if this mechanism naturally occurs at endogenous level. This limitation should be at least discussed.

      Thank you for your insightful comment. However, evidence suggests that the expression levels of these isoforms such as Δ133p53, can be significantly elevated relative to FLp53 in certain physiological conditions(3, 4, 9). For example, in some breast tumors, with Δ133p53 mRNA is expressed at a much levels than FLp53, suggesting a distinct expression profile of p53 isoforms compared to normal breast tissue(4). Similarly, in non-small cell lung cancer and the A549 lung cancer cell line, the expression level of Δ133p53 transcript is significantly elevated compared to non-cancerous cells(3). Moreover, in specific cholangiocarcinoma cell lines, the Δ133p53 /TAp53 expression ratio has been reported to increase to as high as 3:1(9). These observations indicate that the dominant-negative effect of isoform Δ133p53 on FLp53 can occur under certain pathological conditions where the relative amounts of the FLp53 and the isoforms would largely vary. Since data on the Δ160p53 isoform are scarce, we infer that the long N-terminal truncated isoforms may share a similar mechanism.

      (3) Figure 5C: I am concerned by the subcellular location of the Δ133p53α and Δ160p53α as they are commonly considered nuclear and not cytoplasmic as shown here, particularly since they retain the 3 nuclear localization sequences like the FLp53 (Bourdon JC et al. 2005; Mondal A et al. 2018; Horikawa I et al, 2017; Joruiz S. et al, 2024). However, Δ133p53α can form cytoplasmic speckles (Horikawa I et al, 2017) when it colocalizes with autophagy markers for its degradation.

      The authors should discuss this issue. Could this discrepancy be due to the high overexpression level of these isoforms? A co-staining with autophagy markers (p62, LC3B) would rule out (or confirm) activation of autophagy due to the overwhelming expression of the isoform.

      Thank you for your thoughtful comments. We have thoroughly reviewed all the papers you recommended (Bourdon JC et al., 2005; Mondal A et al., 2018; Horikawa I et al., 2017; Joruiz S. et al., 2024)(4, 29, 30, 31). Among these, only the study by Bourdon JC et al. (2005) provided data regarding the localization of Δ133p53(4). Interestingly, their findings align with our observations, indicating that the protein does not exhibit predominantly nuclear localization in the Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137. The discrepancy may be caused by a potentially confusing statement in that paper(4).

      The localization of p53 is governed by multiple factors, including its nuclear import and export(32). The isoforms Δ133p53 and Δ160p53 contain three nuclear localization sequences (NLS)(4). However, the isoforms Δ133p53 and Δ160p53 were potentially trapped in the cytoplasm by aggregation and masking the NLS. This mechanism would prevent nuclear import. 

      Further, we acknowledge that Δ133p53 co-aggregates with autophagy substrate p62/SQSTM1 and autophagosome component LC3B in cytoplasm by autophagic degradation during replicative senescence(33). We agree that high overexpression of these aggregation-prone proteins may induce endoplasmic reticulum (ER) stress and activates autophagy(34). This could explain the cytoplasmic localization in our experiments. However, it is also critical to consider that we observed aggregates in both the cytoplasm and the nucleus (Figures 6B and E and Supplementary Figure 6B). While cytoplasmic localization may involve autophagy-related mechanisms, the nuclear aggregates likely arise from intrinsic isoform properties, such as altered protein folding, independent of autophagy. These dual localizations reflect the complex behavior of Δ133p53 and Δ160p53 isoforms under our experimental conditions.

      In the revised manuscript, we discussed this in Discussion (lines 328-335): “Moreover, the observed cytoplasmic isoform aggregates may reflect autophagy-related degradation, as suggested by the co-localization of Δ133p53 with autophagy substrate p62/SQSTM1 and autophagosome component LC3B(33). High overexpression of these aggregation-prone proteins could induce endoplasmic reticulum stress and activate autophagy(34). Interestingly, we also observed nuclear aggregation of these isoforms (Figure 6B and E and Supplementary Figure 6B), suggesting that distinct mechanisms, such as intrinsic properties of the isoforms, may govern their localization and behavior within the nucleus. This dual localization underscores the complexity of Δ133p53 and Δ160p53 behavior in cellular systems.”

      Minor concerns:

      -  Figure 1A: the initiation of the "Δ140p53" is shown instead of "Δ40p53"

      Thank you! The revised Figure 1A has been created in the revised paper.

      -  Figure 2A: I would like to see the images cropped a bit higher, so the cut does not happen just above the aggregate bands

      Thank you for this suggestion. We’ve changed the image and the new Figure 2 has been shown in the revised paper.

      -  Figure 3C: what ratio of FLp53/Delta isoform was used?

      We have added the ratio in the figure legend of Figure 3C (lines 845-846) “Relative DNA-binding of the FLp53-FLAG protein to the p53-target gene promoters in the presence of the V5-tagged protein Δ133p53 or Δ160p53 at a 1: 1 ratio.”

      -  Figure 3C suggests that the "dominant-negative" effect is mostly senescencespecific as it does not affect apoptosis target genes, which is consistent with Horikawa et al, 2017 and Gong et al, 2016 cited above. Furthermore, since these two references and the others from Gong et al. show that Δ133p53α increases DNA repair genes, it would be interesting to look at RAD51, RAD52 or Lig4, and maybe also induce stress.

      Thank you for your thoughtful comments and suggestions. In Figure 3C, the presence of Δ133p53 or Δ160p53 only significantly reduced the binding of FLp53 to the p21 promoter. However, isoforms Δ133p53 and Δ160p53 demonstrated a significant loss of DNA-binding activity at all four promoters: p21, MDM2, and apoptosis target genes BAX and PUMA (Figure 3B). This result suggests that Δ133p53 and Δ160p53 have the potential to influence FLp53 function due to their ability to form hetero-oligomers with FLp53 or their intrinsic tendency to aggregate. To further investigate this, we increased the isoform to FLp53 ratio in Figure 4, which demonstrate that the isoforms Δ133p53 and Δ160p53 exert dominant-negative effects on the function of FLp53. 

      These results demonstrate that the isoforms can compromise p53-mediated pathways, consistent with Horikawa et al. (2017), which showed that Δ133p53α overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but still affects specific tumor-suppressing pathways. Furthermore, as noted by Gong et al. (2016), Δ133p53’s anti-apoptotic function under certain conditions is independent of FLp53 and unrelated to its dominantnegative effects.

      We appreciate your suggestion to investigate DNA repair genes such as RAD51, RAD52, or Lig4, especially under stress conditions. While these targets are intriguing and relevant, we believe that our current investigation of p53 targets in this manuscript sufficiently supports our conclusions regarding the dominant-negative effect. Further exploration of additional p53 target genes, including those involved in DNA repair, will be an important focus of our future studies.

      - Figure 5A and B: directly comparing the level of FLp53 expressed in cytoplasm or nucleus to the level of Δ133p53α and Δ160p53α expressed in cytoplasm or nucleus does not mean much since these are overexpressed proteins and therefore depend on the level of expression. The authors should rather compare the ratio of cytoplasmic/nuclear FLp53 to the ratio of cytoplasmic/nuclear Δ133p53α and Δ160p53α.

      Thank you very much for this valuable suggestion. In the revised paper, Figure 5B has been recreated.  Changes have been made in lines 214215: “The cytoplasm-to-nucleus ratio of Δ133p53 and Δ160p53 was approximately 1.5-fold higher than that of FLp53 (Figure 5B).” 

      Referees cross-commenting

      I agree that the system needs to be improved to be more physiological.

      Just to precise, the D133 and D160 isoforms are not truncated mutants, they are naturally occurring isoforms expressed in almost every normal human cell type from an internal promoter within the TP53 gene.

      Using overexpression always raises concerns, but in this case, I am even more careful because the isoforms are almost always less expressed than the FLp53, and here they have to push it 5 to 10 times more expressed than the FLp53 to see the effect which make me fear an artifact effect due to the overwhelming overexpression (which even seems to change the normal localization of the protein).

      To visualize the endogenous proteins, they will have to change cell line as the H1299 they used are p53 null.

      Thank you for these comments. We’ve addressed the motivation of overexpression in the above responses. We needed to use the plasmid constructs in the p53-null cells to detect the proteins but the expression level was certainly not ‘overwhelmingly high’. 

      First, we tried the A549 cells (p53 wild-type) under DNA damage conditions, but the endogenous p53 protein was undetectable. Second, several studies reported increased Δ133p53 level compared to wild-type p53 and that it has implications in tumor development(2, 3, 4, 9). Third, the apoptosis activity of H1299 cells overexpressing p53 proteins was analyzed in the revised manuscript (Figure 7). The apoptotic activity induced by FLp53 expression was approximately 2.5 times higher than that of the control vector under identical plasmid DNA transfection conditions (Figure 7). These results rule out the possibility that the plasmid-based expression of p53 and its isoforms introduced artifacts in the results. We’ve discussed this in the Results section (lines 254269).

      Reviewer #3 (Significance):

      Overall, the paper is interesting particularly considering the range of techniques used which is the main strength.

      The main limitation to me is the lack of contradictory discussion as all argumentation presents Δ133p53α and Δ160p53α exclusively as oncogenic and strictly FLp53 dominant-negative when, particularly for Δ133p53α, a quite extensive literature suggests a not so clear-cut activity.

      The aggregation mechanism is reported for the first time for Δ133p53α and Δ160p53α, although it was already published for Δ40p53α, Δ133p53β or in mutant p53.

      This manuscript would be a good basic research addition to the p53 field to provide insight in the mechanism for some activities of some p53 isoforms.

      My field of expertise is the p53 isoforms which I have been working on for 11 years in cancer and neuro-degenerative diseases

      Thank you very much for your positive and critical comments. We’ve included a fair discussion on the oncogenic and non-oncogenic function of Δ133p53 in the Introduction following your suggestion (lines 62-73). 

      References

      (1) Pitolli C, Wang Y, Candi E, Shi Y, Melino G, Amelio I. p53-Mediated Tumor Suppression: DNA-Damage Response and Alternative Mechanisms. Cancers 11,  (2019).

      (2) Fujita K, et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nature cell biology 11, 1135-1142 (2009).

      (3) Fragou A, et al. Increased Δ133p53 mRNA in lung carcinoma corresponds with reduction of p21 expression. Molecular medicine reports 15, 1455-1460 (2017).

      (4) Bourdon JC, et al. p53 isoforms can regulate p53 transcriptional activity. Genes & development 19, 2122-2137 (2005).

      (5) Ghosh A, Stewart D, Matlashewski G. Regulation of human p53 activity and cell localization by alternative splicing. Molecular and cellular biology 24, 7987-7997 (2004).

      (6) Aoubala M, et al. p53 directly transactivates Δ133p53α, regulating cell fate outcome in response to DNA damage. Cell death and differentiation 18, 248-258 (2011).

      (7) Marcel V, et al. p53 regulates the transcription of its Delta133p53 isoform through specific response elements contained within the TP53 P2 internal promoter. Oncogene 29, 2691-2700 (2010).

      (8) Zhao L, Sanyal S. p53 Isoforms as Cancer Biomarkers and Therapeutic Targets. Cancers 14,  (2022).

      (9) Nutthasirikul N, Limpaiboon T, Leelayuwat C, Patrakitkomjorn S, Jearanaikoon P. Ratio disruption of the ∆133p53 and TAp53 isoform equilibrium correlates with poor clinical outcome in intrahepatic cholangiocarcinoma. International journal of oncology 42, 1181-1188 (2013).

      (10) Tadijan A, et al. Altered Expression of Shorter p53 Family Isoforms Can Impact Melanoma Aggressiveness. Cancers 13,  (2021).

      (11) Aubrey BJ, Kelly GL, Janic A, Herold MJ, Strasser A. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell death and differentiation 25, 104-113 (2018).

      (12) Ghorbani N, Yaghubi R, Davoodi J, Pahlavan S. How does caspases regulation play role in cell decisions? apoptosis and beyond. Molecular and cellular biochemistry 479, 1599-1613 (2024).

      (13) Petronilho EC, et al. Oncogenic p53 triggers amyloid aggregation of p63 and p73 liquid droplets. Communications chemistry 7, 207 (2024).

      (14) Forget KJ, Tremblay G, Roucou X. p53 Aggregates penetrate cells and induce the coaggregation of intracellular p53. PloS one 8, e69242 (2013).

      (15) Farmer KM, Ghag G, Puangmalai N, Montalbano M, Bhatt N, Kayed R. P53 aggregation, interactions with tau, and impaired DNA damage response in Alzheimer's disease. Acta neuropathologica communications 8, 132 (2020).

      (16) Arsic N, et al. Δ133p53β isoform pro-invasive activity is regulated through an aggregation-dependent mechanism in cancer cells. Nature communications 12, 5463 (2021).

      (17) Melo Dos Santos N, et al. Loss of the p53 transactivation domain results in high amyloid aggregation of the Δ40p53 isoform in endometrial carcinoma cells. The Journal of biological chemistry 294, 9430-9439 (2019).

      (18) Mestrom L, et al. Artificial Fusion of mCherry Enhances Trehalose Transferase Solubility and Stability. Applied and environmental microbiology 85,  (2019).

      (19) Kaba SA, Nene V, Musoke AJ, Vlak JM, van Oers MM. Fusion to green fluorescent protein improves expression levels of Theileria parva sporozoite surface antigen p67 in insect cells. Parasitology 125, 497-505 (2002).

      (20) Snapp EL, et al. Formation of stacked ER cisternae by low affinity protein interactions. The Journal of cell biology 163, 257-269 (2003).

      (21) Jain RK, Joyce PB, Molinete M, Halban PA, Gorr SU. Oligomerization of green fluorescent protein in the secretory pathway of endocrine cells. The Biochemical journal 360, 645-649 (2001).

      (22) Campbell RE, et al. A monomeric red fluorescent protein. Proceedings of the National Academy of Sciences of the United States of America 99, 7877-7882 (2002).

      (23) Hofstetter G, et al. Δ133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. British journal of cancer 105, 1593-1599 (2011).

      (24) Bischof K, et al. Influence of p53 Isoform Expression on Survival in High-Grade Serous Ovarian Cancers. Scientific reports 9, 5244 (2019).

      (25) Gong L, et al. p53 isoform Δ113p53/Δ133p53 promotes DNA double-strand break repair to protect cell from death and senescence in response to DNA damage. Cell research 25, 351-369 (2015).

      (26) Gong L, et al. p53 isoform Δ133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Scientific reports 6, 37281 (2016).

      (27) Gong L, Pan X, Yuan ZM, Peng J, Chen J. p53 coordinates with Δ133p53 isoform to promote cell survival under low-level oxidative stress. Journal of molecular cell biology 8, 88-90 (2016).

      (28) Candeias MM, Hagiwara M, Matsuda M. Cancer-specific mutations in p53 induce the translation of Δ160p53 promoting tumorigenesis. EMBO reports 17, 1542-1551 (2016).

      (29) Horikawa I, et al. Δ133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell death and differentiation 24, 1017-1028 (2017).

      (30) Mondal AM, et al. Δ133p53α, a natural p53 isoform, contributes to conditional reprogramming and long-term proliferation of primary epithelial cells. Cell death & disease 9, 750 (2018).

      (31) Joruiz SM, Von Muhlinen N, Horikawa I, Gilbert MR, Harris CC. Distinct functions of wild-type and R273H mutant Δ133p53α differentially regulate glioblastoma aggressiveness and therapy-induced senescence. Cell death & disease 15, 454 (2024).

      (32) O'Brate A, Giannakakou P. The importance of p53 location: nuclear or cytoplasmic zip code? Drug resistance updates : reviews and commentaries in antimicrobial and anticancer chemotherapy 6, 313-322 (2003).

      (33) Horikawa I, et al. Autophagic degradation of the inhibitory p53 isoform Δ133p53α as a regulatory mechanism for p53-mediated senescence. Nature communications 5, 4706 (2014).

      (34) Lee H, et al. IRE1 plays an essential role in ER stress-mediated aggregation of mutant huntingtin via the inhibition of autophagy flux. Human molecular genetics 21, 101-114 (2012).

    1. eLife Assessment

      This important study advances our understanding of maladaptive innate immune training. The experimental evidence supporting the conclusions is convincing and the expert reviewers strongly endorse the manuscript. The work will be of high interest to both researchers in the trained immunity field and clinician scientists.

    2. Reviewer #1 (Public review):

      Summary:

      The concept that trained immunity, as defined, can be beneficial to subsequent immune challenges is important in the broad context of health and disease. The significance of this manuscript is the finding that trained immunity is actually a two-edged sword, herein, detrimental in the context of LPS-induced Acute Lung Injury that is mediated by AMs.

      Strengths:

      Several lines of evidence in different mouse models support this conclusion. The postulation that differences in immune responses in individuals is linked to differences in the mycobiome and consequent B-glucan makeup is provocative.

      Weaknesses:

      However, the findings that the authors state are relevant to sepsis are actually confined to a specific lung injury model and not classically-defined sepsis, the ontogeny of the reprogrammed AMs is uncertain, and links in the proposed signaling pathways need to be strengthened.

      Comments on the latest version:

      The manuscript is improved with further clarifications and additional experimentation. My prior concerns are addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Prével et al. present an in vivo study in which they reveal an interesting aspect of β-glucan, a known inducer of enhanced immune responses termed trained immunity in sterile inflammation. The authors can show that β-glucan's can reprogram alveolar macrophages (AMs) in the lungs through neutrophils and IFNγ signaling and independent of Dectin1. This reprogramming occurs at both transcriptional and metabolic levels. After β-glucan training, LPS induced sterile inflammation exacerbated acute lung injury via enhanced immunopathology. These findings highlight a new aspect of β-glucan's role in trained immunity and its potential detrimental effects when enhanced pathogen clearance is not required.

      Strengths:

      - This manuscript is well-written and effectively conveys its message.

      - The authors provide important evidence that β-glucan training is not solely beneficial but depending on the context can also enhance immunopathology. This will be important to the field for two reasons. It shows again that trained immunity can also be harmful. Jentho et al. 2021 had already provided further evidence for this aspect. And it highlights anew that LPS application is an insufficient infection model.

      Original weaknesses noted:

      - Only a little physiological data from the in vivo models is provided.

      - Effects in histology appear to be rather weak.

      Comments on latest version:

      The authors have revised the new version according to my suggestions or responded in a sufficient manner to my requests, with one exception. I recommend to rename TNF as explained by Grimstad in JAMA Dermatol. 2016;152(5):557.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The concept that trained immunity, as defined, can be beneficial to subsequent immune challenges is important in the broad context of health and disease. The significance of this manuscript is the finding that trained immunity is actually a two-edged sword, herein, detrimental in the context of LPS-induced Acute Lung Injury that is mediated by AMs.

      Strengths:

      Several lines of evidence in different mouse models support this conclusion. The postulation that differences in immune responses in individuals are linked to differences in the mycobiome and consequent B-glucan makeup is provocative.

      Weaknesses:

      The findings that the authors state are relevant to sepsis, are actually confined to a specific lung injury model and not classically-defined sepsis. In addition, the ontogeny of the reprogrammed AMs is uncertain. Links in the proposed signaling pathways need to be strengthened.

      Reviewer #2 (Public review):

      Summary:

      Prével et al. present an in vivo study in which they reveal an interesting aspect of β-glucan, a known inducer of enhanced immune responses termed trained immunity in sterile inflammation. The authors can show, that β-glucan's can reprogram alveolar macrophages (AMs) in the lungs through neutrophils and IFNγ signaling and independent of Dectin1. This reprogramming occurs at both transcriptional and metabolic levels. After β-glucan training, LPS-induced sterile inflammation exacerbated acute lung injury via enhanced immunopathology. These findings highlight a new aspect of β-glucan's role in trained immunity and its potential detrimental effects when enhanced pathogen clearance is not required.

      Strengths:

      (1) This manuscript is well-written and effectively conveys its message.

      (2) The authors provide important evidence that β-glucan training is not solely beneficial, but depending on the context can also enhance immunopathology. This will be important to the field for two reasons. It shows again, that trained immunity can also be harmful. Jentho et al. 2021 have already provided further evidence for this aspect. And it highlights anew that LPS application is an insufficient infection model.

      Weaknesses:

      (1) Only a little physiological data is provided by the in vivo models.

      (2) The effects in histology appear to be rather weak.

      Reviewer #1 (Recommendations for the authors):

      The opening paragraph in the introduction focuses on sepsis. This is misleading since this manuscript does not address sepsis but rather intranasal-administered LPS-induced acute lung injury.

      We are in total agreement with the reviewer and have modified the introduction to focus on acute lung injury with clinical relevance more associated to TLR4-mediated acute lung injury and lung inflammation.

      The authors make definitive statements that AMs originate from fetal liver monocytes. However, it is well known that the ontogeny of AMs is complex and AMs can be populated, in part, from peripheral monocytes. The ontogeny of reprogrammed AMs was not addressed in this study but they may come from monocyte-derived AMs following B-glucan training (transfer of AMs into Csf2rb KO mice does not prove the contrary). In this regard, do, for example, the percentages of CD11b+ AMs change? More phenotyping of the control and reprogrammed AMs would enhance the interpretation of the findings.

      The reviewer is correct that the ontogeny of AMs can be heterogenous, especially following a pulmonary challenge. In β-glucan-treated mice, Figure 1I shows no changes in frequency or number of AMs in the BAL. As the reviewer suggested, we repeated this experiment and incorporate more markers for AMs. New Supplementary Figure 1C shows the expression of CD11b on AMs (CD11c<sup>+</sup>SiglecF<sup>+</sup>) from control and β-glucan-treated mice. While the frequency increases with LPS administration, we show no difference between control and β-glucan groups suggesting β-glucan does not induce the expansion of monocyte-derived AMs. Additionally, in New Supplementary Figure 1D, we show the expression of AM-associated markers in order to better delineate their phenotype. We observed no differences in MHCII, CD169, CD64 and F4/80 in β-glucan-treated mice, but an increase in CD80<SUP>+</SUP> AMs following βglucan suggesting enhanced activation corroborating their proinflammatory phenotype. Collectively, these data indicate that while the frequency and number of either yolk-sac or BMderived AMs are unchanged in the β-glucan treated mice, the activation of AMs is enhanced after the systemic treatment with β-glucan.

      The abstract seems to overpromise a bit. First, it mentions trained immunity and HSCs, but they don't seem to formally address either in the context of this model (there is reprogramming as assessed by transcriptome and metabolic analyses which is suggestive as stated by the authors, but do the changes overlap significantly with classically trained immunity?), and second, it links phenotypes together in a pathway(s) that they haven't actually interrogated - although they look at transcripts and do a seahorse assay they don't actually confirm that any of those findings are related to the increased response to LPS in vivo. The long discussion with all the caveats highlights these limitations, all relegated to future studies.

      We thank the reviewer for this comment. In response, we have revised the abstract to more accurately highlight the key findings of this study. Specifically, we introduced the concept of central trained immunity to describe the phenomena commonly observed with β-glucan treatment, contrasting it with the peripheral trained immunity detailed in the manuscript.

      The use of Csf2rb-/- mice to complement the clodronate approach is interesting (this approach has been used in the past with influenza virus). In addition to lacking AMs, these mice develop pulmonary alveolar proteinosis. Do the authors have histopathology from these mice in the current model? They mention PAP in the discussion.

      Pulmonary alveolar proteinosis (PAP) typically develops in Csf2b-/- mice from 12 weeks of age onwards (Stanley et al., Proc Natl Acad Sci USA, 1994). However, in our model, mice were euthanized at 6 weeks, ensuring that pulmonary function and structure remained intact. A hallmark of PAP is the accumulation of protein, primarily surfactant, in BAL. To investigate this, we measured BAL protein concentration and observed no differences at baseline (Figure 2F). These findings were further supported by the absence of differences in BAL proinflammatory cytokine concentrations (Figure 2H).

      A question about their BAL technique? In the control mice without glucan/LPS stimulation, only 40% of BAL cells are AMs [and the total number of AMs (range of <103 to 2-3 x 104) is at least 5-fold lower than typically seen in BALs from healthy mice (105), and there didn't seem to be many PMNs either. Are 60% of the BAL cells lymphocytes/ RBCs? Is it possible that overall AM numbers are changing, but CD11c/SiglecF-positive cell numbers stay the same (only assessed 2 markers)? More phenotyping would help.

      We appreciate the reviewer’s comment and would like to clarify that alveolar macrophages (AMs) are presented in the manuscript as a frequency of viable cells rather than as a frequency of CD45<SUP>+</SUP> cells, to ensure consistency throughout the study. The remaining cells in the samples are likely epithelial cells and lymphocytes, as red blood cells are lysed during sample processing. For additional context, we now provide data showing AMs as a percentage of CD45<SUP>+</SUP> cells, which account for 80–90% of leukocytes. Furthermore, in New Supplementary Figure 1D, we highlight the expression of AM-associated markers to better define their phenotype. We observed no differences in MHCII, CD169, CD64, or F4/80 expression in βglucan-treated mice. However, there was an increase in CD80<SUP>+</SUP> AMs, indicating enhanced activation and corroborating their proinflammatory phenotype.

      Author response image 1.

      AMs as percentage of CD45<SUP>+</SUP> cells. Mice were treated with β-glucan for seven days. We show CD11c<sup>+</sup>SiglecF<sup>+</sup> cells in the bronchoalveolar lavage (BAL) as a percentage of CD45<SUP>+</SUP> cells (n=5).

      Line 130-131. TNF is decreased and not pointed out.

      In the poly(I:C) model, the difference in the BAL TNF concentration is not statistically different between naïve and trained mice due to high variability of data. The reviewer is correct that TNFα does not appear to reflect Poly(I:C)-mediated ALI. We have included this point in the revised manuscript (Line 146-148).

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The authors provide evidence for enhanced ALI via different techniques, e.g. histology, vascular leakage, immune cell composition in BAL etc. It would be interesting to see whether there were any changes in the disease severity of ALI. If possible the authors could provide data for survival, temperature, weight, and/or glucose in the different groups.

      Mice are extremely resistant to the pulmonary LPS model. We have previously assessed lethality of our LPS model, and all mice survive even with an increased intranasal dose of LPS 200μg (Pernet et al, Nature, 2023). To address the reviewer concerns, we next assessed the morbidity by monitoring weight loss following LPS challenge and showed β-glucan-treated mice exhibit a delayed recovery time after 4 days LPS treatment (New Supplementary Figure 1B).

      (2) The authors show that ß-glucan mediated training enhances ALI. Conversely, the opposite, decreased immunopathology should be observed in case an LPS tolerance model would be used. I am wondering whether this has already been performed, given that the (LPS/immune)tolerance field is already older than the training field. If not, I suggest incorporating this feature in their discussion.

      Thank you for this insightful comment. While LPS has long been recognized to induce tolerance, studies have also shown that intranasal exposure to ambient levels of LPS can induce alveolar macrophage (AM) training via type I interferon signaling (Zahalka et al., Mucosal Immunol, 2022). In contrast, Mason et al. demonstrated that systemic LPS stimulation induces tolerance through TNF-α signaling, resulting in diminished AM phagocytosis and superoxide production. This leads to reduced neutrophil recruitment and impaired bacterial clearance in a Pseudomonas aeruginosa pneumonia model (J Infect Dis, 1997). Furthermore, we recently reported that systemic administration of β-glucan induces central trained immunity, generating a distinct subset of regulatory neutrophils that promote disease tolerance against influenza viral infection (Khan et al., Nat Immunol, 2025). These findings highlight the complex and context-dependent interplay between training and tolerance. We have expanded on this point in the discussion section of the revised manuscript (Lines 289-297).

      (3) The finding that trained immunity can exert not only beneficial effects but also enhance immunopathology is interesting and should be further explored. Already Jentho et al. (PNAS 2021) have shown that upon sterile inflammation as imposed by LPS, (heme) training can lead to enhanced mortality. This might be a relevant trade-off in trained immunity since no beneficial resistance effect by pathogen killing can be obtained. It would be interesting to see, in their model, whether heme would also enhance ALI after intranasal LPS application. Or at least, can the authors discuss this finding more, also in relation to the already published evidence?

      Thank you for raising this interesting point, which is indeed relevant to our study. Jentho et al. demonstrated that training by heme can be beneficial in combating infectious challenges but can have deleterious effects in the context of sterile inflammation. The concept of endogenous training agents like heme, with their diverse effects on immune cells, aligns well with our βglucan model, particularly given the high prevalence of fungal agents in the microbiome.

      While investigating the effects of heme on alveolar macrophages would certainly be intriguing, Jentho and colleagues have already reported the maladaptive effects of heme, such as tissue damage, during sterile LPS-induced inflammation. As such, these findings might be redundant in the context of our model. However, we have drawn a relevant parallel and expanded on this discussion in the revised manuscript (Lines 382-385).

      (4) It is not clear how the histologies were evaluated. This is a field of great subjectivity. The authors should describe it in more detail. The best option would have been a blinded observer. Was this done?

      Histology samples were evaluated according to ATS 2011 guidelines regarding “Features and measurements of experimental acute lung injury in animals” by a blinded pathologist. We have specified this in the methods of the revised manuscript.

      Minor:

      (1) Line 108 and ff. Please change TNF, not TNFa

      Since we used an ELISA specific for TNF-α rather than general TNF, it is more accurate to refer to it as TNF-α.

      (2) Line 513 and ff. Please use Greek letters when appropriate, e.g. IFN-γ not IFNg.

      Thank you for pointing out these mistakes, we rectified these in the text.

    1. eLife Assessment

      In this preregistered study, Kunkel and colleagues set out to compare the magnitude and duration of placebo versus nocebo effects in healthy volunteers, and also to examine the different factors contributing to these effects. The authors follow a rigorous methodology in a within-subjects design, taking into consideration standard conventions for manipulation of expectations, and using an appropriate sham condition. They present compelling evidence of long-lasting placebo and nocebo effects, with nocebo responses demonstrating consistently greater strength. These valuable results have the potential for a great impact in the field of experimental and clinical pain.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning<br /> (2) examine the persistence of these effects one week later, and<br /> (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

    3. Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

    4. Author response:

      Public Reviews:  

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase (i.e., 10 trials per condition). This trial number was chosen to ensure comparability with previous studies employing similar designs and research questions (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We will acknowledge this limitation and future direction in the revised manuscript. 

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify that participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We will add this information to the revised version of the manuscript. 

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We will address this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session. 

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that it is unfortunate that 25 participants were conditioned with an unbalanced number of trials per condition during the conditioning session. In the revised version of the manuscript, we will include additional analyses to demonstrate that this imbalance did not systematically bias the results and that the findings observed during the test phase remain robust despite this error.  

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the large sample size, methodological rigor, and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We apologize that the registration number alone does not directly lead to the preregistration of this study. We thank the reviewer for pointing this out and will include a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for this comment and suggestion. In the revised version, we will restructure the manuscript and include more detailed information about the key experimental procedures and design at the beginning of the Results section to enhance clarity and improve the interpretability of the reported findings.

    1. eLife Assessment

      This paper describes the study of the evolution of the N-terminal domain of the MSH6 mismatch repair protein in regard to the presence or absence of histone reader domains. While the presence of the histone reader domains was previously known, the phylogenetic analysis of these domains performed here establishing their insertion through convergent evolution is important, definitively done, and establishes an interesting feature of the MSH6 family of proteins. The work is convincing but the presentation of the structural features of MSH6 could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

      Thank you for the helpful suggestions on this front. We agree that some of the structural visualizations were over simplified and apologize for the lack of clarity. Notably, we did not annotate the presence of putative or known short PCNA-interacting protein (PIP) motifs which have been found at the linker disordered N-terminus of MSH6 proteins. Indeed, while not direct to our investigation of the origins of histone readers, the PIP motifs are an interesting and functionally important feature of MSH6 structural biology, especially because they may facilitate DNA repair processes more generally. In the revised manuscript, we aim to improve the scholarship on this topic and clarify the presence/importance of this motif for MSH6 function, as well as what is known about the structural biology of the MSH6 N-terminus more broadly. We will add annotations of the PIP motif and will also improve structural prediction by visualizing MSH6 structure in its dimerized form with MSH2, for a more accurate estimate of its folding in vivo. We hope that these in addition to other valuable suggested improvements will enhance the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

      Thank you, we will also address your suggestions, which provide valuable recommendations for improving the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

      We agree that this work expands on recent experimental work in humans and Arabidopsis on the function of histone readers in MSH6, PWWP and Tudor, respectively. However, the evolution of these fusions remained a significant knowledge gap, limiting the degree to which functional work could be translated to other organisms. This study definitively characterized the evolutionary history of MHS6 histone readers and lays the groundwork for future investigations in diverse species. We agree that more causal inference would be valuable to understand the evolutionary pressures acting on MSH6 histone reader presence/absence. Indeed, we prioritized the conservative approach of testing hypotheses with strict phylogenetically constrained contrasts. While we observed highly significant associations between histone readers and genomic traits like intron content, associations with life history traits were only significant before accounting for phylogeny. It is possible that this is due to a lack of power because such traits are only available in limited taxa. In the revised manuscript, we aim to clarify potential causes, outline future experimental work beyond the scope of this individual study, and argue that this work highlights the need to catalog trait diversity at broader phylogenetic scales.  We also address other valuable suggestions in the revised manuscript.

    1. eLife Assessment

      This study provides valuable insights into how auditory stimuli influence the temporal dynamics of visual perception by modulating brain rhythms (oscillations) in the alpha band. The authors present convincing evidence that auditory input induces a drop in visual alpha frequency, increasing the time window for audio-visual integration, and subsequently shifting the predictive role from prestimulus alpha frequency to alpha phase. The conclusions are well-supported by the combination of psychophysics, electrophysiological recordings (EEG), non-invasive brain stimulation (tACS), and computational modelling.

    2. Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time). Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms. Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms). As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

    3. Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      We thank the reviewer for highlighting this important point. In our study, all explicit trials consistently presented two flashes. We will clearly state this detail in the Methods section to avoid any further confusion.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      We thank the reviewer for raising this point. First, we would like to clarify that our results do not suggest that the frequency effect is absent in the F2B1 condition; rather, it is relatively attenuated compared to the F2 condition. If the sample size were the primary issue, we would expect to observe a null effect in both conditions. Instead, the stronger frequency modulation in F2 confirms that the sound-induced modulation is present, albeit reduced in the audiovisual context. In our revised manuscript, we will explicitly note that our claim is not that there is no frequency effect in F2B1 but that the effect is weaker relative to F2, and we will also acknowledge the potential limitations associated with sample size and the lack of individualized frequency targeting.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      We thank the reviewer for pointing out the potential bias toward reporting “two flashes” in the bistable trials. Because our experimental design involves presenting two flashes in both explicit and bistable trials, a slight tendency to report two flashes may naturally arise, especially at threshold levels determined during pretesting. We believe, however, that this bias does not undermine our primary findings. Our psychophysical procedure is designed to align the inter-stimulus interval with each participant’s fusion threshold, aiming for a near 50/50 split between “one-flash” and “two-flash” reports. However, given that two flashes are always presented, participants may be predisposed to report two flashes when uncertain. This reflects a plausible perceptual bias inherent in the bistable design, rather than a systematic flaw. Importantly, this tendency appears at comparable levels in both the F2 and F2B1 conditions, indicating that it does not selectively affect any particular condition. In the revised manuscript, we will include additional descriptive statistics, such as means and standard deviations, to demonstrate that the observed bias remains within an acceptable range and does not compromise our core conclusions regarding the modulatory effect of auditory input on visual integration.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time).

      We appreciate the reviewer’s concern regarding this issue. First, the sustained decrease in posterior alpha frequency observed in our study—persisting for approximately 280 ms—substantially exceeds the typical duration of an auditory evoked potential (generally 50–200 ms) (Näätänen and Picton, 1987). This extended period of modulation suggests that it is not merely a transient evoked response.

      Second, our analysis of alpha power further supports this interpretation. A purely evoked response is usually accompanied by a corresponding increase in signal power; however, our results show no such power increase when comparing the F2B1 condition with the F2 condition.

      Moreover, the observed increase in alpha phase resetting—as measured by inter-trial phase coherence (ITC)—does not significantly correlate with changes in alpha power. This dissociation further indicates that the auditory-induced effects are unlikely to be driven solely by evoked potentials, but are more consistent with a reorganization of the intrinsic neural oscillatory activity.

      Together, these lines of evidence strongly support the view that the auditory-induced decrease in alpha frequency reflects true changes in ongoing oscillatory dynamics, rather than being merely a transient evoked response.

      Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms.

      We appreciate the reviewer’s insightful comment regarding the potential influence of flash-induced phase resetting on the temporal extent of the IAF differences. We acknowledge that the observation—that wider ISIs are associated with a longer period of IAF differences—hints at a non-linear or even super-additive interaction between auditory- and flash-induced phase resetting mechanisms.

      However, the primary focus of our current study is on how auditory stimuli affect alpha oscillatory dynamics. Our experimental design and computational model were specifically optimized to capture auditory-induced phase resetting. Incorporating the additional influence of flash-induced effects would require a significantly more refined experimental framework and a more complex modeling approach. This added complexity could obscure the interpretation of our main findings, which are centered on auditory influences.

      In the revised manuscript, we will address this intriguing possibility in the Discussion section. We will acknowledge that while the data hint at a potential visual contribution, our present model deliberately isolates auditory-induced phase resetting to maintain clarity. We also propose that future research, with more precise experimental designs and enhanced modeling techniques, is necessary to fully disentangle and capture the interplay between auditory and flash-induced phase resetting mechanisms.

      Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms).

      We appreciate the reviewer’s suggestion to further explore our model’s parameter space. In response, we will conduct additional simulations that incorporate variability in alpha frequency—sampling randomly between 8 and 13 Hz—and examine alternative phase delays (e.g., around 10 and 50 ms). By systematically adjusting these parameters, we can more thoroughly evaluate the model’s robustness and delineate its boundaries under a broader range of neurophysiological conditions. We will present these results in the revised manuscript and discuss how they inform our understanding of alpha-driven visual integration in cross-modal contexts.

      As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

      We appreciate the reviewer’s suggestion and will revise our claims accordingly. In the revised manuscript, we will clarify that while our study demonstrates a mechanism by which alpha oscillations influence audiovisual integration in certain paradigms, this does not mean that our findings fully reconcile all conflicting results in the literature. We will emphasize that our mechanism may help explain why alpha frequency plays a critical role in some experimental settings, but that factors such as sample size, task parameters, and experimental design differences likely contribute to the divergent results observed across studies. Accordingly, we acknowledge that further research with larger samples and more refined methodologies is necessary to fully reconcile these discrepancies. This more cautious interpretation will be clearly discussed in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      We appreciate the reviewer’s suggestion regarding the use of Bayesian statistics. We agree that a Bayesian framework can offer valuable complementary insights to our analysis by helping to distinguish whether a marginal p-value represents a trend or truly indicates the absence of an effect. To enhance the robustness of our conclusions, we will incorporate supplemental Bayesian analyses in the revised manuscript.

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      We appreciate the reviewer’s valuable feedback regarding the importance of inter-individual variability in alpha frequency. In our current study, we have already addressed participant-level variability in our neural data by performing inter-subject correlation analyses, investigating whether individual reductions in alpha frequency correlate with broader temporal integration windows at the behavioral level.

      Moreover, our computational model incorporates physiologically realistic distributions for key parameters, including frequency and amplitude, which captures some degree of individual variability. Nevertheless, we acknowledge that a more targeted examination of how different IAF values specifically affect the model’s predictions would be highly valuable. In response, we will expand our simulations to systematically explore a range of IAF values and assess their impact on temporal integration windows and related measures of audiovisual processing. These additional analyses will help clarify the role of inter-individual variability in alpha frequency and further strengthen the mechanistic account offered by our model. We will detail these enhancements and discuss their implications in the revised manuscript.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

      We appreciate the reviewer’s suggestion and acknowledge that non-invasive brain stimulation offers promising avenues for inferring causality. In our study, our primary hypothesis focused on the role of occipital alpha oscillations in defining the temporal window for visual integration, and accordingly we targeted visual cortex in our tACS protocol.

      We recognize that stimulating the auditory cortex could provide additional insights into auditory contributions to phase resetting. However, accurately targeting the auditory cortex with tACS presents technical challenges. The auditory cortex is located deeper within the temporal lobe, and factors such as variable skull thickness and complex current spread make it difficult to reliably modulate its neural activity compared to the more superficial visual areas. Indeed, recent studies have demonstrated that tACS-induced electric fields in the temporal regions tend to be weaker and less focal—for example, Huang et al. (2017) and Opitz et al. (2016) highlight the limitations in achieving robust stimulation of deeper or anatomically complex brain regions using conventional tACS approaches.

      Given these considerations, while we agree that future investigations could benefit from exploring auditory cortex stimulation—either as an alternative or as a complementary approach—the present study remains focused on visual alpha modulation, where our protocol is well validated and yields reliable results. In the revised manuscript, we will clearly discuss these issues and acknowledge the potential, yet technically challenging, possibility of stimulating the auditory cortex in future work to further disentangle the contributions of auditory and visual inputs to cross-modal integration.

    1. eLife assessment

      In this useful study, the authors tested the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes using modeling and behavioral experiments, claiming that bumblebees rely most on ground-views for homing. However, due to a lack of analysis of the bees' behavior during training and a lack of information as to how the homing behavior of bees develops over time, the evidence supporting their claims is currently incomplete. Moreover, there was concern that the experimental environment was not representative of natural scenes, thus limiting the findings of the study.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented. 

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

    3. Reviewer #2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.<br /> The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:<br /> line 51: "Snapshot models perform best with bird's eye views";<br /> line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it.";<br /> line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:<br /> Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      Behavioural analysis:<br /> The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).<br /> In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

    4. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. eLife Assessment

      This important study investigates whether neural prediction of words can be measured through pre-activation of neural network word representations in the brain; solid evidence is provided that neural network representations of neighboring words are correlated in natural language. Therefore, it is crucial to differentiate between neural activity that predicts the upcoming word and neural activity that encodes the current words - information that can be used to predict the upcoming word. The study is of potential interest to researchers investigating language encoding in the brain or in large language models. Additional discussions are needed regarding the distinction between prediction and stimulus dependency and potential methods to distinguish them.

    2. Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

    3. Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

    4. Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      We thank the reviewer for their comments and address both points.

      Conceptually, there is a key difference between encoding predictions, i.e. pre-activations of future words, versus encoding stimulus dependencies. The speech acoustics provide a useful control case: they encode the stimulus (and therefore stimulus dependencies) but do not predict. When we apply the encoding analysis to the acoustics (i.e. when we estimate the acoustics pre-onset from post-onset words), we observe the “hallmarks of prediction” – yet, clearly, the acoustics aren't "predicting" the next word.

      This reveals the methodological issue: if the brain were just passively filtering the stimulus (akin to a speech spectrogram), these "prediction hallmarks" would still appear in the acoustics encoding results, despite no actual prediction taking place. Therefore, one necessary criterion for concluding pre-activation from pre-stimulus neural encoding, is that at least the pre-stimulus encoding performance is better on neural data than on the stimulus itself. This would show that the pre-onset neural signal contains additional predictive information about the next word beyond that of the stimulus (e.g. acoustics) itself. We will make this point more prominent in the revision.

      Regarding the regression: different weights are estimated per time point in a time-resolved regression. This allows for modeling of unfolding responses over time, but also for the learning of stimulus dependencies.

      To sum up, the difference between encoding dependencies and predictions is at the core of our work. We appreciate this was not clear in the initial version and we will make this much clearer in the revision, conceptually and methodologically.

      Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe the limitation we highlight extends beyond the specific method of residualizing features. Rather, it points to a fundamental problem: adjusting the features (X matrix) alone cannot address stimulus dependencies that persist in the signal (y matrix), as we demonstrate by using a different signal (acoustics) that encodes no predictions. While removing dependencies from the signal would be more thorough, this would also eliminate the effect of interest. We view this as a fundamental limitation of the encoding analysis approach combined with the experimental design, rather than something that can be resolved analytically. We will perform additional analyses to test this premise and elaborate on this point in our revision.

      Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for their comments.

      We want to address a key unclarity regarding the procedure of regressing out embedding dependencies. While Goldstein et al. showed that neural encoding results persist after their control analysis (like we did, too, in our supplementary Figure S3), this does not lessen the concern surrounding stimulus dependencies. Our analyses demonstrate that even after such residualization, the "hallmarks of prediction" remain encodable in the speech acoustics – a control system that, by definition, cannot predict upcoming words. Therefore, the hallmarks of prediction can be fully explained by stimulus dependencies. This persistence in the acoustics strengthens rather than lessens our concerns about dependencies.

      This connects to a broader methodological point: our key evidence comes from analyzing the stimulus material itself as a control system. By comparing results from encoding neural responses to those of a system that encodes the stimulus, and therefore the dependencies that cannot predict the upcoming input (like acoustics), we can establish proper criteria for concluding that the brain engages in prediction. Notably, the Goldstein dataset was not available when we conducted this research. However, for the revision we will perform additional analyses to make a more direct comparison.

      Finally, our focus was not to definitively test whether the brain predicts upcoming words, but rather to establish rigorous methodological and epistemological criteria for making such claims. We will elaborate on this crucial distinction in our revision and more prominently feature our central argument about the limitations of current evidence for neural prediction.

    1. eLife Assessment

      This study offers valuable insights into brain responses to words in the visual cortex of blind and sighted individuals. However, the evidence supporting the authors' claims remains incomplete, and the conclusions would benefit from a more comprehensive characterization of the conceptual properties of the word stimuli. This work will be of broad interest to cognitive neuroscientists, psycholinguists, and neurologists investigating meaning representation in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      This fMRI study shows that two regions of the visual cortex (BA18 and BA19) of blind and sighted individuals carry information about the physical similarity of objects denoted by words. This effect was found for written words (Braille in blind, visual in sighted) but not spoken words. The evidence complements earlier studies reporting physical similarity effects in the occipitotemporal cortex of blind and sighted individuals (e.g., Peelen et al., 2014).

      Strengths:

      The study addresses an important question in the fields of neural plasticity and visual cortex organization. The study is generally well-conducted and the findings are clearly presented.

      Weaknesses:

      While the evidence is statistically strong, it is currently incomplete because of missing control analyses (see below). The framing of the results, as arguing against the pluripotent cortex account, is not entirely convincing as it was not clear that the study addressed the key predictions of that account.

      Main comments:

      (1) The study is framed as a test of Bedny's "cognitively pluripotent cortex" proposal (2017) that attributes the increased visual cortex response to linguistic stimuli in blind individuals to high-level cognitive functions. Key evidence for this account came from studies showing increased responses in blind visual cortex to certain grammatical manipulations and to solving mathematical equations. The current study did not include such manipulations. Instead, the current study focused on the representation of objects denoted by single words. Bedny's account did not make a strong argument that the physical similarity of word referents should be differently represented in blind and sighted individuals - if it did, please state this explicitly. Indeed, evidence that (some regions of) the visual cortex represent objects similarly in blind and sighted individuals does not seem incompatible with it.

      (2) Throughout the manuscript (including the abstract) it was not clear what was meant with "visual cortex" or "visual areas"; whether this refers to early visual cortex (V1/BA17) or to visual cortex more generally (e.g., BA17-BA19, occipitotemporal cortex (MT, etc)). This is important for the theoretical arguments and for the interpretation of the results. If visual cortex = BA17, the current results point to potentially important differences between blind and sighted individuals, with the physical similarity of objects only observed in the visual cortex of the blind. If visual cortex is meant to include areas beyond BA17, the blind and sighted show similarities in the current study, although such similarities have been observed before using similar research approaches.

      (3) Related to the point above, the abstract does not accurately describe the results, as it only describes the similarities between blind and sighted but not the differences. The study revealed differences between groups, particularly in BA17 - primary visual cortex. The differences between the groups are also illustrated by the strikingly different searchlight results in the two groups separately (Figure S6). These differences do not reach significance in a whole-brain-corrected contrast, but that likely reflects a lack of power (particularly for a between-group contrast).

      (4) Results were found for written words but not spoken words (Figure S9). This is somewhat surprising considering that the visual cortex was more strongly activated for written words in the sighted, with this activation presumably not adding any information about the physical properties of word referents. Together with the widespread significance of clusters correlating with the physical similarity matrix (Figure 6), this raises the possibility of a confound. It would be good to ensure that this is not the case, e.g., you could create similarity matrices based on word length, word visual similarity (e.g., overlap in letters), and word frequency, and correlate these matrices with the physical similarity matrix to ensure that these correlations are not positive (or if they are, partial it out).

      (5) The study included a task manipulation, with participants either judging physical or conceptual properties. This task manipulation is a central aspect of the design but does not feature anywhere in the results, and is also not discussed or introduced in the text. It would be interesting to know whether the results depend on the property (physical/conceptual) being task-relevant. But more importantly, a potential concern is that the responses in the task (given for each object using a two-response button box) correlate with physical or conceptual similarity and that this explains the fMRI findings. For example, two objects that are elongated would both receive a "yes" button press when participants answer the question "is this elongated"; these objects would also be rated as physically similar. This may apply more to physical than conceptual similarity. To exclude this possibility, the responses need to be analysed and included in the fMRI analyses, either as a regressor in the GLM or as another matrix to be partialed out at the final stage of analysis.

      (4) Many of the blind participants had some residual vision (9/20 had light perception, 2/20 had contour perception); this could possibly have prevented the reorganization of visual cortex.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show, through rigorous and extensive analyses, that the visual cortex in both congenitally blind and sighted participants represented differences between individual words presented across sensory modalities. In both groups, the activation patterns for words in the visual cortex reflected physical, but not conceptual similarity between word referents. This suggests a similar representation for both groups of words, one derived from vision-oriented mechanisms, and does not reflect significant functional reorganization in blindness.

      Strengths:

      The theoretical question is sound, as is the analysis approach. The authors' literature discussion is thorough, and the writing is clear.

      Weaknesses:

      I have only minor concerns left open.

      (1) In the representational connectivity analysis, what is the average value across the brain? The authors compare the representational correlation across brain regions to the average value, but the average itself is not reported.

      (2) Can the authors add a map showing the representational connectivity values across the brain in addition to the bar plot? It would make it easier to see what networks show similar neural representation to the visual cortex.

      (3) Are the participants in the behavioral experiment from which the physical and conceptual similarity between word referents were collected matching in age or education with the fMRI participants?

      (4) Although there are no group differences in the correlation of the physical similarity, I think it is important to acknowledge that the effect is only significant at the searchlight level in the blind early visual cortex (Figure S6).

    4. Reviewer #3 (Public review):

      Summary:

      This study examines semantic processing in the visual cortex of both congenitally blind and sighted individuals using fMRI and multivariate pattern analysis (MVPA). The key finding is that the visual cortex in both groups encodes the physical properties of word referents, rather than their conceptual similarities. These results suggest that the same representational mechanisms operate in both the blind and sighted brain.

      Strengths:

      (1) The findings contribute to a broader understanding of cortical reorganization and provide evidence for top-down processing of word referents, even in the absence of visual experience.

      (2) The experiment incorporates both spoken and written word presentations (Braille for blind participants), ensuring that the results are not confounded by modality effects.

      (3) The study employs a rigorous methodological approach, combining multivariate and univariate analyses to strengthen the validity of its findings.

      (4) The paper is well-structured and clearly written, making it easy to follow.

      Weaknesses:

      (1) The word stimuli consists of only 20 nouns referring to concrete entities. However, in the behavioral experiment, participants rated the physical and conceptual similarity of only 30 word pairs, which represents just a subset of all possible word pair combinations. The average similarity ratings across subjects were then used to construct stimuli similarity matrices, which were correlated with the fMRI similarity matrices in the MVPA analysis. What is the rationale for presenting only a small subset of all possible word pair combinations to participants? Additionally, the instruction to rate the "conceptual similarity" of word pairs seems somewhat ambiguous. Would "conceptual similarity" correlate with "physical similarity"? Instead of subjective ratings, why not use cosine similarity scores from pretrained language models to construct the "conceptual similarity" matrices? This approach could provide a more objective and reproducible measure of conceptual similarity.

      (2) There are only six questions each for assessing the physical and conceptual properties of the words in the fMRI experiment. Most of the physical property questions focus on shape-related attributes (e.g., round, angular, elongated, symmetrical), while the conceptual properties are limited to three pairs of antonyms (living/non-living, natural/manufactured, pleasant/unpleasant). These aspects seem insufficient to comprehensively characterize the physical and conceptual properties of the nouns. What was the rationale behind selecting only these six questions? Could this limited set of attributes introduce bias in how the neural representations in the visual cortex are interpreted?

      (3) Two of the blind participants are right-handed, and two may have some form of contour vision. What was the rationale for including these participants? In addition, the sample size for blind participants is relatively small (N = 20). Does the sample size provide sufficient justification for the main conclusion that the visual cortex in both blind and sighted groups represents the physical properties of word referents? Additionally, could individual differences among blind participants impact the results, and were any analyses conducted to account for such variability?

      (4) I appreciate the authors' effort to integrate both univariate and multivariate approaches in their analyses. However, the results appear somewhat contradictory: The MVPA results suggest similar neural representations of word referents in the visual cortex for both blind and sighted participants. However, the univariate analyses indicate higher activation in the visual cortex of blind participants. How can these two findings be reconciled? The authors attributed the increased activation in the visual cortex of blind participants to their "enhanced excitability", but what exactly does "excitability" mean in this context? Could this increased activation instead reflect an alternative neural strategy for processing semantic information in the blind brain? If so, how does this align with the claim that similar representational mechanisms exist in both blind and sighted individuals?

      (5) The authors interpret their findings to suggest that the visual cortex can represent the physical properties of words even without visual experience, attributing this to top-down modulation from higher cognitive regions, which then backprojects to the visual cortex. However, it is unclear why only physical properties, and not conceptual properties, are backprojected. If higher cognitive regions modulate the visual cortex in a top-down manner, wouldn't both physical and conceptual attributes be expected to influence its activity? Could the authors clarify the mechanism that selectively supports physical property encoding over conceptual representation?

    1. eLife Assessment

      In this important study, the authors advance our understanding of copper uptake by chalkophores and their targeted metalloproteins in Mycobacterium tuberculosis. These convincing data demonstrate that chalkophore-acquired copper is solely incorporated into the Mtb bcc:aa3 copper-iron respiratory oxidase under low copper conditions, and that chalkophore-mediated protection of the respiratory chain is critical to Mtb virulence. These findings may be leveraged for drug discovery and will be of broad interest to those studying bacterial pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      It is known that the nrp operon is induced by copper deprivation and encodes the synthesis of chalkophores. The authors carried out a genetic analysis that revealed transcriptional differences for WT and Mtb∆nrp when exposed to the copper chelator tetrathiomolybdate (TTM). The authors found that copper chelation results in upregulation of genes in the chalkophore cluster as well as genes involved in the respiratory chain: including, components of the heme-dependent oxidase CytBD and subunits of the bcc:aa3 heme-copper oxidase. Utilizing several knockout variants and inhibitors, the authors showed that copper starvation survival requires chalkophore synthesis and that copper starvation results in dysfunctional bcc:aa3 oxidase. By monitoring oxygen consumption, they go on to show that copper deprivation inhibits respiration through the bcc:aa3 oxidase. Lastly, the authors compare virulence of WT Mtb, Mtb∆nrp and MtbΔnrpΔcydAB strains in mice spleen and lung. The Mtb∆nrp strain showed mild attenuation, but virulence in MtbΔnrpΔcydAB was severely attenuated and complementation with the chalkophore biosynthetic pathway restored Mtb virulence. These results suggest that chalkophore mediated protection of the respiratory chain is critical to Mtb virulence, and that redundant respiratory oxidases within Mtb provide respiratory chain flexibility that may promote host adaptation.

      This new information about Mtb biology may be leveraged for drug discovery, highlighting that the Mtb respiratory pathway is a promising drug target, where one may target the Mtb chalkophore biosynthetic pathway in conjunction with CytBD, to obliterate Mtb.

      Strengths: Overall, the paper is very clear and well written, with thorough and well-thought-out experimentation.

      No weaknesses.

      Comments on revisions:

      The authors have addressed all the reviewers' comments.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript that clearly demonstrates that the nrp encoded diisonitrile chalkophore is necessary for function of the bcc-aa3 oxidase supercomplex under low copper conditions. In addition, the study demonstrates the chlakophore is important early during infection when copper sequestration is employed by the host as a method of nutritional immunity.

      Strengths:

      The authors use genetic approaches, including single and double mutants of chalkophore biosynthesis, and both the Mtb oxidases. Use a copper chelators to restrict copper in vitro. A strength of the work was the use of a synthesized a Mtb chalkophore analogue to show chemical complementation of the mutant nrp locus. Oxphos metabolic activity was measured by oxygen consumption and ATP levels. Importantly, the study demonstrated that chalkophore, especially in a strain lacking the secondary oxidase, was necessary for early infection and ruled out a role for adaptive immunity in the chalkophore lacking Mtb by use of SCID mice. It is interesting that after two weeks of infection and onset of adaptive immunity the chalkophore is not required, which is consistent with the host environment switching from a copper restricted to copper overload in phagosomes.

      Weaknesses:

      None noted

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the group of Glickman expand on their previous studies on the function of chalkophores during growth of and infection by Mycobacterium tuberculosis. Previously, the group had shown that chalkophores, which are metallophores specific for the scavenging of copper, are induced by M. tuberculosis under copper deprivation conditions. Here, they show that chalkophores, under copper limiting conditions, are essential for the uptake of copper and maturation of a terminal oxidase, the heme-copper oxidase, cytochrome bcc:aa3. As M. tuberculosis has two redundant terminal oxidases, growth of and infection by M. tuberculosis is only moderated if both the chalkophores and the second terminal oxidase, cytochrome bd, are inhibited.

      Strengths:

      A strength of this work is that the lab-culture experiments are complemented with mice infection models, providing strong indications that host-inflicted copper deprivation is a condition that M. tuberculosis has adapted to for virulence.

      Weaknesses:

      Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the copper-containing cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Comments on revisions:

      I thank the authors for carefully addressing my suggestion to the original submission and congratulate them on their work.

    5. Author response:

      The following is the authors’ response to the original reviews

      Response to public reviews:

      We thank the reviewers for their careful evaluation of our manuscript and appreciate the suggestions for improvement. We will outline our planned revisions in response to these reviews.

      Reviewer 2: “The one exception is the claim that "maintenance of respiration is the only cellular target of chalkophore mediated copper acquisition." While under the in vitro conditions tested this does appear to be the case; however, it can't be ruled out that the chalkophore is important in other situations. In particular, for maintenance of the periplasmic superoxide dismutase, SodC, which is the other M. tuberculosis enzyme known to require copper.”

      And

      Reviewer 3: “Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the coppercontaining cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Both comments concern the question of whether the bcc:aa3 respiratory oxidase supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that bcc:aa3 is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of bcc:aa3) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 oxidase is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting the _bcc:aa3 oxidase, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added this caveat to the discussion of revised version of this manuscript. 

      Response to Reviewers Recommendations for the authors:

      Reviewing Editor Comments:

      In addition to the specific recommendations below, there was consensus that the conclusions/discussion should contextualize that the results cannot exclude that in other conditions (such as in infection), enzymes other than cytochrome bcc:aa3 receive copper from the chalkophore system.

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, the authors mention that the nrp operon is only present in pathogenic Mtb and Mycobacterium marinum but not non-pathogenic mycobacterium. Is the nrp operon present in other pathogenic mycobacterium such as in M. leprae, M. avium or M. abscessus?

      Bhatt et al (PMID 30381350) presented an analysis of the distribution of nrp gene clusters in mycobacteria and concluded that M. bovis, M. leprae and M. canetti clearly encode nrp genes. M. marinum has been shown to have a functional chalkophore biosynthetic cluster, but the presence of this system in other mycobacteria awaits experimental validation. We have added the Bhatt reference to this sentence in the introduction. 

      (2) Figure 1A - it would be helpful if the genes were grouped and labeled as per their purpose (for example, CytBD components, bcc:aa3 components). While these are described in the text, the genes belonging to the chalkophore cluster are not defined in the text, and are thus not easily identified in the figure.

      The order of genes in the heatmap is determined by unsupervised clustering as indicated by the dendrogram to the left of the heatmap. To highlight chalkophore and CytBD genes, we have added color coding to the gene names and explained this color coding in the legend. 

      (3) Figure 2B/2C - it is interesting that complementation of ΔnrpΔcydAB with cydABCD does not rescue growth to Δnrp levels. Is there an explanation for this? 

      AND

      (4) Figure 2C - BCS is not introduced in the text for this figure nor are the results described - which seems like an oversight. It is interesting that BCS treatment does have a full rescue with cydABCD complementation, while TTM treatment does not. Is there an explanation for this?

      We thank the reviewer for raising this issue. We have attempted several different complementation constructs, including CydAB alone and different promoters, to address the partial complementation in question. However, we do not have an adequate explanation for this partial complementation. As the reviewer notes, the partial complementation is only evident with TTM, not BCS. However, we cannot speculate on the reason for this difference at present.  We have added a note to the text in the results section noting this difference. 

      (5) Figure 2F - is there a reason for the change in TTM concentrations (50 μM TTM vs 10 μM TTM)? Is the concentration for Q203 in both single treatment and combinatory tests 100nM?  

      We have clarified the 100nm Q203 concentration in the figure legend. To avoid confusion, we have removed the 50µM TTM condition from panel F because the growth inhibition phenotype of 10µM is shown in panel E and is the comparator for the combined TTM/Q203 condition in panel F. 

      (6) Figure 3A - I assume d0 = day 0, d3 = day 3. This should be defined.

      We have modified the legend to clarify these abbreviations. 

      (7) Figure 4B - as complementation of nrp for ΔnrpΔcydAB returns levels back to WT, I assume there is no attenuation with ΔcydAB alone? Clarification would be appreciated.

      The mouse phenotype of M. tuberculosis D_cydAB_ is reported here:

      https://www.pnas.org/doi/10.1073/pnas.1706139114#sec-1 and this paper is reference 22 of the paper and was noted in the discussion. 

      Reviewer #2 (Recommendations for the authors):

      In vitro conditions that require SodC could reveal a role for the chalkophore (ie., exposure to extracellular or periplasmic superoxide stress under low iron conditions). Some minor confusion exists with the terminology around the two oxidases found in M. tuberculosis. The bcc:aa3 oxidase is a supercomplex between the reductase and oxidase complexes. This point should be clarified in the introduction as the term supercomplex isn't used until later in line 194 and without definition. Referring to the bcc:aa3 supercomplex as an oxidase is fine but is sometimes confusing especially when mentioning the target of Q203 is the oxidase as it targets the reductase portion of the supercomplex.

      We thank the reviewer for this point. We have modified the text to refer to the supercomplex at first mention and modified subsequent mentions to be clearer. 

      In the RNA preparation section boxes appear in several places where spaces should be.

      We do not see these boxes so we suspect this is a conversion error of some type. 

      Reviewer #3 (Recommendations for the authors):

      The authors have very carefully performed their studies and their main conclusions are amply supported by the data. The manuscript is also very clearly written, and easily accessible to a broad audience interested in both bioinorganic chemistry and mycobacteria. I have two recommendations:

      (1) I agree that the evidence shows that chalkophores provide copper to cytochrome bcc:aa3. Under lab-culture conditions, it could well be that, when cytochrome bd is deleted or inhibited, cytochrome bcc:aa3 is rate limiting. Under lab-culture conditions, it is also clear that only the expression of a select number of enzymes is affected. However, this does not mean that cytochrome bcc:aa3 is the ONLY enzyme that receives copper from chalkophores. Thus, under infection conditions, other copper enzymes might be important. For instance, M. tuberculosis expresses a Cu-Zn superoxide dismutase. In summary, perhaps the authors would consider changing the wording of statements such as that in Figure 2E and the conclusions drawn in the discussion.

      This comment concerns the question of whether the bcc:aa3 respiratory supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that the supercomplex is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of the bcc:aa3 supercomplex) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 supercomplex is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting _bcc:aa3, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added the following to the discussion: “Although chalkophore mediated protection of the bcc:aa3 supercomplex is an important virulence function, we cannot exclude the possibility that additional copper dependent enzymes use chalkophore delivered copper during infection.”

      (2) There is a difference between copper-uptake (e.g. by chalkophores) and the maturation of metallo-enzymes. A short paragraph discussing knowledge from other bacteria in this area would help understand the role chalkophores (e.g. see 10.1128/mBio.00065-18 or 10.1111/mmi.14701). This could possibly be extended with a genome analysis to check which other proteins are present in M. tuberculosis.

      We thank the reviewer for this point. We agree that our data does not distinguish between 1) a generic role for the chalkophore in copper uptake, with the ultimate candidate metalloenzyme rendered dysfunctional by copper loss, and 2) the chalkophore being an intrinsic part of the cytochrome maturation pathway and interacting directly with the target enzymes. We have added this point to the discussion but have not otherwise added the suggested full discussion of metalloenzyme maturation as we believe this discussion is beyond the scope of our data. 

      Finally, can I suggest the labels d0 and d3 are made clearer in Figure 3A (and defined in the legend).

      We have modified the legend to be clearer.

    1. eLife Assessment

      This is a valuable study that tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts level of investment in the behavior. The evidence that food-washing is deliberate is compelling and the evidence that individual investment in the behavior varies is solid. Overall, the paper should be of interest to researchers interested in foraging behavior, cognition, and primate evolution.

    2. Reviewer #1 (Public review):

      In this paper, the authors had 2 aims:

      (1) Measure macaques' aversion to sand and see if its' removal is intentional, as it likely in an unpleasurable sensation that causes tooth damage.

      (2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.

      They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.

      The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine grained silicates, and that removing it via brushing or washing is intentional.

      They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.

      High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.

      This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.

      Strengths:

      The field experiment seemed well designed, and their quantification of the physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer that is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.

      In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.

      I commend their approach in trying to develop a quantitative model to generate predictions to compare to empirical data for their second aim.<br /> This is something others should strive for.

      I really appreciated the historical context of this paper in the introduction and found it very enjoyable and easy to read.

      I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

      The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

      Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

      For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

      Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

      I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

      We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it. 

      R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

      Original sentence 

      Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

      Revised sentence

      Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

      R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

      To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018. 

      Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

      We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

      Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations. 

      To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018). 

      Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

      On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

    1. eLife Assessment

      This important study identifies a mechanism by which caspases are activated in a non-lethal context to induce functional modulation in Drosophila olfactory receptor neurons. To deliver, the authors generated a new reporter of caspases, used TurboID to identify proteins proximal of the Drosophila executioner caspases Drice, and then focused on Fasciclin 3 as a mediator. The experimental results and the main conclusions are convincing. This substantial body of work will be of interest to researchers across fields, from neuroscience of olfaction to development and cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results, for the most part, are well controlled and justified.

      Weaknesses:

      The behavioral results shown in Fig. 6 need more explanation and clarification (more details below). As currently shown, the results of Fig. 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      Comments on revisions:

      The authors have adequately addressed my comments.

    3. Reviewer #2 (Public review):

      In this revised version of the study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies. In this revised version of the manuscript the authors addressed the concerns raised by this reviewer in a very thorough way. This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      Comments on revisions:

      I would like to thank the authors for fully addressing my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While the removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results for the most part are wellcontrolled and justified.

      Weaknesses:

      The behavioral results shown in Figure 6 need more explanation and clarification (more details below). As currently shown, the results of Figure 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a technical perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript with a particular focus on Figure 6, as well as the overall presentation of the figure and its description in the legends, in accordance with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewer #2 (Public review):

      In this study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies.

      This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a methodological perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript in line with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewing Editor comments:

      I am pleased to let you know that our reviewers found the results in your paper important and the evidence compelling. There are a few minor comments and a point was raised regarding figure 6 for which further details were asked. Please see the reviewer's comments. We are looking forward to receiving an updated version of your very interesting paper.

      We are grateful to you and the reviewers for dedicating time to review our manuscript and for providing insightful comments and suggestions. We have revised our manuscript in line with the reviewers' feedback. The major revision involves clarifying the two-choice preference assay presented in Figure 6. Details of these revisions are provided in our point-by-point responses to the reviewers’ comments below. The new and extensively modified sections of text are highlighted in blue. We have introduced new panels (Figures 1D, 3D, 6B, and 6C) and made modifications to Figure 6A. The previous Figure 1D has been relocated to Figure 1–figure supplement 1B. Additionally, our detailed responses to the reviewers’ comments are also highlighted in blue within the point-by-point response section. With all concerns and suggestions from the Editor and reviewers addressed, our conclusion—that executioner caspase is proximal to Fasciclin 3 which facilitates non-lethal activation in Drosophila olfactory receptor neurons—is now more robustly supported. We are confident that our revised manuscript makes a significant contribution to the fields of caspase function and neurobiology. We remain hopeful that the reviewers will find it suitable for publication in eLife.

      Reviewer #1 (Recommendations for the authors):

      The main comment here is related to Figure 6, which needs to be better explained. First, if the results in Figure 6B and C are conducted with young flies, why is the preference index close to 0? Aren't these young flies more attracted to ACV? Second, what are the results with Dronc-RNAi and DroncDN alone? These should be shown to more accurately assess the outcome of Fas3G expression with and without Dronc inhibition. Third, if Fas3G overexpression induces non-lethal caspase activation and a behavioral change, why does Dronc inhibition enhance (and not suppress) this behavioral change?

      We sincerely thank the reviewer for the comment. We used one-week-old young flies for the two-choice preference assay. We found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (New Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (New Figure 6B). Since our hypothesis is that non-lethal caspase activation suppresses attraction behavior, and that inhibiting caspase activation could enhance attraction, we used the milder experimental condition for subsequent analyses.

      In the original manuscript, we did not test Dronc inhibition alone because caspase activation is rarely observed in young flies (as demonstrated in Figure 3C, New Figure 3D, etc), suggesting that Dronc inhibition during this stage would not affect behavior. This hypothesis is further supported by previous research showing that inhibition of caspase activity in aged flies restores attraction behavior but does has no effect in young flies (Chihara et al., 2014). To validate this hypothesis, we conducted the two-choice preference assay again, including caspase activity inhibition by Dronc<sup>DN</sup> expression alone. As expected, Dronc inhibition alone did not alter behavior in young flies (New Figure 6C).

      We also observed that Fas3G overexpression promotes a weak, though not statistically significant, enhancement in attraction behavior. Importantly, simultaneous inhibition of caspase activity further enhanced attraction behavior (New Figure 6C). These results suggest that Fas3G overexpression has a dual function: one aspect promotes attraction behavior, while the other induces non-lethal caspase activation. In this context, non-lethal caspase activation appears to counteract the behavioral response, acting as a regulatory brake. To address the reviewer’s comments comprehensively, we included the New Figure 6B and replaced the original Figure 6B and C with New Figure 6C. Additionally, we revised the manuscript text as follows:

      Using a two-choice preference assay with ACV (Figure 6A), we found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (Figure 6B). Under the milder experimental condition, we first confirmed that inhibition of caspase activity through expressing Dronc<sup>DN</sup> didn’t affect attraction behavior in young adult (Figure 6C), consistent with a previous report (Chihara et al., 2014).We then observed that the overexpression of Fas3G, which activates caspases, did not impair attraction behavior. Instead, it rather appeared to enhance the tendency for attraction behavior (Figure 6C), suggesting that Fas3G promotes attraction behavior. Finally, we found that inhibiting Fas3G overexpression-facilitated non-lethal caspase activation by expressing Dronc<sup>DN</sup> strongly promoted attraction to ACV (Figure 6C). Overall, these results suggest that Fas3G overexpression has a dual function: it enhances attraction behavior while also triggering non-lethal caspase activation, which counteracts the behavioral response, functioning as a regulatory brake without causing cell death.

      Other minor comments are below:

      The authors should clarify that while they refer to their caspases reporters as "non-lethal caspase reporters", these are caspase reporters in general and can report both lethal and non-lethal caspase activation. Of course, the only surviving cells are those that experience non-lethal caspase activation.

      We thank the reviewer for pointing this out. This reporter can monitor caspase activation with high sensitivity only if the cell is capable of transcribing and translating the reporter proteins following cleavage of the probe, most likely in living cells. However, as mentioned, using the term “non-lethal reporter” is not accurate, as additional experiments are required to determine whether caspase activation leads to cell death. Therefore, we removed the term “non-lethal” and referred to this reporter simply as a highly sensitive caspase reporter in the revised manuscript.

      Some of the figure panels could be better described in the legends (e.g. Figure 1E, 1F, 4E, 4F).

      We thank the reviewer for the comment. We have included additional explanations in the figure legends throughout the manuscript.

      In Figure 3C, the OL and AL regions should be marked in the figure as done in Figure 1C.

      We thank the reviewer for the comment. We have marked OL and AL regions in Figure 3C and Figure 2A as in Figure 1C.

      In Figures 4A and B, the authors should rearrange the order of the x-axis to reflect the order that appears in the text (Dronc first).

      We thank the reviewer for the comment. We have rearranged the order of labels on the X-axis to reflect the order that appears in the text.

      In Figure 6B, do the colors imply anything? If so, it should be explained. 

      We thank the reviewer for pointing this out. We intended to use the colors where the light blue bars represent Fas3G overexpression, while the red dots indicate caspase-activated conditions. In the New Figure 6C, we used light blue dots for Fas3G overexpression and red bars for caspase-activated conditions. We have added an explanation in the figure legend. In addition, we have removed the colors in Figure 4B and have added an explanation in the figure legend in Figure 4D.  

      Reviewer #2 (Recommendations for the authors):

      (1) For the methods section make a table for the lines, the way they are listed is not the most easy to read.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3.

      (2) Lines 420 to 573, not sure why this is here, this information should be in the figure or figure legend, or make a table if necessary.

      We thank the reviewer for the comment. We have listed the detailed genotypes corresponding to each figure in Table S4.

      (3) Blocking with donkey serum, do you get better results than bovine?

      We have not conducted tests with bovine serum for immunohistochemistry. Donkey serum was used throughout the manuscript.

      (4) The Methods section is very thorough and complete but I recommend the use of tables to organize some of the reagents used.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3 and the detailed genotypes corresponding to each figure in Table S4.

      (5) Line 647 spells out LC-MS/MS.

      We thank the reviewer for pointing this out. We have provided the full spelling as “liquidchromatography-tandem mass spectrometry”.

      (6) Line 808 spells out ACV (apple cider vinegar) and MQ (MilliQ water).

      We thank the reviewer for pointing this out. We have provided the full spelling as suggested.

      (7) Figure 1D. Why do you use only females? 

      We thank the reviewer for pointing this out. In the original manuscript, we analyzed female flies by crossing each Gal4 strain with UAS-Drice-RNAi; Drice::V5::TurboID virgin females. In this case, because Pebbled-Gal4 is located on X chromosome, we could only use female flies for the analysis. To address this, we examined the expression pattern in males flies by crossing each Gal4 virgin female with UAS-Drice-RNAi; Drice::V5::TurboID males. As expected, Drice expression is also mostly depleted when using the ORN-specific Gal4 driver, Pebbled-Gal4, suggesting that Drice expression is predominantly observed in ORNs in males as well. We have added New Figure 1D to present the male data. The original Figure 1D, which presents female data, has been relocated to Figure 1–figure supplement 1B.

      (8) Figure 1D. Be clear about the LN driver used here in the figure.

      We thank the reviewer for pointing this out. We used Orb<sup>0449</sup>-Gal4 driver (#63325, Bloomington Drosophila Stock Center), which has been previously characterized as an LN-specific Gal4 driver (Wu et al., 2017). Accordingly, we have revised “LN-Gal4” to “Orb<sup>0449</sup>-Gal4” throughout the manuscript.

      (9) Figure 1 and Supplementary Figure 1 images are very good. I would recommend the use of a different color palette, to help visualization for colorblind readers (such as this reviewer).

      We apologize for any inconvenience caused. We chose the green/magenta color pair because these are complementary colors, which generally provide better contrast compared to other color pairs. Therefore, we have decided to continue using this pair. To enhance readability, we have intensified the magenta signal in the New Figure 1D and Figure 1–figure supplement 1B. We retained the original magenta signal levels in Figure 1C and Figure 1–figure supplement 1A to avoid oversaturation. Instead, we have kept the Streptavidin-only signal images alongside the color merged images for clarity. We hope these adjustments improve the visualization and help you better interpret the figures.

      (10) Based on Supplementary Figure 1 and based on the fact that Figures 1B and 1C use males, why not used also males for Figure 1D?

      Please refer to our reply to comment #7. We have now included the results for males in the New Figure 1D, which show a similar expression pattern to that observed in females. The results for females originally shown in Figure 1D have been relocated to Figure 1–figure supplement 1B.

      (11) Why were the old versus young flies used for Figure 3 raised at 29C? Why not let the animals age at 25C? The use of 29C throughout the manuscript is not clear.

      We thank the reviewer for pointing this out. Most of the UAS fly strains used in this study, including a Fas3G overexpression line, are UASz lines, which exhibit relatively low expression levels compared to UASt lines (DeLuca and Spradling, 2018). Since the Gal4/UAS system is temperature-dependent (Duffy, 2002), we performed most of the experiments at 29°C to enhance gene expression.

      For the aging experiments, we chose to rear flies at 29°C because higher temperatures accelerate aging including neuronal aging (Okenve-Ramos et al., 2024), allowing for faster experimentation, and 29°C is within the ecologically relevant range of temperatures for Drosophila melanogaster (SotoYéber et al., 2018). Additionally, we confirmed that a subset of olfactory receptor neurons undergo aging-dependent caspase activation at both 29°C and 25°C, as shown in New Figure 3D.

      (12) Why not use an Or42b specific GAL 4 for the aging experiment? What are the odorants that are detected by this ORN? Are any of the odorants behaviorally relevant compounds?

      We thank the reviewer for pointing this out. While the exact odorant detected by Or42b neurons has not been fully determined, these neurons innervate the DM1 region in the antennal lobe, which is activated by ACV. Additionally, Or42b neurons have been shown to be required for attraction behavior to ACV (Semmelhack and Wang, 2009), supporting the relevance of ACV for the behavioral experiment.   We used Or42b-Gal4 to confirm that Or42b neurons undergo aging-dependent caspase activation, which is detectable using the MASCaT system (New Figure 3D). Furthermore, we verified that these neurons exhibit aging-dependent caspase activation at both 25°C and 29°C (New Figure 3D).

      (13) Make the panel lettering in all the figures bigger or bold.

      We thank the reviewer for pointing this out. We have increased the size of the panel lettering and made it bold throughout the figures to improve the readability.

      (14) Line 806. MilliQ water.

      We thank the reviewer for pointing this out. We have ensured that “MilliQ water” is consistently spelled this way throughout the manuscript.

      (15) Figure 6. The authors need to be more clear on the experimental conditions. At what time of the day was this experiment performed? Was the experiment run in DD? Were the flies young or old?

      We thank the reviewer for pointing this out. We performed the assay using one-week-old young flies under constant dark conditions during both the starvation period and the assay. We have added a detailed explanation in the Methods section. For clarity, we have also revised Figure 6A to provide a more detailed explanation of the experimental setup.

      References

      Chihara T, Kitabayashi A, Morimoto M, Takeuchi K-I, Masuyama K, Tonoki A, Davis RL, Wang JW, Miura M. 2014. Caspase inhibition in select olfactory neurons restores innate attraction behavior in aged Drosophila. PLoS Genet 10:e1004437.

      DeLuca SZ, Spradling AC. 2018. Efficient expression of genes in the Drosophila germline using a UAS promoter free of interference by Hsp70 piRNAs. Genetics 209:381–387.

      Duffy JB. 2002. GAL4 system in Drosophila: a fly geneticist’s Swiss army knife. Genesis 34:1–15.

      Okenve-Ramos P, Gosling R, Chojnowska-Monga M, Gupta K, Shields S, Alhadyian H, Collie C, Gregory E, Sanchez-Soriano N. 2024. Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22:e3002504.

      Semmelhack JL, Wang JW. 2009. Select Drosophila glomeruli mediate innate olfactory attraction and aversion. Nature 459:218–223.

      Soto-Yéber L, Soto-Ortiz J, Godoy P, Godoy-Herrera R. 2018. The behavior of adult Drosophila in the wild. PLoS One 13:e0209917.

      Wu B, Li J, Chou Y-H, Luginbuhl D, Luo L. 2017. Fibroblast growth factor signaling instructs ensheathing glia wrapping of Drosophila olfactory glomeruli. Proc Natl Acad Sci U S A 114:7505–7512.

    1. eLife Assessment

      In this useful study, the authors tested the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes using modeling and behavioral experiments, claiming that bumblebees rely most on ground-views for homing. However, due to a lack of analysis of the bees' behavior during training and a lack of information as to how the homing behavior of bees develops over time, the evidence supporting their claims is currently incomplete. Moreover, there was concern that the experimental environment was not representative of natural scenes, thus limiting the findings of the study.

    2. Reviewer #2 (Public review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exit the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information to the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views";<br /> line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it.";<br /> line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows to record the flight behaviour of bees in great spatial and temporal detail and in principle also to reconstruct the visual information available to the bees throughout the arena.

      Modelling: The revised manuscript now presents the results of modelling that includes information potentially available to the bees from the arena wall and in particular from the top edge of the arena.

      As I predicted, this increases the width of rotational image difference functions and therefore provides directional guidance over a larger range of misalignments. However, the authors dismiss the modelling results based on such reconstructed views which more realistically describe the information available to the bumblebees, because (line 291ff): 'Further simulations with a rendered arena wall led to worse results because the agent was mainly led to the centre of the arena (Fig. S17, Fig. S18-21)".

      What the modelling in Fig. 17 actually shows is that the agent is led more or less exactly to the 'entry points' to the arena chosen by the real bees (Fig. 4). The authors ignore this and in their rebuttal state that 'We hypothesised that the arena wall and object location created ambiguity'. The problem here is that you don't remove potential 'ambiguity' for real bees by ignoring information they are unlikely to ignore.

      Behavioural analysis: The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Fig. 17.

      Basically both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      The revised manuscript does not address my concerns. The rebuttal states that a detailed analysis of learning and return flights was 'outside the scope of this particular study', that their experimental design 'does not require the entire history of the bee's trajectory to be tested', that 'the entire flight history...will require...effort...conceptually' and that it would be 'difficult to test a hypothesis'.

      These responses clarify the frustrating problem with this study: The authors are more concerned with testing hypotheses than with trying to understand how bumblebees learn to cope with a situation which constrains their learning choreography and confronts them with the one fundamental problem view-based homing has: repetitive scene elements.

      Homing is an experience-dependent process and to understand what cues the bees used to navigate this set-up requires an analysis of the whole learning process. For instance, it may well be that the B+G+ bees initially did enter the array from above, but subsequently learnt a more efficient route into the array, by simply entering it from the side, followed by 'unguided' searching.

      General: The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses unnatural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Fig. 9-12) when more distant scene elements are excluded.

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved. A nice start would be to build camera-based 3D models of natural bumblebee nest entrance environments and analyse whether there are any particularly unusual challenges for the visual localization of the nest entrance.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small spatial scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest hole within a confined area. While many studies have focused on larger spatial scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing, especially in dense environments as we propose here.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      See the new discussion at lines 192-197

      We agree with your comment about the term "clutter". Therefore, we referred to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      See line 20 and we changed the wording throughout the manuscript and figures.

      Reviewer 1 (Recommendations): 

      The manuscript is well written, nicely designed experiments and well illustrated. I have a few comments below.

      It would be useful to discuss known data of learning flights in bumblebees, and the height or catchment area of their flights. This will allow the reader to compare your exp design to the natural learning flights.

      In our study, we first focused on demonstrating the ability to solve a homing task in a dense environment. As we observed the bees returning within the dense environment and not from above it (contrary to the model predictions), we investigated whether they flew above it during their first flights. The bees did indeed fly above, demonstrating their ability to ascend and descend within the constellation of objects (see Supplementary Material Fig. 22).

      In nature, the learning flight of bumblebees may cover several decametres, with the loops performed during these flights increasing with flight time (e.g. Osborne et al. 2013; Woodgate et al. 2016). A similar pattern can be observed on a smaller spatial scale (e.g. Philippides et al. 2013). Similar to the loops that extend over time, the bees gradually gain altitude (Lobecke et al., 2018). However, these observations come from studies where few conspicuous objects surround the nest entrance.

      Although our study  focussed on the performance in goal finding in cluttered environments, we now also address the issue of learning flights in the discussion, as learning flights are the scaffolding of visual learning. We have already conducted several learning flight experiments to fill the knowledge gap mentioned above. These will allow us in a forthcoming paper to compare learning flights in this environment with the existing literature (Sonntag et al., 2024).

      We added a reference to this in the discussion (lines 218-219 and 269-272)

      Include bumblebee in the title rather than 'bees'.

      We adapted the title accordingly:

      “Switching perspective: Comparing ground-level and bird’s-eye views for bumblebees navigating dense environments”

      I found switching between bird-views and frog-views to explain bee-views slightly tricky to read. Why not use 'ground-views', which you already have in the title?

      We agree and adapted the wording in the manuscript according to this suggestion.

      I am not convinced there is evidence here to suggest the bees do not use view-based navigation, because of the following: In L66: unclear what were the views centred around, I assume it is the nest. Is 45cm above the ground the typical height gained by bumblebees during learning flight? The clutter seems to be used more as an obstacle that they are detouring to reach the goal, isn't it?

      Based on many previous studies, view-based navigation can be assumed to be one of the plausible mechanisms bees use for homing (Cartwright & Collett, 1987; Doussot et al., 2020; Lehrer & Collett, 1994; Philippides et al., 2013; Zeil, 2022). In our tests, when the dense environment was shifted to a different position in the flight arena, almost no bees searched at the real location of the nest entrance but at the fictive new location within the dense environment, indicating that the bees assumed  the nest to be located within the dense environment, and therefore  that vision played a crucial role for homing. We thus never meant that the bees were not using view-based navigation. We clarified this point in the revised manuscript.

      See lines 247-248, 250-259, added visual memory to schematic in Fig. 6

      In our model simulations, the memorised snapshots were centred around the nest. However, we found that a multi-snapshot model could not explain the behaviour of the bees. This led us to suggest that bees likely employ acombination of multiple mechanisms for navigation.

      We refined paragraph about possible alternative homing mechanisms. See lines  218-263

      The height of learning flights has not been extensively investigated in previous studies, and typical heights are not well-documented in the literature. However, from our observations of the first outbound flights of bumblebees within the dense environment, we noted that they quickly increased their altitude and then flew above the objects. Since the objects had a height of 0.3 metres, we chose 0.45 metres as a height above the objects for our study.

      Furthermore, the nest is positioned within the arrangement of objects, making it a target the bees must actively find rather than detour around.

      I think a discussion to contrast your findings with Murray and Zeil 2017 will be useful. It was unclear to me whether the flight arena had UV availability, if it didn't, this could be a reason for the difference.

      We referred to this study in the discussion of the revised paper (see our response to the public review). Lines 192-197

      As in most lab studies on local homing, the bees did not have UV light available in the arena. Even without this, they were successful in finding their nest position during the tests. We clarified that in the revised manuscript. See line 334-336

      Figure 2A, can you add a scale bar?

      We added a scale bar to the figure showing the dimensions of the arena. See Fig. 2

      The citation of figure orders is slightly off. We have Figure 5 after Figure 2, without citing Figures 3 and 4. Similarly for a few others.

      We carefully checked the order of cited figures and adapted them.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions: line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the dense environment but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing (neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we included model results with the arena wall in the supplements of the revised paper. See lines 290-293, Figures S17-21

      We agree that the catchment volumes would provide quantitatively more detailed information as catchment slices. Nevertheless, since our goal was  to investigate if bees would use ground views or bird's eye views to home in a dense environment, catchment slices, which provide qualitatively similar information as catchment volumes, are sufficient to predict whether ground or bird's-eye views perform better in leading to the nest. Therefore, we did not include further computations of catchment volumes. (ll. 296-297)

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17. Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments(Baddeley et al., 2012; Dittmar et al., 2010; Doussot et al., 2020; Möller, 2012; Wystrach et al., 2011, 2013; Zeil, 2012). A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Since we observed behavioural responses different from the one suggested by the models, it becomes interesting to look at the flight history. If we had found an alignment between the model and the behaviour, looking at thehistory would have become much less interesting. Thus our results raise an interest in looking at the entire flight history, which will require not only effort on the recording procedure, but as well conceptually. At the moment the underlying mechanisms of learning during outbound, inbound, exploration, or orientation flight remains evasive and therefore difficult to test a hypothesis. A detailed description of the flight during the entire bee history would enable us to speculate alternative models to the one tested in our study, but would remain limited in testing those.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the dense environment.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled laboratory conditions. Both field and laboratory research are necessary and should complement each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of the components of the environment for the behaviour through targeted variation of them. These results yield precious information to then guide future field-based experiments for validation.

      Our laboratory settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was based on the knowledge that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and now refer to the  environment as a "dense environment."

      We changed the wording throughout the manuscript and figures.

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factors inherent to field work conditions, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious assessments of catchment areas in the context of local homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

      Reviewer 2 (Recommendations):

      (1) Clarify what is meant by modelling panoramic images at 1cm intervals (only?) along the x-axis of the arena.

      The panoramic images were taken along a grid with 0.5cm steps within the dense environment and 1cm steps in the rest of the arena. A previous study (Doussot et al., 2020) showed successful homing of multi-snapshot models in an environment of similar scale with a grid with 2cm steps. Therefore, we think that our scaling is sufficiently fine. We apologise for the missing information in the method section and added it to the revised manuscript. See lines 286-287

      (2) In Figures 9-12 what are the memory0 to memory7 locations and reference image orientations? Explain clearly which image comparisons generated the rotIDFs shown.

      Memory 0 to memory 7 are examples of the eight memorised snapshots, which are aligned in the nest direction and taken around the nest. In the rotIDFs shown, we took memory 0 as a reference image, and compared the 7 others by rotating them against memory 0. We clarified that in the revised manuscript.

      See revised figure caption in Fig. S9 – 16

      (3) Figure 9 seems to compare 'bird's-eye', not 'frog's-eye' views.

      We apologise for that mistake and carefully double-checked the figure caption.

      See revised figure caption Fig. S9

      (4) Why do you need to invoke a PI vector (Figure 6) to explain your results?

      Since the bees were able to home in the dense environment without entering the object arrangement from above but from the side, image matching alone could not explain the bees’ behaviour. Therefore, we suggest, as an hypothesis for future studies, a combination of mechanisms such as a home vector. Other alternatives, perhaps without requiring a PI vector, may explain the bees’ behaviour, and we will welcome any future contributions from the scientific community.

      References

      Baddeley, B., Graham, P., Husbands, P., & Philippides, A. (2012). A Model of Ant Route Navigation Driven by Scene Familiarity. PLoS Computational Biology,8(1), e1002336. https://doi.org/10.1371/journal.pcbi.1002336

      Capaldi, E. A., Smith, A. D., Osborne, J. L., Farris, S. M., Reynolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., & Riley, J. R. (2000).

      Ontogeny of orientation flight in the honeybee revealed by harmonic radar. Nature, 403. https://doi.org/10.1038/35000564

      Cartwright, B. A., & Collett, T. S. (1987). Landmark maps for honeybees. Biological Cybernetics, 57(1), 85–93. https://doi.org/10.1007/BF00318718

      Dittmar, L., Stürzl, W., Baird, E., Boeddeker, N., & Egelhaaf, M. (2010). Goal seeking in honeybees: Matching of optic flow snapshots? Journal of Experimental Biology, 213(17), 2913–2923. https://doi.org/10.1242/jeb.043737

      Doussot, C., Bertrand, O. J. N., & Egelhaaf, M. (2020). Visually guided homing of bumblebees in ambiguous situations: A behavioural and modelling study. PLoS Computational Biology, 16(10). https://doi.org/10.1371/journal.pcbi.1008272

      Lehrer, M., & Collett, T. S. (1994). Approaching and departing bees learn different cues to the distance of a landmark. Journal of Comparative Physiology A, 175(2), 171–177. https://doi.org/10.1007/BF00215113

      Lobecke, A., Kern, R., & Egelhaaf, M. (2018). Taking a goal-centred dynamic snapshot as a possibility for local homing in initially naïve bumblebees. Journal of Experimental Biology, 221(2), jeb168674. https://doi.org/10.1242/jeb.168674

      Möller, R. (2012). A model of ant navigation based on visual prediction. Journal of Theoretical Biology, 305, 118–130. https://doi.org/10.1016/j.jtbi.2012.04.022

      Murray, T., & Zeil, J. (2017). Quantifying navigational information: The catchment volumes of panoramic snapshots in outdoor scenes. PLOS ONE, 12(10), e0187226. https://doi.org/10.1371/journal.pone.0187226

      Osborne, J. L., Smith, A., Clark, S. J., Reynolds, D. R., Barron, M. C., Lim, K. S., & Reynolds, A. M. (2013). The ontogeny of bumblebee flight trajectories: From Naïve explorers to experienced foragers. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0078681

      Philippides, A., de Ibarra, N. H., Riabinina, O., & Collett, T. S. (2013). Bumblebee calligraphy: The design and control of flight motifs in the learning and return flights of Bombus terrestris. Journal of Experimental Biology, 216(6), 1093–1104. https://doi.org/10.1242/jeb.081455

      Sonntag, A., Lihoreau, M., Bertrand, O. J. N., & Egelhaaf, M. (2024). Bumblebees increase their learning flight altitude in dense environments. bioRxiv, 2024.10.14.618154. https://doi.org/10.1101/2024.10.14.618154

      Woodgate, J. L., Makinson, J. C., Lim, K. S., Reynolds, A. M., & Chittka, L. (2016). Life-long radar tracking of bumblebees. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0160333

      Wystrach, A., Mangan, M., Philippides, A., & Graham, P. (2013). Snapshots in ants? New interpretations of paradigmatic experiments. Journal of Experimental Biology, 216(10), 1766–1770. https://doi.org/10.1242/jeb.082941

      Wystrach, A., Schwarz, S., Schultheiss, P., Beugnon, G., & Cheng, K. (2011). Views, landmarks, and routes: How do desert ants negotiate an obstacle course? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 197(2), 167–179. https://doi.org/10.1007/s00359-010-0597-2

      Zeil, J. (2012). Visual homing: An insect perspective. Current Opinion in Neurobiology, 22(2), 285–293. https://doi.org/10.1016/j.conb.2011.12.008

      Zeil, J. (2022). Visual navigation: Properties, acquisition and use of views. Journal of Comparative Physiology A. https://doi.org/10.1007/s00359-022-01599-2

    1. eLife Assessment

      These important findings detail the role of Pim1 and Pim2 in controlling the behaviour and activity of 'killer' T cells; a vital cell within of our immune system. The authors capitalized on high resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to provide compelling evidence for how the PIM1/2 kinases control TCR-driven activation and IL-2/IL-15-driven proliferation and differentiation into effector T cells. It's also noteworthy that Pim1/Pim2 impact is better revealed through quantitative proteomics than transcriptomics.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      The study focuses on PIM1 and 2 in CD8 T cell activation and differentiation. These two serine/threonine kinases belong to a large network of Serine/Threonine kinases that acts following engagement of the TCR and of cytokine receptors and phosphorylates proteins that control transcriptional, translational and metabolic programs that result in effector and memory T cell differentiation. The expression of PIM1 and PIM2 is induced by the T-cell receptor and several cytokine receptors. The present study capitalized on high-resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to decipher how the PIM1/2 kinases control TCR-driven activation and IL-2/IL-15-driven proliferation, and differentiation into effector T cells.

      Quantitative mass spectrometry-based proteomics analysis of naïve OT1 CD8 T cell stimulated with their cognate peptide showed that the PIM1 protein was induced within 3 hours of TCR engagement and its expression was sustained at least up to 24 hours. The kinetics of PIM2 expression was protracted as compared to that of PIM1. Such TCR-dependent expression of PIM1/2 correlated with the analysis of both Pim1 and Pim2 mRNA. In contrast, Pim3 mRNA was only expressed at very low levels and the PIM3 protein not detected by mass spectrometry. Therefore, PIM1 and 2 are the major PIM kinases in recently activated T cells. Pim1/Pim2 double knockout (Pim dKO) mice were generated on a B6 background and found to express lower number of splenocytes. No difference in TCR/CD28-driven proliferation was observed between WT and Pim dKO T cells over 3 days in culture. Quantitative proteomics of >7000 proteins further revealed no substantial quantitative or qualitative differences in protein content or proteome composition. Therefore, other signaling pathways can compensate for the lack of PIM kinases downstream of TCR activation.

      Considering that PIM1 and PIM2 kinase expression is regulated by IL-2 and IL-15, antigen-primed CD8 T cells were expanded in IL-15 to generate memory phenotype CD8 T cells or expanded in IL-2 to generate effector cytotoxic T lymphocytes (CTL). Analysis of the survival, proliferation, proteome, and transcriptome of Pim dKO CD8 T cells kept for 6 days in IL-15 showed that PIM1 and PIM2 are dispensable to drive the IL-15-mediated metabolic or differentiation programs of antigen-primed CD8 T cells. Moreover, Pim1/Pim2-deficiency had no impact on the ability of IL-2 to maintain CD8 T cell viability and proliferation. However, WT CTL downregulated expression of CD62L whereas the Pim dKO CTL sustained higher CD62L expression. Pim dKO CTL were also smaller and less granular than WT CTL. Comparison of the proteome of day 6 IL-2 cultured WT and Pim dKO CTL showed that the latter expressed lower levels of the glucose transporters, SLC2A1 and SLC2A3, of a number of proteins involved in fatty acid and cholesterol biosynthesis, and CTL effector proteins such as granzymes, perforin, IFNg and TNFa. Parallel transcriptomics analysis showed that the reduced expression of perforin and some granzymes correlated with a decrease in their mRNA whereas the decreased protein levels of granzymes B and A, and of the glucose transporters SLC2A1 and SLC2A3 did not correspond with decreased mRNA expression. Therefore, PIM kinases are likely required for IL-2 to maximally control protein synthesis in CD8 CTL. Along that line, the translational repressor PDCD4 was increased in Pim dKO CTL and pan-PIM kinase inhibitors caused a reduction in protein synthesis rates in IL-2 expanded CTL. Finally, the differences between Pim dKO and WT CTL in terms of CD62L expression resulted in that Pim dKO CTL but not WT CTL retained the capacity to home to secondary lymphoid organs. In conclusion, this thorough and solid study showed that the PIM1/2 kinases shape the effector CD8 T cell proteomes rather than transcriptomes and are important mediators of IL2-signalling and CD8 T cell trafficking.

      Weaknesses: None

      Comments on revisions:

      The authors have been able to provide in their rebuttal letter fair answers to most of the queries primarily raised by Reviewer 2 and they have incorporated the corresponding results in the revised text. It makes the paper stronger.

    3. Reviewer #2 (Public review):

      Summary:

      Using a suite of techniques (e.g., RNA seq, proteomics, and functional experiments ex vivo) this paper extensively focuses on the role of PIM1/2 kinases during CD8 T-cell activation and cytokine-driven (i.e., IL-2 or IL-15) differentiation. The authors key finding is that PIM1/2 enhance protein synthesis in response to IL-2 stimulation, but not IL-15, in CD8+ T cells. Loss of PIM1/2 made T cells less 'effector-like', with lower granzyme and cytokine production, and a surface profile that maintained homing towards secondary lymphoid tissue. The cytokines the authors focus on are IL-15 and Il-2, which drive naïve CD8 T cells towards memory or effector states, respectively. Although PIM1/2 are upregulated in response to T-cell activation and cytokine stimulation (e.g., IL-15, and to a greater extent, IL-2), using T cells isolated from a global mouse genetic knockout background of PIM1/2, the authors find that PIM1/2 did not significantly influence T-cell activation, proliferation, or expression of anything in the proteome under anti-CD3/CD28 driven activation with/without cytokine (i.e., IL-15) stimulation ex vivo. This is perhaps somewhat surprising given PIM1/2 are upregulated, albeit to a small degree, in response to IL-15, and yet PIM1/2 did not seem to influence CD8+ T cell differentiation towards a memory state. Even more surprising is that IL-15 was previously shown to influence the metabolic programming of intestinal intraepithelial lymphocytes, suggesting cell-type specific effects from PIM kinases. What the authors went on to show, however, is that PIM1/2 KO altered CD8 T cell proteomes in response to IL-2. Using proteomics, they saw increased expression of homing receptors (i.e., L-selectin, CCR7), but reduced expression of metabolism-related proteins (e.g., GLUT1/3 & cholesterol biosynthesis) and effector-function related proteins (e.g., IFNy and granzymes). Rather neatly, by performing both RNA-seq and proteomics on the same IL-2 stimulated WT vs. PIM1/2 KO cells, the authors found that changes at the proteome level were not corroborated by differences in RNA uncovering that PIM1/2 predominantly influence protein synthesis/translation. Effectively, PIM1/2 knockout reduced the differentiation of CD8+ T cells towards an effector state. In vivo adoptive transfer experiments showed that PIM1/2KO cells homed better to secondary lymphoid tissue, presumably owing to their heightened L-selectin expression (although this was not directly examined).

      Strengths:

      Overall, I think the paper is scientifically good, and I have no major qualms with the paper. The paper as it stands is solid, and while the experimental aim of this paper was quite specific/niche, it is overall a nice addition to our understanding of how serine/threonine kinases impact T cell state, tissue homing, and functionality. Of note, they hint towards a more general finding that kinases may have distinct behaviour in different T-cell subtypes/states. I particularly liked their use of matched RNA-seq and proteomics to first suggest that PIM1/2 kinases may predominantly influence translation (then going on to verify this via their protein translation experiment - although I must add this was only done using PIM kinase inhibitors not the PIM1/2KO cells). I also liked that they used small molecule inhibitors to acutely reduce PIM1/2 activity, which corroborated some of their mouse knockout findings - this experiment helps resolve any findings resulting from potential adaptation issues from the PIM1/2 global knockout in mice but also gives it a more translational link given the potential use of PIM kinase inhibitors in the clinic. The proteomics and RNA seq dataset may be of general use to the community, particularly for analysis of IL-15 or IL-2 stimulated CD8+ T cells.

      Weaknesses:

      None. My comments here have been addressed in the previous review.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful reading of our manuscript and their considered feedback. Please see our detailed response to reviewer comments inset below.

      In addition to requested modifications we have also uploaded the proteomics data from 2 of the experiments contained within the manuscript onto the Immunological Proteome Resource (ImmPRes) website: immpres.co.uk making the data available in an easy-to-use graphical format for interested readers to interrogate and explore. We have added the following text to the data availability section (lines 1085-1091) to indicate this:

      “An easy-to-use graphical interface for examining protein copy number expression from the 24-hour TCR WT and Pim dKO CD4 and CD8 T cell proteomics and IL-2 and IL-15 expanded WT and Pim dKO CD8 T cell proteomics datasets is also available on the Immunological Proteome Resource website: immpres.co.uk (Brenes et al., 2023) under the Cell type(s) selection: “T cell specific” and Dataset selection: “Pim1/2 regulated TCR proteomes” and “Pim1/2 regulated IL2 or IL15 CD8 T cell proteomes”.”

      As well as indicating in figure legends where proteomics datasets are first introduced in Figures 1, 2 and 4 with the text:

      “An interactive version of the proteomics expression data is available for exploration on the Immunological Proteome Resource website: immpres.co.uk

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      The study focuses on PIM1 and 2 in CD8 T cell activation and differentiation. These two serine/threonine kinases belong to a large network of Serine/Threonine kinases that acts following engagement of the TCR and of cytokine receptors and phosphorylates proteins that control transcriptional, translational and metabolic programs that result in effector and memory T cell differentiation. The expression of PIM1 and PIM2 is induced by the T-cell receptor and several cytokine receptors. The present study capitalized on high-resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to decipher how the PIM1/2 kinases control TCRdriven activation and IL-2/IL-15-driven proliferation, and differentiation into effector T cells.

      Quantitative mass spectrometry-based proteomics analysis of naïve OT1 CD8 T cell stimulated with their cognate peptide showed that the PIM1 protein was induced within 3 hours of TCR engagement, and its expression was sustained at least up to 24 hours. The kinetics of PIM2 expression was protracted as compared to that of PIM1. Such TCRdependent expression of PIM1/2 correlated with the analysis of both Pim1 and Pim2 mRNA. In contrast, Pim3 mRNA was only expressed at very low levels and the PIM3 protein was not detected by mass spectrometry. Therefore, PIM1 and 2 are the major PIM kinases in recently activated T cells. Pim1/Pim2 double knockout (Pim dKO) mice were generated on a B6 background and found to express a lower number of splenocytes. No difference in TCR/CD28-driven proliferation was observed between WT and Pim dKO T cells over 3 days in culture. Quantitative proteomics of >7000 proteins further revealed no substantial quantitative or qualitative differences in protein content or proteome composition. Therefore, other signaling pathways can compensate for the lack of PIM kinases downstream of TCR activation.

      Considering that PIM1 and PIM2 kinase expression is regulated by IL-2 and IL-15, antigen-primed CD8 T cells were expanded in IL-15 to generate memory phenotype CD8 T cells or expanded in IL-2 to generate effector cytotoxic T lymphocytes (CTL). Analysis of the survival, proliferation, proteome, and transcriptome of Pim dKO CD8 T cells kept for 6 days in IL-15 showed that PIM1 and PIM2 are dispensable to drive the IL-15mediated metabolic or differentiation programs of antigen-primed CD8 T cells. Moreover, Pim1/Pim2-deficiency had no impact on the ability of IL-2 to maintain CD8 T cell viability and proliferation. However, WT CTL downregulated the expression of CD62L whereas the Pim dKO CTL sustained higher CD62L expression. Pim dKO CTL was also smaller and less granular than WT CTL. Comparison of the proteome of day 6 IL-2 cultured WT and Pim dKO CTL showed that the latter expressed lower levels of the glucose transporters, SLC2A1 and SLC2A3, of a number of proteins involved in fatty acid and cholesterol biosynthesis, and CTL effector proteins such as granzymes, perforin, IFNg, and TNFa. Parallel transcriptomics analysis showed that the reduced expression of perforin and some granzymes correlated with a decrease in their mRNA whereas the decreased protein levels of granzymes B and A, and the glucose transporters SLC2A1 and SLC2A3 did not correspond with decreased mRNA expression. Therefore, PIM kinases are likely required for IL-2 to maximally control protein synthesis in CD8 CTL. Along that line, the translational repressor PDCD4 was increased in Pim dKO CTL and pan-PIM kinase inhibitors caused a reduction in protein synthesis rates in IL-2expanded CTL. Finally, the differences between Pim dKO and WT CTL in terms of CD62L expression resulted in Pim dKO CTL but not WT CTL retained the capacity to home to secondary lymphoid organs. In conclusion, this thorough and solid study showed that the PIM1/2 kinases shape the effector CD8 T cell proteomes rather than transcriptomes and are important mediators of IL2-signalling and CD8 T cell trafficking.

      Weaknesses:

      None identified by this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      Using a suite of techniques (e.g., RNA seq, proteomics, and functional experiments ex vivo) this paper extensively focuses on the role of PIM1/2 kinases during CD8 T-cell activation and cytokine-driven (i.e., IL-2 or IL-15) differentiation. The authors' key finding is that PIM1/2 enhances protein synthesis in response to IL-2 stimulation, but not IL-15, in CD8+ T cells. Loss of PIM1/2 made T cells less 'effector-like', with lower granzyme and cytokine production, and a surface profile that maintained homing towards secondary lymphoid tissue. The cytokines the authors focus on are IL-15 and Il-2, which drive naïve CD8 T cells towards memory or effector states, respectively. Although PIM1/2 are upregulated in response to T-cell activation and cytokine stimulation (e.g., IL-15, and to a greater extent, IL-2), using T cells isolated from a global mouse genetic knockout background of PIM1/2, the authors find that PIM1/2 did not significantly influence T-cell activation, proliferation, or expression of anything in the proteome under anti-

      CD3/CD28 driven activation with/without cytokine (i.e., IL-15) stimulation ex vivo. This is perhaps somewhat surprising given PIM1/2 is upregulated, albeit to a small degree, in response to IL-15, and yet PIM1/2 did not seem to influence CD8+ T cell differentiation towards a memory state. Even more surprising is that IL-15 was previously shown to influence the metabolic programming of intestinal intraepithelial lymphocytes, suggesting cell-type specific effects from PIM kinases. What the authors went on to show, however, is that PIM1/2 KO altered CD8 T cell proteomes in response to IL-2. Using proteomics, they saw increased expression of homing receptors (i.e., L-selectin, CCR7), but reduced expression of metabolism-related proteins (e.g., GLUT1/3 & cholesterol biosynthesis) and effector-function related proteins (e.g., IFNy and granzymes). Rather neatly, by performing both RNA-seq and proteomics on the same IL2 stimulated WT vs. PIM1/2 KO cells, the authors found that changes at the proteome level were not corroborated by differences in RNA uncovering that PIM1/2 predominantly influence protein synthesis/translation. Effectively, PIM1/2 knockout reduced the differentiation of CD8+ T cells towards an effector state. In vivo adoptive transfer experiments showed that PIM1/2KO cells homed better to secondary lymphoid tissue, presumably owing to their heightened L-selectin expression (although this was not directly examined).

      Strengths:

      Overall, I think the paper is scientifically good, and I have no major qualms with the paper. The paper as it stands is solid, and while the experimental aim of this paper was quite specific/niche, it is overall a nice addition to our understanding of how serine/threonine kinases impact T cell state, tissue homing, and functionality. Of note, they hint towards a more general finding that kinases may have distinct behaviour in different T-cell subtypes/states. I particularly liked their use of matched RNA-seq and proteomics to first suggest that PIM1/2 kinases may predominantly influence translation (then going on to verify this via their protein translation experiment - although I must add this was only done using PIM kinase inhibitors, not the PIM1/2KO cells). I also liked that they used small molecule inhibitors to acutely reduce PIM1/2 activity, which corroborated some of their mouse knockout findings - this experiment helps resolve any findings resulting from potential adaptation issues from the PIM1/2 global knockout in mice but also gives it a more translational link given the potential use of PIM kinase inhibitors in the clinic. The proteomics and RNA seq dataset may be of general use to the community, particularly for analysis of IL-15 or IL-2 stimulated CD8+ T cells.

      We thank the reviewer for their comments supporting the robustness and usefulness of our data.

      Weaknesses:

      It would be good to perform some experiments in human T cells too, given the ease of e.g., the small molecule inhibitor experiment.

      The suggestions to check PIM inhibitor effects in human T cell is a good one. We think an ideal experiment would be to use naïve cord blood derived CD4 and CD8 cells as a model to avoid the impact of variability in adult PBMC and to really look at what PIM kinases do as T cells first respond to antigen and cytokines. In this context there is good evidence that the signalling pathways used by antigen receptors or the cytokines IL-2 and IL-15 are not substantially different in mouse and human. We have also previously compared proteomes of mouse and human IL-2 expanded cytotoxic T cells and they are remarkably similar. As such we feel that mature mouse CD8 T cells are a genetically tractable model to use to probe the signalling pathways that control cytotoxic T cell function. To repeat the full set of experiments observed within this study with human T cells would represent 1-year of work by an experienced postdoctoral fellow.

      Unfortunately, the funding for the project has come to an end and there is no capacity to complete this work.

      Would also be good for the authors to include a few experiments where PIM1/2 have been transduced back into the PIM1/2 KO T cells, to see if this reverts any differences observed in response to IL-2 - although the reviewer notes that the timeline for altering primary T cells via lentivirus/CRISPR may be on the cusp of being practical such that functional experiments can be performed on day 6 after first stimulating T cells.

      A rescue experiment could indeed be informative, though of course comes with challenges/caveats with re-expressing both proteins that have been deleted at once and ability to control the level of PIM kinase that is re-expressed. This work using the Pim dKO mice was performed from 2019-2021 and was seriously impacted by the work restrictions during the COVID19 pandemic. We had to curtail all mouse colonies to allow animal staff to work within the legal guidelines. We had to make choices and the Pim1/2 dKO colony was stopped because we felt we had generated very useful data from the work but could not justify continued maintenance of the colony at such a difficult time. As such we no longer have this mouse line to perform these rescue experiments.

      We have however, performed a limited number of retroviral overexpression studies in WT IL-2-expanded CTL, where T cells were transfected after 24 hours activation and phenotype measured on day 6 of culture. We chose to leave these out of the initial manuscript as these were overexpression under conditions where PIM expression was already high, rather than a true test of the ability of PIM1 or PIM2 to rescue the Pim dKO phenotype. A more robust test would also have required doing these overexpression experiments in IL-15 expanded or cytokine deprived CTL where PIM kinase expression is low, however, we ran out of time and funding to complete this work.

      We have provided Author response image 1 below from the experiments performed in the IL-2 CTL for interested readers. The limited experiments that were performed do support some key phenotypes observed with the Pim dKO mice or PIM inhibitors, finding that PIM1 or PIM2 overexpression was sufficient to increase S6 phosphorylation, and provided a small further increase in GzmB expression above the already very high levels in IL-2 expanded CTL.

      Author response image 1.

      PIM1 or PIM2 overexpression drives increased GzmB expression and S6 phosphorylation in WT IL-2 CTL. OT1 lymph node cell suspensions were activated for 24 hours with SIINFEKL peptide (10 ng/mL), IL-2 (20 ng/mL) and IL-12 (2 ng/mL) then transfected with retroviruses to drive expression of PIM1-GFP, PIM2-GFP fusion proteins or a GFP only control. T cells were split into fresh media and IL-2 daily and (A) GzmB expression and (B) S6 phosphorylation assessed by flow cytometry in GFP+ve vs GFP-ve CD8 T cells 5 days post-transfection (i.e. day 6 of culture). Histograms are representative of 2 independent experiments.

      Other experiments could also look at how PIM1/2 KO influences the differentiation of T cell populations/states during ex vivo stimulation of PBMCs or in vivo infection models using (high-dimensional) flow cytometry (rather than using bulk proteomics/RNA seq which only provide an overview of all cells combined).

      We did consider the idea of in vivo experiments with the Pim1/2 dKO mice but rejected this idea as the mice have lost PIM kinases in all tissues and so we would not be able to understand if any phenotype was CD8 T cell selective. To note the Pim1/2 dKO mice are smaller than normal wild type mice (discussed further below) and clearly have complex phenotypes. An ideal experiment would be to make mice with floxed Pim1 and Pim2 alleles so that one could use cre recombinase to make a T cell-specific deletion and then study the impact of this in in vivo models. We did not have the budget or ethical approval to make these mice. Moreover, this study was carried out during the COVID pandemic when all animal experiments in the UK were severely restricted. So our objective was to get a molecular understanding of the consequences of losing theses kinases for CD8 T cells focusing on using controlled in vitro systems. We felt that this would generate important data that would guide any subsequent experiments by other groups interested in these enzymes.

      We do accept the comment about bulk population proteomics. Unfortunately, single cell proteomics is still not an option at this point in time. High resolution multidimensional flow cytometry is a valuable technique but is limited to looking at only a few proteins for which good antibodies exist compared to the data one gets with high resolution proteomics.

      Alongside this, performing a PCA of bulk RNA seq/proteomes or Untreated vs. IL-2 vs. IL-15 of WT and PIM1/2 knockout T cells would help cement their argument in the discussion about PIM1/2 knockout cells being distinct from a memory phenotype.

      We thank the reviewer for this very good suggestion. We have now included PCAs for the RNAseq and proteomics datasets of IL-2 and IL-15 expanded WT vs Pim dKO CTL in Fig S5 and added the following text to the discussion section of the manuscript (lines 429-431):

      “… and PCA plots of IL-15 and IL-2 proteomics and RNAseq data show that Pim dKO IL-2 expanded CTL are still much more similar to IL-2 expanded WT CTL than to IL-15 expanded CTL (Fig S5)”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In panel B of Figure S1, are the smaller numbers of splenocytes found in dKO fully accounted for by a reduction in the numbers of T cells or also correspond to a reduction in B cell numbers? Are the thymus and lymph nodes showing the same trend?

      We’re happy to clarify on this.

      Since we were focused on T cell phenotypes in the paper this is what we have plotted in this figure, however there is also a reduction in total number of B, NK and NKT cells in the Pim dKO mice (see James et al, Nat Commun, 2021 for additional subset percentages). We find that all immune subsets we have measured make up the same % of the spleen in Pim dKO vs WT mice (we show this for T cell subsets in what was formerly Fig S1C and is now Fig S1A), the total splenocyte count is just lower in the Pim dKO mice (which we show in what was formerly Fig S1B and is now Fig S1C). To note, the Pim dKO mice were smaller than their WT counterparts (though we have not formally weighed and quantified this) and we think this is likely the major factor leading to lower total splenocyte numbers.

      We have not checked the thymus so can’t comment on this. We can confirm that lymph nodes from Pim dKO mice had the same number and % CD4 and CD8 T cells as in WT.

      For our in vitro studies we have made sure to either use co-cultures or for single WT and Pim dKO cultures to equalise starting cell densities between wells to account for the difference in total splenocyte number. We have now clarified this point in the methods section lines 682-684

      “For generation of memory-like or effector cytotoxic T lymphocytes (CTL) from mice with polyclonal T cell repertoires, LN or spleen single cell suspensions at an equal density for WT and Pim dKO cultures (~1-3 million live cells/mL)….”

      Reviewer #2 (Recommendations For The Authors):

      Line 89-99 - PIM kinase expression is elevated in T cells in autoimmunity and inhibiting therefore may make some sense if PIM is enhancing T cell activity. Why then would you use an inhibitor in cancer settings? This needs better clarification for readers, with reference to T cells, particularly given this is an important justification for looking at PIM kinases in T cells.

      We thank the reviewer for highlighting the lack of clarity in our explanation here.

      PIM kinase inhibitors alone are proposed as anti-tumour therapies for select cancers to block tumour growth. However so far these monotherapies haven’t been very effective in clinical trials and combination treatment options with a number of strategies are being explored. There are two lines of logic for why PIM kinase inhibitors might be a good combination with an e.g. anti-PD1 or adoptive T cell immunotherapy. 1) PIM kinase inhibition has been shown to reduce inhibitory/suppressive surface proteins (e.g. PDL1) and cytokine (e.g. TGFbeta) expression in tumour cells and macrophages in the tumour microenvironment. 2) Inhibiting glycolysis and increasing memory/stem-like phenotype has been identified as desirable for longer-lasting more potent anti-tumour T cell immunity. PIM kinase inhibition has been shown to reduce glycolytic function and increase several ‘stemness’ promoting transcription factors e.g. TCF7 in a previous study. Controlled murine cancer models have shown improvement in clearance with the combination of pan-Pim kinase inhibitors and anti-PD1/PDL1 treatments (Xin et al, Cancer Immunol Res, 2021 and Chatterjee et al, Clin Cancer Res 2019).

      It is worth noting, this is seemingly contradictory with other studies of Pim kinases in T cells that have generally found Pim1/2/3 deletion or inhibition in T cells to be suppressive of their function.

      We have clarified this reasoning/seeming conflict of results in the introductory text as follows (lines 90-101):

      “PIM kinase inhibitors have also entered clinical trials to treat some cancers (e.g. multiple myeloma, acute myeloid leukaemia, prostate cancer), and although they have not been effective as a monotherapy, there is interest in combining these with immunotherapies. This is due to studies showing PIM inhibition reducing expression of inhibitory molecules (e.g. PD-L1) on tumour cells and macrophages in the tumour microenvironment and a reported increase of stem-like properties in PIM-deficient T cells which could potentially drive longer lasting anti-cancer responses (Chatterjee et al., 2019; Xin et al., 2021; Clements and Warfel, 2022). However, PIM kinase inhibition has also generally been shown to be inhibitory for T cell activation, proliferation and effector activities (Fox et al., 2003; Mikkers et al., 2004; Jackson et al., 2021) and use of PIM kinase inhibitors could have the side effect of diminishing the anti-tumour T cell response.”  

      Line 93 - The use of 'some cancers' is rather vague and unscientific - please correct phrasing like this. The same goes for lines 54 and 77 (some kinases and some analyses).

      We have clarified the sentence in what is now Line 91 to include examples of some of the cancers that PIM kinase inhibitors have been explored for (see text correction in response to previous reviewer comment), which are predominantly haematological malignancies. The use of the phrase ‘some kinases’ and ‘some analyses’ in what are now Lines 52 and 75 is in our view appropriate as the subsequent sentence/(s) provide specific details on the kinases and analyses that are being referred to.

      Lines 146-147 - Could it be that rather than redundancies, PIM KO is simply not influential on TCR/CD28 signalling in general but influences other pathways in the T cell?

      We agree that the lack of PIM1/2 effect could also be because PIM targets downstream of TCR/CD28 are not influential and have clarified the text as follows (lines 156-161):

      “These experiments quantified expression of >7000 proteins but found no substantial quantitative or qualitative differences in protein content or proteome composition in activated WT versus Pim dKO CD4 and CD8 T cells (Fig 1G-H) (Table S1). Collectively these results indicate that PIM kinases do not play an important unique role in the signalling pathways used by the TCR and CD28 to control T cell activation.”

      Line 169 - Instead of specifying control - maybe put upregulate or downregulate for clarity.

      We have changed the text as per reviewer suggestion (see line 183)

      Line 182-183 - I would move the call out for Figure 2D to after the last call out for Figure 2C to make it more coherent for readers.

      We have changed the text as per reviewer suggestion (see lines 197-200)

      Line 190 - 14,000 RNA? total, unique? mRNA?

      These are predominantly mRNA since a polyA enrichment was performed as part of the standard TruSeq stranded mRNA sample preparation process, however, a small number of lncRNA etc were also detected in our RNA sequencing. We left the results in as part of the overall analysis since it may be of interest to others but don’t look into it further. We do mention the existence of the non-mRNA briefly in the subsequent sentence when discussing the total number of DE RNA that were classified as protein coding vs non-coding.

      We have edited this sentence as follows to more accurately reflect that the RNA being referred to is polyA+ (lines 205-207):

      “The RNAseq analysis quantified ~14,000 unique polyA+ mRNA and using a cut off of >1.5 fold-change and q-value <0.05 we saw that the abundance of 381 polyA+ RNA was modified by Pim1/Pim2-deficiency (Fig 2E) (Table S2A).

      Questions/points regarding figures:

      Figure 1 - Is PIM3 changed in expression with the knockout of PIM1/2 in mice? Although the RNA is low could there be some compensation here? The authors put a good amount of effort in to showing that mouse T cells do not exhibit differences from knocking out pim1/2 i.e., Efforts have been made to address this using activation markers and cell size, cytokines, and proliferation and proteomics of activated T cells. What do the resting T cells look like though? Although TCR signalling is not impacted, other pathways might be. Resting-state comparison may identify this.

      In all experiments Pim3 mRNA was only detected at very low levels and no PIM3 protein was detected by mass spectrometry in either wild type or PIM1/2 double KO TCR activated or cytokine expanded CD8 T cells (See Tables S1, S3, S4). There was similarly no change in Pim3 mRNA expression in RNAseq of IL-2 or IL-15 expanded CD8 T cells (See Tables S2, S6). While we have not confirmed this in resting state cells for all the conditions examined, there is no evidence that PIM3 compensates for PIM1/2deficiency or that PIM3 is substantially expressed in T cells.

      Figure 1A&B - Does PIM kinase stay elevated when removing TCR stimulus? During egress from lymph node and trafficking to infection/tumour/autoimmune site, T cells experience a period of 'rest' from T-cell activation so is PIM upregulation stabilized, or does it just coincide with activation? This could be a crucial control given the rest of the study focuses on day 6 after initial activation (which includes 4 days of 'rest' from TCR stimulation). Nice resolution on early time course though.

      This is an interesting question. Unfortunately, we do not know how sensitive PIM kinases are to TCR stimulus withdrawal, as we have not tried removing the TCR stimulus during early activation and measuring PIM expression.

      Based on the data in Fig 2A there is a hint that 4 hours withdrawal of peptide stimulus may be enough to lose PIM1/2 expression (after ~36 hrs of TCR activation), however, we did not include a control condition where peptide is retained within the culture. Therefore, we cannot resolve this question from the current experimental data, as this difference could also be due to a further increase in PIMs in the cytokine treated conditions rather than a reduction in expression in the no cytokine condition. This ~36-hour time point is also at a stage where T cells have become more dependent on cytokines for their sustained signalling compared to TCR stimulus.

      It is worth noting that PIM kinases are thought to have fairly short mRNA and protein half lives (~5-20 min for PIM1 in primary cells, ~10 min – 1 hr for PIM2). This is consistent with previous observations that cytotoxic T cells need sustained IL-2/Jak signalling to sustain PIM kinase expression, e.g. in Rollings et al (2018) Sci Signaling, DOI:10.1126/scisignal.aap8112 . We would therefore expect that sustained signalling from some external signalling receptor whether this is TCR, costimulatory receptors or cytokines is required to drive Pim1/2 mRNA and protein expression.

      Figure 1D - the CD4 WT and Pim dKO plots are identical - presumably a copying error - please correct.

      We apologise for the copying error and have amended the manuscript to show the correct data. We thank the reviewer for noticing this mistake.

      In Figure 1H - there is one protein found significant - would be nice to mention what this is - for example, if this is a protein that influences TCR levels this could be quite important.

      The protein is Phosphoribosyl Pyrophosphate synthase 1 like 1 (Prps1l1).

      This was a low confidence quantification (based on only 2 peptides) with no known function in T cells. Based on what is known, this gene is predominantly expressed in the testis (though also detected in spleen, lung, liver). A whole-body KO mouse found no difference in male fertility. No further phenotype has been reported in this mouse. See: Wang et al (2018) Mol Reprod Dev, DOI: 10.1002/mrd.23053

      We have added the following text to the legend of Figure 1H to address this protein:

      “Phosphoribosyl Pyrophosphate synthase 1 like 1 (Prps1l1), was found to be higher in Pim dKO CD8 T cells, but was a low confidence quantification (based on only 2 unique peptides) with no known function in T cells.”

      Figure S1 - In your mouse model the reduction in CD4 T cells is quite dramatic in the spleen - is this reduced homing or reduced production of T cells through development?

      Could you quantify the percentage of CD45+ cells that are T cells from blood too? Would be good to have a more thorough analysis of this new mouse model.

      We apologise for the lack of clarity around the Pim dKO mouse phenotype. Something we didn’t mention previously due to a lack of a formal measurement is that the Pim dKO mice were typically smaller than their WT counterparts. This is likely the main reason for total splenocytes being lower in the Pim dKO mice - every organ is smaller. It is not a phenotype reported in Pim1/2 dKO mice on an FVB background, though has been reported in the Pim1/2/3 triple KO mouse before (see Mikkers et al, Mol Cell Biol 2004 doi: 10.1128/MCB.24.13.6104-6115.2004).

      The % cell type composition of the spleen is equivalent between WT and Pim dKO mice and as mentioned above, was controlled for when setting up of our in vitro cultures.

      We have revised the main text and changed the order of the panels in Fig S1 to make this caveat clearer as follows (lines 138-144):

      “There were normal proportions of peripheral T cells in spleens of Pim dKO mice (Fig S1A) similar to what has been reported previously in Pim dKO mice on an FVB/N genetic background (Mikkers et al., 2004), though the total number of T cells and splenocytes was lower than in age/sex matched wild-type (WT) mouse spleens (Fig S1B-C). This was not attributable to any one cell type (Fig S1A)(James et al., 2021) but was instead likely the result of these mice being smaller in size, a phenotype that has previously been reported in Pim1/2/3 triple KO mice (Mikkers et al., 2004).”

      Figure S1C - why are only 10-15% of the cells alive? Please refer to this experiment in the main text if you are going to include it in the supplementary figure.

      With regards what was previously Fig S1C (now Fig S1A) we apologise for our confusing labelling. We were quoting these numbers as the percentage of live splenocytes (i.e. % of live cells). Typically ~80-90% of the total splenocytes were alive by the time we had processed, stained and analysed them by flow cytometry direct ex vivo. Of these CD4 and CD8 T cells made up ~%10-15 of the total live splenocytes (with most of the rest of the live cells being B cells).  

      We have modified the axis to say “% of splenocytes” to make it clearer that this is what we are plotting.

      Figure S1 - Would be good to show that the T cells are truly deficient in PIM1/2 in your mice to be absolutely sure. You could just make a supplementary plot from your mass spec data.

      This is a good suggestion and we have now included this data as supplementary figure 2.

      To note, due to the Pim1 knockout mouse design this is not as simple as showing presence or absence of total PIM1 protein detection in this instance.

      To elaborate: the Pim1/Pim2 whole body KO mice used in this study were originally made by Prof Anton Berns’ lab (Pim1 KO = Laird et al Nucleic Acids Res, 1993, doi: 10.1093/nar/21.20.4750, with more detail on deletion construct in te Riele, H. et al, Nature,1990, DOI: 10.1038/348649a0; Pim2 KO = Mikkers et al, Mol Cell Biol, 2004, DOI: 10.1128/MCB.24.13.6104-6115.2004). They were given to Prof Victor Tybulewicz on an FVB/N background. He then backcrossed them onto the C57BL/6 background for > 10 generations then gave them to us to intercross into Pim1/2 dKO mice on a C57BL/6 background.

      The strategy for Pim1 deletion was as follows:

      A neomycin cassette was recombined into the Pim1 gene in exon 4 deleting 296 Pim1 nucleotides. More specifically, the 98th pim-1 codon (counted from the ATG start site = the translational starting point for the 34 kDa isoform of PIM1) was fused in frame by two extra codons (Ser, Leu) to the 5th neo codon (pKM109-90 was used). The 3'-end of neo included a polyadenylation signal. The cassette also contains the PyF101 enhancer (from piiMo +PyF101) to ensure expression of neo on homologous recombination in ES cells.

      Collectively this means that the PIM1 polypeptide is made prior to amino acid 98 of the 34 kDa isoform but not after this point. This deletes functional kinase activity in both the 34 kDa and 44 kDa PIM1 isoforms. Ablation of PIM1 kinase function using this KO was verified via kinase activity assay in Laird et al. Nucelic Acids Res 1993.

      The strategy to delete Pim2 was as follows:

      “For the Pim2 targeting construct, genomic BamHI fragments encompassing Pim2 exons 1, 2, and 3 were replaced with the hygromycin resistance gene (Pgp) controlled by the human PGK promoter.” (Mikkers et al Mol Cell Biol, 2004)

      The DDA mass spectrometry data collected in Fig 1 G-H and supplementary table 1 confirmed we do not detect peptides from after amino acid residue 98 in PIM1 (though we do detect peptides prior to this deletion point) and we do not detect peptides from the PIM2 protein in the Pim dKO mice. Thus confirming that no catalytically active PIM1/PIM2 proteins were made in these mice.

      We have added a supplementary figure S2 showing this and the following text (Lines 155-156):

      “Proteomics analysis confirmed that no catalytically active PIM1 and PIM2 protein were made in Pim dKO mice (Fig S2).”

      Figure 2A - I found the multiple arrows a little confusing - would just use arrows to indicate predicted MW of protein and stars to indicate non-specific. Why are there 3 bands/arrows for PIM2?  

      The arrows have now been removed. We now mention the PIM1 and PIM2 isoform sizes in the figure legend and have left the ladder markings on the blots to give an indication of protein sizes. There are 2 isoforms for PIM1 (34 and 44 kDa) in addition to the nonspecific band and 3 isoforms of PIM2 (40, 37, 34 kDa, though two of these isoform bands are fairly faint in this instance). These are all created via ribosome use of different translational start sites from a single Pim1 or Pim2 mRNA transcript.

      The following text has been added to the legend of Fig 2A:

      “Western blots of PIM1 (two isoforms of 44 and 34 kDa, non-specific band indicated by *), PIM2 (three isoforms of 40, 37 and 34 kDa) or pSTAT5 Y694 expression.”

      Figure 2A - why are the bands so faint for PIM1/2 (almost non-existent for PIM2 under no cytokine stim) here yet the protein expression seems abundant in Figure 1B upon stim without cytokines? Is this a sensitivity issue with WB vs proteomics? My apologies if I have missed something in the methods but please explain this discrepancy if not.

      There is differing sensitivity of western blotting versus proteomics, but this is not the reason for the discrepancy between the data in Fig 1B versus 2A. These differences reflect that Fig1 B and Fig 2A contrast PIM levels in two different sets of conditions and that while proteomics allows for an estimate of ‘absolute abundance’ Western blotting only shows relative expression between the conditions assessed.  

      To expand on this… Fig 1B proteomics looks at naïve versus 24 hr aCD3/aCD28 TCR activated T cells. The western blot data in Fig 2A looks at T cells activated for 1.5 days with SIINFEKL peptide and then washed free of the media containing the TCR stimulus and cultured with no stimulus for 4 or 24 hrs hours and contrast this with cells cultured with IL-2 or IL-15 for 4 or 24 hours. All Fig 2A can tell us is that cytokine stimuli increases and/or sustains PIM1 and PIM2 protein above the level seen in TCR activated cells which have not been cultured with cytokine for a given time period. Overexposure of the blot does reveal detectable PIM1 and PIM2 protein in the no cytokine condition after 4 hrs. Whether this is equivalent to the PIM level in the 24 hr TCR activated cells in Fig 1B is not resolvable from this experiment as we have not included a sample from a naïve or 24 hr TCR activated T cell to act as a point of reference.

      Figure 4F - Your proteomics data shows substantial downregulation in proteomics data for granzymes and ifny- possibly from normalization to maximise the differences in the graph - and yet your flow suggests there are only modest differences. Can you explain why a discrepancy in proteomics and flow data - perhaps presenting in a more representative manner (e.g., protein counts)?

      The heatmaps are a scaled for ‘row max’ to ‘row min’ copy number comparison on a linear scale and do indeed visually maximise differences in expression between conditions. This feature of these heatmaps is also what makes the lack of difference in GzmB and GzmA at the mRNA heatmap in Fig 5C quite notable.

      We have now included bar graphs of Granzymes A and B and IFNg protein copy number in Figure 4 (see new Fig 4G-H) to make clearer the magnitude of the effect on the major effector proteins involved in CTL killing function. It is worth noting that flow cytometry histograms from what was formerly Fig 4G (now Fig 4I) are on a log-scale so the shift in fluorescence does generally correspond well with the ~1.7-2.75-fold reduction in protein expression observed.

      Figure 4G - did you use isotype controls for this flow experiment? Would help convince labelling has worked - particularly for low levels of IFNy production.

      We did not use isotype controls in these experiments but we are using a well validated interferon gamma antibody and very carefully colour panel/compensation controls to minimise background staining. The only ways to be 100% confident that an antibody is selective is to use an interferon gamma null T cell which we do not have. We do however know that the antibody we use gives flow cytometry data consistent with other orthogonal approaches to measure interferon gamma e.g. ELISA and mass spectrometry.

      Figure 5M - why perform this with just the PIM kinase inhibitors? Can you do this readout for the WT vs. PIM1/2KO cells too? This would really support your claims for the paper about PIM influencing translation given the off-target effects of SMIs.

      Regrettably we have not done this particular experiment with the Pim dKO T cells. As mentioned above, due to this work being performed predominantly during the COVID19 pandemic we ultimately had to make the difficult decision to cease colony maintenance. When work restrictions were lifted we could not ethically or economically justify resurrecting a mouse colony for what was effectively one experiment, which is why we chose to test this key biological question with small molecule inhibitors instead.

      We appreciate that SMIs have off target effects and this is why we used multiple panPIM kinase inhibitors for our SMI validation experiments. While the use of 2 different inhibitors still doesn’t completely negate the concern about possible off-target effects, our conclusions re: PIM kinases and impact on proteins synthesis are not solely based on the inhibitor work but also based on the decreased protein content of the PIM1/2 dKO T cells in the IL-2 CTL, and the data quantifying reductions in levels of many proteins but not their coding mRNA in PIM1/2dKO T cells compared to controls.

    1. eLife Assessment

      This article presents a valuable genetic spatio-temporal analysis of malaria-infected individuals from four villages in a highly seasonal transmission setting in The Gambia, covering the period between December 2014 and May 2017. Evidence generated by the study's laboratory and data processing approaches is solid and helps to advance the understanding of malaria in The Gambia, particularly due to its longitudinal design and the inclusion of asymptomatic cases.

    2. Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods of intense transmission at the beginning of the rainy season are interspersed by long periods of low to no transmission. This raises several questions about how this transmission pattern impacts the spatiotemporal distribution of circulating parasite strains, how parasites persist during the dry season, and how asymptomatic infections contribute to maintaining transmission during the low/no transmission season.

      Combining a molecular barcode genotyping using 101 bi-allelic SNPs and SNPs from Whole Genome Sequence (WGS) in a "consensus barcode", the authors aimed at measuring the relatedness between parasites at different spatial (i.e., individual, household, village, and region) and temporal (i.e., high, low, and the corresponding the transitions) levels by assessing the fraction of the genome having a common ancestry (i.e. Identity-by-Descent (IBD)).

      By measuring the Complexity of Infection (COI) and parasite relatedness by IBD the authors show that a large fraction of infections is polygenomic and stable over time, resulting in a high recombinational diversity. Moreover, they show that transmission intensity increases during the transition from the dry to wet seasons. However, they find that there is a higher probability of finding similar genotypes within the same household, but this similarity rapidly disappears over time and is not observed between different villages. If there is no drug selection during the dry season, and if resistance results in a fitness cost, alleles associated with drug resistance may change in frequency. The authors looked at the frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps, kelch13, and mdr1), and found no evidence of changes in allele frequencies associated with seasonality. They also find chronic infections lasting from one month to one and a half years with no dependence on age or gender.

      This work makes use of genomic information and IBD analytic tools to show parasite relatedness from asymptomatic infections at different spatial and temporal scales, thus providing a better understanding of the transmission dynamics of malaria in highly seasonal environments.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes representing 101 bi-allelic SNPs) and 199 high-quality genome sequences to infer the fraction of the genome with shared Identity by Descent (IBD) (i.e. a metric of recombination rate) over several time points covering two years. The barcode and whole genome sequence combination allows full use of a large dataset, to confidently infer the relatedness of parasite isolates at various spatiotemporal scales and show the advantage of using genomic information for understanding malaria transmission dynamics.

      The authors aimed to establish how seasonal transmission cycles shape the spatiotemporal parasite population structure using metrics such as parasite genetic diversity, genetic relatedness, and frequency of drug resistance alleles, as well as the contribution of asymptomatic chronic carriers to sustained transmission. The results support their conclusions.

      Using a combination of molecular barcodes and available whole genome sequence datasets opens new opportunities to understand malaria transmission dynamics in different transmission settings. This allows for data analysis at different spatiotemporal granularities, having a practical utility for identifying malaria control targets and acquiring metrics to evaluate malaria control programs. The development of molecular barcodes using similar SNPs by different malaria control programs would be of great utility to compare and understand malaria transmission dynamics in different settings worldwide.

    3. Reviewer #3 (Public review):

      This study aimed to examine the impact of seasonality on the population genetics of malaria parasites. To achieve this, the researchers conducted a longitudinal study in a region with seasonal malaria transmission. Over a 2.5-year period, blood samples were collected from 1,516 participants across four villages in the Upper River Region of The Gambia. These samples were tested for malaria parasite infection, and the parasites from positive samples were genotyped using a genetic barcode and/or whole genome sequencing. Genetic relatedness analysis was then performed to explore the findings

      The study identified three key findings:

      (1) The malaria parasite population undergoes continuous recombination, with no single genotype predominating, in contrast to viral populations;

      (2) Parasite relatedness is influenced by both spatial and temporal factors; and

      (3) The lowest genetic relatedness among parasites occurs during the transition from the low to high transmission seasons, which the authors linked to increased recombination during sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and self-explanatory. The methods are adequately described, providing a solid foundation for the findings. While there are no unexpected results, it is reassuring to see the anticipated outcomes supported by actual data. The conclusions are generally well-supported and the recommendation to target asymptomatic infections is logical and relevant.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Household clustering and seasonal genetic  variation of Plasmodium falciparum at the community-level in The Gambia" presents a valuable genetic spatio-temporal analysis of  malaria-infected individuals from four villages in The Gambia, covering  the period between December 2014 and May 2017. The majority of samples  were analyzed using a SNP barcode with the Spotmalaria panel, with a  subset validated through WGS. Identity-by-descent (IBD) was calculated  as a measure of genetic relatedness and spatio-temporal patterns of the  proportion of highly related infections were investigated. Related  clusters were detected at the household level, but only within a short  time period.

      Strengths:

      This study offers a valuable dataset, particularly due to its  longitudinal design and the inclusion of asymptomatic cases. The  laboratory analysis using the Spotmalaria platform combined and  supplemented with WGS is solid, and the authors show a linear  correlation between the IBD values determined with both methods,  although other studies have reported that at least 200 SNPs are required for IBD analysis. Data-analysis pipelines were created for (1) variant  filtering for WGS and subsequent IBD analysis, and (2) creating a  consensus barcode from the spot malaria panel and WGS data and  subsequent SNP filtering and IBD analysis.

      Weaknesses:

      Further refining the data could enhance its impact on both the scientific community and malaria control efforts in The Gambia.

      (1) The manuscript would benefit from improved clarity and better  explanation of results to help readers follow more easily. Despite  familiarity with genotyping, WGS, and IBD analysis, I found myself  needing to reread sections. While the figures are generally clear and  well-presented, the text could be more digestible. The aims and  objectives need clearer articulation, especially regarding the rationale for using both SNP barcode and WGS (is it to validate the approach with the barcode, or is it to have less missing data?). In several analyses, the purpose is not immediately obvious and could be clarified.

      The text of the manuscript has now been thoroughly revised. But please let us know if a specific section remains unclear.

      (2) Some key results are only mentioned briefly in the text without  corresponding figures or tables in the main manuscript, referring only  to supplementary figures, which are usually meant for additional detail, but not main results. For example, data on drug resistance markers  should be included in a table or figure in the main manuscript.

      We agree with the reviewer suggesting to move the prevalence of drug resistance markers from supplementary figures (previously Figure S8) to the main manuscript (now Figure 5). If other Figure/Table should be moved to the main manuscript please let us know.

      (3) The study uses samples from 2 different studies. While these are  conducted in the same villages, their study design is not the same,  which should be addressed in the interpretation and discussion of the  results. Between Dec 2014 and Sept 2016, sampling was conducted only in 2 villages and at less frequent intervals than between Oct 2016 to May  2017. The authors should assess how this might have impacted their  temporal analysis and conclusions drawn. In addition, it should be  clarified why and for exactly in which analysis the samples from Dec  2016 - May 2017 were excluded as this is a large proportion of your  samples.

      We have clarified which set of samples was used in our Results (Lines 293-295, 316-319). While two villages were recruited halfway through the study, two villages (J and K, Figure 1C) consistently provided data for each transmission season. Importantly, our temporal analysis accounts for these differences by grouping paired barcodes based on their respective locations (Figure 3B). Despite variations in sampling frequency, we still observe a clear overall decline in relatedness between the ‘0-2 months’ and ‘2-5 months’ groups, both of which include barcodes from all four villages.

      (4) Based on which criteria were samples selected for WGS? Did the  spatiotemporal spread of the WGS samples match the rest of the genotyped samples? I.e. were random samples selected from all times and places,  or was it samples from specific times/places selected for WGS?

      All P. falciparum positive samples were sent for genotyping and whole genome sequencing, ensuring no selection bias. However, only samples with sufficient parasite DNA were successfully sequenced. We have updated the text (Line 129-130) and added a supplementary figure (Figure S4) to show the sample collection broken down by type of data (barcode or genome). High quality genomes are distributed across all time points.

      (5) The manuscript would benefit from additional detail in the methods section.

      Please see our response in the section “Recommendation for the authors”.

      (6) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      While we acknowledge the potential for bias between samples with a consensus barcode (based on WGS) and those with genotyping-only barcodes, its impact is minimal. WGS does indeed produce a more accurate barcode compared to SNP genotyping, but any errors in the genotyping barcodes were mitigated by excluding loci that systematically mismatched with WGS data (see Figure S3). Additionally, the use of WGS improved the accuracy of 51 % (216/425) of barcodes, which strengthens the overall quality and validity of our analysis.

      (7) The linear correlation between IBD-values of barcode vs genome is  clear. However, since you do not use absolute values of IBD, but a  classification of related (>=0.5 IBD) vs. unrelated (<0.5), it  would be good to assess the agreement of this classification between the 2 barcodes. In Figure S6 there seem to be quite some samples that would be classified as unrelated by the consensus barcode, while they have  IBD>0.5 in the Genome-IBD; in other words, the barcode seems to be  underestimating relatedness.

      a. How sensitive is this correlation to the nr of SNPs in the barcode?

      We measured the agreement between the two classifications using specificity (0.997), sensitivity (0.841) and precision (0.843) described in the legend of Figure S8. To further demonstrate the good agreement between the two methods, we calculated a Cohen’s kappa value of 0.839 (Lines 226, 290), indicative of a strong agreement (McHugh 2012). As expected, the correlation between IBD values obtained by both methods improves (higher Cohen’s kappa and R<sup>2</sup>) as the cutoff for the minimal number of comparable and informative loci per barcode pair is raised (data not shown).

      (8) With the sole focus on IBD, a measure of genetic relatedness, some of the conclusions from the results are speculative.

      a. Why not include other measures such as genetic diversity, which  relates to allele frequency analysis at the population level (using, for example, nucleotide diversity)? IBD and the proportion of highly  related pairs are not a measure of genetic diversity. Please revise the  manuscript and figures accordingly.

      We agree with the fact that IBD is not a direct measure of genetic diversity, even though both are related (Camponovo et al., 2023). More precisely, IBD is a measure of the level of inbreeding in the population (Taylor et al., 2019). We have updated our manuscript by replacing “genetic diversity” with “genetic relatedness” or “inbreeding/outcrossing” when appropriate. Nucleotide diversity would be relevant if we wanted to compare different settings, e.g. Africa vs Asia, however this is not the case here.

      b. Additionally, define what you mean by "recombinatorial genetic  diversity" and explain how it relates to IBD and individual-level  relatedness.

      We considered the term ‘recombinatorial genetic diversity’ to be equivalent to the level of inbreeding in the population. Because this expression is rather uncommon, we decided to drop it from our manuscript and replace it with “inbreeding/outcrossing”.

      c. Recombination is one potential factor contributing to the loss of  relatedness over time. There are several other factors that could  contribute, such as mobility/gene flow, or study-specific limitations  such as low numbers of samples in the low transmission season and many  months apart from the high transmission samples.

      Indeed, the loss of relatedness could be attributed not only to the recombination of local cases but also to new parasites introduced by imported malaria cases. As we stated in our manuscript, previous studies have shown a limited effect of imported cases on maintaining transmission (Lines 72-74). Nevertheless, we cannot definitely exclude that imported cases have an effect on inbreeding levels, since we do not have access to genetic data of surrounding parasites at the time of the study. We updated the discussion accordingly (Lines 497-501).

      d. By including other measures such as linkage disequilibrium you could  further support the statements related to recombination driving the loss of relatedness.

      This commendable suggestion is actually part of an ongoing project focusing on the sharing of IBD fragments and how it correlates with linkage disequilibrium. However, we believe that this analysis would not fit in the scope of our manuscript which is really about spatio-temporal effects on parasite relatedness at a local scale.

      (9) While the authors conclude there is no seasonal pattern in the  drug-resistant markers, one can observe a big fluctuation in the dhps  haplotypes, which go down from 75% to 20% and then up and down again  later. The authors should investigate this in more detail, as dhps is  related to SP resistance, which could be important for seasonal malaria  chemoprofylaxis, especially since the mutations in dhfr seem near-fixed  in the population, indicating high levels of SP resistance at some of  the time points.

      As the reviewer noted, the DHPS A437G haplotype appears to decrease in prevalence twice throughout our study: from the 2015 and 2016 high transmission seasons to the subsequent 2016 and 2017 low transmission seasons. Seasonal Malaria Chemoprophylaxis (SMC) was carried out in the area through the delivery of sulfadoxine–pyrimethamine plus amodiaquine to children 5 years old and younger during high transmission seasons. As DHPS A437G haplotype has been associated with resistance to sulfadoxine, its apparent increase in prevalence during high transmission seasons could be resulting from the selective pressure imposed on parasites. After SMC, the decrease in prevalence observed during low transmission seasons could be caused by a fitness cost of the mutation favouring wild-type parasites over resistant ones. We updated our manuscript to reflect this relevant observation (Lines 400-405).

      (10) I recommend that raw data from genotyping and WGS should be deposited in a public repository.

      Genotyping data is available in the supplementary table 4 (Table S4). Whole genome sequencing is accessible in a European Nucleotide Archive public repository with the identifiers provided in supplementary table 5 (Table S5). We added references to these tables in the manuscript (Lines 249-250).

      Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods  of intense transmission at the beginning of the rainy season are  interspersed by long periods of low to no transmission. This raises  several questions about how this transmission pattern impacts the  spatiotemporal distribution of circulating parasite strains. Knowledge  of these dynamics may allow the identification of key units for targeted control strategies, the evaluation of the effect of selection/drift on  parasite phenotypes (e.g., the emergence or loss of drug resistance  genotypes), and analyze, through the parasites' genetic nature, the  duration of chronic infections persisting during the dry season. Using a combination of barcodes and whole genome analysis, the authors try to  answer these questions by making clever use of the different  recombination rates, as measured through the proportion of genomes with  identity-by-descent (IBD), to investigate the spatiotemporal relatedness of parasite strains at different spatial (i.e., individual, household,  village, and region) and temporal (i.e., high, low, and the  corresponding the transitions) levels. The authors show that a large  fraction of infections are polygenomic and stable over time, resulting  in high recombinational diversity (Figure 2). Since the number of  recombination events is expected to increase with time or with the  number of mosquito bites, IBD allows them to investigate the  connectivity between spatial levels and to measure the fraction of  effective recombinational events over time. The authors demonstrate the  epidemiological connectivity between villages by showing the presence of related genotypes, a higher probability of finding similar genotypes  within the same household, and how parasite-relatedness gradually  disappears over time (Figure 3). Moreover, they show that transmission  intensity increases during the transition from dry to wet seasons  (Figure 4). If there is no drug selection during the dry season and if  resistance incurs a fitness cost it is possible that alleles associated  with drug resistance may change in frequency. The authors looked at the  frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps,  kelch13, and mdr1), and found no evidence of changes in allele  frequencies associated with seasonality. They also find chronic  infections lasting from one month to one and a half years with no  dependence on age or gender.

      The use of genomic information and IBD analytic tools provides the  Control Program with important metrics for malaria control policies, for example, identifying target populations for malaria control and  evaluation of malaria control programs.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes  representing 101 bi-allelic SNPs) and 199 high-quality genome sequences  to infer the fraction of the genome with shared Identity by Descent  (IBD) (i.e. a metric of recombination rate) over several time points  covering two years. The barcode and whole genome sequence combination  allows full use of a large dataset, and to confidently infer the  relatedness of parasite isolates at various spatiotemporal scales.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate the impact of seasonality on the malaria parasite population genetic. To achieve this, the researchers conducted a longitudinal study in a region characterized by seasonal malaria  transmission. Over a 2.5-year period, blood samples were collected from  1,516 participants residing in four villages in the Upper River Region  of The Gambia and tested the samples for malaria parasite positivity.  The parasites from the positive samples were genotyped using a genetic  barcode and/or whole genome sequencing, followed by a genetic  relatedness analysis.

      The study identified three key findings:

      (1) The parasite population continuously recombines, with no single genotype dominating, in contrast to viral populations;

      (2) The relatedness of parasites is influenced by both spatial and temporal distances; and

      (3) The lowest genetic relatedness among parasites occurs during the  transition from low to high transmission seasons. The authors suggest  that this latter finding reflects the increased recombination associated with sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and  self-explanatory. The methods are adequately described, providing a  solid foundation for the findings. While there are no unexpected  results, it is reassuring to see the anticipated outcomes supported by  actual data. The conclusions are generally well-supported; however, the  discussion on the burden of asymptomatic infections falls outside the  scope of the data, as no specific analysis was conducted on this aspect  and was not stated as part of the aims of the study. Nonetheless, the  recommendation to target asymptomatic infections is logical and  relevant.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript would benefit from additional detail in the methods section.

      a. Refer to Figure 1 when you describe the included studies and sample processing.

      We added the reference to Figure 1 (Line 131).

      b. While you describe each step in the pipeline, you do not specify the  tools, packages, or environment used (the GitHub link is also  non-functional). A graphic representation of the pipeline, with more  bioinformatic details than Supplementary Figure S1, would be helpful.  Add references to used tools and software created by others.

      The GitHub link has been updated and is now functional. We find Figure S1 already heavy in details, adding in more would be detrimental to our will of it being an easily readable summary of our pipeline. Readers seeking in-depth explanation of our pipeline might be more interested in reading the methods section instead. We are very much committed to credit the authors of the tools that were essential for us to create our analysis pipeline. The two most relevant tools that we used are hmmIBD and the Fws calculation, which were both cited in the methods (Lines 148-152, 214-215).

      c. What changed in the genotyping protocol after May 2016? Does it not  lead to bias in the (temporal) analysis by leaving these loci in for  samples collected before May 2016 and making them 'unknown' for the  majority of samples collected after this date?

      These 21 SNPs all clustered in 1 of the 4 multiplexes used for molecular genotyping, which likely failed to produce accurate base calls. We updated the text to include this information (Lines 198-200).

      The rationale behind the discarding of these 21 SNPs for barcodes sampled after May 2016 was that they were consistently mismatching with the WGS SNPs, probably due to genotyping error as mentioned above. However, by replacing these unknown positions in the molecular barcodes with WGS SNPs, 141 samples did recover some of these 21 SNPs with the accurate base calls (Figure S3A). Additionally, we added an extra analysis to assess the agreement between barcodes and WGS data (Figure S3B).

      d. Related to this, how are unknown and mixed genotypes treated in the  binary matrix? How is the binary matrix coded? Is 0 the same as the  reference allele? So all the missing and mixed are treated as  references? How many missing and mixed alleles are there, how often does it occur and how does this impact the IBD analysis?

      We acknowledge that the details that we provided regarding the IBD analysis were confusing. hmmIBD requires a matrix that contains positive or null integers for each different allele at a given loci (all our loci were bi-allelic, thus only 0 and 1 were used) and -1 for missing data. In our case, we set missing and mixed alleles to -1, which were then ignored during the IBD estimation. The corresponding text was updated accordingly (Lines 173-175).

      e. By excluding households with less than 5 comparisons, are you not preselecting households with high numbers of cases, and therefore higher likelihood of transmission within the household?

      All participants in each household were sampled at every collection time point. This sampling was unbiased towards likelihood of transmission. Excluding pairs of households with less than 5 comparisons was necessary to ensure statistical robustness in our analyses. Besides, this does not necessarily restrict the analysis to only households with a high number of cases as it is the total number of pairs between households that must equal 5 at least (for instance these pairs would pass the cutoff: household with 1 case vs household with 5 cases; household with 2 cases vs household with 3 cases).

      (2) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      See (6) in the Public Review.

      a. It would be good to get a better sense of the distribution of the nr  of SNPs in the barcode. The range is 30-89, and 30 SNPs for IBD is  really not that much.

      Adding the range of the number of available SNPs per barcode is indeed particularly relevant. We added a supplementary figure (Figure S5) showing the distribution of homozygous SNPs per barcode, showing that a very small minority of barcodes have only 30 SNPs available for IBD (average of 65, median of 64).

      b. Did you compare the nr of SNPs in the consensus vs. only genotyped  barcodes? Is there more missing data in the genotype-only barcodes?

      We added a supplementary figure (Figure S5) with the distribution of homozygous SNPs in consensus (216 samples) and molecular (209 samples) barcodes. Consensus barcodes have more homozygous SNPs (average 76, median 82) than molecular barcodes (average of 54, median of 53), showing the improvement resulting from using whole genome sequencing data.

      c. How was the cut-off/sample exclusion criteria of 30 SNPs in the barcode determined?

      As described above (Public review section 7.a.), we removed pairs of barcodes with less than 30 comparable loci (and 10 informative loci) because this led to a good agreement between IBD values obtained from barcodes and genomes while still retaining a majority of pairwise IBD values.

      d. Was there more/less IBD between sample pairs with a consensus barcode vs those with genotype-only barcodes?

      We separated pairwise IBD values into two groups: “within consensus” and “within molecular”. The percentages of related barcodes (IBD ≥ 0.5) was virtually identical between “within consensus” (1.88 %) and “within molecular” (1.71 %) groups (χ<sup>2</sup> = 1.33, p value > 0.24).

      (3) Line 124 adds a reference for the PCR method used.

      We have updated this information: varATS qPCR (Line 121).

      (4) Line 126, what is MN2100ff? Is this the catalogue number of the  cellulose columns? Please clarify and add manufacturer details.

      MN2100ff was a replacement for CF11. We added a link to the MalariaGen website describing the product and the procedure (Lines 124-125).

      (5) Line 143: Figure S7 is the first supplementary figure referenced. Change the order and make this Figure S1?

      The numbering of figures is now fixed.

      (6) Line 154: How many SNPs were in the vcf before filtering?

      There were 1,042,186 SNPs before filtering. This information was added to the methods (Line 168).

      (7) Line 156: Why is QUAL filtered at 10000? This seems extremely high.  (I could be mistaken, but often QUAL above 50 or so is already fine, why discard everything below 10000?). What is the range of QUAL scores in  your vcf?

      We used the QUAL > 10000 to make our analyses less computationally intensive while keeping enough relevant genetic information. We agree that keeping variants with extremely high values of QUAL is not relevant above a certain threshold as it translates into infinitesimally low probabilities (10<sup>-(QUAL/10)</sup>) of the variant calling being wrong. We then decided to use a minimal population minor allele frequency (MAF) of 0.01 to keep a variant as this will make the IBD calculation more accurate (Taylor et al., 2019). The variant filtering was carried out with the MAF > 0.01 filter, resulting in 27,577 filtered SNPs with a minimal QUAL of 132. With a cutoff of 3000 available SNPs, we retrieved all 199 genomes previously obtained with the QUAL > 10000 condition. The methods have been updated accordingly (Lines 166-170).

      (8) Line 161-165: How did you handle the mixed alleles in the hmmIBD  analysis for the WGS data? Did you set them as 0 as you do later on for  the consensus barcode?

      Mixed alleles and missing data were ignored. This translated into a value of -1 for the hmmIBD matrix and not 0 as we incorrectly stated previously. We updated our manuscript with this correct information (Lines 173-175).

      (9) Line 168-171: How many SNPs do you have in the WGS dataset after all the filtering steps? If the aim of the IBD with WGS was to validate the IBD-analysis with the barcode, wouldn't it make sense to have at least  200 loci (as shown in Taylor et al to be required for hmmIBD) in the WGS data? What proportion of comparisons were there with only 100 pairs of  loci? This seems like really few SNPs from WGS data.

      There were 27,577 SNPs overall in the 199 high quality genomes. In our analysis, we make the distinction between comparable and informative loci. For two loci to be comparable, they both have to be homozygous. To be informative, they must be comparable and at least one of them must correspond to the minor allele in the population. We borrowed this term and definition from hmmIBD software which yields directly the number of informative loci per pair. By keeping pairs with at least 100 informative SNPs, we aimed to reduce the number of samples artificially related because only population major alleles are being compared. Pairs of genomes had between 1073 and 27466 of these, way above the recommended 200 loci in Taylor et al. (2019). We added more details on comparable and informative sites (Lines 152-160).

      (10) Line 178: why remove the 12 loci that are absent from the WGS? Are  these loci also poorly genotyped in the spotmalaria panel?

      As our goal is to validate the reliability of molecular genotyped SNPs, these 12 loci have to be removed. Especially because we did find a consistent discrepancy between genotyped and WGSed SNPs, which cannot be tested if these SNPs are absent from the genomes.

      (11) Line 180-182: What do you mean by this sentence: "Genomic barcodes  are built using different cutoffs of within-sample MAF and aligned  against molecular barcodes from the same isolates." Is this the analysis presented in the supplementary figure and resulting in the cut-off of  MAF 0.2? Please clarify.

      A loci where both alleles are called can result from two distinct haploïd genomes present or from an error occurring during sequencing data acquisition or processing. To distinguish between the two, we empirically determined the cutoff of within-sample MAF above which the loci can be considered heterozygous and below which only the major allele is kept. The corresponding figure was indeed Figure S2 (referenced in next sentence Lines 192-195). We clarified our approach in the methods (Lines 190-192) and legends of Figures S2 and Figure S3.

      (12) Line 191: How often was there a mismatch between WGS and SNP barcode?

      We added a panel (Figure S3B) showing the average agreement of each SNP between molecular genotyping and WGS. We highlighted the 21 discrepant SNPs showing a lower agreement only for samples collected after May 2016.

      (13) Line 201-204: This part is unclear (as above for the WGS): did you  include sample pairs with more than 10 paired loci? But isn't 10 loci  way too few to do IBD analysis?

      We included pairs of samples with at least 30 comparable loci and 10 informative paired loci (refer to our answer to comment 8 for the difference between the two). We added more details regarding comparable and informative sites (Lines 152-160). Indeed, using fewer than 200 loci leads to an IBD estimation that is on average off by 0.1 or more (Taylor et al., 2019). However we showed that the barcode relatedness classification based on a cutoff of IBD (related when above 0.5, unrelated otherwise) was close enough to our gold standard using genomes (each pair having more than 1000 comparable sites). Because we use this classification approach rather than the exact value of barcode-estimated IBD in our study, our 30 minimum comparable sites cutoff seems sufficient.

      (14) Lines 206-207: which program did you use to analyse Fws?

      We did not use any program, we computed Fws according to Manske et al. (2012) methods.

      (15) Line 233: "we attempted parasite genotyping and whole genome  sequencing of 522 isolates over 16 time points" => This is confusing, you did not do WGS of 522 samples, only 199 as mentioned in the next  sentence.

      We attempted whole genome sequencing on 331 isolates and molecular genotyping on 442 isolates with 251 isolates common between the two methods. We updated our text to clarify this point (Lines 247-252).

      (16) Lines 256-259: Add a range of proportions or some other summary  statistic in this section as you are only referring here to  supplementary figures to support these statements.

      The text has been updated (Lines 271-274).

      (17) Line 260: check the formatting of the reference "Collins22" as the rest of the document references are numbered.

      Fixed.

      (18) Figure 2/3:

      a. You could also inspect relatedness at the temporal level, by  adjusting the network figure where the color is village and shape is  time (month/year).

      Although visualising the effect of time on the parasite relatedness network would be a valuable addition, we did not find any intuitive and simple way of doing so. Using shapes to represent time might end up being more confusing than helpful, especially because the sampling was not done at fixed intervals.

      b. To further support the statement of clustering at the household  level, it might be useful to add a (supplementary) figure with the  network with household number/IDs as color or shape. In the network,  there seems to be a lot of relatedness within the villages and between  villages. Perhaps looking only at the distribution of the proportion of  highly related isolates is simplifying the data too much. Besides, there is no statistical difference between clustering at the household vs  within-village levels as indicated in Figure 3.

      Unfortunately, there are too many households (71 in Figure 2) to make a figure with one color or shape per household readable. The statistical test of the difference between the within household and within village relatedness yielded a p value above the cutoff of 0.05 (p value of 0.084). However, it is possible that the lack of significance arises from the relatively low number of data points available in the “within household” group. This is even more plausible considering the statistical difference of both “within household” and “within village” groups with “between village” group. Overall, our results indicate a decreasing parasite relatedness with spatial distance, and that more investigation would be needed to quantify the difference between “within household” and “within village” groups. 

      (19) Figure 4: Please add more description in the caption of this figure to help interpret what is displayed here. Figure 4A is hard to  interpret and does not seem to show more than is already shown in Figure 3A. What do the dots represent in Figure 4B? It is not clear what is  presented here.

      Compared to Figure 3A, Figure 4A enables the visualization of the relatedness between each individual pair of time points, which are later used in the comparison of relatedness between seasonal groups in Figure 4B. For this reason, we believe that Figure 4A should remain in the manuscript. However, we agree that the relationship between Figure 4A and Figure 4B is not intuitive in the way we presented it initially. For this reason, we added more details in the legend and modified Figure 4A to highlight the seasonal groups used in Figure 4B. 

      (20) Line 360-361: what did you do when haplotypes were not identical?

      We explained it in the methods section (Lines 144-146): in this case, only WGS haplotypes were kept.

      (21) Section chronic infections: it is important to mention that the  majority of chronic infections are individuals from the monthly  dry-season cohort.

      We added a statement about the 21 chronically infected individuals that were also part of the December 2016 – May 2017 monthly follow-up (Lines 423-426).

      (22) Lines 381-386: Did you investigate COI in these individuals? Could  it be co-circulating strains that you do not pick up at all times due to the consensus barcodes and discarding of mixed genotypes (and does not  necessarily show intra-host competition. That is speculation and should  perhaps not be in the results)?

      This is exactly what we think is happening. Due to the very nature of genotyping, only one strain may be observed at a time in the case of a co-infection, where distinct but related strains are simultaneously present in the host. The picked-up strain is typically the one with the highest relative abundance at the time of sampling. As the reviewer stated, fluctuation of strain abundance might not only be due to intra-host competition but also asynchronous development stages of the two strains. We added this observation to the manuscript (Lines 432-435).

      (22) Figure 6: highlight the samples where the barcode was not available in a different color to be able to see the difference between a  non-matching barcode and missing data.

      We thank the reviewer for this great suggestion. We have now added to Figure 6 barcodes available along with their level of relatedness with the dominant genotypes for each continuous infections.

      (24) Improve the discussion by adding a clear summary of the main  findings and their implications, as well as study-specific limitations.

      The Discussion has been updated with a paragraph summarizing the primary results (Lines 451-457).

      (25) Line 445: "implying that the whole population had been replaced in just one year "

      a. What do you mean by replaced? Did other populations replace the  existing populations? I am not sure the lack of IBD is enough to show  that the population changed/was replaced. Perhaps it is more accurate to say that the same population evolved. Nevertheless, other measures such as genetic diversity and genetic differentiation or population  structure.would be more suitable to strengthen these conclusions.

      We agree that “replaced” was the wrong term in this case. We rather intended to describe how the numerous recombinations between malaria parasites completely reshaped the same initial population which gradually displayed lower levels of relatedness over time. We updated the manuscript accordingly (Lines 507-512).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 260: Remove Collins 22.

      Fixed.

      (2) Lines 270-274: 73 + 213 = 286 not 284; sum of percentages is equal to 101%.

      The numbers are correct: the 73 barcodes identical (IBD >= 0.9) to another barcode are a subset of the 213 related (IBD >= 0.5) to another barcode. However we agree that this might be confusing and will considering barcodes to be related if they have an IBD between 0.5 and 0.9, while excluding those with an IBD >= 0.9. The text has been updated (Lines 299-301).

      (3) Section: "Independence of seasonality and drug resistance markers prevalence".

      The text has been revised and the supplementary figure is now a main figure.

      (4) For readers unaware of malaria control policy in the Gambia it would be helpful to have more details on the specifics of anti-malarial drug  administration.

      We added the drugs used in SMC (sulfadoxine-pyrimethamine and amodiaquine) and the first line antimalarial treatment in use in The Gambia during our study (Coartem) (Lines 383-388).

      Reviewer #3 (Recommendations for the authors):

      (1) The abstract is not as clear as the authors' summary. For example, I found the sentence starting with "with 425 P. falciparum..." hard to  follow.

      The abstract has been updated.

      (2) It is better to consistently use "barcode genotyping "or "genotyping by barcode". Sometimes "molecular genotyping" is used instead of  "barcode genotyping"

      We have now replaced all occurrences of “barcode genotyping” with “molecular genotyping” or “molecular barcode genotyping”. We prefer to stick with “molecular genotyping” as this let us distinguish between the molecular and the genomic barcode.

      (3) The introduction is quite disjoined and does not provide a clear  build-up to the gap in knowledge that the study is attempting to fill.  please revise.

      Introduction is now thoroughly revised.

      (4) Line 31 "with notable increase of parasite differentiation" is an interpretation and not an observation.

      We have modified that sentence (Lines 31-33).

      (5) Overall, the introduction requires substantial revision.

      Introduction is now thoroughly revised.

      (6) Line 70 "parasite population adapts..." I thought this required phenotypic analysis and not genetics?

      The idea is that population of parasites may adapt to environmental conditions (such as seasonality) by selecting the most fitted genotypes. For instance, antimalarial exposure has an effect of selecting parasites with specific mutations in drug resistance related genes, and this even appears to be transient (for example with chloroquine). As such, there is good reason to think that seasonality might have a similar effect on parasite genetics.

      (7) Line 129-130: the #442 is not reflected in the schematic Figure 1.

      This is an intentional choice to make the figure more synthetic. For this reason, we included the Figure S1, which provides more details on the data collection and analysis pipeline.

      (8) Line 242-243: "Made with natural earth". What is this?

      This is a statement acknowledging the use of Natural Earth data to produce the map presented in Figure 1A.

      (9) Line 260: "collins22", is this a reference?

      Fixed.

      (10) Line 269-70. Very hard to follow. Please revise.

      We changed the text (Lines 293-297).

      (11) Line 324: similarly... I think there is a typo here.

      We did not find any typo in this specific sentence. However, “Similarly to Figure 3” sounds maybe a bit off, so we changed it to “As in Figure 3” (Line 351).

      (12) Line 332-334: very hard to follow. please revise. Again, the lower  parasite relatedness during the transition from low to high was linked  to recombination occurring in the mosquito but what about infection  burden shifting to naive young children? Is there a role for host  immunity in the observed reduction in parasite-relatedness during the  transition period?

      This text has been rewritten (Lines 356-361).

      About the hypothesis of infection burden shifting to naïve young children, this question is difficult to address in The Gambia because children under 5 years old received Seasonal Malaria Chemoprophylaxis during the high transmission season. In older children (6-15 years old), the prevalence was similar to adults (Fogang et al., 2024).

      About the role of host immunity on parasite relatedness across time and space, our dataset is too small to divide it in different age groups. Further studies should address this very interesting question.

    1. eLife Assessment

      The authors show MRI relaxation time changes that are claimed to originate from cell membrane potential changes. This would be a substantial contribution if true because it may provide a mechanism whereby membrane potential changes could be inferred noninvasively. However, the membrane potential manipulations applied here are performed on a slow time scale and are known to induce cell swelling. Cell swelling has been previously shown to affect relaxation time. Experiments could be performed to rule out this hypothesis, but the authors have chosen not to perform these experiments. The study is therefore useful, but the evidence is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K+. The motivation has been revised to state that the results suggest a potential approach to non-invasively detect changes in membrane potential using MRI.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used, and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. The results are consistent with Stroman et al. Magn. Reson. in Med. 59:700-706 (increased T2 with KCL) as well as some other cited work. However all those authors emphasize that cell swelling is the mechanism, not cell membrane potentials.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same timecourse as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the degree cell swelling or shrinkage. Measuring intracellular potential is not enough to clarify the mechanism.

      So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. And in fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seem in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling. The First line of the discussion still claims that T2 relaxation time and pool size ratio (PSR) can detect responses to membrane potential changes modulated by ionic solutions. However, in the absence of cell swelling controls, this cannot be stated.

      For this mechanism to be relevant to measuring neuronal activity directly or explaining techniques such DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potentials.

      Comments on revisions:

      The manuscript is well written and my previous methodological concerns have been clarified as well. There are no flaws in the experiments, but the interpretation really depends on simultaneous measurements of cell volume and membrane potential, which have yet to be done.

    3. Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate a mechanism whereby magnetic resonance imaging (MRI) can reflect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions that are known to depolarize or hyperpolarize excitable cells. The authors specifically examine two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K+ and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with parallel measurements of voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke T2 increases in rat brains, and that these changes are reversed when potassium is renormalized. Min et al. argue that their results suggest a basis for noninvasive functional imaging of cellular voltage signals. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven insufficient for neuroscientific or clinical applications. The current paper suggests that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate correlates of membrane potential if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using quantitative tests that include some controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      The major weakness of the paper is that it uses only slow correlational experiments to probe the relationship between MRI contrast and membrane potential. The authors do not examine effects on the subsecond time scale that is of greatest interest, and they do not adequately consider how biophysical factors with only loose relationship to electrophysiological variables could explain their imaging results. Notably, depolarizing ionic solutions that perturb membrane potential can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. In their revised manuscript, the authors acknowledge that cell swelling might contribute to the MRI signals they report, but they do nothing to probe the contributions or characteristics of such effects. If cell swelling accounted for the author's MRI results, it would likely operate on a time scale far too slow to yield useful indications of membrane potential. Given these considerations and the absence of data demonstrating correspondence of electrophysiological measures with MRI readouts on a fast time scale, the paper fails to provide evidence that membrane potential changes can be meaningfully detected by MRI.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K<sup>+</sup>. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try to provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      [Reviewer 1, Comment 1] While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same time course as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      [Reviewer 1, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and also take into account the reviewer 2’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Finally, when [K<sup>+</sup>]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes that appear to influence T<sup>2</sup> changes. Our follow-up study shows that there are differences in volume changes for the same T<sup>2</sup> change in the following two different situations: pure osmotic volume changes versus [K<sup>+</sup>]-induced volume changes. For example, for the same T<sup>2</sup> change, the volume change for depolarization is greater than the volume change for hypoosmotic conditions. We will present these results in this coming ISMRM 2025 and are also preparing a manuscript to report shortly.

      [Reviewer 1, Comment 2] So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      [Reviewer 1, Response 2] In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly mentioned as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T<sup>2</sup> and PSR) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 above.

      There are a few smaller issues that should be addressed.

      [Reviewer 1, Comment 3] (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      [Reviewer 1, Response 3] We appreciate the reviewer’s suggestion regarding imaging sequences. In fact, we used dictionaries for fitting in vivo T<sup>2</sup> decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T<sup>2</sup> maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interest while balancing scan time constraints.

      [Reviewer 1, Comment 4] (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      [Reviewer 1, Response 4] The T<sup>2</sup> decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T<sub>2</sub> decay curve using the technique developed by McPhee and Wilman (2017).

      [Reviewer 1, Comment 5] (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      [Reviewer 1, Response 5] We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We described the imaging slice more clearly in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We clarified this point in the revised manuscript to avoid any misunderstanding.

      [Reviewer 1, Comment 6] (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      [Reviewer 1, Response 6] As requested by the reviewer, we included the absolute values in the supplementary information.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K<sup>+</sup> and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      [Reviewer 2, Comment 1] The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      [Reviewer 2, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 to Reviewer 1’s Comment 1 above.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and also consider the reviewer’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      [Reviewer 1, Comment 7] The manuscript is well written. One thing to emphasize early on is that the KCL depolarization is done in an equimolar (or isotonic) manner. I was not clear on this point until I got to the very end of the methods. This is a strength of the paper and should be presented earlier.

      [Reviewer 1, Response 7] In response to the reviewer’s suggestion, we have revised the manuscript to present the equimolar characteristic of our experiment earlier.

      [Reviewer 1, Comment 8] In terms of experiments, the relaxation time measurements are not well constructed. They should be done with a CPMG sequence with hundreds of echos and properly curve fit. This is entirely possible on a Bruker spectrometer.

      [Reviewer 1, Response 8] As noted in our Response to Reviewer 1’s Comment 3, while a CPMG sequence with numerous echoes and straightforward curve fitting can be effective, it is less feasible for in vivo experiments. Our multi-echo spin-echo sequence was a balanced approach between spatial resolution, reasonable scan duration, and the need to localize signals within specific regions of interest.

      [Reviewer 1, Comment 9] Measurements of cell swelling should be done to determine the time course of the cell swelling. This could be with NMR (CPMG) or with light scattering. For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity.

      [Reviewer 1, Response 9] We acknowledge the importance of further research to further strengthened the claims of this study through additional experiments such as cell volume recording. We will do it in future studies.

      As noted in our Response 2 to Reviewer 1’s Comment 2, this study does not address rapid membrane potential changes on the millisecond scale, and we acknowledge that establishing the precise timing of cell swelling is crucial for fully understanding the mechanisms of DIANA. Our current work demonstrates that MR parameters (e.g., T<sup>2</sup> and PSR) correlate strongly with membrane potential-modulating ionic environments, but it does not extend to millisecond-scale neural activation. We recognize the importance of further experiments, such as direct cell volume measurements and plan to incorporate it in future studies to build on the insights gained from the present work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few comments, questions, and suggestions for improvement:

      [Reviewer 2, Comment 2] I could not find much information about the various incubation times and delays used for the authors' in vitro experiments. For each of the in vitro experiments in particular, how long were cells exposed to the stated ionic condition prior to imaging, and how long did the imaging take? Could this and any other relevant information about the experimental timing please be provided and added to the methods section?

      [Reviewer 2, Response 2] We have included the information about the preparation/incubation times in the revised manuscript. For the scan time, it was already stated in the original manuscript: 23 minutes for the single-echo spin-echo sequence and 23 minutes for the inversion-recovery multi-echo spin-echo, for a total of 46 minutes.

      [Reviewer 2, Comment 3] In what format were the cells used for patch clamping, and were any controls done to ensure that characteristics of these cells were the same as those pelleted and imaged in the MRI studies? How long were the incubation times with ionic solutions in the patch clamp experiment? This information should likewise be added to the paper.

      [Reviewer 2, Response 3] We have clarified in the revised manuscript that SH-SY5Y cells were patch clamp-measured in their adherent state. On the other hand, the cells were dissociated from the culture plate and pelleted, so the experimental environments were not entirely identical. The patch clamp experiments involved a 20–30 minutes incubation period with the ionic solutions. We have included this information in the revised manuscript.

      [Reviewer 2, Comment 4] Can the authors provide information about the mean cell size observed under each condition in their in vitro experiments?

      [Reviewer 2, Response 4] We did not directly quantify the mean cell size for each in vitro condition in this study, so we do not have corresponding data. However, we acknowledge that this information could provide valuable insights into potential mechanisms underlying the observed MR parameter changes. In future experiments, we plan to include direct cell-size measurements to further elucidate how changes in cell volume or hydration contribute to our MR findings.

      [Reviewer 2, Comment 5] The ionic challenges used both in vitro and in vivo could also have affected cell permeability, with corresponding effects that would be detectable in diffusion weighted imaging. Did the authors examine this or obtain any results that could reflect on contributions of permeability properties to the contrast effects they report?

      [Reviewer 2, Response 5] We did not perform diffusion-weighted imaging and therefore do not have direct data regarding changes in cell permeability. We agree that incorporating diffusion-weighted measurements could help distinguish whether the MR parameters changes are driven primarily by membrane potential shifts, cell volume changes, or variations in permeability properties. We will consider these approaches in our future studies.

      [Reviewer 2, Comment 6] Clearly, a faster stimulation method such as optogenetics, in combination with time-locked MRI readouts of the pelleted cells, would be more effective at demonstrating a useful relationship between cellular neurophysiology and MRI contrast in vitro. Can the authors present data from such an experiment? Is there any information they can present that documents the time course of observed responses in their experiments?

      [Reviewer 2, Response 6] In the current study, our methodology did not include time-resolved or dynamic measurements. While it may be possible to obtain indirect information about the temporal dynamics using T<sup>2</sup>-weighted or MT-weighted imaging, such an experiment was beyond the scope of this work. However, we agree that an optogenetic approach with time-locked MRI acquisitions could help directly link cell physiology to MRI contrast, and we will explore this in future studies.

      [Reviewer 2, Comment 7] The authors used a drug cocktail to suppress hemodynamic effects in the experiments of Figs. 5-6. What evidence is there that this cocktail successfully suppresses hemodynamic responses and that it also preserves physiological responses to the ionic challenges used in their experiments? Were analogous in vivo results also obtained in the absence of the cocktail?

      [Reviewer 2, Response 7] We appreciate the reviewer’s concern regarding pharmacological suppression of hemodynamic effects. Although each component is known to inhibit nitric oxide synthesis, we did not directly measure the degree of hemodynamic suppression in this study. In addition, we cannot definitively confirm that these agents preserved the physiological responses to the ionic challenges. We have clarified these points in the revised manuscript and identified them as limitations of the study.

      [Reviewer 2, Comment 8] Why weren't PSR results reported as part of the in vivo experimental results in Fig. 5? Does PSR continue to vary inversely to T2 in these experiments?

      [Reviewer 2, Response 8] In our current experimental setup, acquiring the T<sup>2</sup> map four times required 48 minutes, and extending the scan to include additional quantitative MT measurements for PSR would have significantly prolonged the scanning session. Given that these experiments were conducted on acutely craniotomized rats, maintaining stable physiological conditions for such a long period of time was challenging. Therefore, due to time constraints, we did not perform MT measurements and focused on T<sub>2</sub> mapping.

      [Reviewer 2, Comment 9] The authors have established in vivo optogenetic stimulation paradigms in their laboratory and used them in the Toi et al. DIANA study. Were T2 or PSR changes observed in vivo using standard T2 measurement or T2-weighted imaging methods that do not rely on the DIANA pulse sequence they originally applied?

      [Reviewer 2, Response 9] Our current T<sub>2</sub> mapping experiments utilized a standard multi-echo spin-echo sequence, rather than the DIANA pulse sequence employed in our previous work. In this respect, the T<sub>2</sub> changes we observed in vivo do not rely on the specialized DIANA methodology.

      [Reviewer 2, Comment 10] In the discussion section, the authors state that to their knowledge, theirs "is the first report that changes in membrane potential can be detected through MRI." This cannot be true, as their own Toi et al. Science paper previously claimed this, and a number of the studies cited on p.2 also claimed to detect close correlates of neuroelectric activity. This statement should be amended or revised.

      [Reviewer 2, Response 10] We appreciate the reviewer’s comment. We have revised the discussion section of the manuscript to reflect the points raised by the reviewer.

      [Reviewer 2, Comment 11] Because the current study does not actually demonstrate that changes in membrane potential can be detected by MRI, the authors should alter the title, abstract, and a number of relevant statements throughout the text to avoid implying that this has been shown. The title, for instance, could be changed to "Responses to depolarizing and hyperpolarizing ionic solutions measured by magnetic resonance imaging of excitable cells and rat brains," or something along these lines.

      [Reviewer 2, Response 11] We appreciate the reviewer’s suggestions. We have revised the title, abstract, and relevant statements of the manuscript to clarify that our findings show MR-detectable responses to ionic solutions that are expected to modulate membrane potential, rather than demonstrating direct detection of membrane potential changes by MRI.

      [Reviewer 2, Comment 12] The axes in Fig. 3 seem to be mislabeled. I think the horizontal axes are supposed to be membrane potential measured in mV.

      [Reviewer 2, Response 12] Thank the reviewer for finding an error. We have corrected the axis labels in Figure 3 to indicate membrane potential (in mV) on the horizontal axis.

      [Reviewer 2, Comment 13] Since neither the experiments in Jurkat cells (Fig. 4) nor the in vivo MRI tests (Fig. 5-6) appear to have made in conjunction with membrane potential measurements, it seems like a stretch to refer to these experiments as involving manipulation of membrane potentials per se. Instead, the authors should refer to them as involving administration of stimuli expected to be depolarizing or hyperpolarizing. The "hyperpolarization" and "depolarization" labels of Fig. 4 similarly imply a result that has not actually been shown, and should ideally be changed.

      [Reviewer 2, Response 13] To prevent any misleading that membrane potential changes were directly measured in Jurkat cells or in vivo, we have revised the relevant text and figure labels.

      [Reviewer 2, Comment 14] The changes in T2 and PSR documented with various K<sup>+</sup> challenges to Jurkat cells in Fig. 4 seem to follow a step-function-like profile that differs from the results reported in SH-SY5Y cells. Can the authors explain what might have caused this difference?

      [Reviewer 2, Response 14] We currently do not have a definitive explanation for why Jurkat cells exhibit a step-function-like response to varying K⁺ levels, whereas SH-SY5Y cells show a linear response to log [K<sup>+</sup>]. Experiments that include direct membrane potential measurements in Jurkat cells would help clarify whether this difference arises from genuinely different patterns of depolarization/hyperpolarization or from other factors. We have revised the revised manuscript to address this point.

    1. eLife Assessment

      A regression discontinuity analysis finds essentially no effect of 1 additional year of secondary education on brain structure in adulthood. This is a valuable finding that adds to the literature on the impact of education on brain health. While the finding is convincing on its own, as the analysis was pre-registered and very carefully conducted, the impact is limited as the manipulated variable only relates to a single additional year of education (remaining in education to 15 vs 16 years of age).

    2. Reviewer #2 (Public review):

      Summary:

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity anlaysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood.

      Strengths:

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative anlayses).

      Weaknesses:

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome measured long after the intervention. Thus a null finding here is completely consistent educational attainement (EA) in fact having an impact on brain structure, where EA may reflect elements of training after second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no.

    3. Reviewer #3 (Public review):

      Summary:

      This study investigates evidence for a hypothesised, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume and diffusivity.

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality.

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships.

      Strengths:

      - A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples.<br /> - This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry and intended analyses, with only minor, justifiable changes in the final analysis.<br /> - The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education.<br /> - The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others.<br /> - The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area.

      Weaknesses:

      - This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year old British students born around 1957 who also participate in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario.<br /> - The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. However, other studies have not found specific issues that would invalidate ROSLA as a natural experiment.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This fascinating manuscript studies the effect of education on brain structure through a natural experiment. Leveraging the UK BioBank, these authors study the causal effect of education using causal inference methodology that focuses on legislation for an additional mandatory year of education in a regression discontinuity design. 

      Strengths: 

      The methodological novelty and study design were viewed as strong, as was the import of the question under study. The evidence presented is solid. The work will be of broad interest to neuroscientists 

      Weaknesses: 

      There were several areas which might be strengthed from additional consideration from a methodological perspective. 

      We sincerely thank the reviewer for the useful input, in particular, their recommendation to clarify RD and for catching some minor errors in the methods (such as taking the log of the Bayes factors). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The fuzzy local-linear regression discontinuity analysis would benefit from further description. 

      (2) In the description of the model, the terms "smoothness" and "continuity" appear to be used interchangeably. This should be adjusted to conform to mathematical definitions. 

      We have now added to our explanations of continuity regression discontinuity. In particular, we now explain “fuzzy”, and add emphasis on the two separate empirical approaches (continuity and local-randomization), along with fixing our use of “smoothness” and “continuity”.

      results:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (3) The optimization of the smoother based on MSE would benefit from more explanation and consideration. How was the flexibility of the model taken into account in testing? Were there any concerns about post-selection inference? A sensitivity analysis across bandwidths is also necessary. Based on the model fit in Figure 1, results from a linear model should also be compared. 

      It is common in the RD literature to illustrate plots with higher-order polynomial fits while inference is based on linear (or at most quadratic) models (Cattaneo, Idrobo & Titiunik, 2019). We agree that this field-specific practice can be confusing to readers. Therefore, we have redone Figure 1 using local-linear fits better aligning with our analysis pipeline. Yet, it is still not a one-to-one alignment as point estimation and confidence are handled robustly while our plotting tools are simple linear fits. In addition, we updated Sup. Fig 3 and moved 3rd-order polynomial RD plots to Sup. Fig 4.

      Empirical RD has many branching analytical decisions (bandwidth, polynomial order, kernel) which can have large effects on the outcome. Fortunately, RD methodology is starting to become more standardized (Catteneo & Titiunik, 2022, Ann. Econ Rev) as there have been indications of publication bias using these methods (Stommes, Aronow & Sävje, 2023, Research and Politics (This paper suggest it is not researcher degrees of freedom, rather inappropriate inferential methods)). While not necessarily ill-intended, researcher degrees of freedom and analytic flexibility are major contributors to publication bias. We (self) limited our analytic flexibility by using pre-registration (https://osf.io/rv38z).

      One of the most consequential analytic decisions in RD is the bandwidth size as there is no established practice, they are context-specific and can be highly influential on the results. The choice of bandwidths can be framed as a ‘bias vs. variance trade-off’. As bandwidths increase, variance decreases since more subjects are added yet bias (misspecification error/smoothing bias) also increases (as these subjects are further away and less similar). In our case, our assignment (running/forcing) variable is ‘date of birth in months’; therefore our smallest comparison would be individuals born in August 1957 (unaffected/no treatment) vs September 1957 (affected/treated). This comparison has the least bias (subjects are the most similar to each other), yet it comes at the expense of very few subjects (high variance in our estimate). 

      MSE-derived bandwidths attempt to solve this issue by offering an automatic method to choose an analysis bandwidth in RD. Specifically, this aims to minimize the MSE of the local polynomial RD point estimator – effectively choosing a bandwidth by balancing the ‘bias vs. variance trade-off’ (explained in detail 4.4.2 Cattaneo et al., 2019 p 45 - 51 “A practical introduction to regression discontinuity designs: foundations”). Yet, you are very correct in highlighting potential overfitting issues as they are “by construction invalid for inference” (Calonico, Cattaneo & Farrell, 2020, p. 192). Quoting from Cattaneo and Titiunik’s Annual Review of Economics from 2022: 

      “Ignoring the misspecification bias can lead to substantial overrejection of the null hypothesis of no treatment effect. For example, back-of-the-envelop calculations show that a nominal 95% confidence interval would have an empirical coverage of about 80%.”

      Fortunately, modern RD analysis packages (such as rdrohust or RDHonest) calculate robust confidence intervals - for more details see Armstrong and Kolesar (2020). For a summary on MSE-bandwidths see the section “Why is it hard to estimate RD effects?” in Stommes and colleagues 2023 (https://arxiv.org/abs/2109.14526). For more in-depth handling see the Catteneo, Idrobo, and Titiunik primer (https://arxiv.org/abs/1911.09511).

      Lastly, with MSE-derived bandwidths sensitivity tests only make sense within a narrow window of the MSE-optimized bandwidth (5.5 Cattaneo et al., 2019 p 106 - 107). When a significant effect occurs, placebo cutoffs (artificially moving the cutoff) and donut-hole analysis are great sensitivity tests. Instead of testing our bandwidths, we decided to use an alternate RD framework (local randomization) in which we compare 1-month and 5-month windows. Across all analysis strategies, MRI modalities, and brain regions, we do not find any effects of the education policy change ROSLA on long-term neural outcomes.

      (4) In the Bayesian analysis, the authors deviated from their preregistered analytic plan. This whole section is a bit confusing in its current form - for example, point masses are not wide but rather narrow. Bayes factors are usually estimated; it is unclear how or why a prior was specified. What exactly is being modeled using a prior? Also, throughout - If the log was taken, as the methods seem to indicate for the Bayes factor, this should be mentioned in figures and reported estimates. 

      First, we would like to thank you for spotting that we incorrectly kept the log in the methods. We have fixed this and added the following sentence to the methods: 

      “Bayes factors are reported as BF<sub>10</sub> in support of the alternative hypothesis, we report Bayes factors under 1 as the multiplicative inverse (BF<sub>01</sub> = 1/BF)”

      All Bayesian analyses need to have a prior. In practice, this becomes an issue when you’re uncertain about 1) the location of the effect (directionality & center mass, defined by a location parameter), yet more importantly, the 2) confidence/certainty of the range-spread of possible effects (determined by a scale parameter). In normally distributed priors these two ‘beliefs’ are represented with a mean and a standard deviation (the latter impacts your confidence/certainty on the range of plausible parameter space). 

      Supplementary figure 6 illustrates several distributions (location = 0 for all) with varying scale parameters; when used as Bayesian priors this indicates differing levels of confidence in our certainty of the plausible parameter space. We illustrate our three reported, normally distributed priors centered at zero in blue with their differing scale parameters (sd = .5, 1 & 1.5).

      All of these five prior distributions have the same location parameter (i.e., 0) yet varying differences in the scale parameter – our confidence in the certainty of the plausible parameter space. At first glance it might seem like a flat/uniform prior (not represented) is a good idea – yet, this would put equal weight on the possibility of every estimate thereby giving the same probability mass to implausible values as plausible ones. A uniform prior would, for instance, encode the hypothesis that education causing a 1% increase in brain volume is just as plausible as it causing either a doubling or halving in brain volume. In human research, we roughly know a range of reasonable effect sizes and it is rare to see massive effects.

      A benefit of ‘weakly-informative’ priors is that they limit the range of plausible parameter values. The default prior in STAN (a popular Bayesian estimation program; https://mc-stan.org) is a normally distributed prior with a mean of zero and an SD of 2.5 (seen in orange in the figure; our initial preregistered prior). This large standard deviation easily permits positive and negative estimates putting minimal emphasis on zero. Contrast this to BayesFactor package’s (Morey R, Rouder J, 2023) default “wide” prior which is the Cauchy distribution (0, .7) illustrated in magenta (for more on the Cauchy see: https://distribution-explorer.github.io/continuous/cauchy.html). 

      These different defaults reflect differing Bayesian philosophical schools (‘estimate parameters’ vs ‘quantify evidence’ camps); if your goal is to accurately estimate a parameter it would be odd to have a strong null prior, yet (in our opinion) when estimating point-null BF’s a wide default prior gives far too much evidence in support of the null. In point-null BF testing the Savage-Dickey density ratio is the ratio between the height of the prior at 0 and the height of the posterior at zero (see Figure under section “testing against point null 0”). This means BFs can be very prior sensitive (seen in SI tables 5 & 6). For this reason, we thought it made sense to do prior sensitivity testing, to ensure our conclusions in favor of the null were not caused solely by an overly wide prior (preregistered orange distribution) we decided to report the 3 narrower priors (blue ones).

      Alternative Bayesian null hypotheses testing methods such as using Bayes Factors to test against a null region and ‘region of practical equivalence testing’ are less prior sensitive, yet both methods demand the researcher (e.g. ‘us’) to decide on a minimal effect size of practical interest. Once a minimal effect size of interest is determined any effect within this boundary is taken as evidence in support of the null hypothesis.

      (5) It is unclear why a different method was employed for the August / September data analysis compared to the full-time series. 

      We used a local-randomization RD framework, an entirely different empirical framework than continuity methods (resulting in a different estimate). For an overview see the primer by Cattaneo, Idrobo & Titiunik 2023 (“A Practical Introduction to Regression Discontinuity Designs: Extensions”; https://arxiv.org/abs/2301.08958).

      A local randomization framework is optimal when the running variable is discrete (as in our case with DOB in months) (Cattaneo, Idrobo & Titiunik 2023). It makes stronger assumptions on exchangeability therefore a very narrow window around the cutoff needs to be used. See Figure 2.1 and 2.2 (in the Cattaneo, Idrobo & Titiunik 2023) for graphical illustrations of 1) a randomized experiment, 2) a continuity RD design, and 3) local-randomization RD. Using the full-time series in a local randomization analysis is not recommended as there is no control for differences between individuals as we move further away from the cutoff – making the estimated parameter highly endogenous.

      We understand how it is confusing to have both a new framework and Bayesian methods (we could have chosen a fully frequentist approach) but using a different framework allows us to weigh up the aforementioned ‘bias vs variance tradeoff’ while Bayesian methods allow us to say something about the weight of evidence (for or against) our hypothesis.

      (6) Figure 1 - why not use model fits from those employed for hypothesis testing? 

      This is a great suggestion (ties into #3), we have now redone Figure 1.

      (7) The section on "correlational effect" might also benefit from additional analyses and clarifications. Indeed, the data come from the same randomized experiment for which minimum education requirements were adjusted. Was the only difference that the number of years of education was studied as opposed to the cohort? If so, would the results of this analysis be similar in another subsample of the UK Biobank for which there was no change in policy?

      We have clarified the methods section for the correlational/associational effect. This was the same subset of individuals for the local randomization analysis; all we did was change the independent variable from an exogenous dummy-coded ROSLA term (where half of the sample had the natural experiment) to a continuous (endogenous) educational attainment IV. 

      In principle, the results from the associational analysis should be exactly the same if we use other UK Biobank cohorts. To see if the association of education attainment with the global neuroimaging cohorts was similar across sub-cohorts of new individuals, we conducted post hoc Bayesian analysis on eight more subcohort of 10-month intervals, spaced 2 years apart from each other (Sup. Figure 7; each indicated by a different color). Four of these sub-cohorts predate ROSLA, while the other four are after ROSLA. Educational attainment is slowly increasing across the cohorts of individuals born from 1949 until 1965; intriguingly the effect of ROSLA is visually evident in the distributions of educational attainment (Sup. Figure 7). Also, as seen in the cohorts predating ROSLA more and more individuals were (already) choosing to stay in education past 15 years of age (see cohort 1949 vs 1955 in Sup. Figure 7).

      Sup. Figure 8 illustrates boxplots of the educational attainment posterior of the eight sub-cohorts in addition to our original analysis (s1957) using a normal distributed prior with a mean of 0 and a sd of 1. Total surface area shows a remarkably replicable association with education attainment. Yet, it is evident the “extremely strong” association we found for CSF was a statistical fluke – as the posterior of other cohorts (bar our initial test) crosses zero. The conclusions for the other global neuroimaging covariates where we concluded ‘no associational effect’ seems to hold across cohorts.

      We have now added methods, deviation from preregistration, and the following excerpt to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity analysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood. 

      Strengths: 

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative analyses). 

      Weaknesses: 

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome is measured long after the intervention. Thus a null finding here is completely consistent educational attainment (EA) in fact having an impact on brain structure, where EA may reflect elements of training after a second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no. 

      The work also does not address the impact of the UK Biobank's well-known healthy volunteer bias (Fry et al., 2017) which is yet further magnified in the imaging extension study (Littlejohns et al., 2020). Under-representation of people with low EA will dilute the effects of EA and impact the interpretation of these results. 

      References: 

      Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology, 186(9), 1026-1034. https://doi.org/10.1093/aje/kwx246 

      Littlejohns, T. J., Holliday, J., Gibson, L. M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J. D., Boultwood, C., Collins, R., Conroy, M. C., Crabtree, N., Doherty, N., Frangi, A. F., Harvey, N. C., Leeson, P., Miller, K. L., Neubauer, S., Petersen, S. E., Sellors, J., ... Allen, N. E. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature Communications, 11(1), 2624. https://doi.org/10.1038/s41467-020-15948-9 

      We thank the reviewer for the positive comments and constructive feedback, in particular, their emphasis on volunteer bias in UKB (similar points were mentioned by Reviewer 3). We have now addressed these limitations with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We also highlighted it both in the results and methods.

      We appreciate that one year of education may seem modest compared to the entire educational trajectory, but as an intervention, we disagree that one year of education is ‘a very modest manipulation’. It is arguably one of the largest positive manipulations in childhood development we can administer. If we were to translate a year of education into the language of a (cognitive) intervention, it is clear that the manipulation, at least in terms of hours, days, and weeks, is substantial. Prior work on structural plasticity (e.g., motor, spatial & cognitive training) has involved substantially more limited manipulations in time, intensity, and extent. There is even (limited) evidence of localized persistent long-term structural changes (Wollett & Maguire, 2011, Cur. Bio.).

      We have now also highlighted the limited generalizability of our findings since we estimate a ‘local’ average treatment effect. It is possible higher education (college, university, vocational schools, etc.) could impact brain structure, yet we see no theoretical reason why it would while secondary wouldn’t. Moreover, higher education education is even trickier to research empirically due to heightened self and administrative selection pressures. While we cannot discount this possibility, the impacts of endogenous factors such as genetics and socioeconomic status are most likely heightened. That being said, higher education offers exciting possibilities to compare more domain-specific processes (e.g., by comparing a philosophy student to a mathematics student). Causality could be tested in European systems with point entry into field-specific programs – allowing comparison of students who just missed entry criteria into one topic and settled for another.

      Regarding the amount of time following the manipulation, as we highlight in our discussion this is both a weakness and a strength. Viewed from a developmental neuroplasticity lens it would have been nice to have imaging immediately following the manipulation. Yet, from an aging perspective, our design has increased power to detect an effect.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors assert there is no strong causal evidence for EA on brain structure. This overlooks work from Mendielian Randomisation, e.g. this careful work: https://pubmed.ncbi.nlm.nih.gov/36310536/ ... evidence from (good quality) MR studies should be considered. 

      We thank the reviewer for highlighting this well-done mendelian randomization study. We have now added this citation and removed previous claims on the “lack of causal evidence existing”. We refrain from discussing Mendelian randomization, as it it would need to be accompanied by a nuanced discussion on the strong limitations regarding EduYears-PGS in Mendelian randomization designs.

      (2) Tukey/Boxplot is a good name for your identification of outliers but your treatment of outliers has a well-recognized name that is missing: Windsorisation. Please add this term to your description to help the reader more quickly understand what was done. 

      Thanks, we have now added the term winsorized.

      (3) Nowhere is it plainly stated that "fuzzy" means that you allow for imperfect compliance with the exposure, i.e. some children born before the cut-off stayed in school until 16, and some born after the cut-off left school before 16. For those unfamiliar with RD it would be very helpful to explain this at or near the first reference of the term "fuzzy". 

      We have now clarified the term ‘fuzzy’ to the results and methods:

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (4) Supplementary Figure 2 never states what the percentage actually measures. What exactly does each dot represent? Is it based on UK Biobank subjects with a given birth month? If so clarify. 

      Fixed!

      Reviewer #3 (Public review): 

      Summary: 

      This study investigates evidence for a hypothesized, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume, and diffusivity. 

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year-olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality. 

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships. 

      Strengths: 

      (1) A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples. 

      (2) This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry, and intended analyses, with only minor, justifiable changes in the final analysis. 

      (3) The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education. 

      (4) The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others. 

      (5) The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area. 

      Weaknesses: 

      (1) This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year-old British students born around 1957 who also participated in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario. 

      (2) The authors address potential concerns about the validity of modelling assumptions and the sensitivity of the regression discontinuity design approach. However, the possibility of selection and cohort bias remains and is not discussed clearly in the paper. Other studies (e.g. Davies et al 2018, https://www.nature.com/articles/s41562-017-0279-y) have used the same policy intervention to study other health-related outcomes and have established ROSLA as a valid naturalistic experiment. Still, quoting Davies et al. (2018), "This assumes that the participants who reported leaving school at 15 years of age are a representative sample of the sub-population who left at 15 years of age. If this assumption does not hold, for example, if the sampled participants who left school at 15 years of age were healthier than those in the population, then the estimates could underestimate the differences between the groups.". Recent studies (Tyrrell 2021, Pirastu 2021) have shown that UK Biobank participants are on average healthier than the general population. Moreover, the imaging sub-group has an even stronger "healthy" bias (Lyall 2022). 

      (3) The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. Mentioned only briefly, the inclusion and exclusion of covariates in the model are not discussed in detail. Standard imaging confounds such as head motion and scanning site have been included but other factors (e.g. physical exercise, smoking, socioeconomic status, genetics, alcohol consumption, etc.) may also play a role. 

      We thank the reviewer for their numerous positive comments and have now attempted to address the first two limitations (generalizability and UKB bias) with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We further highlight this in the results section:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      Healthy volunteer bias can create two types of selection bias; crucially participation itself can serve as a collider threatening internal validity (outlined in van Alten et al., 2024; https://academic.oup.com/ije/article/53/3/dyae054/7666749). Natural experimental designs are partially sheltered from this major limitation, as ‘volunteer bias’ would have to differentially impact individuals on one side of the cutoff and not the other – thereby breaking a primary design assumption of regression discontinuity. Substantial prior work (including this article) has not found any threats to the validity of the 1973 ROSLA (Clark & Royer 2010, 2013; Barcellos et al., 2018, 2023; Davies et al., 2018, 2023). While the Davies 2028 article did IP-weight with the UK Biobank sample, Barcellos and colleagues 2023 (and 2018) do not, highlighting the following “Although the sample is not nationally representative,  our estimates have internal validity because there is no differential selection on the two sides of the September 1, 1957 cutoff – see  Appendix A.”.

      The second (more acknowledged & arguably less problematic) type of selection bias results in threats to external validity (aka generalizability). As highlighted in your first point; this is a large limitation with every natural experimental design, yet in our case, this is further amplified by the UK Biobank’s healthy volunteer bias. We have now attempted to highlight this limitation in the discussion passage above.

      Point 3 – the inability to fully confirm design validity – is again, another inherent limitation of a natural experimental approach. That being said, extensive prior work has tested different predetermined covariates in the 1973 ROSLA (cited within), and to our knowledge, no issues have been found. The 1973 ROSLA seems to be one of the better natural experiments around (there was also a concerted effort to have an ‘effective’ additional year; see Clark & Royer 2010). For these reasons, we stuck with only testing the variables we wanted to use to increase precision (also offering new neuroimaging covariates that didn’t exist in the literature base). One additional benefit of ROSLA was that the cutoff was decided years later on a variable that happened (date of birth) in the past – making it particularly hard for adolescents to alter their assignments.

      Reviewer #3 (Recommendations for the authors): 

      (1) FMRIB's preprocessing pipeline is mentioned. Does this include deconfounding of brain measures? Particularly, were measures deconfounded for age before the main analysis? 

      This is such a crucial point that we triple-checked, brain imaging phenotypes were not corrected for age (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf) – large effects of age can be seen in the global metrics; older individuals have less surface area, thinner cortices, less brain volume (corrected for head size), more CSF volume (corrected for head size), more white matter hyperintensities, and worse FA values. Figure 1 shows these large age effects, which are controlled for in our continuity-based RD analysis.

      One’s date of birth (DOB) of course does not match perfectly to their age, this is why we included the covariate ‘visit date’; this interplay can now be seen in our updated SI Figure 1 (recommended in #3) which shows the distributions of visit date, DOB, and age of scan. 

      In a valid RD design covariates should not be necessary (as they should be balanced on either side of the cutoff), yet the inclusion of covariates does increase precision to detect effects. We tested this assumption, finding the effect of ‘visit date’ and its quadratic term to be not related to ROSLA (Sup. Table 1). This adds further evidence (specific to the UK Biobank sample) to the existing body of work showing the 1973 ROSLA policy change to not violate any design assumptions. Threats to internal validity would more than likely increase endogeneity and result in ‘false causal positive causal effects’ (which is not what we find).  

      (2) Despite the large overall sample size, I am wondering whether the effective number of samples is sufficient to detect a potentially subtle effect that is further attenuated by the long time interval before scanning. As stated, for the optimised bandwidth window (DoB 20 to 35 months around cut-off), N is about 5000. Does this mean that effectively about 250 (10%) out of about 2500 participants born after the cut-off were leaving school at 16 rather than 15 because of ROSLA? For the local randomisation analysis, this becomes about N=10 (10% out of 100). Could a power analysis show that these cohort sizes are large enough to detect a reasonably large effect? 

      This is a very valid point, one which we were grappling with while the paper was out for review. We now draw attention to this in the results and highlight this as a limitation in the discussion. While UKB’s non-representativeness limits our power (10% affected rather than 25% in the general population), it is still a very large sample. Our sample size is more in line with standard neuroimaging studies than with large cohort studies. 

      The novelty of our study is its causal design, while we could very precisely measure an effect of some phenotype (variable X) in 40,000 individuals. This effect is probably not what we think we are measuring. Without IP-weighting it could even have a different sign. But more importantly, it is not variable X – it is the thousands of things (unmeasured confounders) that lead an individual to have more or less of variable X. The larger the sample the easier it is for small unmeasured confounders to reach significance (Big data paradox) – this in no way invalidates large samples, it is just our thinking and how we handle large samples will hopefully change to a more casual lens.

      (3) Supplementary Figure 1: A similar raincloud plot of date of birth would be instructive to visualise the distribution of subjects born before and after the 1957 cut-off. 

      Great idea! We have done this in Sup Fig. 1 for both visit date and DOB.

      (4) p.9: Not sure about "extreme evidence", very strong would probably be sufficient. 

      As preregistered, we interpreted Bayes Factors using Jeffrey’s criteria. ‘Extreme evidence’ is only used once and it is about finding an associational effect of educational attainment on CSF (BF10 > 100). Upon Reviewer 1’s recommendation 7, we conducted eight replication samples (Sup. Figure 7 & 8) and have now added the following passage to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      (5) The code would benefit from a bit of clean-up and additional documentation. In its current state, it is not easy to use, e.g. in a replication study. 

      We have now further added documentation to our code; including a readme describing what each script does. The analysis pipeline used is not ideal for replications as the package used for continuity-based RD (RDHonest) initially could not handle covariates – therefore we manually corrected our variables after a discussion with Prof Kolesár (https://github.com/kolesarm/RDHonest/issues/7). 

      Prof Kolesár added this functionality recently and future work should use the latest version of the package as it can correct for covariates. We have a new preprint examining the effect of 1972 ROLSA on telomere length in the UK Biobank using the latest package version of RDHonest (https://www.biorxiv.org/content/10.1101/2025.01.17.633604v1). To ensure maximum availability of such innovations, we will ensure the most up-to-date version of this script becomes available on this GitHub link (https://github.com/njudd/EduTelomere).

    1. eLife Assessment

      The work presented is important for our understanding of the development of the cardiac conduction system and its regulation by T-box transcription factors. The conclusions are supported by convincing data. Overall this is an excellent study that advances our understanding of cardiac biology and has implications beyond the immediate field of study.

    2. Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment). In addition, the optical mapping dataset has alternative interpretations that are not excluded or thoroughly discussed.

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented mostly fit within the existing paradigm of established functions of Tbx3 and Tbx5. The authors present some evidence to support the claim that VCS cells adopt a working myocardial phenotype in the absence of Tbx3 and Tbx5, but some key experiments that could more definitively test this model were not performed, reducing the degree to which the data support the conclusions.

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model<br /> (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line<br /> (3) Well-powered and convincing assessments of mortality and physiological phenotypes<br /> (4) Isolation of genetically modified VCS cells using flow.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, which seem like distinct roles in VCS patterning.<br /> (2) More direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from single Tbx5 KO is presented for comparison to the double KO. I understand that single Tbx5 VCS KO mutants have been evaluated in previous publications but I think in order to evaluate the claims presented here, it would be important to do a direct comparison using the same assays and conditions.<br /> (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.<br /> (4) From the optical mapping data, it is difficult to distinguish between the presence of (1) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (2) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS. Acknowledging that the present work is strong start, some additional studies not included in the present manuscript will be needed for this new mouse model to decisively advance the field of VCS transcriptional biology.

    3. Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterised the cardiac phenotype associated to cardiac conduction system (CCS)-specific combined deletion of Tbx5 and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen inducible Cre recombination (MinK-creERT) at 6 weeks after birth. Their analysis of 8-9 weeks old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterising VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analysed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutant and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lays in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Linage tracing analysis using the reporter lines included in the crosses described in the study, will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot will demonstrate that indeed the cells that were specified as VCS in the adult heart become working myocardium in the absence of Tbx3 and Tbx5 function.

      Comments on revisions:

      I would like to thank the authors for their revised manuscript and for their corrections based on the suggestions from the 3 reviewers. Although I would have preferred to see some of the additional experiments suggested by any of the reviewers to improve the robustness and depth of the study integrated in the revised version of the manuscript, I acknowledge that the authors may prefer to develop them as follow-up studies. So, looking forward to seeing the follow-up study unravelling the detailed molecular regulation controlled by Tbx3/Tbx5 during the formation and maintenance of the ventricular cardiac conduction system.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Strengths:

      Great genetic model. Phenotypic consequences at the organ and organismal levels are well investigated. The requirement of both Tbx3 and Tbx5 for maintaining VCS cell state has been demonstrated.

      We thank Reviewer #1 for acknowledging the effort involved in generating and characterizing the Tbx3/Tbx5 double conditional knockout mouse model and for highlighting the significance of this work in elucidating the role of these transcription factors in maintaining the functional and transcriptional identity of the ventricular conduction system. 

      Weaknesses:

      The actual cell state of the Tbx3/Tbx5 deficient conducting cells was not investigated in detail, and therefore, these cells could well only partially convert to working cardiomyocytes, and may, in reality, acquire a unique state.

      We agree with Reviewer #1 that the Tbx3/Tbx5 double mutant ventricular conduction myocardial cells may only partially convert to working cardiomyocytes or may acquire a unique state.  The transcriptional state of the double mutant VCS cells was investigated by bulk profiling of key genes associated with specific conduction and non-conduction cardiac regions, including fast conduction, slow conduction, or working myocardium. Neither the bulk transcriptional approaches nor the optical mapping approaches we employed capture single-cell data; in both cases, the data represents aggregated signals from multiple cells (1, 2). Single cell approaches for transcriptional profiling and cellular electrophysiology would clarify this concern and are appropriate for future studies. 

      (1) O’Shea C, Nashitha Kabri S, Holmes AP, Lei M, Fabritz L, Rajpoot K, Pavlovic D (2020) Cardiac optical mapping – State-of-the-art and future challenges. The International Journal of Biochemistry & Cell Biology 126:105804. doi: 10.1016/j.biocel.2020.105804. (2) Efimov IR, Nikolski VP, and Salama G (2004) Optical Imaging of the Heart. Circulation Research 95:21-33. doi: 10.1161/01.RES.0000130529.18016.35.

      Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive the adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First, the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      We appreciate Reviewer #2’s detailed summary of the study’s aims, methodologies, and findings, as well as their thoughtful suggestions for further analysis. We are grateful for their recognition of our genetic model’s novelty and robustness.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function.

      We agree with this reviewer’s assessment of the assertions in our manuscript.  The novel combined Tbx5/Tbx3 double mutant model does not further explore the TBX5/TBX3 ratio concept, which we previously examined in detail (1). Instead, as the Reviewer notes, this manuscript focuses on testing a model that the coordinated activity of Tbx3 and Tbx5 defines specialized ventricular conduction identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model.

      (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line.

      (3) Well-powered and convincing assessments of mortality and physiological phenotypes. (4) Isolation of genetically modified VCS cells using flow.

      We thank Reviewer #2 for acknowledging the listed strengths of our study.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates the expression of VCS genes, which seem like distinct roles in VCS patterning. However, the authors move between different descriptions of the functional relationship and epistatic relationship between these factors, including terms like "cooperative", "coordinated", and "distinct" at various points. In a similar vein, sometimes terms like "reversion" are used to describe how VCS cells change after Tbx3/Tbx5 conditional knockout, and other times "transcriptional shift" and at other times "reprogramming". But these are all different concepts. The lack of a clear and consistent terminology for describing the phenomena observed makes the overarching claims of the manuscript more difficult to evaluate.

      We discriminate prior work on the “long-standing and well-supported model’ supported by investigation of the role of Tbx5 and Tbx3 independently from this work examining the coordinated role of Tbx5 and Tbx3. Prior work demonstrated that Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, consistent with the reviewer’s suggestion of their distinct roles in VCS patterning. However, the current study uniquely evaluates the combined role of Tbx3 and Tbx5 in distinguishing specialized conduction identify from working myocardium, for the first time. 

      We appreciate Reviewer #2’s feedback regarding the need for consistent terminology when describing the impact of the double Tbx3 and Tbx5 mutant. We will edit the manuscript to replace terms like “reversion” with “transcriptional shift” or “transformation” when describing the observed phenotype, and we will use “coordination” to describe the combined role of Tbx5 and Tbx3 in maintaining VCS-specific identity.

      (2) A more direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from a single Tbx5 KO is presented for comparison to the double KO.

      We thank Reviewer #2 for the suggestions that a direct comparison between Tbx5 single conditional knockout and Tbx3/Tbx5 double conditional knockout models may help isolate the specific contribution of Tbx3 deletion in addition to Tbx5 deletion. 

      Previous studies have assessed the effect of single Tbx5 CKO in the VCS of murine hearts (1, 3, 5). Arnolds et al. demonstrated that the removal of Tbx5 from the adult ventricular conduction system results in VCS slowing, including prolonged PR and QRS intervals, prolongation of the His duration and His-ventricular (HV) interval (3).

      Furthermore, Burnicka-Turek et al. demonstrated that the single conditional knockout of Tbx5 in the adult VCS caused a shift toward a pacemaker cell state, with ectopic beats and inappropriate automaticity (1). Whole-cell patch clamping of VCS-specific Tbx5 deficient cells revealed action potentials characterized by a slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization - features characteristic of nodal action potentials rather than typical VCS action potentials (3). These observations were interpreted as uncovering nodal potential of the VCS in the absence of Tbx5. Based on the role of Tbx3 in CCS specification (2), we hypothesized that the nodal state of the VCS uncovered in the absence of Tbx5 was enabled by maintained Tbx3 expression. This motivated us to generate the double Tbx5

      / Tbx3 knockout model to examine the state of the VCS in the absence of both T-box TFs. In the current study, we demonstrate that the VCS-specific deletion of Tbx3 and Tbx5 results in the loss of fast electrical impulse propagation in the VCS, similar to that observed in the single Tbx5 mutant. However, unlike the Tbx5 single mutant, the Tbx3/Tbx5 double deletion does not cause a gain of pacemaker cell state in the VCS. Instead, the physiological data suggests a transition toward non-conduction working myocardial physiology. This conclusion is supported by the presence of only a single upstroke in the optical action potential (OAP) recorded from the His bundle region and VCS cells in Tbx3/Tbx5 double conditional knockout mice. The electrical properties of VCS cells in the double knockout are functionally indistinguishable from those of ventricular working myocardial cells. As a result, ventricular impulse propagation is significantly slowed, resembling activation through exogenous pacing rather than the rapid conduction typically associated with the VCS. We will edit the text of the manuscript to more carefully distinguish the observations between these models, as suggested.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109.

      (5) Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265. PMID: 15289437.

      (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.

      And

      the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment).

      We appreciate Reviewer #2’s suggestion to expand the gene expression analysis in Tbx3/Tbx5-deficient VCS cells by including other specific genes and comparisons with “native”/actual working ventricular myocardial cells and broadening the gene panel. In this study, we evaluated core cardiac conduction system markers, revealing a loss of conduction system-specific gene expression in the double mutant VCS. Furthermore, we evaluated key working myocardial markers normally excluded from the conduction system, Gja1 and Smpx, revealing a shift towards a working myocardial state in the double mutant VCS (Figure 4). We agree that a more comprehensive analysis, such as transcriptome-wide approaches, would offer greater clarity on the extent and specificity of the observed shift from conduction to non-conduction identity. These approaches are appropriate directions for future studies.

      (4) From the optical mapping data, it is difficult to distinguish between the presence of (a) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (b) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      And

      In addition, the optical mapping dataset is incomplete and has alternative interpretations that are not excluded or thoroughly discussed.

      We agree with Reviewer #2 that the resolution of the optical mapping experiment may be insufficient to precisely localize the conduction block due to the limited signal strength from the VCS. It is possible that the region defined as the His Bundle also includes portions of the right bundle branch. Our control mice show VCS OAP upstrokes consistent with those reported by Tamaddon et al. (2000) using Di-4-ANEPPS (1). We appreciate the Reviewer’s attention to alternative interpretations, and we will incorporate these caveats into the manuscript text. 

      (1) Tamaddon HS, Vaidya D, Simon AM, Paul DL, Jalife J, Morley GE (2000) Highresolution optical mapping of the right bundle branch in connexin40 knockout mice reveals slow conduction in the specialized conduction system. Circulation Research 87:929-36. doi: 10.1161/01.res.87.10.929. 

      Impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS but do not, as presented, constitute a decisive advance.

      And

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented fit well within the existing paradigm of established functions of Tbx3 and Tbx5 in the VCS and in that sense do not decisively advance the field. Moreover, the authors' claims about the implications of the data are not always strongly supported by the data presented and do not fully explore alternative possibilities.

      We appreciate Reviewer # 2’s acknowledgment of the elegance and novelty of the mouse model we generated. However, we respectfully disagree with their assessment that this work merely corroborates existing models without providing a decisive advance. Previous studies have investigated single Tbx5 or Tbx3 gene knockouts in-depth and established the T-box ratio model for distinguishing fast VCS from slow nodal conduction identity (1) that the reviewer alludes to in earlier comments. In contrast, this study aimed to explore a different model, that the combined effects of Tbx5 and Tbx3 distinguish adult VCS identity from non-conduction working myocardium. The coordinated Tbx3 and Tbx5 role in conduction system identify remained untested due to the lack of a mouse model that allowed their simultaneous removal. The very model the reviewer recognizes as “novel and elegantly constructed” has allowed the examination of the coordinated role of Tbx5 and Tbx3 for the first time. While we acknowledge the opportunity for additional depth of investigation of this model in future studies, the data we present provides consistent experimental support for the coordinated requirement of both Tbx5 and Tbx3 for ventricular cardiac conduction system identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterized the cardiac phenotype associated with the cardiac conduction system (CCS)-specific combined deletion of T_bx5_ and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen-inducible Cre recombination (MinKcreERT) at 6 weeks after birth. Their analysis of 8-9-week-old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterizing VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analyzed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutants and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      We appreciate the Reviewer’s comments regarding the originality and utility of our model and the strengths of our methodological approach. The Reviewer’s appreciation of the molecular and morphological analyses as well as their constructive feedback is highly valuable.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lies in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Lineage tracing analysis using the reporter lines included in the crosses described in the study will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot, will demonstrate that indeed the cells that were specified as VCS in the adult heart, become working myocardium in the absence of Tbx3 and Tbx5 function.

      We appreciate the reviewer’s concern regarding the morphology of the cardiac conduction system in the Tbx3/Tbx5 double conditional knockout model. We did not observe any structural abnormalities, as the Reviewer notes. We agree with their suggestion for using Genetic Inducible Fate Mapping to mark cardiac conduction cells expressing MinKCre. In fact, we utilized this approach to isolate VCS cells for transcriptional profiling. Specifically, we combined the tamoxifen-inducible MinKCreERT allele with the Cre-dependent R26Eyfp reporter allele to label MinKCre-expressing cells in both control VCS and VCS-specific double Tbx3/Tbx5 knockouts. EYFP-positive cells were isolated for transcriptional studies, ensuring that our analysis exclusively targeted conduction system-lineage marked cells. The ability to isolate MinKCre-marked cells from both controls and Tbx5/Tbx3 double mutants indicates that VCS cells persisted in the double knockout. Nonetheless, the suggestion for in-vivo marking by Genetic Inducible

      Fate Mapping and morphologic analysis is a valuable recommendation for future studies. 

      Reviewer #1 (Recommendations for the authors):

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Previous work suggested the prediction that VCS-specific genetic ablation of both the TBX3 and TBX5 would transform fast-conducting adult VCS into cells resembling working myocardium, eliminating specialized CCS fate. The current study suggests that this prediction is at least to some extent accurate.

      We appreciate Reviewer #1’s summary and recognition of our study. As the review notes, the simultaneous deletion of Tbx3 and Tbx5 in the mature ventricular conduction system (VCS) suggests a conversion of VCS to "ordinary" ventricular working myocytes. To our knowledge, this represents a novel observation and experimental model that uniquely captures the combined roles of these essential T-box transcription factors. We believe that this model offers a valuable platform for further investigation into the transcriptional mechanisms underlying conduction system specialization.

      (1) The huge effort made to generate the DKO model contrasts with the limited efforts made to study the mechanism. Conditional deficiency of Tbx3 and Tbx5 creates an artificial situation that is useful for addressing fundamental mechanistic questions. The authors provide a rather superficial analysis of the changes in the VCS upon deletion of these two critically important factors and do not provide really novel insights into their requirement/function in the VCS gene regulatory network and epigenetic state. So to what extent do VCS cardiomyocytes (CMs) from Tbx3/5 DKO mice resemble "simple" working myocardium? To what extent do these cells acquire the working myocardial (epigenetic) state, do these cells have an epigenetic memory of the Tbx3/Tbx5+ history, is the enhancer usage between the modified VCS CMs and the working CMs similar or not, etc.? The assumption that the authors' data indicate that the DKO VCS CMs simply acquire a ventricular working "fate" is unlikely. Following this reasoning, the reverse experiment to induce Tbx3 and Tbx5 expression in working CMs would result in complete conversion to VCS CMs, which is also unlikely.

      To answer such questions, transcriptomic and epigenetic state analysis, electrophysiologic analysis (e.g. patch-clamp), cell/subcellular level analysis, etc. would be required, as well as a comparison of the changed state of the DKO VCS CMs to that of working CMs.

      This initial study focused on generating the Tbx3:Tbx5 double-conditional knockout model and characterizing the resulting physiological and molecular changes within the VCS. We analyzed transcriptomic markers of fast conduction (VCS), slow conduction (nodal), and non-conduction (working myocardium). Additionally, we applied optical mapping to evaluate the physiological consequences of the double knockout, which allowed a calculated AP of the VCS to be generated. We agree that a more in-depth mechanistic investigation of the VCS transformation upon Tbx3/Tbx5 deletion by transcriptomic or cellular electrophysiology could provide a deeper understanding of the precise transcriptional/epigenetic state of the VCS in the double knockout and clarify whether there is a partial or complete conversion of VCS cells to a simple working myocardial phenotype. The suggestions by the reviewer will be considered for future studies.

      (2) Tbx3 stimulates BMP-TGFb signaling (e.g. positive loop between Tbx3-Bmp2), which in turn stimulates EMT and modulates the behavior of endocardial and mesenchymal cells. Did the authors investigate the impact of Tbx3/5 DKO on non-CM cells in and around the VCS? (see also comment 1). The insulation of the AVB for example could be a Tbx3/5 non cell autonomous target.

      We appreciate the Reviewer’s suggestion to examine the impact of Tbx3/Tbx5 deletion on non-CM cells surrounding the VCS. While this is an intriguing avenue for future exploration, it falls outside the scope of the current study, which focused on the cardiomyocyte-specific roles of Tbx3 and Tbx5 in maintaining adult VCS identity.

      (3) The MinK-Cre line used (from the Moskowitz lab) also recombines in the AVN (Arnolds et al 2011). The authors do not mention changes in the AVN, and systematically call the line VCS specific (which refers to the AVB, BB, PVCS I assume). This could also impact the PR interval. Please address.

      The MinK-Cre line recombines in the atrioventricular bundle (AVB) and bundle branches (BB). It recombines in cardiomyocytes adjacent to the atrioventricular node (AVN). We previously interpreted these cells as the penetrating portion of the His bundle into the AVN. This line does not recombine in the vast majority, if any, physiologic nodal cells. We also assessed nodal conduction parameters by invasive electrophysiologic (EP) studies. Our data showed that non-VCS parameters, including sinus node recovery time, AV node recovery time, and atrial and ventricular effective refractory periods, remained within normal ranges in Tbx3:Tbx5-deficient mice (please see Figure 2I). These findings indicate that AVN function is preserved in the VCS-specific double knockout, reinforcing the specificity of the observed conduction defects to the ventricular conduction system.

      (4) Did the authors also investigate the electrophysiological changes in the (EGFP+) DKO VCS CMs? Would these resemble the properties of ventricular working CMs, or would they still show some VCS properties? (see also comment 1).

      We performed electrophysiologic analysis of the double knockout by optical mapping. Optical mapping provides tissue-level resolution, capturing the functional behavior of clusters of thousands of cells simultaneously, rather than individual cells. While this technique does not achieve single-cell resolution, it allows for a comprehensive assessment of electrophysiological changes across the VCS region. Single cell electrophysiology is a good idea for future studies. 

      (5) Throughout the manuscript, the authors use "patterning" and "fate", which are applicable to development and differentiation, not to the situation where a gene is removed from fully differentiated cells in an adult organism resulting in a change of these cells. Perhaps more appropriate are "state" change and the requirement for "homeostasis/maintenance" of state.

      We appreciate the Reviewer’s concern regarding the terminology used to describe changes in VCS cell identity. To ensure precision and uniformity, we replaced terms such as “fate” and “patterning” with “state” or “maintenance” to reflect the shift in cellular characteristics in a fully differentiated adult tissue context. 

      Minor:

      (1) Please provide all data points in bar graphs.

      We have incorporated individual data points into the bar graphs as suggested, ensuring enhanced transparency and clarity in the data presentation.

      “(2) Formally, gene expression levels between samples are not normally distributed. The Welch t-test used here assumes a normal distribution. Therefore, nonparametric tests should be used.

      We appreciate Reviewer #1’s consideration of the appropriate statistical approach to the qPCR data and clarify our statistical approach here. Normality within each experimental group was assessed using the Shapiro-Wilk test. Between-group comparisons were conducted using Welch t-test, and multiple comparisons were corrected using the Benjamini & Hochberg method to control the false discovery rate (FDR) (71). If a significant difference was detected between two groups (t-test FDR < 0.05) but normality was rejected in any of the compared groups (Shapiro-Wilk P < 0.05), a non-parametric Wilcoxon rank-sum test was used for verification. A significant group-mean difference was confirmed at one-tailed Wilcoxon P≤0.05 (detailed in Supplementary Data Set I). Furthermore, we have updated the qRT-PCR information in each figure and their respective legends as follows. Statistical analysis was performed using R version 4.2.0. We have included a new Supplementary Data Set I, detailing the statistical analysis of qRT-PCR data. Additionally, we have revised the Methods/Statistics section to detail the applied statistical analysis. 

      (3) Some of the panels of figures are tiny and cannot be evaluated. For example, in Figure 1B the actual data (expression of Tbx3/5) is impossible to see.

      We appreciate the Reviewer’s observation and have revised the figures to improve visual clarity and ensure that the presented data are easily interpretable by readers.

      Reviewer #2 (Recommendations for the authors):

      Additional Experiments, Data, Analysis:

      (1) Comparisons between both single knockouts and double knockouts at the phenotypic level are needed. In some instances, the data is shown (e.g., mortality and EKG) but direct statistical comparison is not performed. In other instances (optical mapping and gene expression), data with single knockouts are not shown. If combined VCS Tbx3/Tbx5 deletion does not change the phenotype of the VCS Tbx5 single deletion, this should be explicitly stated and discussed.

      We appreciate Reviewer #2’s suggestion to compare the phenotypic outcomes of the Tbx3 and Tbx5 single conditional knockout models with those observed in Tbx3/Tbx5 double conditional knockout model. We have expanded the discussion section of our manuscript to incorporate a more detailed comparison between the double Tbx3/Tbx5 model and the single Tbx5 and Tbx3 models [1-5], highlighting the distinct phenotypic outcomes of the single and double knockouts.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109. [5] Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265.

      (2) Genome-wide expression analysis including working myocardium would provide stronger evidence for interconversion of cell states. Ideally, this would include single knockouts.

      We agree that a genome-wide expression analysis, including a direct comparison with working myocardium, would provide more comprehensive insights into cell state transitions in Tbx3:Tbx5-deficient VCS cells. Additionally, incorporating single knockout models into such analyses would further clarify the distinct and cooperative contributions of Tbx3 and Tbx5 to maintaining VCS identity. This is a good suggestion for future studies.

      (3) This may not be essential to support the authors' claims, but the addition of epigenetic data from single and double KO VCS using ATAC-seq (which can be performed with relatively small numbers of cells) could provide stronger evidence for cell state changes of the kind hypothesized by the authors.

      We agree that epigenetic data such as ATAC-seq would complement transcriptional analyses and provide insight into chromatin states that underlie the observed cellular reprogramming. This is a good suggestion for follow-up studies to further characterize the molecular state of Tbx3:Tbx5-deficient VCS cells.

      (4) Additional clarification of the optical mapping experiments to exclude alternative interpretations like focal right bundle branch block and to include single knockouts for comparison - if the Tbx5 single KO looks the same as the double KO that would be very important to know and would directly affect interpretation of the experiment.

      Right septal optical mapping preparation involved removing the right ventricular free wall to directly image the right ventricular septum, which contains the VCS. In a healthy mouse, there are two peak components of the optical action potential upstroke, the first peak due to the activation of the VCS and the second due to the activation of the ventricular cardiomyocytes. Importantly, in Tbx3:Tbx5 double-conditional knockout mice, the first peak was absent, rather than delayed, indicating loss of fast conduction through the VCS. This absence suggests a shift in VCS cells toward a ventricular working myocardial phenotype, rather than a regional conduction block or delayed propagation through a structurally intact VCS.

      Previous studies from our group have extensively characterized the effect of single Tbx5 knockout on the VCS in murine hearts [1, 2, 3]. Arnolds et al. demonstrated that VCSspecific Tbx5-deficiency results in significant slowing of VCS conduction, evidenced by prolonged PR and QRS intervals, along with lengthening of the atrio-Hisian interval, His duration, and Hisioventricular interval [1]. Although both single Tbx5 knockout and Tbx3:Tbx5 double knockout mice exhibit slowing of ventricular conduction system, our optical mapping studies reveal distinct differences in their electrophysiological phenotypes. Burnicka-Turek et al. showed that the single knockout of Tbx5 in the VCS leads to a shift toward a pacemaker cell state, evidenced by ectopic beats originating in the ventricles and inappropriate automaticity [3]. During spontaneous beats, electrical impulses were retrogradely activated, propagating from the ventricles to the atria [3]. Whole-cell patch clamping recordings confirmed that Tbx5-deficient VCS cells displayed action potentials resembling pacemaker cells, characterized by slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization [3]. In contrast, our current study on VCS-specific Tbx3:Tbx5 double knockout demonstrates a loss of the VCS-specific fast conduction propagation. Optical mapping demonstrated the absence of the initial upstroke corresponding to VCS activation in the His bundle region, indicating a shift in the VCS cells toward a ventricular working myocardium state. This loss of fast conduction properties highlights a fundamental distinction between single and double knockouts, suggesting that both Tbx3 and Tbx5 are required to maintain VCS identity and function.

      (1) D. E. Arnolds et al., “TBX5 drives Scn5a expression to regulate cardiac conduction system function,” J. Clin. Invest., vol. 122, no. 7, pp. 2509–2518, Jul. 2012, doi: 10.1172/JCI62617.

      (2) Moskowitz, I.P., Pizard, A., Patel, V.V., Bruneau, B.G., Kim, J.B., Kupershmidt, S., Roden, D., Berul, C.I., Seidman, C.E., Seidman, J.G. (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131(16):4107-4116. 

      (3) Burnicka-Turek, O., Broman, M.T., Steimle, J.D., Boukens, B.J., Peterenko, N.B, Ikegami, K., Nadadur, R.D., Qiao, Y., Arnolds, D.E., Yang, X.H., Patel, V.V., Nobrega, M.A., Efimov, I.R., Moskowitz, I.P. (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circ Res. 127(3):e94-e106. 

      Methods:

      (1) Additional methods on FACS are required. The methods section references a paper from 2004 (reference 67) that describes the flow sorting of embryonic cardiomyocytes. However, flow cytometric isolation of intact adult cardiomyocytes, which the authors describe in the present work, is a distinct technique and generally requires special equipment. These need to be described in more detail to be fully replicable.

      We thank Reviewer #2 for highlighting the need to provide additional details regarding our flow cytometric isolation of adult VCS cardiomyocytes. While we referenced earlier methods, we agree that isolating adult cardiomyocytes requires specialized approaches. Therefore, we revised the Methods section to include a detailed description of the equipment, procedures, and adaptations specific to isolating intact adult VCS cells to ensure full replicability.

      Minor Corrections:

      (1) Figure 1D. Please add a statistical test for mortality between the double conditional KO and the Tbx5 conditional KO.

      We have revised Figure 1D to include the statistical test comparing mortality between the Tbx3:Tbx5 double conditional knockout and the Tbx5 conditional knockout cohorts.

      (2) Figure 2A, 2I, 3A: Please include all individual data points not just a bar graph with error bars.

      We have added all individual data points to the bar graphs as recommended, enhancing the transparency and clarity of the data presentation.

      (3) Figure 2A: Please consider separate graphs for PR and QRS with appropriately scaled Y-axis so differences are easier to see.

      We appreciate Reviewer #2’s suggestion and fully agree with it. As a result, we have revised Figure 2A to include separate graphs for PR and QRS intervals, each with appropriately scaled Y-axes. This adjustment enhanced both the readability and the clarity of the observed differences.

      (4) Figure 3 G-K: The figure would be easier to interpret for the reader if genotypes were shown in the figure not just in the legend.

      We agree with Reviewer #2’s suggestion and have revised Figure 3 accordingly by adding genotype labels directly to the histological sections in Panels G-K. This update improves clarity, making the data easier for readers to interpret without needing to refer to the figure legend.

      (5) Figure 4A, C: Are vertical axes mislabeled? They say, "CON VCS and TBX5OE VCS". Please double-check axis labels and data on the graph.

      We appreciate the Reviewer bringing the mislabeling of the vertical axis in Figure 4 to our attention. We have corrected the labeling errors and ensured consistency between the graph and the underlying data.

      (6) Legend to Supplementary Figure 6. Says "Tbx3:Tbx3" instead of "Tbx3:Tbx5".

      We thank Reviewer #2 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

      (7) Discussion. The authors write, "In Tbx3:Tbx5 double VCS knockout, we observed repression of fast VCS markers and also repression of Pan-CCS markers transcribed throughout the entire CCS." The term 'repression' has a specific connotation with transcription regulators that is likely not intended in this context so perhaps 'reduced expression' would be better here?

      We agree with Reviewer #2 and have replaced “repression” with “reduced expression” throughout the text (look below for references).

      “In the Tbx3:Tbx5 double VCS knockout, we observed a reduction in the expression of both fast VCS markers and Pan-CCS markers transcribed throughout the entire CCS.”

      (8) Discussion, the authors write, "This study combined with prior literature (1, 7, 11, 15, 26, 53, 54) indicates that the presence of both Tbx3 and Tbx5 is necessary for the specification of the adult VCS (Figure 7)." Since this work presents data from an adult conditional deletion, it's not clear how it informs our understanding of the specification, which occurs during development. Perhaps "maintenance of VCS fate" would be more appropriate here?

      We agree with Reviewer #2 that the term “maintenance of VCS fate” is more appropriate in the context of our study. Accordingly, we have updated the text to reflect this terminology.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 2B: It is hard to see the IF images. What is the cardiac structure studied? Maybe a dashed line and a label to define the region and the structure represented will help. As the authors have described that the crosses used contain a reporter allele (R26-EYFP), a clearer way to show these results would be to include images of the linage traced cells with the reporter, not only to identify the CCS structure analyzed, but also to demonstrate that the deletion is specific to the MinK-creERT expression in the CCS.

      We appreciate the Reviewer’s suggestion to improve the clarity of Figure 2B by delineating the cardiac structures analyzed. In response, we have added dashed lines and labels to highlight the regions of interest within the IF images. Unfortunately, we were unable to capture high-quality EYFP fluorescence images for these sections. However, to address this concern, we microdissected the region shown in the IF images and performed FACS to isolate EYFP-positive cells from this specific area. These sorted cells were subsequently used for qPCR analysis, which confirmed the presence of Tbx3 and Tbx5 in control samples and the successful deletion of both genes in the doubleconditional knockout samples (Figure 2C, middle panel). We believe this approach provides robust evidence for the specificity of the MinK-CreERT expression in the CCS and the efficiency of gene deletion in the targeted region.

      (2) 3G-K: The authors describe the absence of morphological defects in the tissue sections of adult hearts from the different genotypes analyzed. Although this reviewer agrees that there seem to be no major defects in the general cardiac morphology of these animals, the higher magnification images suggest some tissue differences at the level of the AVN especially in the double HET, double HOMO, and the Tbx3 HOMO. Is that due to the section plane used? If so, more appropriate and comparable sections must be provided. Again, as the crosses used by the authors contain a reporter allele (R26-EYFP), it is required that the authors show that the CCS cells, where deletions are induced, are still present in equivalent areas in the mutants and that they remain in similar numbers only failing to maintain their specification into CCS due to Tbx3 and Tbx5 loss of function.

      This analysis will reinforce the authors' claims on the role of Tbx5/Tbx3 in this process.

      We thank the reviewer for their thorough assessment and thoughtful feedback on our histological analysis. The higher magnification images in Figure 3G-K do not specifically present the AVN. These sections primarily represent areas of the ventricular conduction system (VCS), particularly the His bundle and bundle branches, rather than the AVN itself. We do not believe that the observed morphological differences are related to AVN tissue, and there were no functional deficits attributable to the AVN in the double knockout. Furthermore, the Mink-Cre allele used in this study does not recombine in the ANV proper.   We agree that confirming the presence of CCS cells in equivalent regions across different genotypes is crucial. Our approach using FACS-based isolation of EYFP-positive cells from the VCS, followed by qPCR analysis, provides evidence that these cells remain present in double conditional knockouts, although they fail to maintain their specialized gene expression profile. This reinforces our conclusion that Tbx3 and Tbx5 are essential for maintaining the molecular identity of CCS cells, rather than their physical presence.

      (3) Figure 4: The authors performed molecular analysis by qPCR and WB in Tbx5/Tbx3 double mutants to demonstrate that CCS cells lose the expression of CCS genes and express working myocardium genes. Could this be further demonstrated by ISH, HCR, or IF together with lineage tracing to provide evidence that these changes are located where the CCS tissues are in the control embryos? Analysis of 2 or 3 of these markers of each type on tissue sections would be enough.

      We thank the Reviewer for their insightful suggestion regarding additional validation of our molecular findings through ISH, HCR, or IF combined with lineage tracing. However, we would like to clarify that the molecular analyses we performed by qPCR and WB were conducted on EYFP-positive cells that were specifically isolated from the ventricular conduction system (VCS) region of both control and double conditional knockout (dCKO) mice. These EYFP-positive cells were obtained through fluorescence-activated cell sorting (FACS), ensuring that our analyses were confined to the targeted VCS population. Alternate approaches are appropriate for future studies to investigate the precise genomic and molecular nature of the transformation observed in the double knockout.

      (4) Discussion: in the discussion section the authors conclude that the combined role of Tbx5/Tbx3 is critical for the specification of the adult VCS. However, as the Tbx5/Tbx3 loss of function conditions are only induced in adult animals 6 weeks old, would it be more appropriate that their function is the maintenance of the VCS cell fate and that if not present these cells return to the working myocardium fate? If the authors believe that these genes are involved in the induction of VCS specification in adults, then they need to demonstrate that, before the loss of function induction at 6 weeks, these cells are not yet specified as adult VCS.

      We appreciate the Reviewer’s clarification regarding terminology. We agree that our study focuses on adult-specific conditional deletion and thus reflects the maintenance, rather than the specification, of VCS cell fate. Accordingly, we have revised the text to explicitly state that Tbx3 and Tbx5 are critical for maintaining VCS identity in adult mice, and that their loss leads to a shift toward a working myocardial fate.

      Minor:

      (1) There is no consistency in the way the quantitative data is shown in graphs. There are some graphs showing only bars, other dot plots, and other a combination of both. The authors must homogenise the representation of quantitative data showing the different data points in dot plots and not in bar graphs.

      We have standardized the quantitative data presentation across all figures, by including individual data points in bar graphs, ensuring enhanced transparency and clarity.

      (2) Figure 3: The labels defining the genotypes corresponding to the different histological sections of adult hearts (Panels G-K) are missing. Panels J and K are not referenced in the text.

      We thank Reviewer #3 for highlighting these omissions. We have added the genotype labels to the histological sections in Panels G-K of Figure 3 to ensure clarity. Furthermore, we have now referenced Panels J and K in the results and in the supplementary material (please look below for references).

      “Histological examination of all four-chambers demonstrated no discernible differences between VCS-specific Tbx3:Tbx5 double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and control (Tbx3<sup>+/+</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) mice, nor between . the double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and single-knockout models for either Tbx3 (Tbx3<sup>fl/fl</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) or Tbx5 (Tbx3<sup>+/+</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>).Ventricular muscle appeared normal without hypertrophy or myofibrillar disarray and no fibrosis was present (Figure 3G, 3I, 3J, and 3K, respectively).”

      “Additionally, we confirmed the absence of histological and structural abnormalities in these mice, aligning with previous findings (Figures 3A, 3F versus 3B, and 3K versus 3G, respectively)(1, 11).”

      (3) Typo: Supplementary Figure 6. Tbx3:Tbx3 double-conditional knockout: it should say Tbx5:Tbx3 double-conditional knockout.

      We thank Reviewer #3 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

    1. eLife Assessment

      This manuscript makes important contributions to our understanding of cell polarization dynamics by demonstrating how compensatory regulatory and spatial mechanisms enhance the robustness of polarization patterns. By integrating a computational pipeline with comparisons to experimental data, the authors provide convincing evidence that stability and asymmetry in reaction-diffusion networks are crucial for polarization in C. elegans zygotes. Their findings offer novel insights into essential biological processes such as cell migration, division, and symmetry breaking. Future theoretical and experimental work could refine the model by addressing its acknowledged limitations.

    2. Joint Public Review:

      In this manuscript, the authors aim to evaluate the robustness of stable asymmetric polarization patterns by analyzing both a minimal 2-node network and a more biologically realistic 5-node network based on the C. elegans polarization system. They introduce a computational pipeline for systematically exploring reaction-diffusion network dynamics. Their study highlights the limitations of the widely used 2-node antagonistic network, demonstrating its susceptibility to simple modifications that disrupt polarization. However, they show that polarization stability can be restored by combining multiple regulatory mechanisms, and that spatially varying kinetic parameters can fine-tune the interface position. The authors further investigate the 5-node network of C. elegans, identifying key parameters that enhance its robustness against perturbations. Their findings provide novel insights into the mechanisms that ensure stable polarization in biological systems.

      The major strengths of this work lie in its rigorous computational approach and the clarity of its findings. The authors demonstrate that the widely used 2-node antagonistic network is highly sensitive to parameter changes, requiring precise fine-tuning to maintain stable polarization. However, they show that stability can be restored through compensatory modifications, which expand the range of parameter sets supporting polarization. By further exploring spatial parameter variations, the authors reveal how compensatory adjustments can stabilize polarization patterns, offering insights into potential biological mechanisms regulating interface localization.

      Extending their analysis to the C. elegans polarization network, the authors construct a 5-node model grounded in an extensive literature review. Their computational pipeline identifies key parameters that enhance robustness, and their model successfully replicates experimental observations, even in mutant conditions. Notably, among 34 possible network structures, only the naturally evolved 5-node network with mutual inhibition between specific components maintains stable polarization, highlighting its evolutionary optimization. This work significantly advances our understanding of polarization maintenance and provides a valuable framework for future in silico experiments.

      Despite its strengths, the study has some limitations related to simplifying assumptions. The model neglects cortical flows and the role of actomyosin dynamics, which are known to be crucial during the establishment phase of polarization in the C. elegans zygote. While the authors focus on the maintenance phase, the absence of these biomechanical effects may limit the model's applicability to the full polarization process. Additionally, the assumption of infinitely fast cytoplasmic diffusion disregards potential effects of cytoplasmic flows on the stability of molecular distributions. Experimental measurements suggest that cytoplasmic diffusion coefficients are only an order of magnitude higher than membrane diffusion coefficients, meaning that finite diffusion combined with cytoplasmic flows could influence polarization stability. Although the authors acknowledge and discuss these limitations, incorporating these effects in future models could provide a more complete picture of the polarization dynamics in C. elegans embryos.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This manuscript makes valuable contributions to our understanding of cell polarisation dynamics and its underlying mechanisms. Through the development of a computational pipeline, the authors provide solid evidence that compensatory actions, whether regulatory or spatial, are essential for the robustness of the polarisation pattern. However, a more comprehensive validation against experimental data and a proper estimation of model parameters are required for further characterization and predictions in natural systems, such as the C. elegans embryo.

      We sincerely thank the editor(s) for their pertinent assessment. We have carefully considered the constructive recommendations and made the necessary revisions in the manuscript, which are also detailed in this response letter. We have implemented most of the revisions requested by the reviewers. For the few requests we did not fully accept, we have provided justifications. The corresponding revisions in both the Manuscript and Supplementary Information are highlighted with a yellow background. To provide a more comprehensive validation against experimental data and model parameters used for characterizing and predicting natural systems, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      Joint Public Review:

      The polarisation phenomenon describes how proteins within a signalling network segregate into different spatial domains. This phenomenon holds fundamental importance in biology, contributing to various cellular processes such as cell migration, cell division, and symmetry breaking in embryonic morphogenesis. In this manuscript, the authors assess the robustness of stable asymmetric patterns using both a previously proposed minimal model of a 2-node network and a more realistic 5-node network based on the C. elegans cell polarisation network, which exhibits anterior-posterior asymmetry. They introduce a computational pipeline for numerically exploring the dynamics of a given reaction-diffusion network and evaluate the stability of a polarisation pattern. Typically, the establishment of polarisation requires the mutual inhibition of two groups of proteins, forming a 2-node antagonistic network. Through a reaction-diffusion formulation, the authors initially demonstrate that the widely-used 2-node antagonistic network for creating polarised patterns fails to maintain the polarised pattern in the face of simple modifications. However, the collapsed polarisation can be restored by combining two or more opposing regulations. The position of the interface can be adjusted with spatially varied kinetic parameters. Furthermore, the authors show that the 5-node network utilised by C. elegans is the most stable for maintaining polarisation against parameter changes, identifying key parameters that impact the position of the interface.

      We sincerely thank the editor(s) for the pertinent summary!

      While the results offer novel and insightful perspectives on the network's robustness for cell polarisation, the manuscript lacks comprehensive validation against experimental data, justified node-node network interactions, and proper estimation of model parameters (based on quantitative measurements or molecular intensity distributions). These limitations significantly restrict the utility of the model in making meaningful predictions or advancing our understanding of cell polarisation and pattern formation in natural systems, such as the C. elegans embryo.

      We sincerely thank the editor(s) for the comment!

      To provide a more comprehensive validation against experimental data and model parameters, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These meaningful predictions effectively demonstrate the utility of our model’s network structure and parameters in advancing our understanding of cell polarisation and pattern formation in natural systems, exemplified by the C. elegans embryo.

      We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node network interactions” and the “proper estimation of model parameters (based on quantitative measurements or molecular intensity distributions)”, both of which rely on experimental measurements of biological information.   However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions in the C. elegans embryo.

      The study extends its significance by examining how cells maintain pattern stability amid spatial parameter variations, which are common in natural systems due to extracellular and intracellular fluctuations. The authors found that in the 2-node network, varying individual parameters spatially disrupt the pattern, but stability is restored with compensatory variations. Additionally, the polarisation interface stabilises around the step transition between parameter values, making its localisation tunable. This suggests a potential biological mechanism where localisation might be regulated through signalling perception.

      We sincerely thank the editor(s) for the pertinent review!

      Focusing on the C. elegans cell polarisation network, the authors propose a 5-node network based on an exhaustive literature review, summarised in a supplementary table. Using their computational pipeline, they identify several parameter sets capable of achieving stable polarisation and claim that their model replicates experimental behaviour, even when simulating mutants. They also found that among 34 possible network structures, the wild-type network with mutual inhibition is the only one that proves viable in the computational pipeline. Compared with previous studies, which typically considered only 2- or 3-node networks, this analysis provides a more complete and realistic picture of the signalling network behind polarisation in the C. elegans embryo. In particular, the model for C. elegans cell polarisation paves the way for further in silico experiments to investigate the role of the network structure over the polarisation dynamics. The authors suggest that the natural 5-node network of C. elegans is optimised for maintaining cell polarisation, demonstrating the elegance of evolution in finding the optimal network structure to achieve certain functions.

      We sincerely thank the editor(s) for the pertinent review!

      Noteworthy limitations are also found in this work. To simplify the model for numerical exploration, the authors assume several reactions have equivalent dynamics, reducing the parameter space to three independent dimensions. While the authors briefly acknowledge this limitation in the "Discussion and Conclusion" section, further analysis might be required to understand the implications. For instance, it is not clear how the results depend on the particular choice of parameters. The authors showed that adding additional regulation might disrupt the polarised pattern, with the conclusion apparently depending on the strength of the regulation. Even for the 5-node wild-type network, which is the most robust, adding a strong enough self-activation of [A], as done in the 2-node network, will probably cause the polarised pattern to collapse as well.

      We sincerely thank the editor(s) for the comment!

      Now we have thoroughly expanded our acknowledgment of the model’s limitations in in 2. Results and 3. Discussion and conclusion. To rule out the equivalent dynamics assumption undermines our conclusions, we have added simulations showing that the cell polarization pattern stability does not depend on the exact strength of each regulation, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values ( i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions (i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      Additionally, the authors utilise parameter values that are unrealistic, fail to provide units for some of them, and assume unknown parameter values without justification. The model appears to have non-dimensionalised length but not time, resulting in a mix of dimensional and non-dimensional variables that can be confusing. Furthermore, they assume equal values for Hill coefficients and many parameters associated with activation and inhibition pathways, while setting inhibition intensity parameters to 1. These arbitrary choices raise concerns about the fidelity of the proposed model in representing the real system, as their selected values could potentially differ by many orders of magnitude from the actual parameters.

      We sincerely thank the editor(s) for the comment!

      We apologize for the confusion. The non-dimensionalised parameter values are adopted from previous theoretical research [Seirin-Lee et al., Cells, 2020], which originates from the experimental measurement in [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011]. With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system.

      The assumption of “equal values for Hill coefficients and many parameters associated with activation and inhibition pathways” is to reduce the parameter space for affordable computational cost. It is a widely-used strategy to fix Hill coefficients [Seirin-Lee et al., J. Theor. Biol., 2015; Seirin-Lee, Bull. Math. Biol., 2021] and unify parameter values for different pathways in network research about both cell polarization [Marée et al., Bull. Math. Biol., 2006; Goehring et al., Science, 2011; Trong et al., New J. Phys., 2014] and other biological topics (e.g., plasmid transferring in the microbial community [Wang et al., Nat. Commun., 2020]), to control computational cost. Nevertheless, to rule out that the equivalent dynamics assumption undermines our conclusions, we have added simulations showing that the cell polarization pattern stability does not depend on the exact parameter values associated with activation and inhibition pathways, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e_., _γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      To confirm the fidelity of the proposed model in representing the real system, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      It is worth noting that, although a strict match between numerical and realistic parameter values with consistent units is always helpful, a lot of notable pure numerical studies successfully unveil principles that help interpret [Ma et al., Cell, 2009] and synthesize real biological systems [Chau et al., Cell, 2012]. These studies suggest that numerical analysis in biological systems remains powerful, even when comprehensive experimental data from prior research are not fully available.

      The definition of stability and its evaluation in the proposed pipeline might also be too narrow. Throughout the paper, the authors discuss the stability of the polarised pattern, checked by an exhaustive search of the parameter space where the system reaches a steady state with a polarised pattern instead of a homogeneous pattern. It is not clear if the stability is related to the linear stability analysis of the reaction terms, as conducted in Goehring et al. (Science, 2011), which could indicate if a homogeneous state exists and whether it is stable or unstable. The stability test is performed through a pipeline procedure where they always start from a polarised pattern described by their model and observe how it evolves over time. It is unclear if the conclusions depend on the chosen initial conditions. Particularly, it is unclear what would happen if the initial distribution of posterior molecules is not exactly symmetric with respect to the anterior molecules, or if the initial polarisation is not strong.

      We sincerely thank the editor(s) for the comment!

      The definition of stability and its evaluation in the proposed pipeline consider two criteria: 1. The pattern is polarized; 2. The pattern is stable. Following simulations, figures, and videos (Fig. 1-6; Fig. S1-S5; Fig. S7-S9; Movie S1-S5) have sufficiently demonstrated that the parameters and networks set up capture the cell polarization dynamis regarding both the stable and unstable states very well.

      Now we have added new simulation on alternative initial conditions. They demonstrating the necessity of a polarized initial pattern set up independently of the reaction-diffusion network during the establishment phase, probably through additional mechanisms such as the active actomyosin contractility and flow [Cuenca et al., Development, 2003; Gross et al., Nat. Phys., 2019]. Our conclusions ( i.e., single-sided self-regulation, single-sided additional regulation, and unequal system parameters cause the stable polarized pattern to collapse) have little dependence on the chosen initial conditions as long as the unsymmetric initial patterns can set up a stable polarized pattern. A part of the simulations institutively show our conclusions still hold if the initial distribution of posterior molecules is not exactly symmetric with respect to the anterior molecules, or if the initial polarisation is not strong (Fig. S4 and Fig. S9).

      Regarding the biological interpretation and relevance of the model, it overlooks some important aspects of the C. elegans polarisation system. The authors focus solely on a reaction-diffusion formulation to reproduce the polarisation pattern. However, the polarisation of the C. elegans zygote consists of two distinct phases: establishment and maintenance, with actomyosin dynamics playing a crucial role in both phases (see Munro et al., Dev Cell 2004; Shivas & Skop, MBoC 2012; Liu et al., Dev Biol 2010; Wang et al., Nat Cell Biol 2017). Both myosin and actin are crucial to maintaining the localisation of PAR proteins during cell polarisation, yet the authors neglect cortical flows during the establishment phase and any effects driven by myosin and actin in their model, failing to capture the system's complexity. How this affects the proposed model and conclusions about the establishment of the polarisation pattern needs careful discussion. Additionally, they assume that diffusion in the cytoplasm is infinitely fast and that cytoplasmic flows do not play any role in cell polarity. Finite cytoplasmic diffusion combined with cytoplasmic flows could compromise the stability of the anterior-posterior molecular distributions. The authors claim that cytoplasmic diffusion coefficients are two orders of magnitude higher than membrane diffusion coefficients, but they seem to differ by only one order of magnitude (Petrášek et al., Biophys. J. 2008). The strength of cytoplasmic flows has been quantified by a few studies, including Cheeks et al., and Curr Biol 2004.

      We sincerely thank the editor(s) for the comment!

      Indeed, previous research highlighted the importance of convective cortical flow in orchestrating the localisation of PAR proteins during the establishment phase of polarisation formation [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin, 2008; Goehring, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. We have now clarified the distinction between the establishment and maintenance phases in 1. Introduction, emphasized our research focus on the reaction-diffusion dynamics during the maintenance phase in 2. Results, and provided a discussion of the omitted actomyosin dynamics to foster a more comprehensive understanding in the future in 3. Discussion and conclusion. The effect of the establishment phase is studied as the initial condition for the cell polarization simulation solely governed by reaction-diffusion dynamics, with new simulations demonstrating the necessity of a polarized initial pattern set up independently of the reaction-diffusion network during the establishment phase, probably through additional mechanisms such as the active actomyosin contractility and flow [Cuenca et al., Development, 2003; Gross et al., Nat. Phys., 2019].

      Cytoplasmic and membrane diffusion coefficients differ by two orders of magnitude according to previous experimental measurements on PAR-2 and PAR-6 [Goehring et al., J. Cell Biol., 2011; Lim et al., Cell Rep., 2021]. Many previous C. elegans cell polarization models have incorporated mass-conservation model combined with finite cytoplasmic diffusion, but this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that the infinite cytoplasmic diffusion, without precise experiment-based parameter assignment or accounting for other hidden biological processes ( e.g., protein production and degradation), may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, to acquire the consistent spatial concentration distribution between the cell membrane and cytosol [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein diffusion, production, and degradation are essential for providing a more realistic representation of cellular dynamics and enhancing the model's predictive power.

      Now we have acknowledged the possibly overlooked aspects of the C. elegans polarisation system in 3. Discussion and conclusion, a detailed outline of potential model improvements. Those aspects include, but are not limited to, issues involving “neglect cortical flows” and the “diffusion in the cytoplasm is infinitely fast”. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness. The meaningful predictions of five experimental groups and eight perturbed conditions in the C. elegans embryo faithfully supports the biological interpretation and relevance of the model.

      Although the authors compare their model predictions to experimental observations, particularly in reproducing mutant behaviours, they do not explicitly show or discuss these comparisons in detail. Diffusion coefficients and off-rates for some PAR proteins have been measured (Goehring et al., JCB 2011), but the authors seem to use parameter values that differ by many orders of magnitude, perhaps due to applied scaling. To ensure meaningful predictions, whether their proposed model captures the extensive published data should be evaluated. Various cellular/genetic perturbations have been studied to understand their effects on anterior-posterior boundary positioning. Testing these perturbations' responses in the model would be important. For example, comparing the intensity distribution of PAR-6 and PAR-2 with measurements during the maintenance phase by Goehring et al., JCB 2011, or comparing the normalised intensity of PAR-3 and PKC-3 from the model with those measured by Wang et al., Nat Cell Biol 2017, during establishment and maintenance phases (in both wild-type and cdc-42 (RNAi) zygotes) could provide insightful validation. Additionally, in the presence of active CDC-42, it has been observed that PAR-6 extends further into the posterior side (Aceto et al., Dev Biol 2006). Conducting such validation tests is essential to convince readers that the model accurately represents the actual system and provides insights into pattern formation during cell polarisation.

      We sincerely thank the editor(s) for the comment!

      To provide more comprehensive validations and refinements to ensure the model accurately represents biological systems, we extensively reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total from published data, comprising eight perturbed conditions and using wild-type as the reference. We have also explicitly show the comparison between model predictions and experimental observations (including the mutant behaviors reproduction as well) in detail, by describing how “cell polarization pattern characteristics in simulation” responds to various cellular/genetic perturbations (Section 2.5; Fig. 5; Fig. S7 and S8). The original and new validation tests conducted can convince readers that the model accurately represents the actual system and provides insights into pattern formation during cell polarisation.

      The diffusion coefficients for anterior and posterior molecular species were assigned according to previous experimental and theoretical research [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. The off-rates are assigned uniformly by searching viable parameter sets that can set up a network with cell polarization pattern stability. Now we have added simulations showing that the cell polarization pattern stability and response to network structure and parameter perturbation does not depend on the exact parameter values (incl., diffusion coefficients and off-rates), provided the parameter values on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values ( i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. We agreed that full experimental measurements of biological information are essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      A clear justification, with references, for each network interaction between nodes in the five-node model is needed. Some of the activatory/inhibitory signals proposed by the authors have not been demonstrated ( e.g. CDC-42 directly inhibiting CHIN-1). Table S2 provided by the authors is insufficient to justify each node-node interaction, requiring additional explanations. (See the review by Gubieda et al., Phil. Trans. R. Soc. B 2020, for a similar node network that differs from the authors' model.) Additionally, the intensity distributions of cortical PAR-3 and PKC-3 seem to vary significantly during both establishment and maintenance phases (Wang et al., Nat Cell Biol 2017), yet the authors consider the PAR-3/PAR-6/PKC-3 as a single complex. The choices in the model should be justified, as the presence or absence of clustering of these PAR proteins can be crucial during cell polarisation (Wang et al., Nat Cell Biol 2017; Dawes & Munro, Biophys J 2011).

      We sincerely thank the editor(s) for the comment!

      Now we have acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “each network interaction between nodes” and the “consider the PAR-3/PAR-6/PKC-3 as a single complex”, in which the former one relies on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      In consistent with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species. We agree that a more comprehensive model, capable of resolving the individual localization patterns of these anterior PAR proteins, would be a valuable future direction. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions in the C. elegans embryo.

      In summary, the authors successfully demonstrate the importance of compensatory actions in maintaining polarisation robustness. Their computational pipeline offers valuable insights into the dynamics of reaction-diffusion networks. However, the lack of detailed experimental validation and realistic parameter estimation limits the model's applicability to real biological systems. While the study provides a solid foundation, further work is needed to fully characterise and validate the model in natural contexts. This work has the potential to significantly impact the field by providing a new perspective on the robustness of cell polarisation networks.

      We sincerely thank the editor(s) for the pertinent summary!

      To provide a more comprehensive validation against experimental data and model parameters, three more groups of the qualitative and semi-quantitative phenomenon regarding CDC-42 are reproduced based on previously published experiments (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total, comprising eight perturbed conditions and using wild-type as the reference.

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. Together with the reproduction of five experimental groups (eight perturbed conditions with wild-type as the reference), the model’s applicability to real biological systems in natural contexts are are fully characterised and validated.

      The computational pipeline developed could be a valuable tool for further in silico experiments, allowing researchers to explore the dynamics of more complex networks. To maximise its utility, the model needs comprehensive validation and refinement to ensure it accurately represents biological systems. Addressing these limitations, particularly the need for more detailed experimental validation and realistic parameter choices, will enhance the model's predictive power and its applicability to understanding cell polarisation in natural systems.

      We sincerely thank the editor(s) for the comment!

      To provide more comprehensive validations and refinements to ensure the model accurately represents biological systems, we extensively reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total from published data, comprising eight perturbed conditions and using wild-type as the reference. We have also explicitly show the comparison between model predictions and experimental observations (including the mutant behaviors reproduction as well) in detail, by describing how “cell polarization pattern characteristics in simulation” responds to various cellular/genetic perturbations (Section 2.5; Fig. 5; Fig. S7 and S8).

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. Together with the reproduction of five experimental groups (eight perturbed conditions with wild-type as the reference), the model's predictive power and its applicability to understanding cell polarisation in natural systems are enhanced.

      Now we have added simulations showing that the cell polarization pattern stability and response to network structure and parameter perturbation does not depend on the exact parameter values (incl., diffusion coefficients, basal off-rates and inhibition intensity), provided the parameter values on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      Recommendations for the Authors:

      (1) Parameterisation and Model Validation: The authors utilise parameter values that lack realism and fail to provide units for some of them, which can lead to confusion. For instance, the length of the cell is set to 0.5 without clear justification, raising questions about the scale used. Additionally, there's a mix of dimensional and non-dimensional variables, potentially complicating interpretation. Furthermore, arbitrary choices such as equal Hill coefficients and setting inhibition intensity parameters to 1 raise concerns about model fidelity. To ensure meaningful predictions, the authors should validate their model against extensive published data, including cellular/genetic perturbations. For example, comparing intensity distributions of PAR proteins measured during maintenance phases by Goehring et al., JCB 2011, and those obtained from the model could provide valuable validation. Similarly, comparisons with data from Wang et al., Nat Cell Biol 2017, on wild-type and cdc-42 (RNAi) zygotes, as well as observations from Aceto et al., Dev Biol 2006, on PAR-6 extension in the presence of active CDC-42, would strengthen the model's validity. Such validation tests are essential for convincing readers that the model accurately represents the actual system and can provide insights into pattern formation during cell polarisation.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      Now we have added a new section, Parameter Nondimensionalization and Order of Magtitude Consistency, into Supplemental Text. In this section, we introduced how we adopted the parameter nondimensionalization and value assignments from previous works [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. We listed four examples (i.e., evolution time, membrane diffusion coefficient, basal off-rate, and inhibition intensity) to show the consistency in order of magtitude between numerical and realistic values.

      The assumption of “equal Hill coefficients” is to reduce the parameter space for an affordable computational cost. It is a widely-used strategy to fix Hill coefficients [Seirin-Lee et al., J. Theor. Biol., 2015; Seirin-Lee, Bull. Math. Biol., 2021] in network research, to control computational cost. Besides, setting inhibition intensity parameters to 1 is for determining a numerical scale. Now we have added simulations showing that the cell polarization pattern stability does not depend on the exact parameter values associated with activation and inhibition pathways, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions (i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      To confirm the fidelity of the proposed model in representing the real system, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      It is worth noting that, although a strict match between numerical and realistic parameter values with consistent units is always helpful, a lot of notable pure numerical studies successfully unveil principles that help interpret [Ma et al., Cell, 2009] and synthesize real biological systems [Chau et al., Cell, 2012]. These studies suggest that numerical analysis in biological systems remains powerful, even when comprehensive experimental data from prior research are not fully available.

      (2) Parameter Changes: It is not clear how the parameters change as more complicated networks are explored, and how this affects the comparison between the simple and complete model. Clarification on this point would be beneficial.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      The computational pipeline in Section 2.1 is generalized for all reaction-diffusion networks, including the simple and complete ones studied in this paper. The parameter changes included two parts: 1. The mutual activation in the anterior (none for the simple 2-node network and q<sub2</sub> for the complete 5-node network); 2. The viable parameter sets (122 sets for the simple 2-node network and 602 sets for the complete 5-node network). Now we have explicitly clarified those differences:

      Those differences don’t affect the comparison between the simple and complete models. Now we have added comprehensive comparisons between the simple and complete models about 1. How they respond to alternative initial conditions consistently (Fig. S2). 2. How they respond to alternative single modifications consistently (Fig. S4 and S9), even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) are assigned with various values concerning all nodes and regulations (Fig. S5).

      (3) Exploration of Model Parameter Space: In the two-node dual antagonistic model, the authors observe that the cell polarisation pattern is unstable for different systems (Fig. 1). However, it remains uncertain whether this instability holds true for the entire model parameter space. Have the authors thoroughly screened the full model parameter space to support their statements? It would be beneficial for the authors to provide clarification on the extent of their exploration of the model parameter space to ensure the robustness of their conclusions.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      The trade-off between considered parameter space and computational cost is a long-term challenge in network study as there are always numerous combinations of network nodes, edges, and parameters [Ma et al., Cell, 2009; Chau et al., Cell, 2012]. The computational pipeline in Section 2.1 generalized for all reaction-diffusion networks exerts two strategies to limit the computational cost and set up a basic network reference: 1. Dimension Reduction (Strategy 1) - Unifying the parameter values for different nodes and different edges within the same regulatory type to minimize the unidentical parameter numbers into 3; 2: Parameter Space Confinement (Strategy 2): Enumerating the dimensionless parameter set on a three-dimensional (3D) grid confined by γ∈ [0,0.05] in steps ∆γ = 0.001, k<sub>1</sub>∈[0,5] in steps ∆k<sub>1</sub> = 0.05,  and  in steps .

      In the early stage of our project, we tried to explore “the entire model parameter space” as indicated by the reviewer. We first tried to use the Monte Carlo method to find parameter solutions in an open parameter space and with all parameter values allowed to be different. However, such a process is full of randomness and is computationally expensive (taking months to search viable parameter sets but still unable to profile the continuous viable parameter space; the probability of finding a viable parameter set is no higher than 0.02%, making it very hard to profile a continuous viable parameter space). Now we clearly can see the viable parameter space is a thin curved surface where all parameters have to satisfy a critical balance (Fig. 3a, b, Fig. 5e, f). This is why we exert a typical strategy for dimension reduction in network research in both cell polarization [Marée et al., Bull. Math. Biol., 2006; Goehring et al., Science, 2011; Trong et al., New J. Phys., 2014] and other biological topics (e.g., plasmid transferring in the microbial community [Wang et al., Nat. Commun., 2020]), i.e., unifying the parameter values for different nodes and different edges within the same regulatory type.

      Additionally, the curved surface for viable parameter space can be extended to infinite as long as the parameter balance is achieved (Fig. 3a, b, Fig. 5e, f), it is impossible or unnecessary to explore “the entire model parameter space”. Setting up a confined parameter region near the original point for parameter enumeration can help profile the continuous viable parameter space, which is sufficient for presenting the central conclusion of this paper – that is - the network structure and parameter need to satisfy a balance for stable cell polarization.

      To support a comprehensive study considering all kinds of reference and perturbed networks, we have maximized the parameter domain size by exhausting all the computational research we can access, including 400-500 Intel(R) Core(TM) E5-2670v2 and Gold 6132 CPU on the server (High-Performance Computing Platform at Peking University) and 5 Intel(R) Core(TM) i9-14900HX CPU on personal computers.

      To make it certain that instability holds true when the model parameter space is extended, we add a comprehensive comparison between the simple and complete models about how their instability occurs consistently even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) are assigned with various values concerning all nodes and regulations, searched by the Monte Carlo method (Fig. S5).

      (4) Sensitivity of Numerical Solutions to Initial Conditions: Are the numerical solutions in both models sensitive to the chosen initial condition? What results do the models provide if uniform initial distributions were utilised instead?

      We sincerely thank the editor(s) and referee(s) for the comments!

      To investigate both the simple network and the realistic network consisting of various node numbers and regulatory pathways [Goehring et al., Science, 2011; Lang et al., Development, 2017], we propose a computational pipeline for numerical exploration of the dynamics of a given reaction-diffusion network's dynamics, specifically targeting the maintenance phase of stable cell polarization after its initial establishment [Motegi et al., Nat. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020].

      Now we have added new simulations and explanations for the sensitivity of numerical solutions to initial conditions. For both models, a uniform initial distribution leads to a homogeneous pattern while a Gaussian noise distribution leads to a multipolar pattern. In contrast, an initial polarized distribution (even with shifts in transition planes, weak polarization, or asymmetric curve shapes between the two molecular species) can maintain cell polarization reliably.

      (5) Initial Conditions and Stability Tests: In Figure 1, the authors discuss the stability of the basic two-node network (a) upon modifications in (b-d). The stability test is performed through a pipeline procedure in which they always start from a polarised pattern described by Equation (4) and observe how the pattern evolves over time. It would be beneficial to explore whether the stability test depends on this specific initial condition. For instance, what would happen if the posterior molecules have an initial distribution of 1/(1+e^(-10x)), which is not exactly symmetric with respect to the anterior molecules' distribution of 1-1/(1+e^(-20x))? Additionally, if the initial polarisation is not as strong, for example, with the anterior molecules having a distribution of 10-1/(1+e^(-20x)) and the posterior molecules having a distribution of 9+1/(1+e^(-20x)), how would this affect the results?

      We sincerely thank the editor(s) and referee(s) for the constructive advice!

      Now we have added comprehensive comparisons between the simple and complete models about how they respond to alternative initial conditions consistently (Fig. S4, Fig. S9). The successful cell polarization pattern requests an initial polarized pattern, but its following stability and response to perturbation depend very little on the specific form of the initial polarized pattern. All the conditions mentioned by the reviewer have been included.

      (6) Stability Analysis: Throughout the paper, the authors discuss the stability of the polarised pattern. The stability is checked by an exhaustive search of the parameter space, ensuring the system reaches a steady state with a polarised pattern instead of a homogeneous pattern. It would be beneficial to explore if this stability is related to a linear stability analysis of the model parameters, similar to what was conducted in Reference [18], which can determine if a homogeneous state exists and whether it is stable or unstable. Including such an analysis could provide deeper insights into the system's stability and validate its robustness.

      We sincerely thank the editor(s) and referee(s) for the comments!

      We agree that the linear stability analysis can potentially offer additional insights into polarized pattern behavior. However, this approach often requests the aid of numerical solutions and is therefore not entirely independent [Goehring et al., Science, 2011]. Over the past decade, numerical simulations have consistently proven to be a reliable and sufficient approach for studying network dynamics, spanning from C. elegans cell polarization [Tostevin et al., Biophys. J, 2008; Blanchoud et al., Biophys. J, 2015; Seirin-Lee, Dev. Growth Differ., 2020] to topics in metazoon [Chau et al., Cell, 2012; Qiao et al., eLife, 2022; Sokolowski et al., arXiv, 2023]. Numerous purely numerical studies have successfully unveiled principles that help interpret [Ma et al., Cell, 2009] and synthesized real biological systems [Chau et al., Cell, 2012], independent of additional mathematical analysis. Thus, we leverage our numerical framework to address the cell polarization problems cell polarization problems in this paper.

      To confirm the reliability of stability checked by an exhaustive search of the parameter space, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], we reproduce five experimental groups in total (two acting on LGL-1 and three acting on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      To confirm the robustness of our conclusions regarding the system's stability, now we add comprehensive comparisons between the simple and complete models about 1. How they respond to alternative initial conditions consistently (Fig. S4; Fig. S9). 2. How they respond to alternative single modifications consistently, even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub> ) are assigned with various values concerning all nodes and regulations (Fig. S5).

      (7) Interface Position Determination: In Figure 4, the authors demonstrate that by using a spatially varied parameter, the position of the interface can be tuned. Particularly, the interface is almost located at the step where the parameter has a sharp jump. However, in the case of a homogeneous parameter (e.g., Figure 4(a)), the system also reaches a stable polarised pattern with the interface located in the middle (x = 0), similar to Figure 4(b), even though the homogeneous parameter does not contain any positional information of the interface. It would be helpful to clarify the difference between Figure 4(a) and Figure 4(b) in terms of the interface position determination.

      We sincerely thank the editor(s) and referee(s) for the comments!

      The case of a homogeneous parameter (e.g., Fig. 4a), in which the system also reaches a stable polarised pattern with the interface located in the middle (x = 0), is just a reference adopted from Fig. 1a to show that the inhomogeneous positional information in Fig. 4b can achieve a similar stable polarised pattern.

      Now we clarify the interface position determination to Section 2.4 to improve readability. Moreover, it is marked with grey dashed line in all the patterns in Fig. 4 and Fig. 6 to highlight the importance of inhomogeneous parameters on interface localization.

      (8) Presented Comparison with Experimental Observations: The comparison with experimental observations lacks clarity. It isn't clear that the model "faithfully recapitulates" the experimental observations (lines 369-370). We recommend discussing and showing these comparisons more carefully, highlighting the expectations and similarities.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we remove the word “faithfully” and highlight the expectations and similarities of each experimental group by describing “cell polarization pattern characteristics in simulation: …”.

      (9) Validation of Model with Experimental Data: Given the extensive number of model parameters and the uncertainty of their values, it is essential for the authors to validate their model by comparing their results with experimental data. While C. elegans polarisation has been extensively studied, the authors have yet to utilise existing data for parameter estimation and model validation. Doing so would considerably strengthen their study.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      To utilise existing data for parameter estimation, now we add a new section, Parameter Nondimensionalization and Order of Magtitude Consistency, into Supplemental Text. In this section, we introduced how we adopted the parameter nondimensionalization and value assignments from previous works [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. We listed four examples (i.e., evolution time, membrane diffusion coefficient, basal off-rate, and inhibition intensity) to show the consistency in order of magtitude between numerical and realistic values.

      To utilise existing data for model validation, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], we reproduce five experimental groups in total (two acting on LGL-1 and three acting on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      Also, we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “extensive number of model parameters” and “uncertainty of their values”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions with wild-type as a reference in the C. elegans embryo.

      (10) Enhancing Model Accuracy by Considering Cortical Flows: The authors are encouraged to include cortical flows in their cell polarisation model, as these flows are known to be pivotal in the process. Although the current model successfully predicts cell polarisation without accounting for cortical flows, research has demonstrated their significant role in polarisation formation. By incorporating cortical flows, the model would provide a more thorough and precise representation of the biological process. Furthermore, previous studies, such as those by Goehring et al. (References 17 and 18), highlight the importance of convective actin flow in initiating polarisation. It would be valuable for the authors to address the contribution of convection with actin flow to the establishment of the polarisation pattern. The polarisation of the C. elegans zygote progresses through two distinct phases: establishment and maintenance, both heavily influenced by actomyosin dynamics. Works by Munro et al. (Dev Cell 2004), Shivas & Skop (MBoC 2012), Liu et al. (Dev. Biol. 2010), and Wang et al. (Nat Cell Biol 2017) underscore the critical roles of myosin and actin in orchestrating the localisation of PAR proteins during cell polarisation. To enhance the fidelity of their model, we recommend that the authors either integrate cortical flows and consider the effects driven by myosin and actin, or provide a discussion on the repercussions of omitting these dynamics.

      We sincerely thank the editor(s) and referee(s) for the comment!

      Indeed, previous research highlighted the importance of convective cortical flow in orchestrating the localisation of PAR proteins during the establishment phase of polarisation formation [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin et al., Biophys J, 2008; Goehring et al, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. Now we clarify the distinction between the establishment and maintenance phases, emphasize our research focus on the reaction-diffusion dynamics during the maintenance phase, and provide a discussion of these omitted dynamics to foster a more comprehensive understanding in the future, as suggested.

      (11) Further Justification of Network Interactions: The authors should provide additional explanations, supported by empirical evidence, for the network interactions assumed in their model. This includes both node-node interactions and the rationale behind protein complex formations. Some of the proposed interactions lack empirical validation, as noted in studies such as Gubieda et al., Phil. Trans. R. Soc. B 2020. Additionally, discrepancies in protein intensity distributions, as observed in Wang et al., Nat Cell Biol 2017, should be addressed, particularly concerning the consideration of the PAR-3/PAR-6/PKC-3 complex as a single entity. Justifying these choices is crucial for ensuring the model's credibility and alignment with experimental findings.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      In consistency with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species.

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node interactions” and “discrepancies in protein intensity distributions”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      To ensure the model's credibility and alignment with experimental findings, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      (12) Further Justification of Node-Node Network Interactions: The authors should provide further justification for the node-node network interactions assumed in their study. To the best of our knowledge, some of the node-node interactions proposed have not yet been empirically demonstrated. Providing additional explanations for these interactions would enhance the credibility of the model and ensure its alignment with empirical evidence.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node network interactions”, which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      To enhance the credibility of the model and ensure its alignment with empirical evidence, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      (13) Justification for Network Interactions and Protein Complexes: The authors must provide clear justifications, supported by references, for each network interaction between nodes in the five-node model. Some of the activatory/inhibitory signals proposed lack empirical validation, such as CDC-42 directly inhibiting CHIN-1. The provided Table S2 is insufficient to justify these interactions, necessitating additional explanations. Reviewing relevant literature, such as the work by Gubieda et al., Phil. Trans. R. Soc. B 2020, may offer insights into similar node networks. Furthermore, the authors should address discrepancies in protein intensity distributions, as observed in studies like Wang et al., Nat Cell Biol 2017. Specifically, the authors consider the PAR-3/PAR-6/PKC-3 complex as a single entity despite potential differences in their distributions. Justification for this choice is essential, particularly considering the importance of clustering dynamics during cell polarisation, as demonstrated by Wang et al., Nat Cell Biol 2017, and Dawes & Munro, Biophys J 2011.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      In consistent with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species. Besides, the inhibition of CHIN-1 from CDC-42, which recruits cytoplasmic PAR-6/PKC-3 to form a complex, may act indirectly to restrict CHIN-1 localization through phosphorylation [Sailer et al., Dev. Cell, 2015; Lang et al., Development, 2017].

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “each network interaction between nodes in the five-node model” and “discrepancies in protein intensity distributions”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions with wild-type as a reference in the C. elegans embryo.

      (14) Incorporating Cytoplasmic Dynamics into the Model: The authors assume infinite cytoplasmic diffusion and neglect the role of cytoplasmic flows in cell polarity, which may oversimplify the model. Finite cytoplasmic diffusion combined with flows could potentially compromise the stability of anterior-posterior molecular distributions, affecting the accuracy of the model's predictions. The authors claim a significant difference between cytoplasmic and membrane diffusion coefficients, but the actual disparity seems smaller based on data from Petrášek et al., Biophys. J. 2008. For example, cytosolic diffusion coefficients for NMY-2 and PAR-2 differ by less than one order of magnitude. Additionally, the strength of cytoplasmic flows, as quantified by studies such as Cheeks et al., and Curr Biol 2004, should be considered when assessing the impact of cytoplasmic dynamics on polarity stability. Incorporating finite cytoplasmic diffusion and cytoplasmic flows into the model could provide a more realistic representation of cellular dynamics and enhance the model's predictive power.

      We sincerely thank the editor(s) and referee(s) for the comment!

      Cytoplasmic and membrane diffusion coefficients differ by two orders of magnitude according to previous experimental measurements on PAR-2 and PAR-6 [Goehring et al., J. Cell Biol., 2011; Lim et al., Cell Rep., 2021]. Many previous C. elegans cell polarization models have incorporated mass-conservation model combined with finite cytoplasmic diffusion, but this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that the infinite cytoplasmic diffusion, without precise experiment-based parameter assignment or accounting for other hidden biological processes (e.g., protein production and degradation), may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, to acquire the consistent spatial concentration distribution between the cell membrane and cytosol [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein diffusion, production, and degradation are essential for providing a more realistic representation of cellular dynamics and enhancing the model's predictive power.

      Cytoplasmic flows indeed play an unneglectable role in cell polarity during the establishment phase [Kravtsova et al., Bull. Math. Biol., 2014], which creates a spatial gradient of actomyosin contractility and directs PAR-3/PKC-3/PAR-6 to the anterior membrane by cortical flow [Rose et al., WormBook, 2014; Lang et al., Development, 2017]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Geβele et al., Nat. Commun., 2020]. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin, 2008; Goehring, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. We now emphasize our research focus on the reaction-diffusion dynamics during the maintenance phase, so the dynamics between NMY-2 and PAR-2 are not included. We have also provided a discussion of the simplified cytoplasmic diffusion and omitted cytoplasmic flows to foster a more comprehensive understanding in the future.

      (15) Explanation of Lethality References: On page 13, the authors mention lethality without adequately explaining why they are drawing connections with lethality experimental data.

      We sincerely thank the editor(s) and referee(s) for the comment!

      It is well-known that cell polarity loss in C. elegans zygote will lead to symmetric cell division, which brings out the more symmetric allocation of molecular-to-cellular contents in daughter cells; this will result in abnormal cell size, cell cycle length, and cell fate in daughter cells, followed by embryo lethality [Beatty et al., Development, 2010; Beatty et al., Development, 2013; Rodriguez et al., Dev. Cell, 2017; Jankele et al., eLife, 2021]. Now we explain why we are drawing connections with lethality experimental data in Section 2.5.

      (16) Improved Abstract: "...However, polarity can be restored through a combination of two modifications that have opposing effects..." This sentence could be revised for better clarity. For example, the authors could consider rephrasing it as follows: "...However, polarity restoration can be achieved by combining two modifications with opposing effects...".

      We sincerely thank the editor(s) and referee(s) for helpful advice!

      Now we revise the abstract as follows:

      “Abstract – However, polarity restoration can be achieved by combining two modifications with opposing effects.”

      (17) Conservation of Mass in Network Models: Is conservation of mass satisfied in their network models?

      We sincerely thank the editor (s) and referee(s) for the comment!

      While previous experiments provide evidence for near-constant protein mass during the establishment phase [Goehring et al., Science, 2011], whether this is consistent until the end of maintenance is unclear.

      Many previous C. elegans cell polarization models have assumed mass conservation on the cell membrane and in the cell cytosol, this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that mass conservation may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, instead of assuming mass conservation [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein mass are essential for constructing a more accurate model.

      Given the absence of a universally accepted model in agreement with experimental observation, we adopted the assumption that the concentration of molecules in the cytosol (not the total mass on the cell membrane and in the cell cytosol) is spatially inhomogeneous and temporally constant, which was also used before [Kravtsova et al., Bull. Math. Biol., 2014]. In the context of this well-mixed constant cytoplasmic concentration, our model successfully reproduced the cell polarization phenotype in wild-type and eight perturbed conditions (Section 2.5; Fig. S7; Fig. S8), supporting the validity of this simplified, yet effective, model. Now we have provided a discussion of protein mass assumption to foster a more comprehensive understanding in the future.

      (18) Comparison of Network Structures: In Figure 1c, the authors demonstrate that the symmetric two-node network is susceptible to single-sided additional regulation. They considered four subtypes of modifications, depending on whether [L] is in the anterior or posterior and whether [A] and [L] are mutually activating or inhibiting. What is the difference between the structure where [L] is in the anterior and in the posterior? Upon comparing the time evolution of the left panel ([L] is sided with

      ) and the right panel ([L] is sided with [A]), the difference is so tiny that they are almost indistinguishable. It might be beneficial for the authors to provide a clearer explanation of the differences between these network structures to aid in understanding their implications.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      The difference between the structures where [L] is in the anterior and posterior is the initial spatial concentration distribution of [L], which is polarized to have a higher concentration in the anterior and posterior respectively. The time evolution of the left panel ([L] is sided with [P]) and the right panel [L] is sided with [P]) is almost indistinguishable because the perturbation from [L] is slight (less than over one order of magnitude) compared to the predominant [A]~[P] interaction ( for [A]~[P] mutual inhibition while for [A]~[L] mutual inhibition and for [A]~[L] mutual activation), highlighting the response of cell polarization pattern. To aid the readers in understanding their implications, we have added the [L] and plotted the spatial concentration distribution of all three molecular species at t=0,100, 200, 300, 400 and 500 in Fig. S3, where the difference between the [L] ones in the left and right panels are distinguishably shown.

      (19) Figure Reference: In line 308, Fig. 4a is referenced when explaining the loss of pattern stability by modifying an individual parameter, but this is not shown in that panel. Please update the panel or adjust the reference in the main text.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Fig. 4 focuses on the regulatable shift of the zero-velocity interface by modifying a pair of individual parameters, not on the loss (or recovery) of pattern stability, which has been analyzed as a focus in Fig. 1, Fig. 2, and Fig. 3. Fig. 4a is actually from the same simulation as the one in Fig. 1a, which has spatially uniform parameters used as a reference in Fig. 4. The individual parameter modification in other subfigures of Fig. 4 shows how the zero-velocity interface is shifted in a regulatable manner always in the context of pattern stability. Now we update the panel, adjust the reference, add one more paragraph, and improve the wording to clarify how the analyses in Fig. 4 are carried out on top of the pattern stability already studied.

      (20) Viable Parameter Sets: In line 355, the number of viable parameter sets (602) is not very informative by itself. We suggest reporting the fraction or percentage of sets tested that resulted in viable results instead. This applies similarly to lines 411 and 468.

      We sincerely thank the editor(s) and referee(s) for the constructive comment!

      Now the fraction/percentage of parameter sets tested that resulted in viable results are added everywhere the number appears.

      (21) Perturbation Experiments: In lines 358-359, "the perturbation experiments" implies that those considered are the only possible ones. Please rephrase to clarify.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      Now we rephrase three paragraphs to clarify why the perturbation experiments involved with [L] and [C] are considered instead of other possible ones.

      (22) Figure 2S: This figure is unclear. The caption states that panel (a) shows the "final concentration distribution," but only a line is shown. If "distribution" refers to spatial distribution, please clarify which parameters are shown.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we clarify the “spatial concentration distribution” and which parameters are shown in the figure caption.

      (23) Figure 5 and 6 Captions: The captions for Figures 5 and 6 could benefit from clarification for better understanding.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify the details in the captions of Fig. 5 and Fig. 6 for better understanding.

      (24) Figure 5 Legend: The legend on the bottom right corner of Figure 5 is unclear. Please specify to which panel it refers.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify to which the legend on the bottom right corner of Fig. 5 refers.

      (25) L and A~C Interactions: In paragraphs 405-418, please explain why the L and A~C interactions are removed for the comparison instead of others.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we add a separate paragraph and a supplemental figure to explain why the L and A~C interactions are removed for the comparison instead of others.

      (26) Network Structures in Figure S3: From the "34 possible network structures" considered in Figure S3 (lines 440-441), why are the "null cases" (L disconnected from the network) relevant? Shouldn't only 32 networks be considered?

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now the two “null cases” are removed:

      (27) Figure S3 Caption: The caption must state that the position of the nodes (left or right) implies the polarisation pattern. Additionally, with the current size of the figure, the dashed lines are extremely hard to differentiate from the continuous lines.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we state that the position of the nodes (left or right) implies the polarization pattern. Additionally, we have modified the figure size and dashed lines so that the dash lines are adequately distinguishable from the continuous lines.

      (28) Equation #7: It is confusing to use P as the number of independent simulations when P is also one of the variables/species in the network. Please consider using different notation.

      We sincerely thank the editor(s) and refer(s) for the hhelpful advice!

      Now we replace the P in current Equation #8 with Q and the P in current Equation #10 with W.

      (29) Use of "Detailed Balance": The authors used the term "detailed balance" to describe the intricate balance between the two groups of proteins when forming a polarised pattern. However, "detailed balance" is a term with a specific meaning in thermodynamics. Breaking detailed balance is a feature of nonequilibrium systems, and the polarisation phenomenon is evidently a nonequilibrium process. Using the term "detailed balance" may cause confusion, especially for readers with a physics background. It might be advisable to reconsider the terminology to avoid potential confusion and ensure clarity for readers.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      To avoid potential confusion and ensure clarity for readers, now we replace “detailed balance” with “balance”, “required balance”, or “interplay” regarding different contexts.

      (30) Terminology: The word "molecule" is used where "molecular species" would be more appropriate, e.g., lines 456 and 551. Please revise these instances.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we replace all the “molecule” by “molecular species” as suggested.

      (31) Section 2.5: This section is confusing. It isn't clear where the "method outlined" (line 464) is nor what "span an iso-velocity surface at vanishing speed" means in line 470. The sentence in lines 486-488, "An expression similar to Eq. 8 enables quantitative prediction...", is too vague. Please clarify these points and specify what the "similar expression" is and where it can be found.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify these points and specify the terms as suggested.

      (32) Software Mention: The software is only mentioned in the abstract and conclusions. It should also be mentioned where the computational pipeline is described, and the instructions available in the supplementary information need to be referenced in the main text.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we mention the software where the computational pipeline is described and reference the instructions available in the Supplemental Text.

      (33) Supplementary Material References: Several parts of the supplementary material are never referenced in the main text, including Figure S1, Movies S3-S4, and the Instructions for PolarSim. Please reference these in the main text to clarify their relevance and how they fit with the manuscript's narrative.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we add all the missing references for supplementary materials to the main text properly.

    1. eLife Assessment

      This important study uses the delay line axon model in the chick brainstem auditory circuit to examine the interactions between oligodendrocytes and axons in the formation of internodal distances. This is a significant and actively studied topic, and the authors have used this preparation to support the hypothesis that regional heterogeneity in oligodendrocytes underlies the observed variation in internodal length. In a solid series of experiments, the authors have used enhanced tetanus neurotoxin light chains, a genetically encoded silencing tool, to inhibit vesicular release from axons and support the hypothesis that regional heterogeneity among oligodendrocytes may underlie the biased nodal spacing pattern in the sound localization circuit.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?:

      Significance:

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

    3. Reviewer #2 (Public review):

      Summary:

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:<br /> (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or Mann-Whitney U test, which are much more common in the field. What is the rationale for the test choice?<br /> (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a t-test used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.<br /> (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)<br /> (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).<br /> (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:<br /> (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)<br /> (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)<br /> (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).<br /> (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Significance:

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing.

      Major comments:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      Significance:

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    5. Author response:

      General Statements

      We sincerely appreciate the constructive comments from the reviewers, which have significantly enhanced the clarity and rigor of our manuscript. Most of their suggestions have already been incorporated into the revised version. Additionally, we are conducting an additional experiment to further substantiate our conclusions, and preliminary data seem to support our findings.

      As pointed out by Reviewer #1, the regulation of neural circuit function by oligodendrocytes is currently a highly significant and actively studied topic. Our study demonstrates that regional heterogeneity in oligodendrocytes underlies the microsecond-level computational processes in the sound localization circuit. We believe this work represents a substantial contribution to the field.

      Description of the planned revisions

      • Evaluation of node formation along axons sparsely expressing eTeNT (related to Reviewer #2: comment 1)

      Based on the approximately 90% expression efficiency of A3V-eTeNT in NM neurons, we interpreted that vesicular release from NM axons was largely inhibited in the NL region, leading to the suppression of oligodendrogenesis and the subsequent emergence of unmyelinated segments. However, the effects of eTeNT on myelination are likely diverse, and a possibility remains that eTeNT directly disrupted axon-oligodendrocyte interactions, preventing oligodendrocytes from myelinating the axons expressing eTeNT.

      To test this possibility, we have initiated an additional experiment to evaluate formation of nodes along axons, while expressing eTeNT sparsely by electroporation. Preliminary results indicated that unmyelinated segments did not increase, supporting our original conclusion. After completion of the experiment, we will include the findings as a Supplementary Figure associated with Figure 6, which will provide a clearer understanding of how eTeNT influences myelination.

      Description of the revisions that have already been incorporated in the transferred manuscript

      • Revised terminology from "nodal distribution" to "nodal spacing" throughout the manuscript. (Reviewer #1: comment 1)

      • Emphasized that our analyses were focused on the main trunk of NM axons (Reviewer #1: comment 2) We explicitly stated throughout the manuscript that we analyzed the main trunk of NM axons and made it clear that our findings do not contradict those by Seidl et al. (J Neurosci 2010), showing the similar axon diameter between midline and ventral NL regions (page 7, line 7).

      • Added an explanation on the maturation of sound localization circuit (Reviewer #1: comment 3) We explained that chickens have high ability of sound localization at hatch, emphasizing that the sound localization circuit is almost fully developed by E21 (page 4, line 12).

      • Emphasized the diverse effects of neuronal activity on oligodendrocytes (page 10, line 18) (Reviewer #1: comment 4)

      • Added details on the efficiency of A3V-eTeNT expression in NM neurons to the Results section (page 8, line 5) (Reviewer #2: comment 1)  

      • Made it clear in Figure Legend for Figure 6D that the analysis was conducted under the condition, where most of the axons were labeled by A3V-eTeNT (page 31, line 9) (Reviewer #2: comment 2)

      • Clarified the rationale for statistical test selection (Reviewer #2: comment 3.1)

      • Reanalyzed all statistical data with appropriate methods using R (Reviewer #2: comment 3.2)

      • Clearly indicated which statistical tests were used in each figure (Reviewer #2: comment 3.3)

      • Clarified what n represents and N used in each experiment (Reviewer #2: comment 3.4)

      • Added individual data points to bar graphs in Figure  5 and 6 (Reviewer #2: comment 3.5)

      • Emphasized the importance of comparing the ITD circuit with that of rodents (page 11, line 32) (Reviewer #2: comment 4) 

      • Softened the expressions related to "determine" (Reviewer #2: comment 5)

      Our study demonstrates that regional differences in the intrinsic properties of oligodendrocytes are the prominent determinant of nodal spacing patterns. However, we acknowledge that this does not establish a direct causation. Accordingly, relevant expressions have been revised throughout the manuscript.

      • Added references (Reviewer #2: comment 6)

      • Corrected units in Figure 1G (Reviewer #2: comment 7)

      • Added discussion about the involvement of pre-nodal clusters in the regional differences in nodal spacing (page 9, line 35) (Reviewer #3: comment 1).

      Related to this issue, we have added new data to Figure 6I.

      • Discussed the possibility that the developmental origin and/or the pericellular microenvironment of OPCs contributed to the regional heterogeneity of oligodendrocytes (page 9, line 21) (Reviewer #3: comment 3).

      • Added references used in the response to reviewers into the main text.

      • Corrected the data error in Figure 6G, H

      • Corrected the dataset in Figure 3E

      We limited the data in Figure 3E–G to those measuring both myelin length and diameter simultaneously.

      Description of analyses that authors prefer not to carry out

      • Analysis in adult chickens (Reviewer #1: comment 3,4)

      The chick brainstem auditory circuit is nearly fully developed by E21, and we have also demonstrated that nodal spacing increases by approximately 20% while maintaining regional differences up to P9. Therefore, our study covers the period from pre-myelination to postfunctional maturation, and we think that the necessity of analyzing aged animals is small.

      • Functional evaluation of the efficiency of eTeNT suppression (Reviewer #2: comment 1)

      It is technically challenging to quantitatively assess the inhibition of vesicular release by eTeNT in NM axons given that multiple synapses from different NM axons converge onto postsynaptic neurons. In addition, previous studies have already validated the efficacy of this construct in multiple species. Therefore, we will not evaluate electrophysiologically the extent of vesicular release inhibition by eTeNT in this study. Instead, we have provided clear evidence that A3V-eTeNT is expressed efficiently and leads to notable phenotypic changes, such as the inhibition of oligodendrogenesis. (page 8, line 5).

      • Replacing figures with data averaged per animal (Reviewer #2: comment 3.4)

      Our study focuses on the distribution of morphological characteristics at the single-cell level rather than solely on group means. Averaging measurements per animal could obscure this cellular heterogeneity and potentially misrepresent our findings. Given that data distributions in our plots show clear distinctions, we believe that averaging per biological replicate is not essential in this case. If requested, we will be happy to provide the outputs of PlotsOfDifferences as supplementary source data files, similar to those used in eLife publications, for each figure.

      • Additional experiments to manipulate oligodendrocyte density (Reviewer #2: comment 5)

      We have already demonstrated that A3V-eTeNT reduces oligodendrocyte density in the NL region, and some of the arguments in our study are based on this result. Therefore, we think that further experiments are not necessary.

      • Verification of the presence of pre-nodal clusters (Reviewer #3: comment 1)

      We investigated the presence of pre-nodal clusters on NM axons, but we could not identify them in the immunohistochemistry of AnkG. As the occurrence of pre-nodal clusters varies depending on neuronal type, we consider that pre-nodal clusters are not prominent in the NM axons and that further experimental validation would not be necessary. Instead, we have added a discussion on the possibility that pre-nodal clusters contribute to regional differences in nodal spacing along NM axons (page 9, line 35).

      • Axon diameter measurements using EM (Reviewer #3: comment 2)

      This experiment was already done by Seidl et al. (2010), and hence, we do not think it necessary to repeat it. We believe that the relative differences in axon diameter between the regions could be adequately assessed using the optical approach with membrane-targeted GFP.

    1. eLife Assessment

      The authors use a multidisciplinary approach to provide a valuable link between Beta-alanine and S. Typhimurium (STM) infection and virulence. The work shows how Beta-alanine synthesis mediates zinc homeostasis regulation, possibly contributing to virulence. The work is convincing as it adds to the existing knowledge of metabolic flexibility displayed by STM during infection. However, the authors need to address some lingering concerns.

    2. Reviewer #1 (Public review):

      Summary:

      Ma & Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays important roles in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies, and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates regulation of zinc homeostatisis in Salmonella.

      Strengths and weaknesses:

      The results and model are adequately supported by the authors' data. Further work will need to be performed to learn whether the Zn2+ functions as proposed in their mechanism. By performing a small set of confirmatory experiments in S. Typhi, the authors provide some evidence of relevance to human infections.

      Impact:

      This work adds to the body of literature on the metabolic flexibility of Salmonella during infection that enable pathogenesis.

    3. Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known that Salmonella requires beta-alanine from many other studies. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which is required for pathogenesis.

      Strengths:

      Made a couple of knockouts in Salmonella and did transcriptomic to understand the global gene expression pattern

      Weaknesses:

      Transport of Beta-alanine to SCV is not yet elucidated. Is it possible to determine whether the Zn transporter is involved in B-alanine transport?

      Beta-alanine can also be shuttled to form carnosine along with histidine. If beta-alanine is channelled to make more carnosine, then the virulence phenotypes may be very different.

      Some amino acid transporters can be knocked out to see if beta-alanine uptake is perturbed. Like ArgT transport Arginine, and its mutation perturbs the uptake of beta-alanine. What is the beta-alanine concentration in the SCV? SCVS can be purified at different time points, and the Beta-alanine concentration can be measured

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading, as well as the constructive comments and advice regarding our manuscript. We have revised the manuscript based on your comments and suggestions.

      You are correct that this work has not thoroughly investigated the mechanisms underlying the roles of β-alanine, panD, and zinc in impacting Salmonella infection. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Meanwhile, we concur that additional, unknown mechanisms are involved in the virulence regulation by β-alanine in Salmonella. Our findings indicate that the double mutant Δ_panD_Δ_znuA_, which cannot synthesize β-alanine nor uptake zinc, is more attenuated than the single mutant Δ_znuA_ (Figures 5D and 6B). This suggests that the contribution of β-alanine to Salmonella's virulence is partially dependent on zinc acquisition. We have revised the related descriptions throughout the manuscript for clarity (lines 31, 304, 341,1056, 1068).

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank you for your comments and advice on our manuscript and are delighted to accept them. Salmonella Typhimurium causes systemic disease in mice, which is similar to the symptoms of typhoid fever in humans and has been widely used to explore the pathogenesis of Salmonella. Based on your comment, we have now performed additional experiments to confirm several key points of our findings in another typical Salmonella serovar, Salmonella enterica serovar Typhi, which is a human-limited serovar and the cause of typhoid fever in humans (PLoS Pathog. 2012, 8(10):e1002933).

      We constructed the panD mutant strain (ΔpanD) in the S. Typhi strain Ty2 and  subsequently compared the replication of ΔpanD with that of the Ty2 wild-type in the human THP-1 monocyte like cell line (ATCC TIB-22) using gentamicin protection assays. The results showed that the replication of ΔpanD in THP-1cells was reduced by 2.6-fold at 20 h post-infection compared to the Ty2 wild-type strain  (P < 0.01) (Figure 2_figure Supplement 3), suggesting that panD also facilitates S. Typhi replication in human macrophages and may be involved in the systemic infection of S. Typhi in humans. This result has been included in the revised manuscript. (lines 203-210).

      Based on these results, we speculate that PanD may serve as a potential target for treating Salmonella infection.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28. Latin phrases like de novo should be italicized.

      Thank you for your careful review. We have revised the manuscript thoroughly (Lines 28, 65, 77, 106, 171, 173, 214, 1002, 1023, 1078).

      (2) Line 45. 'survival' typo.

      We have corrected it in the revised manuscript (Line 45).

      (3) Line 57. What evidence or prior work supports the SCV of macrophages in a nutrient-poor environment? Citation needed.

      The relevant reference has now been added (lines 62-63).

      (4) Lines 65-68. If an 'increasing number of studies have focused' on this topic, please cite them here.

      The relevant reference has now been added (lines 72-73).

      (5) Lines 69-71. Citations are needed for these claims.

      The relevant reference has now been added (lines 76-77, 79-80).

      (6) Line 76-77. Citation needed for this claim.

      The relevant reference has now been added (lines 84, 86).

      (7) Line 116-122, and Figure 1C, and Figure 1 legend. An important claim in this work is that the amino acid content of the macrophage cytoplasm is different +/- STM infection. The authors need to explain this result more carefully and define their acronyms. What is VIP, Log2 FC, etc.? What do the colors in Figure 1C mean? They are not defined. If possible, it would be more approachable to list these as molar concentrations, weight/cell, or number of molecules/cell. The authors should calculate an effect size for each of these data to help assess if the differences are meaningful. Without this information, and a clearer explanation of what these data are, it is difficult to evaluate the authors' claim that "8 [amino acids] showed significant differences in abundance."

      Thank you for the comment. The full names of VIP (Variable Importance in the Projection) and FC (fold change) have been included in the revised manuscript. In Figure 1C of the original manuscript, pink represents the content of amino acids that increased following Salmonella infection, whereas blue signifies the content of amino acids that decreased after Salmonella infection.

      Based on your suggestion, we have revised Figure 1C (now Figure 1C, D in the revised manuscript) and the content of amino acids is now expressed as weight per cell (ng/ 10<sup>7</sup> cells). The legend has been updated accordingly. (lines 9931-997).

      (8) Line 134-138. Additional controls are required for this experiment. By adding a nutrient (B-Ala) you have increased the nutrient availability and growth potential of the bacteria. This may not relate to anything special to B-Ala. Perhaps the addition of another amino acid, or sugar, would have a similar impact. Further, this result would be more compelling if the authors demonstrated a dose-dependent effect of B-Ala addition.

      Thank you for the comment. To further confirm that host-derived β-alanine can promote intracellular Salmonella replication, we have added varying concentrations of β-alanine (0.5, 1, 2, and 4 mM) to the culture medium (RPMI) of RAW264.7 cells. Subsequently, we infected these cells with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations indicate that the addition of 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the increase in Salmonella intracellular replication was dose-dependent, as illustrated in the revised Figure 1E. These findings suggest that host-derived β-alanine facilitates Salmonella replication inside macrophages. We have included these results in the revised manuscript (lines 141-149).

      (9) Lines 181-184, and Figure 2E. In addition to the fold-change replication data, here and elsewhere the authors should provide raw CFU counts for data transparency.

      Thank you for bringing this to our attention. In this work, we have utilized “fold intracellular replication (20 h intracellular bacterial CFU/ 2 h intracellular bacterial CFU)” to illustrate the differences in intracellular replication of different Salmonella strains in macrophages. The term “fold intracellular replication” is commonly employed in recently published reports (eg. FEMS Microbiol Lett. 2024, 9;371:fnae067; mBio. 2024, 15(7):e0112824; Front Microbiol. 2024, 14:1340143). To ensure data transparency, we have included the raw CFU counts in the source data file.

      (10) Line 197. Why employ i.p. injection of STM? As a non-typhoidal serovar, STM infection is enteric, and so i.p. injection seems very artificial if the goal is to understand the role B-Ala synthesis in disease.

      Thank you for the comment. Salmonella can induce gastroenteritis or systemic infection, which are associated with its capacity to invade intestinal epithelial cells and replicate within macrophages, respectively. In this study, using gentamicin protection assays and immunofluorescence analysis, we demonstrated that β-alanine is crucial for Salmonella replication inside macrophages. Since replication in macrophages is a key determinant of systemic Salmonella infection, we hypothesized that β-alanine also affects Salmonella systemic infection in vivo. Intraperitoneal (i.p.) injection enables Salmonella to disseminate directly to systemic sites via the lymphatic and bloodstream systems, bypassing the need for intestinal invasion (Microbiol Res. 2023, 275:127460; Int Immunopharmacol. 2016, 31:233-8). Thus, we conducted the mice infection assays via intraperitoneal (i.p.) injection to ascertain whether β-alanine affects systemic Salmonella infection. We have included the description in the revised manuscript to enhance clarity. (lines 217-221).

      Whether β-alanine influences Salmonella invasion of intestinal epithelial cells and intestinal colonization has not been investigated in this work; this issue will be explored in our future studies.

      (11) Line 207-214 and Figure 3. If the hypothesis is that B-Ala mediates STM survival/virulence through enhancing metabolism in the SCV and intracellular niche, why did the authors not investigate/enumerate STM in this niche in their in vivo studies?

      Thank you for the comment. Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (cpanD) within the macrophages of the mouse liver. The findings indicated that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (refer to Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      (12) Figure 4B - the down genes label is cut off.

      Thank you for your careful review. We have corrected it in the revised Figure 4B.

      (13) Line 260-265. SPI-2 needs to be defined and introduced, as do other terms here, to make the work approachable to non-STM specialists.

      The introduction of SPI-2 has been added to the revised manuscript. (Lines 290-292).

      (14) Line 300-301. Additional experiments are needed to support the claim that "data indicate that β-alanine promotes in vivo virulence of Salmonella, partially by increasing the expression of zinc transporter genes." Gene up- or down-regulation does not necessarily have any meaningful impact on function or activity. The authors here need an assay that confirms that the function of znuA is disrupted, such as examining the cell Zn2+ content in vivo at different levels of B-Ala exposure and/or panD activity. Moreover, more Zn2+ is not necessarily beneficial for STM, at levels too high zinc can exert cell toxicity. So, the authors have a correlation but no data supporting this mechanism explains their observations of virulence and infection. How much Zn2+ is ideal for STM growth?

      Thank you for the comment. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, the efficient acquisition of zinc may play a crucial role in the survival and replication of Salmonella within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella utilizes the high-affinity ZnuABC zinc transporter to maximize zinc availability within host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages.

      You are correct that more zinc is not necessarily beneficial for Salmonella, as excessive zinc can inhibit the growth of Salmonella. Considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentrations during Salmonella's growth within macrophages. We have included a discussion on this matter in the revised manuscript.t (lines 459-466).

      (15) Figure 6B. Related to the above, these data would be more compelling with higher n and a dose-dependent response demonstrated for Zn2+ addition. This is a central point of the manuscript, and effectively what the authors propose as the underlying mechanism, and it should be more robustly substantiated.

      Thank you for the comment. As stated in the previous response, we were unable to directly assess the bacterial zinc concentration during Salmonella growth within macrophages. Instead, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. Moreover, considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentration during Salmonella's growth within macrophages.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. To further investigate the mechanisms by which β-alanine, panD, and zinc influence Salmonella infection, we have conducted additional experiments as suggested. For instance, we examined the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_). This approach indirectly reflects zinc acquisition by intracellular Salmonella, as it is challenging to isolate sufficient amounts of the bacteria from infected cells or tissues for zinc concentration measurement. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared to that in WT-infected counterparts (Figures 5E and 6A). This suggests that the panD gene and β-alanine are crucial for Salmonella to absorb zinc from host cells. This new information has been included in the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth. (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910 ). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, efficient zinc uptake could be crucial for Salmonella survival and replication within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella exploits the high-affinity ZnuABC zinc transporter to maximize zinc availability in host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages. We have addressed this issue in the revised manuscript (lines 459-466).

      Reviewer #2 (Recommendations for the authors):

      A few general clarifications and suggested experiments:

      (1) Metabolome analysis: Salmonella can itself produce b-alanine. Given that it is isolated from infected cells where salmonella has scavenged b-alanine from host cytosol as well as produced it, how b-alanine levels went down in metabolome analysis is confusing.

      Thank you for the comment. The method for targeted metabolic profiling is conducted as outlined in a recently published paper by our group (Nat Commun. 2021, 12(1):879). To prevent delays and changes in metabolite concentrations during the separation of bacterial contents from macrophages, we determined the combined metabolite concentrations directly from infected cells and Salmonella. We observed that each Salmonella cell contained only 0.01%-0.02% of the concentration of each corresponding combined metabolite. Approximately 94% of the infected macrophages contained no more than ten bacteria at 8 hours post-infection, confirming that the combined metabolites were predominantly from the host. We have included an explanation of this issue in the method section. (lines 557-560).

      (2) What is the basal level of b-alanine produced by macrophages? How was 1 mM conc. chosen?

      According to our results, the content of β-alanine in uninfected RAW264.7 cells is 26-33 μM/10<sup>7</sup> cell (700-900 ng/10<sup>7</sup> cell). The 1 mM concentration was chosen based on a published report (Appl Microbiol Biotechnol. 2004, 65(5):576-82).

      Additionally, we have supplemented the culture medium (RPMI) of RAW264.7 cells with 0.5, 1, 2, and 4 mM β-alanine and subsequently infected them with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations revealed that the supplementation with 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the addition of β-alanine to the infected cells resulted in a dose-dependent increase in Salmonella intracellular replication, as depicted in Figure 1E. These findings further support the notion that host-derived β-alanine facilitates Salmonella replication within macrophages. This data has been incorporated into the revised manuscript (lines 141-149).

      (3) The antimicrobial activity of macrophages preventing the growth of intracellular Salmonella will primarily be governed by genes such as GBPs, defensins, nitric oxide, etc. The expression of these genes should be tested rather than cytokines which are secreted with little effect on intracellular Salmonella.

      Thank you for the suggestion. We have investigated the levels of ROS (reactive oxygen species) and RNS (reactive nitrogen species) in Salmonella-infected RAW264.7 cells, both in the presence and absence of 1 mM β-alanine. The results indicated that β-alanine did not affect the ROS and RNS levels in RAW 264.7 cells (Figure 1_figure Supplement 1), suggesting that β-alanine does not influence the antimicrobial activity of macrophages. We have included these results in the revised manuscript (lines150-153).

      (4) For animal experiments, how many times was the experiment repeated? Can the animal experiment be done with b-alanine supplementation and panD mutant? Can the liver be stained to detect the bacteria?

      Thank you for the comment.

      i) Mouse infection assays were conducted twice, with at least 2 mice (n ≥ 2) in each injection group. The combined data from the two experiments was used for statistical analysis. This information has been added to the revised manuscript. (lines 678-681).

      ii) As suggested, mice infected with the panD mutant (Δ_panD_) were administered β-alanine (500 mg/kg/day, Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial burden in the liver and spleen and the body weight of the infected mice were measured. The results indicated that administering β-alanine to mice did not affect the bacterial burden of ΔpanD in the liver and spleen nor did it influence the body weight of the infected mice (please refer to Author response image 1 below). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly synthesized into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      Author response image 1.

      iii) Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (c_panD_) within the macrophages of the mouse liver. The findings indicate that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It has been reported that β-alanine is transported into eukaryotic cells via the TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Acta Physiol (Oxf). 2015, 213(1):191-212; Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Biochim Biophys Acta. 1994, 1194(1):44-52.).

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      According to the published report, translocation of SPI2 effector proteins induces the formation of specific tubular membrane compartments extend from the SCV, known as Salmonella-induced filaments (SIFs) (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology (Reading). 2012, 158(Pt 5):1147-1161). The membranes and lumens of both SIFs and SCVs form a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). We hypothesize that β-alanine may enter SCVs from the cytoplasm of macrophages via SIFs. This information has been included in the revised manuscript (lines 56-61).

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to identify the transporter of β-alanine in Salmonella, but we found that the CycA transporter, which transports β-alanine in Escherichia coli, does not function in the same manner in Salmonella, despite Salmonella being closely related to E. coli.

      BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the comment. Our findings indicated that β-alanine levels were reduced in Salmonella-infected RAW264.7 cells. Furthermore, the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells significantly enhanced Salmonella replication, suggesting that the intracellular Salmonella utilize host-derived β-alanine for their growth. However, to date, we have not identified the transporter responsible for the uptake of exogenous β-alanine into the Salmonella cytosol.

      Moreover, we have discovered that the replication of the Salmonella panD mutant within macrophages and its virulence in mice are significantly reduced compared to the wild type (WT), indicating that the de novo synthesis of β-alanine is crucial for Salmonella's intracellular replication and virulence.

      These results indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages.

      Reviewer #3 (Recommendations for the authors):

      Cite this paper from 1985, which talks about the role of beta-alanine in Salmonella infection J Gen Microbiol,. 1985 May;131(5):1083-90. doi: 10.1099/00221287-131-5-1083. A Salmonella typhimurium strain defective in uracil catabolism and beta-alanine synthesis, T P West, T W Traut, M S Shanley, G A O'Donovan

      We have now cited this paper in the revised manuscript (lines 82-83).

      (2) BasC- can be important for beta-alanine transport. CycA transporter was not found to be involved in beta-alanine. However, it is important to find out which transporter is required for the uptake of beta-alaine.

      Thank you for pointing it out. We agree that it is important to determine which transporter is necessary for the uptake of β-alanine in Salmonella. BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (3) Bacteria being quite stringent with its energy resources, it is unlikely that it will use de novo synthesis if the host resources are available. Only if the host resources are depleted, can it turn on the de novo synthesis involving panD. What is the status of fold-replication of panD mutant in the presence of exogenous addition of beta-alanine?

      Thank you for the comment. The addition of 1 to 4 mM of β-alanine increased the replication of the panD mutant (Δ_panD_) in RAW264.7 cells by 1.7- to 3.1-fold. This increase in Salmonella intracellular replication was dose-dependent, as shown in Figure 2H of the revised manuscript, further illustrating that host-derived β-alanine promotes Salmonella replication inside macrophages.

      We agree that bacteria are quite stringent with their energy resources. The results of this work indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages. We speculate that Salmonella relies on a large amount of β-alanine to efficiently replicate in macrophages, thereby highlighting the importance of β-alanine for Salmonella intracellular growth. We have discussed this issue in the revised manuscript. (lines 392-396).

      (4) 100% survival of animals infected with panD mutant is a bit of concern. What happens when beta-alanine is fed to mice and infected with panD mutant?

      Thank you for the comment. As suggested, mice infected with the panD mutant (ΔpanD) were administered β-alanine (500 mg/kg/day, as reported in Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial load in the liver and spleen, as well as the body weight of the infected mice, were measured. The results indicated that administering β-alanine did not affect the bacterial load of Δ_panD_ in the liver and spleen nor did it influence the body weight of the infected mice (refer to Author response image 1). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly converted into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      (5) How does beta-alanine from macrophages' cytosol enter the SCV.

      Thank you for pointing it out. According to published reports, the translocation of SPI2 effectors triggers the formation of specialized tubular membrane compartments, known as Salmonella-induced filaments (SIFs), which extend from the SCV (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology. 2012, 158:1147-1161). The membranes and lumens of SIFs and SCVs create a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). Consequently, it is plausible that β-alanine enters SCVs from the macrophage cytoplasm via SIFs. This information has been included in the revised manuscript.(lines 56-61).

      (6) It would be essential to dissect the role of exogenous beta-alanine and the use of de novo synthesized beta-alanine.

      We agree that it is essential to dissect the role of exogenous β-alanine and the use of de novo synthesized β-alanine. Our results indicate that Salmonella-infected macrophages exhibited lower levels of β-alanine compared to mock-infected macrophages. Furthermore, β-alanine supplementation in the cell medium enhanced Salmonella replication within macrophages in a dose-dependent manner, revealing that Salmonella utilizes host-derived β-alanine to promote intracellular replication. Additionally, a deficiency in the biosynthesis of β-alanine, resulting from mutation of the rate-limiting gene panD, led to reduced Salmonella replication in macrophages and systemic infection in mice. This suggests that Salmonella also employs bacterial-derived β-alanine to enhance intracellular replication and pathogenicity.

      We sought to identify the main transporters responsible for β-alanine uptake in Salmonella. Unfortunately, we have not yet found the transporter. We will address this issue in our future work.

    1. eLife Assessment

      By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof. Preliminary simulations and experiments on small circuits are solid, but further investigations are needed to demonstrate the broad applicability and scalability of the method.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      - On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      - Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. 

      We thank the referee for this summary of our work. 

      Strengths: 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. 

      We thank the referee for highlighting the potential value of our proposed method.

      Weaknesses: 

      The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated. 

      We agree with these two points and have rewritten the manuscript, in particular highlighting the considerable future work that remains to be done to establish the broad applicability and scalability of our method.

      In the rewritten manuscript we explicitly spell out potential practical issues and we explicitly state that our presented proof–of–principle feasibility study does not guarantee that our method will successfully work in systems beyond the narrowly sampled test circuits. This helps readers to clearly distinguish between what we claim to have done from what remains to be done. The re-written parts and additional clarifications are:

      Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study” (p. 10).

      Reviewer #2 (Public Review): 

      Summary: 

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality. 

      The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in one of the eight negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, the data support the practical usefulness of the proposed approach. 

      We thank the referee for their summary of our work.

      Strengths: 

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations. 

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations. 

      We thank the referee for summarizing the value of our work. 

      Caveats: 

      The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.  

      We thank the referee for this suggestion. In the revised manuscript we now define causal effects right before the statement of the main theorem of the main text (p. 2). We have also added a definition of the causal network arrows in the caption of Fig. 1 to help readers better understand our central claim.

      The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down: 

      (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0). 

      (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example. 

      The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.  

      We thank the referee for bringing to our attention these edge cases that indeed violate our theorem as stated. In the revised manuscript we have “built a fence” around these edge cases by adding two requirements to the premise of our theorem: First, we have added the requirement that the degradation rate does not decay to zero for any possible realization. That is, if beta(t) is the degradation rate of X and Y for a particular cell over time, then taking the time average of beta(t) over all time must be non-zero. Second, we have added the requirement that the system has evolved for enough time such that the dual reporter averages <x> and <y>, along with the covariances Cov(x, z_{k}) and Cov(y, z_{k}) have reached a time-independent stationary state.  

      With these requirements, no assumptions need to be made about the initial conditions of the system, because any differences in the initial conditions will decay away as the system reaches stationarity. For instance, the referee’s example (1) is not possible with these requirements because beta(t) can no longer remain zero. Additionally, example (2) is no longer possible because the time average of the degradation rate would be zero, which is no longer allowed (i.e., we would have that integral from 0 to T of b(0)exp(-t)/T dt =  0 when T goes to infinity). 

      Note that adding the condition that degradation cannot decay to exactly zero does not reduce the biological applicability of the theorem. But as the referee correctly points out any mathematical theorem needs to be accurately stated and stand on its own regardless of whether biological systems could realize particular edge cases. Also note, that the requirement that the cellular ensemble has reached a time-independent distribution of cell-to-cell variability can be (approximately) experimentally verified by taking snapshots of ensemble variability at two sufficiently separate different moments in time. 

      In response to the referee’s comment, we have added the above requirements when stating the theorem in the main text. We have also added the requirement of non-decay of the degradation rate to the definition of the system in SI Sec. 4, along with the stationarity requirement in theorem 1 in SI Sec 5. We have also added mathematical details to the proof of the invariant in SI Sec 5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. They propose and experimentally demonstrate the utility of this idea with a synthetic reporter system in bacteria. 

      The paper is well written and clearly outlines the principle, the mathematical invariant relationship both to give the reader an intuitive understanding of why the relationship must be true and in their mathematical derivation of the proof of Theorem 1. 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. However, the practical utility of this method may not be straightforward and potentially be quite difficult to execute. We think this work could offer a platform to advance the field of network inference, but would encourage the authors to address the following comments. 

      We thank the reviewer for the positive comments on readability, summarizing the value of our work, as well as the critical comments below that helped us improve the manuscript.

      Major comments: 

      (1) Although the invariant identity seems theoretically sound, the data from synthetic engineered circuits in this manuscript do not support that the invariant holds for natural causal relations between genes in wild-type cells. In all the positive control synthetic circuits (numbers 1 to 4) the target gene Z i.e. RFP was always on the plasmid, and in circuit #4 there was an additional endogenous copy. The authors recapitulate the X-to-Z causality in circuits 1, 2, and 3 but not 4. Ultimately, the utility of this method lies in the ability to capture causality from endogenous correlations, this observation suggests that the method might not be useful for that task. 

      We thank the referee for their careful reading of our synthetic circuits and sincerely apologize for an error in our description of circuit #4 in the schematic of Table S2 of the supplement. We incorrectly stated that this circuit contained a chromosomally expressed RFP. In fact, in circuit #4 RFP was only on the plasmid just like in the circuits #1-3. We have corrected the schematic in the revised manuscript and have verified that the other circuits are correctly depicted.

      In the revised manuscript, we now explicitly spell out that all our “positive control” test cases had the genes of interest expressed on plasmids, and that we have not shown that our method successfully detected causal interactions in a chromosomally encoded gene regulatory circuit, see additional statements in Sec. “Causally connected genes that break the invariant” on p. 6. 

      In the absence of any explicit experimental evidence, it is then important to consider whether chromosomally encoded circuits are expected to cause problems for our method which is based on a fluctuation test. Due to plasmid copy number fluctuations, X and Z will fluctuate significantly more when expressed on plasmids than when expressed chromosomally. However, because this additional variability is shared between X and Z it does not help our analysis which relies on stochastic differences in X and Z expression due to “intrinsic noise” effects downstream of copy number fluctuations. The additional “extrinsic noise” fluctuations due to plasmid copy number variability would wash out violations of Eq. (2) rather than amplify them. If anything, we thus expect our test cases to have been harder to analyze than endogenous fluctuations. This theoretical expectation is indeed borne out by numerical test cases presented in the revised supplement where plasmid copy fluctuations severely reduced the violations of Eq. 2, see new additional SI Sec. 15. 

      Additionally, the case of the outlier circuit (number 12) suggests that exogenous expression of certain genes may lead to an imbalance of natural stoichiometry and lead to indirect effects on target genes which can be misinterpreted as causal relations. Knocking out the endogenous copy may potentially ameliorate this issue but that remains to be tested. 

      We agree with the referee that the expression of exogenous genetic reporters can potentially affect cellular physiology and lead to undesired effects. In the revised manuscript we now explicitly spell out that the metabolic burden or the phototoxicity of introducing fluorescent proteins could in principle cause artificial interactions that do not correspond to the natural gene regulatory network, see Sec. “Proposed additional tests” on p. 8.

      However, it is also important to consider that the test circuit #12 represents a synthetic circuit with genes that were expressed at extremely high levels (discussed in 3rd paragraph of Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit”, p. 8), which led to the presumed cellular burden. Arguably, natural systems would not typically exhibit such high expression levels, but importantly even if they did, our method does not necessarily rely on fluorescently tagged proteins but can, in principle, also be applied to other methods such as transcript counting through sequencing or in-situ hybridization of fluorescent probes.  

      Ultimately, the value of this manuscript will be greatly elevated if the authors successfully demonstrate the recapitulation of some known naturally existing causal and non-causal relations. For this, the authors can choose any endogenous gene Z that is causally controlled by gene X. The gene X can be on the exogenous plasmid along with the reporter and the shared promoter. Same for another gene Z' which is not causally controlled by gene X. Potentially a knockout of endogenous X may be required but it might depend  on what genes are chosen. 

      If the authors think the above experiments are outside the scope of this manuscript, they should at least address these issues and comment on how this method could be effectively used by other labs to deduce causal relations between their favorite genes. 

      Because a full analysis of naturally occurring gene interactions was beyond the scope of our work, we agree with the referee’s suggestion to add a section to discuss the limitations of our experimental results. In the revised manuscript we reiterate that additional investigations are needed to show that the method works to detect causal interactions between endogenous genes, see Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study”  (p. 9). In the original manuscript we explicitly spelled out how other researchers can potentially carry out this further work in the subsections titled “Transcriptional dual reporters” (p. 3) and ”Translational dual reporters” (p. 3).  In the revised manuscript, we have added a section “Proposed additional tests” (p. 8) in which we propose an experiment analogous to the one proposed by the referee above, involving an endogenous gene circuit found in E. coli, as an example to test our invariant. 

      (2) For a theoretical exposition that is convincing, we suggest the authors simulate a larger network (for instance, a network with >10 nodes), like the one shown schematically in Figure 1, and demonstrate that the invariant relationship holds for the causally disconnected entities, but is violated for the causally related entities. It would also be interesting to see if any quantification for the casual distance between "X" and the different causally related entities could be inferred.  

      We thank the referee for this suggestion. We have added SI Sec. 14 where we present simulation results of a larger network with 10 nodes. We find that all of the components not affected by X satisfy Eq. (2) as they must. However, it is important to consider that we have analytically proven the invariant of Eq. (2) for all possible systems. It provably applies equally to networks with 5, 100, or 10,000 components. The main purpose of the simulations presented in Fig. (2) is to illustrate our results and to show that correlation coefficients do not satisfy such an invariant. However, they are not used as a proof of our mathematical statements.

      We thank the referee for the interesting suggestion of quantifying a “causal distance”. Unfortunately, the degree to which Eq. (2) is violated cannot directly equate to an absolute measure for the “causal distance” of an interaction. This is because both the strength of the interaction and the size of the stochastic fluctuations in X affect the degree to which Eq. (2) is violated. The distance from the line should thus be interpreted as a lower bound on the causal effect from X to Z because we do not know the magnitude of stochastic effects inherent to the expression of the dual reporters X and Y. While the dual reporters X and Y are identically regulated, they will differ due to stochastic fluctuations. Propagation of these fluctuations from X to Z are what creates an asymmetry between the normalized covariances. In the most extreme example, if X and Y do not exhibit any stochastic fluctuations we have x(t)=y(t) for all times and Eq. (2) will not be violated even in the presence of a strong causal link from X to Z.

      However, it might be possible to infer a relative causal distance to compare causal interactions within cells.

      That is, in a given network, the normalized covariances between X, Y and two other components of interest Z1, Z2 that are affected by X can be compared. If the asymmetry between (η𝑥𝑧1 , η𝑦𝑧1) is larger than the asymmetry between (η𝑥𝑧2 , η𝑦𝑧2) , then we might be able to conclude that X affects Z1 with a stronger interaction than the interaction from X to Z2, because here the intrinsic fluctuations in X are the same in both cases. 

      In response to the referee’s comment and to test the idea of a relative causal distance, we have simulated a larger network made of 10 components. In this network, X affects a cascade of components called Z8, Z9, and Z10, see the additional SI Sec. 14. Here the idea of a causal distance can be defined as the distance down the cascade: Z8 is closest to X and so has the largest causal strength, whereas Z10 has the weakest. Indeed, simulating this system we find that the asymmetry between η𝑥𝑧8 and η𝑦𝑧8 is the largest whereas that between  η𝑥𝑧10 and η𝑦𝑧10 the smallest. We also find that all of the components not affected by X have normalized covariances that satisfy Eq. (2). This result suggests that the relative causal distance or strength in a network could potentially be estimated from the degree of the violations of Eq. (2). 

      However, we note that these are preliminary results. In the case of the specific regulatory cascade now considered in SI Sec. 14, the idea of a causal distance can be well defined. Once feedback is introduced into the system, this definition may no longer make sense. For instance, consider the same network that we simulate in SI Sec. 14, but where the most downstream component in the cascade, Z10, feeds back and affects X and Y. In such a circuit it is unclear whether Z8 or Z10 is “causally closer” to X. A more thorough theoretical analysis, equipped with a more universal quantitative definition for causal distance or strength, would be needed to deduce what information can be inferred from the relative distances in the violations of Eq. (2). While this defines an interesting research question, answering it goes beyond the scope of the current manuscript. 

      Minor comments: 

      - The method relies on the gene X and the reporter Y having the same control which would result in similar dynamics. The authors do not quantitatively compare the YFP and CFP expression if this indeed holds for the synthetic circuits. It would be useful to know how much deviation between the two can be tolerated while not affecting the outcome. 

      We thank the referee for their comment. The invariant of Eq. (2) is indeed only guaranteed to hold only when the transcription rate of Y is proportional to that of X. How much levels of X and Y covary depends on the stochastic effects intrinsic to the expression of the dual reporters as well as how similar the transcriptional control of X and Y is. The stochastic difference between X and Y is exactly what we exploit. 

      However, in the limit of high YFP and CFP levels, intrinsic fluctuations that cause stochastic expression differences between X and Y become negligible and we can directly infer whether they are indeed tightly co-regulated from time-traces: Below, we show two single cell traces taken with our experimental setup in which the YFP and CFP fluorescence trajectories are almost exactly proportional. Both of these traces are from circuit #10 as defined in Table. S4. 

      Author response image 1.

      We chose the above traces because they showed the highest correlation between YFP and CFP levels. Other traces for lower expression levels have lower correlations due to effects of intrinsic noise (see Tables S2-S4). However, the existence of one trace in which YFP is almost perfectly proportional to CFP throughout can only occur if the YFP and CFP genes are under the same control. And, since the control of YFP and CFP genes in all of our synthetic circuits are identical (with the same promoters and plasmid positions), these data strongly suggest that our dual reporters are tightly co-regulated in all the synthetic circuits. Moreover, the negative control experiments presented in Fig. 3E provide a natural consistency check that the YFP and CFP are under the same control and satisfy Eq. (1).

      We agree that it would be useful to know how much the X and Y production rates can differ for Eq. (2) to hold. Importantly, our proven theorem already allows for the rates to differ by an unspecified proportionality constant. In response to the referee’s comment we have derived a more general condition under which our approach holds. In the newly added SI Sec. 7 we prove that Eq. (2) holds also when rates differ as long as the difference is stochastic in nature with an average of zero. We also prove that Eq. (2) holds in the face of multiplicative noise that is independent of the X and Y production rates.

      However, the production rates of X and Y cannot differ in all ways. Some types of differences between the X and Y production rates can lead to deviations of Eq. (2) even when there is no causal interaction. To highlight this, we added the results of simulations of a toy model in which the X and Y production rates differ by an additive noise term that does not average to zero, see Fig. S19B of the newly added SI Sec. 7.

      - The invariant should potentially hold true for any biological species that are causally related e.g. protein-protein interactions. Also, this method could potentially find many applications in eukaryotic cells. Although it's outside the scope of current work to experimentally demonstrate such applications, the authors should comment on experimental strategies to apply this method to overcome potential pitfalls (e.g. presence of enhancers in eukaryotic cells). 

      We thank the referee for this suggestion. We agree that there are potential pitfalls that could come into effect when our proposed approach is applied on more complex systems such as eukaryotic gene expression. In response to the referee’s comment, we have added an explicit discussion of these potential pitfalls in the discussion section “Limitations of this study” (see p. 10). 

      In particular, in eukaryotes there are many genes in which promoter sequences may not be the sole factor determining transcription rates. Other factors that can be involved in gene regulation include the presence of enhancers, epigenetic modifications, and bursts in gene expression, to name a few. We thus propose a few strategies, which include positioning the passive reporter at a similar gene loci as the gene of interest, measuring the gene regulation activities of the gene of interest and its passive reporter using a separate method, and exploiting the invariant with a third gene, where it is known there is no causal interaction, as a consistency check. In addition, we include in the SI a new section SI Sec. 8 which shows that the invariant holds in the face of many types of bursty gene expression dynamics.

      However, the above is not a comprehensive list. Some of the issues the referee mentions are serious and may not be straightforward to overcome. We now spell this out explicitly in the revised manuscript (p. 10). 

      - In the legend of Fig. 1, the sentence "Data points here are for..." is missing a few words, or needs to be rephrased. 

      We thank the referee for this comment. We have rewritten the figure caption, which now reads “Data points are numerical simulations of specific example networks (see SI for details) to illustrate the analytically proven theorem of Eq. 2.”

      - Fig. 2 talks about the uncertainties associated with each point on the scatter plots. However, it is difficult to understand the quantification in such a plot. It would be great to have a plot quantifying the uncertainties in the invariant relation for the different topologies studied, specifically in order to understand if one topology is consistently deviating more from the x=y line than the other topologies studied here.  

      We thank the referee for this suggestion. In the supplement of the revised manuscript we have added supplemental Figs. S3, S4, and  S5 to separately quantify the uncertainty of the difference processes plotted in Fig. 2 and have added a new section (SI Sec. 11) to discuss the processes simulated in Fig. 2 in more detail. In short, each simulated process generated less than ~5% of outliers when considering 95% confidence intervals (with the max percentage deviation being 5.01% for process 5, see Fig. S5). These outliers were then simulated over a larger number of simulations to reduce the sampling error, which resulted in 0% of outliers (see Sec. “Confidence intervals for finite sampling error” on Materials and Methods on p. 11). Some simulated processes generated larger percentage errors in the normalized covariances than others, but this is expected as different processes have different dynamics which will result in different degrees of sampling of the underlying distributions.

      Note, that the invariant of Eq. 2 is analytically proven for all tested topologies as none of the topologies include a causal effect from X to Z. Any deviation of the numerical data from the straight line prediction of Eq. 2 (right column in Fig. 2C) is due to the finite sampling of a stochastic process to estimate the true covariance from the sampling covariance. Any given parameter set was simulated several times which allowed us to estimate the sampling error from differences in between repeated samples. In the additional SI figures we now quantify this error for the different topologies. 

      In addition to the above changes we want to highlight that the purpose of the simulations presented in Fig. (2) is not to prove our statements or explore the behavior of different topologies. The purpose of the data presented in the right column of Fig. 2C is to illustrate the theoretical invariant and act as a numerical sanity check of our analytically proven result. In contrast, the data in the left column of Fig 2C illustrates that the correlations do not satisfy an invariant like Eq. 2 which applies to covariances but not correlations.  

      - The legend for Fig. 3 seems to end abruptly. There likely needs to be more.  

      We thank the referee for catching this mistake. We have corrected the accidentally truncated figure caption of Fig. 3.

      - There is a typo in equation (5.3) on page 23 of supplementary material, there should be x instead of y in the degradation equation of x. 

      We thank the referee for catching this mistake which has been corrected in the revised manuscript.

      - In the supplemental material, to understand the unexpected novel discovery of causality, Figure S5 is presented. However, this doesn't give the context for other negative controls designed, and the effect of rfp dynamics (which can be seen in the plots both in the main paper and the supplement) in the growth rate of cells in those constructs. As a baseline, it would be nice to have those figures.  

      We thank the referee for this suggestion. We have now included representative RFP traces with the growth rates for other negative control circuits, see Fig. S10. In addition, we have now included the cross correlation functions between RFP and growth rate in these negative control circuits, see Fig. S10A. While in all cases, RFP and growth rate are negatively correlated, the outlier circuit exhibits the largest negative correlation.

      The suggested comparison of the referee thus highlights that – in isolation – a negative correlation between RFP and growth rate is only weak evidence for our hypothesized causal interaction because negative correlations can result from the effect of growth rate affecting volume dilution and thus RFP concentration. Crucially, we thus additionally considered the overall variability of growth rate and found the outlier circuit has the largest growth rate variability which is indicative of something that is affecting the growth rate of those cells, see Fig. S10B. To compare the magnitude of RFP variability against other strains requires constraining the comparison group to other synthetic circuits that have RFP located on the chromosome rather than a plasmid. This is why we compare the CV of the outlier with the CV of circuit #5, which corresponds to the “regular” repressilator (i.e., the outlier circuit without the endogenous lacI gene). As an additional comparison, we computed the CV for a strain of E. coli that does not contain a synthetic plasmid at all, but still contains the RFP gene on the chromosome. We find that the CVs in the outlier circuit to be larger than in these two additional circuits, suggesting that the outlier circuit causes additional fluctuations in the RFP and growth rate. We now spell this out explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      The referee is correct that the above arguments are only circumstantial evidence, but they do show that the data is consistent with a plausible explanation of the hypothesized causal interaction. Our main evidence for an RpoS mediated stress response that explains the deviations from Eq. 2 in the outlier circuit is the perturbation experiment in which the deviation disappears for the RpoS knockout strain. We now spell out this argument explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      Reviewer #2 (Recommendations For The Authors): 

      The proof of theorem 1 relies on an earlier result, lemma 1. Lemma 1 only guarantees the existence of a "dummy" system that satisfies the separation requirement and preserves the dynamics of X and Y. However, in principle, it may be possible to maintain the dynamics of X and Y while still changing the relationship between Cov(X,Zk) and Cov(Y,Zk). This could occur if the dynamics of Zk differ in a particular way between the original system and the dummy system. So lemma 1 needs to be a little stronger- it needs  to mention that the dynamics of Zk are preserved, or something along these lines. The proof of lemma 1 appears to contain the necessary ingredients for what is actually needed, but this should be clarified. 

      We agree with the referee that this is an important distinction. Lemma 1 does in fact guarantee that any component Zk that is not affected by X and Y will have the same dynamics in the “dummy” system. However, as the referee points out, this is not stated in the lemma statement nor in the proof of the lemma. In response to the referee’s comment, we have made it clear in the lemma statement that the Zk dynamics are preserved in the “dummy” system, and we have also added details to the proof to show that this is the case, see Lemma 1 on p. 27 of the SI. 

      Readers who are familiar with chemical reaction diagrams, but not birth-death process diagrams may waste some time trying to interpret Equation 1 as a chemical reaction diagram with some sort of rate constant as a label on each arrow (I did this). It may be helpful to either provide a self-contained definition of the notation used, or mention a source where the necessary definitions can be found. 

      We agree with the referee. In the revised manuscript we have added a description of the notation used below Equation 1 of the main text, see p. 2. The notational overloading of the “arrow notation” is a perennial problem in the field and we thank the referee for reminding us of the need to clarify what the arrows mean in our diagrams.

      It would be helpful if the authors could propose a rule for deciding whether dependence is detected or not. As it stands presently, the output of the approach seems to be a chart like that in Figure 3D where you show eta_xz and eta_yz with confidence interval bars and the reader must visually assess whether the points more-or-less fall on the line of unity. It would be better to have some systematic procedure for making a "yes or no" call as to whether a causal link was detected or not. Having a systematic detection rule would allow you to make a call as to whether dependence in circuit 3 was detected or not. It would also allow you or a future effort to evaluate the true positive rate of the approach in simulated settings. 

      We thank the referee for this suggestion. In the revised manuscript we have added an explicit rule for detecting causality using the invariant of Eq. (2). Specifically, Eq. (2) can be re-written as r = 1 where r is the covariability ratio r = etaXZ/etaYZ. In that case, given 95% confidence intervals for the experimentally determined covariability ratio r, we say that there is a causal interaction if the confidence intervals overlap with the value of r = 1. 

      This corresponds to a null hypothesis test at the 2.5% significance level. The reason that it is at 2.5% significance and not 5% significance is as follows. Let’s say we measure a covariability ratio of r_m, and the 95% confidence interval is [r_m - e_m, r_m + e_m] for some error e_m. Without loss of generality, let’s say that r_m > 1 (the same applies if r_m < 1). This means that Prob(r < r_m - e_m) = 2.5% and Prob(r > r_m + e_m) = 2.5% , where r is the actual value of the covariability ratio. Under the null hypothesis that there is no causal interaction, we set r = 1. However, we now have Prob(1 < r_m + e_m) = 0, because we know that r_m > 1 and so we must have r_m + e_m > 1. The probability that the value of 1 falls outside the error bars is therefore 2.5% under the null hypothesis. 

      This proposed rule is the same rule that we used to detect statistical outliers in our simulations, where we found a “false positive” rate of 2.3% over 6522 simulated systems due to statistical sampling error (as discussed in the Materials and Methods section). In response to the referee’s suggestion, we have added the section “A rule for detecting causality in the face of measurement uncertainty” (p. 4). We also apply the rule to the experimental data and find that the rule detects 2/4 causal interactions in Fig. 3D. We have clarified this in the Fig. 3D caption, in the main text, and we have added a figure in the SI (Fig. S2) where we apply the null hypothesis test on the measured covariability ratios. 

      Note, whether the third interaction is “detected” or not depends on the cut-off value used. We picked the most common 95% rule to be consistent with the traditional statistical approaches. With this rule one of the data points lies right at the cusp of detection, but ultimately falls into the “undetected” category if a strictly binary answer is sought under the above rule. 

      It would be helpful to mention what happens when the abundance of a species hits zero. Specifically, there are two ways to interpret the arrow from X to X+d with a W on top: 

      Interpretation (1): 

      P(X+d | X) = W if X+d {greater than or equal to} 0  P(X+d | X) = 0 if X_i+d_i < 0 for at least one i 

      Interpretation (2): 

      P(X+d | X) = W regardless of whether X+d < 0  W = 0 whenever X_i < d_i for at least one i 

      Interpretation (1) corresponds to a graph where the states are indexed on the non-negative integers. Interpretation (2) corresponds to a graph where the states are indexed on the integers (positive or negative), and W is responsible for enforcing the non-negativity of mass. I believe you need the second interpretation because the first interpretation leads to problems with your definition of causality. For example, consider the reaction: 

      (Na, K) -- 0.1 --> (Na-1, K+1) 

      This could occur if Na and K are the intracellular concentrations of sodium and potassium ions in a cell that has an ATP-driven sodium-potassium exchanger whose rate is limited by the frequency with which extracellular potassium ions happen to flow by. Per the definition of causality found in the appendix, Na has no causal effect on K since Na does not show up in the reaction rate term. However, under interpretation (1), Na clearly has a causal effect on K according to a reasonable definition of causality because if Na=0, then the reaction cannot proceed, whereas if Na>0 then it can. However, under interpretation (2), the reaction above cannot exist and so this scenario is excluded. 

      We thank the referee for this comment that helped us clarify the meaning of arrows with propensities. In short, interpretation (2) corresponds to the definition of our stochastic systems. This is consistent with the standard notation used for the chemical master equation. As the referee points out, because molecular abundances cannot be negative, any biochemical system must then have the property that the propensity of a reaction must be equal to zero when the system is in a state in which an occurrence of that reaction would take one of the abundances to negative numbers. Stochastic networks that do not have this property cannot correspond to biochemical reaction networks.

      In the revised manuscript, we now spell this out explicitly to avoid any confusion, see SI page 25.

      Furthermore, we additionally discuss the referee’s example in which the rate of exchanging Na for K through an ion exchanger is approximately independent of the intracellular Na concentration. Because biochemical systems cannot become negative, it cannot be that the rate is truly constant, but at some point for low concentrations must go down until it becomes exactly zero for zero molecules. 

      Importantly, agreement with Eq. (2) does not imply that there is no causal effect from X to Zk. It is the deviation from Eq. (2) that implies the existence of a causal effect from X to Zk. Therefore, although the above referee’s example would constitute a causal interaction in our framework, it would not lead to a deviation of Eq. (2) because the fluctuations in Na (which we exploit) do not propagate to K. From a practical point of view, our method thus detects whether changing X over the observed range affects the production and degradation rates of Zk. 

      In the course of setting up the negative control benchmark circuits, a perturbation-based causal validation would be nice. For instance, first, verify that X does not affect Z by intervening on X (e.g. changing its copy number or putting it under the control of an inducible promoter), and ensuring that Z's activity is not affected by such interventions upon X. This approach would help to adjudicate questions of whether the negative control circuits actually have an unknown causal link. The existing benchmark is already reasonably solid in my view, and I do not know how feasible this would be with the authors' setup, but I think that a perturbation-based validation could in principle be the gold standard benchmark.  

      We agree that additional perturbation-based validation tests on all of the negative control circuits would indeed improve the evidence that our method worked as advertised. While such experiments are indeed beyond the scope of our current work we now explicitly point out the benefits of such additional controls in the revised Discussion.

      Below is a series of comments about typography, mostly about section 4 of the supplement. 

      We thank the referee for their careful reading and highlighting those mistakes.

      At the bottom of page 21, Z_aff is defined as the set of components that are affected by X. However, later Z_aff seems to refer to components affected by X or Y. For instance, in the proof of lemma 1, it is written "However, because a is part of z_aff, the {ak} variables must be affected by X and/or Y." 

      We thank the referee for catching this mistake. We have changed the definition of Z_aff throughout the supplement to refer to components affected by X or Y. If it can be experimentally ensured that Y is a passive reporter (i.e., it does not affect other components in the cell), then the theorem can only be violated if X affects Z. 

      In the equation following Eq 5.2, W_k and d_k should be W_i and d_i ?  

      Yes, the referee is correct. In the revised manuscript we have corrected W_k and d_k to W_i and d_i. 

      In Eq 5.3 in the lower-left transition diagram, I think a "y" should be an "x". 

      Yes, the referee is correct. In the revised manuscript  we have fixed this typo.

      In the master equation above Eq 5.5, the "R" terms for the y reactions are missing the alpha term, and I think two of the beta terms need to be multiplied by x and y respectively.  

      The referee is correct. In the revised manuscript  we have fixed this typo.

      The notation of Eq 5.8, where z_k(t) is the conditional expectation of z_kt, is strange and difficult to follow. Why does z_k(t) not get a bar over it like its counterparts for x, y, R, and beta? The bars, although not a perfect solution, do help.  

      We agree with the referee’s comment and have added further explanations to define the averages in question, see SI p. 28. In short, when we condition on the history of the components not affected by X or Y, we in effect condition on the time trajectories of z_{k} (when it is part of the components not affected by X and/or Y) and beta (since it only depends on the components not affected by X or Y). We thus previously did not include the bars when taking the averages of these components in the conditional space because the conditioning in effect sets their time-trajectories (so they become deterministic functions of time). In the revised manuscript we now also denote these conditional expectations with bars and we have added comments to the proof to clarify their definition.

      I think it would be helpful to show how the relationship <x>=<y>/alpha is obtained from Eq 5.5.  

      We agree with this suggestion and have added the derivations, see Eqs. (5.9) - (5.13) in the revised SI. 

      In the main text, the legend of Fig 3 cuts off mid-sentence.  

      We thank the referee for catching this mistake which has been fixed in the revised manuscript.

    1. eLife Assessment

      This important study provides compelling data from in vitro models and patient-derived samples to demonstrate how modulation of GSK3 activity can reprogram macrophages, revealing potential therapeutic applications in inflammatory diseases such as severe COVID-19. The study stands out for its clear and systematic presentation, convincing experimental approach, and the relevance of its findings to the field of immunology.

    2. Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study, though some minor points could be addressed for clarity and completeness, as outlined in the recommendations below.

      Many thanks for these comments. Please find below the response to the  specific recommendations.

      Recommendations for the authors:

      (1) In lines 263-266, the term "MoMac-VERSE" and its associated clusters are introduced without sufficient explanation. The authors should provide additional clarification on what these clusters represent and how they were derived.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (2) In line 283, the reference labeled "2227" appears incorrect. It seems to be a formatting issue, and it might refer to references 22-27. Please verify and correct.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (3) In line 353, the reference is incorrect. Please reviewe ensure that all references are properly cited throughout the manuscript.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (4) In line 368, one of the patient samples shows a decreased IL-10 response after CHIR treatment. The authors should acknowledge the heterogeneity in the primary cell responses and adjust the conclusion accordingly to reflect this variability.

      We have modified the text following the reviewer´s comment, and acknowledge the heterogeneity in the production of IL-10 after GSK3 inhibition in the three analyzed samples. The modified text now states: "Consistent with these findings, CHIR-AMØ exhibited higher expression of MAFB (Figure 6F) whose increase correlated with an augmented secretion of Legumain, CCL2 and IL-10 (Figure 6G), although the latter was only seen in two samples, probably reflecting heterogeneity in primary cell responses."

      (5) Figure 7B: the UMAP shows 4 populations, but according to the visualization in the sup fig 3, there should be many more clusters. How do the authors explain this? Are these patient-specific clusters? Also, IMs can be separated into at least subpopulations. Can the authors plot also bona fide macrophage markers expressed by all subpopulations?

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      Addresing the first question, UMAPs in old Figure 7B and old Supplementary Figure 3B had a different  number of clusters because old Figure 7B was derived from old Supplementary Figure 3B after grouping macrophage clusters according to the expression of previously defined markers and to limit the weight of donor-specific clusters. Specifically, the macrophage clusters from old Figure 7B were re-grouped according to the differential expression of:

      - FCN1 (including cluster 4, 7 and 12 from Figure 7B): Infiltrating monocytes.

      - FABP4 and TYMS-negative (including clusters 0, 2, 5 and 13 from Figure 7B), or MARCO and INHBA (cluster 9 from Figure 7B) or PPARG (cluster 11 from Figure 7B): Alveolar macrophages (AMØ).

      - TYMS, MKI67, TOP2A and NUSAP1 (cluster 15 from Figure 7B): Proliferating AMØ.

      - LYVE1 or RNASE1 or LGMN (including clusters 1, 3, 6, 8, 10 and 14 from Figure 7B): Interstitial Macrophages (IMØ).

      As the reviewer suggested, this type of UMAP plot yielded a large number of donor-specific clusters. To avoid such a misleading representation, we have now plotted UMAPs after running scVI reduction in every case. The new plots are now shown in new Figure 7A, new Figure 7B, new Supplementary Figure 3 (containing the information of the 21310 single-cell transcriptomes from GSE128033) and the novel Supplementary Figure 4 (with the information of the single-cell transcriptomes from human lung macrophages from GSE128033).

      Finally, to address the last issue, we have now plotted the expression of genes used for macrophage definition (CD163, FABP4, LYVE1, FCN1), as well as proliferation-associated genes (TYMS, MKI67, TOP2A, NUSAP1) and other bona fide macrophage marker genes (SPI1, FOLR2) in Supplementary Figure 4C.

      (6) statistics should be indicated in every figure legend and for every subfigure where applicable.

      We have now included the specific statistical procedure applied for each Figure and panel.

      Reviewer 2 (Public review):

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study

      In an effort to address the comment of the reviewer, we have performed more detailed analysis of the kinetics and dose-response effects of GSK3 inhibition, which are now provided as new Supplementary Figure 3A.

      Regarding additional mechanistic studies, we decided to explore the relationship between inactive GSK3β and MAFB levels at the early stages of M-CSF- or GM-CSF-driven monocyte-to-macrophage differentiation. These experiments, performed in three independent monocyte preparations, indicated that, 48 hours along differentiation, M-CSF promoted a huge increase in both MAFB expression and a slight (albeit significant) rise in inactive GSK3β (P-Ser9-GSK3β) (compared to either untreated or GM-CSF-treated monocytes), further supporting the macrophage re-programming effect of GSK3. However, since the M-CSF-promoted increase in MAFB levels was much robust than the enhancement in inactive GSK3β, we hypothesize that proteasomal degradation of MAFB might be also distinct between M-CSF- (M-MØ) and GM-CSF-dependent (GM-MØ) monocyte-derived macrophages.

      Author response image 1.

      Total GSK3β, p-Ser9-GSK3β and MAFB levels in three preparations of freshly purified monocytes either unstimulated (-) or stimulated with M-CSF (10 ng/ml) or GM-CSF (1,000 U/ml) at different time points, as determined by Western blot (upper panel). Vinculin protein levels were determined as protein loading control. Mean ± SEM of the GSK3β/Vinculin, p-Ser9-GSK3β/Vinculin, and MAFB/Vinculin protein ratios from the three independent experiments are shown (lower panel) (paired Student’s t test: *, p<0.05; ****, p<0.001).

      Based on this finding, we then determined proteasome activity in fully differentiated M-CSF- and GM-CSF-dependent monocyte-derived macrophages. Use of the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) in M-MØ and GM-MØ, either untreated or exposed to the proteasome inhibitor MG132, revealed that immune-proteasomal and proteasomal activity is significantly stronger in GM-MØ than in M-MØ,  as demonstrated in assays for chymotrypsin-like (ANW) and branched amino acid preferring (PAL) activity (immunoproteasome), and trypsin-like (KQL) activity (both proteasome and immunoproteasome). This result suggested that, indeed, immunoproteasomal activity might contribute to the differential expression of MAFB in M-MØ and GM-MØ.

      Author response image 2.

      Immunoproteasome activity in M-MØ and GM-MØ, either untreated or exposed to MG132, as determined using the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) on the three indicated peptides (upper panel).  Mean ± SEM of three independent experiments are shown (paired Student’s t test: *, p<0.05) (lower panel).

      Consequently, we next set up experiments to assess whether the proteasome inhibitor MG132 was capable of enhancing the expression of MAFB-dependent genes in GM-MØ. Preliminary results of GM-MØ exposure to MG132 for 6 hours indicated an increase in the expression of MAFB protein and the MAFB-dependent genes LGMN and IL10. , as well as a reduction in the expression of the GM-MØ-specific gene CD1C.

      Author response image 3.

      A. Schematic representation of the exposure of MG132 to GM-MØ for 6 hours. B. MAFB protein levels in four independent preparations of GM-MØ exposed to either DMSO (DMSO-GM-MØ) or the proteasome inhibitor MG132 (MG132-GM-MØ) for 6 hours, as determined by Western blot (left panel). GAPDH protein levels were determined as protein loading control. Mean ± SEM of the MAFB/GAPDH protein ratios from the four independent experiments are shown (right panel) (paired Student’s t test: ***, p<0.005). C. Relative mRNA levels of the indicated genes in DMSO-GM-MØ and MG132-GM-MØ, as determined by RT-PCR on seven independent samples (paired Student’s t test: ***, p<0.005; ****, p<0.001).

      Unfortunately, this proteasome inhibitor (MG-132) caused a great reduction in cell viability after 6-8 hours. Since a similar decrease in cell viability was observed upon analysis with the ONX-0914 immunoproteasome inhibitor, we could not procede any further with this approach.

      Given the reviewer´s suggestion to include mechanistic insights to the manuscript, we are now providing these results (and the corresponding figures) only for the reviewer´s information and to make clear our attempts to comply with his/her request.

      Recommendations for the authors:

      The results are of interest, and only some minor issues need to be addressed to strengthen the conclusions of the study.

      We gratefully thank the reviewer for his/her comments. 

      (1) This study employs a single dose of 10 μM of the GSK3 inhibitor CHIR-99021 for 48 hours, which is reasonable for in vitro studies. However, further investigation into the effect of different doses and exposure times could provide additional insight into optimal dosing and durability of reprogramming effects. In addition, would an alternative GSK3 inhibitors have comparable effects?

      Following the reviewer suggestion, we have performed a kinetics and dose-response analysis of the effects of CHIR-99021, using MAFB protein levels as a readout. This experiments is now shown in new Supplementary Figure 1A, that replaces the old Supplementary Figure 1A panel where a shorter kinetics was presented. Results of this new experiment indicates a maximal effect of 10µM CHIR-99021, and that the effect of the inhibitor becomes maximal 24-48 hours after treatment. The text has been modified accordingly, and it now states: "Kinetics and dose-response analysis of the effects of CHIR-99021 on MAFB expression showed that maximal protein levels were achieved after a 24-48 hour exposure to 10µM CHIR-99021 (Supplementary Figure 1A), conditions that were used hereafter."

      Regarding the use of alternative GSK3 inhibitors, we had already provided that information in Supplementary Figure 1B, where the effects of SB-216763 (10 µM) or LiCl (10 mM) were evaluated. The huge reversal of the Tyr<sup>216</sup>/Ser<sup>9</sup> GSK3β phosphorylation ratio observed with CHIR-99021 was not seen with other GSK3 inhibitors, as indicated in the text. In any event, we believe that the relevance of this result with SB-216763 or LiCl is minimized by the results generated after siRNA-mediated GSK3 knockdown (shown in Figure 4), that completely reproduced the effects seen with CHIR-99021.

      (2) Why in the "reanalysis of single cell RNAseq data" section, the authors use Seurat v5 (R) but then change to python, and the other way around?

      As indicated in the documentation for Integrative Analysis in Seurat v5 (https://satijalab.org/seurat/articles/seurat5_integration), scVIIntegration requires reticulate package which allow us to run Python environment in R.

      (3) When the authors refer to the clusters enriched in MoMacVERSE, they use the labels of the clusters (for example #2 or #3). I would suggest using the annotations described in the original paper, to link it to the bibliography published through the labels established in the paper.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (4) In line 309. Is there any significance on the "having a stronger effect"?

      We apologize for the misleading sentence. The phrase has been modified for better clarity, and the text now states: "Like CHIR-99021, silencing of both GSK3A and GSK3B augmented the expression of MAFB, with the simultaneous silencing of both GSK3A and GSK3B genes having a stronger effect (Figure 4B), and modulated the expression of 329 genes (Figure 4C,D)."

      (5) In line 337, "(22)(27)", are these references?

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (6) In the single-cell reanalysis, could you please provide integration Qc plots? It would be interesting to have it on the paper.

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      As requested by the reviewer, we are now providing the Qc plots for the re-analysis in the new Supplementary Figures 3 and 4.

    1. eLife Assessment

      This important work presents a stochastic branching process model of tumour-immune coevolution, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and the summary statistics of sequencing data of bulk and single-cell sequencing of a tumour. The evidence is currently incomplete: statistical comparisons between the observed mutational burden distribution and theoretical predictions in the absence of immune selection should be carried out. Conclusions should be tested extensively for robustness/sensitivity to parameters.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors noted, a general dearth of good models in this space. The authors have made important progress on the topic by introducing a stochastic branching process model of antigenicity/immunogenicity and measuring the proportion of simulated tumors that go extinct. The model is extensively explored, and the authors provide some nice theoretical results in addition to simulated results.

      Major comments

      The text in lines 183-191 is intuitively and nicely explained. However, I am not sure all of it follows from the figure panels in Figure 2. For example, the authors refer to a mutation that has a large immunogenicity, but it's not shown how many mutations, or the relative size of the mutations in Figure 2. The same comment holds true for the claim that spikes also arise for mutations with low antigenicity.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors developed a model of tumour-immune dynamics, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and summary statistics of sequencing data.

      Strengths:

      This novel modeling framework addresses an important and timely topic. The authors consider the useful question of how bulk and single-cell sequencing may provide insights into the tumour-immune interactions and selection processes.

      Weaknesses:

      One set of conclusions presented in the paper is the presence of cyclic dynamics between effector/cancer cells, antigenicity, and immunogenicity. However, these conclusions are supported in the manuscript by two sample trajectories of stochastic simulations, and these provide mixed support for the conclusions (i.e. the phasing asynchrony described in the text does not seem to apply to Figure 2C). Similarly, the authors also find immune selection effects on the shape of the mutational burden in Figure 5 D/H using a qualitative comparison between the distributions and theoretical predictions in the absence of immune response. However the discrepancy appears quite small in panel D, and there are no quantitative comparisons provided to evaluate the significance. An analysis of the robustness of all the conclusions to parameter variation is missing. Lastly, the role of the Appendix results in the main messages of the paper is unclear.

    1. eLife Assessment

      This valuable study shows that locomotion-related modulations in the mouse visual cortex are not uniform but primarily affect neurons in muscarinic receptor-negative patches, which receive projections from specific cortical areas. While the evidence is mostly solid, some uncertainties remain regarding the link between anatomical data and functional measurements. The study should be of interest to neuroscientists interested in state modulation of cortical function.

    2. Reviewer #1 (Public review):

      Processing in the primary visual cortex (V1) of mice is not only based on sensory inputs but also strongly modulated by locomotion. In this study, Meier et al. ask whether neurons that are modulated by locomotion form clusters in V1. Their work is based on previous studies from their lab establishing a modularity in the organization of primary visual cortex based on M2-muscarinic-acetylcholine-receptor-positive patches and interpatches (Ji et al. 2015, D'Souza et al. 2019). In these studies, they have highlighted the clustering of specific visual pathways and inhibition. In the current study, they extend this modularity to motor inputs, confirming a clustering of locomotion modulated neurons but also show that these clusters overlap with the M2-negative interpatches of layer 1. Finally, they establish a blueprint for visual processing streams in V1, segregating projections to and from lateral visual areas (LM, AL, and RL) from projections to and from the lateral areas, including the visual area PM, the retrosplenial cortex (RSP), and the secondary motor area (MOs).

      Conceptually, this study provides an important finding in the organization of locomotion-related signaling in primary visual cortex, which clearly has substantial implications for sensory processing in visual cortex. While the anatomical data are solid, the link to physiology is incomplete. In conclusion, there are numerous issues that leave the main findings in some doubt, so the authors have some work to do before I find this story convincing.

      Major issues:

      (1) The major results in this study rely on proper quantification of neuronal responses during resting and running. Recently, it has been reported that hemodynamic occlusion can strongly influence measurements of fluorescent changes using two-photon imaging (Yogesh et al. 2025, doi.org/10.1101/2024.10.29.620650). Since it is unclear whether there is an inherent bias in vasculature and hemodynamic occlusion in M2 patches and interpatches, a quantification of the effect of hemodynamic occlusion would be necessary. This control would ideally be done using mice with GFP expression to test if there is still a clustering of locomotion-modulated neurons that overlaps with M2-negative interpatches. Alternatively, the authors should at the very least quantify the vascularization in M2 patches and interpatches.

      (2) To assess the effects, the authors use a correlation analysis for many of their findings (e.g., Figures 2b,c, 4j,k, ...). This, however, is inappropriate to assess the significance of the results. I suggest redoing all statistics with hierarchical bootstrap sampling (Saravanan et al. 2020, PMID: 33644783) or similar.

      (3) The authors use two different measures to assess whether and to what extent a neuron is locomotion sensitive, the LMI and "locomotion-responsive". While the LMI is defined based on recording in the light and dark (Figure 2), the "locomotion-responsiveness" is defined only in the dark (Figure 3a,c,d). The link between the two measures should be clarified.

      a) Additionally, Figure 2b shows higher average LMI for interpatches, but the locomotion-responsive fraction is similar in interpatches and patches (relative number of pairs in Figure 3c and Figure 3d). How do the authors explain this discrepancy?

      b) How is the LMI calculated - based on the average or the maximum response over stimuli? One particular stimulus? If the LMI is defined for each stimulus separately, what is plotted in Figure 2b?

      (4) In the last panels of Figures 4-7, the authors analyze the alignment of cell bodies with the M2 patches. While in superficial layers it might be straightforward to align the cell body locations with the M2 patches and interpatches in layer 1, this alignment does not appear to be trivial for deeper layers. The authors should provide additional material to convince the reader of the proper alignment.

      (5) Related to point 4 above - Given the importance of a proper alignment of M2 patches with the in vivo imaging, the in vivo - ex vivo alignment should be more convincing than Figure 1 C-E. Measuring M2 patches in vivo (as the authors have tried to do) would have provided more solid evidence. Have the authors tried to remove the dura for their in vivo imaging to increase signal-to-noise? In any case, more examples of proper alignment are necessary.

      (6) The authors state that locomotion selectively affects M2-/M2- pairs based on Figure 3c. However, to make this claim, there should be a significant difference between the correlation of stimulus-driven noise of M2-/M2- locomotion-responsive pairs and M2-/M2- locomotion-unresponsive pairs, AND no significant difference in the same analysis for M2+/M2+ pairs (i.e., testing the differences between the bars in Figure 3c and Figure 3d).

    3. Reviewer #2 (Public review):

      Summary:

      Meier et al. explore the variability of locomotion-related modulations in mouse area V1. They present 4 major findings: V1 L2/3 neurons beneath M2- interpatches are more strongly locomotion-modulated than those beneath M2+ patches, while V1 L2/3 neurons are more strongly orientation tuned. They then use viral tracing to examine the relationship of M2- interpatches and M2+ patches with inputs from and outputs to HVOs, MO, RSP, and LP, and find evidence for different closed-loop subnetworks within L1; these relationships, however, are more complicated for cell bodies in L2/3. Finally, they also describe an overlap between M2- interpatches and SOM+ dendrites/axons.

      Strengths:

      The strength of the manuscript is the detailed anatomical quantification of closed-loop connectivity, and the description of the organizing principles of M2- interpatches and M2+ patches.

      Weaknesses:

      The major weakness of the manuscript is the lack of a direct connection between the functional and the anatomical data, and the somewhat puzzling effects observed in the analysis of noise correlations. The former issue might be alleviated by modelling, where the authors could explore the space of possibilities that could explain the functional data based on the anatomical connectivity. Some control analyses could be done, for the comparison of noise correlations.

    4. Reviewer #3 (Public review):

      The authors build on the large body of their previous research, which showed that the mouse primary visual cortex is organised into two types of clusters, M2+ and M2-, which exhibit distinct input patterns from thalamus and higher visual cortical areas and distinct visual tuning preferences. The current study reveals that a like-to-like projection from within-cluster neurons to the areas that provide feedback projections and, furthermore, that neurons in the M2- clusters are more strongly affected by non-visual signals about the locomotion of the animal.

      The study adds fundamental insights to our understanding of the principles of cortical organisation and computation, specifically how the cortex integrates sensory and action-related signals.

      While the tracing data are very convincing, data analysis should be strengthened to support the claims:

      (1) The locomotion modulation index (LMI) compares the mean activity during running and not running but does not seem to account for differences between visual stimuli, so that the LMI could be influenced by the neuron's visual tuning rather than its sensitivity to locomotion, e.g. if the mouse was running more when the neuron's preferred stimulus was presented. Trials should first be averaged per stimulus, and then across stimuli. Alternatively, only the preferred stimulus could be considered.

      The significance test (unpaired t-test) suffers from the same flaw. Instead an ANOVA (with stimulus parameter as factor) would resolve the problem, or testing whether fitting the data with two tuning curves (one per locomotion state) or a single curve results in a lower error (using cross-validation).

      Given that there is evidence that specific visual stimuli can induce more or less running in mice, this issue is very important to account for behavioural differences across stimuli.

      (2) All bars in Figure 2b show a lower LMI than the reported mean LMI of 0.19. This should be checked.

      (3) Correlation tests: Pearson correlation is only meaningful when applied to continuous data. A more suitable test for discrete data like the M2 patch quantile is a rank test like Kendall's coefficient of rank correlation. This applies to data in Figure 2b,c, 4j,k, Figure 2 - Supplement 2,1a, etc.

      (4) How OSI was determined should be clarified. Specifically, were R_pref and R_ortho the mean responses to the two opposite movement directions? Similarly, how was the half-width at half-maximum of orientation determined? From the fits in Figure 2a, it looks like the widths of both Gaussians can be different.

      (5) The correlation measures in Figure 3 would greatly benefit from additional analyses to help interpretation of the results.

      a) Correlations between neurons typically increase with increasing firing rates (e.g., de la Rocha J, Doiron B, Shea-Brown E, Josić K, Reyes A. 2007. Correlation between neural spike trains increases with firing rate. Nature 448:802-6. doi:10.1038/nature06028). Could the higher correlations in M2+ pairs (Figure 3a) be explained by higher firing rates in M2+ compared to M2- neurons?

      b) To determine correlations in Figure 3a, trials during locomotion and stationarity were pooled. As locomotion impacts the firing rate of the neurons, it would be helpful to separate correlations between the two states, locomotion vs stationarity, so the measures reflect something closer to "noise correlations" rather than tuning to locomotion.

      c) Similarly, in Figure 3b, I wonder whether the large correlations in M2- pairs are driven by locomotion rather than functional connectivity. As suggested in b, a better test of noise correlations would be to account for locomotion, i.e., separate trials by stimulus identity and locomotion state. To prevent conditions with few trials from having greater weight in the overall noise correlations, I suggest the authors first z-score responses per condition, then determine noise correlations across all trials (as explained in Renart et al., 2010).

      d) Correlations in Figure 3a,b should be tested with an ANOVA and a control for multiple tests.

      (6) In plots like Figure 4j-l, it would be very informative to show individual measures (per ROI and mouse) in addition to mean +- SEM. As the counts are low (<10) it wouldn't obstruct the plot.

      (7) The caption of Figure 4l says that most retrogradely labelled cells are located in L2/3. However, the plot only shows data from L2/3 and a single section of L4, so one cannot compare it to other layers. Can the authors corroborate the claim with data from other layers?

      (8) Methods:<br /> The authors should provide more details on the visual stimuli: What was the background on which gratings were presented? How long was the inter-stimulus interval? What was presented during the inter-stimulus interval? How large were gratings used to map tuning to SF, TF, and orientation?

    1. eLife Assessment

      The findings are important and intriguing, with theoretical or practical implications beyond a single subfield. The computational methods employed are clever and sophisticated and the strength of evidence is convincing. Many of the methodological concerns raised after the first round of review were addressed in the revised version, although all three reviewers also highlighted that the exploratory nature of the paper and the lack of clarity regarding the hypotheses make it hard to assess the impact of the results on existing theories.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use a sophisticated task design and Bayesian computational modeling to test their hypothesis that information generalization (operationalized as a combination of self-insertion and social contagion) in social situations is disrupted in Borderline Personality Disorder. Their main finding relates to the observation that two different models best fit the two tested groups: While the model assuming both self-insertion and social contagion to be present when estimating others' social value preferences fit the control group best, a model assuming neither of these processes provided the best fit to BPD participants.

      Strengths:

      The revisions have substantially strengthened the paper and the manuscript is much clearer and easier to follow now. The strengths of the presented work lie in the sophisticated task design and the thorough investigation of their theory by use of mechanistic computational models to elucidate social decision-making and learning processes in BPD.

      Weaknesses:

      Some critical concerns remain after the first revision, particularly regarding the use of causal language and the clarity of the hypotheses and results, specified in the points below.

      (1) The authors frequently refer to their predictions and theory as being causal, both in the manuscript and in their response to reviewers. However, causal inference requires careful experimental design, not just statistical prediction. For example, the claim that "algorithmic differences between those with BPD and matched healthy controls" are "causal" in my opinion is not warranted by the data, as the study does not employ experimental manipulations or interventions which might predictably affect parameter values. Even if model parameters can be seen as valid proxies to latent mechanisms, this does not automatically mean that such mechanisms cause the clinical distinction between BPD and CON, they could plausibly also refer to the effects of therapy or medication. I recommend that such causal language, also implicit to expressions like "parameter influences on explicit intentional attributions", is toned down throughout the manuscript.

      (2) Although the authors have now much clearer outlined the stuy's aims, there still is a lack of clarity with respect to the authors' specific hypotheses. I understand that their primary predictions about disruptions to self-other generalisation processes underlying BPD are embedded in the four main models that are tested, but it is still unclear what specific hypotheses the authors had about group differences with respect to the tested models. I recommend the authors specify this in the introduction rather than refering to prior work where the same hypotheses may have been mentioned.

      (3) Caveats should also be added about the exploratory nature of the many parameter group comparisons. If there are any predictions about group differences that can be made based on prior literature, the authors should make such links clear.

      (4) I'm not sure I understand why the authors, after adding multiple comparison correction, now list two kinds of p-values. To me, this is misleading and precludes the point of multiple comparison corrections, I therefore recommend they report the FDR-adjusted p-values only. Likewise, if a corrected p-value is greater than 0.05 this should not be interpreted as a result.

      (5) Can the authors please elaborate why the algorithm proposed to be employed by BPD is more 'entropic', especially given both their self-priors and posteriors about partners' preferences tended to be more precise than the ones used by CON? As far as I understand, there's nothing in the data to suggest BPD predictions should be more uncertain. In fact, this leads me to wonder, similarly to what another reviewer has already suggested, whether BPD participants generate self-referential priors over others in the same way CON participants do, they are just less favourable (i.e., in relation to oneself, but always less prosocial) - I think there is currently no model that would incorporate this possibility? It should at least be possible to explore this by checking if there is any statistical relationship between the estimated θ_ppt^m and 〖p(θ〗_par |D^0).

      "To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired trait mentalising" - I don't understand what the authors mean by this, can they please elaborate and add some explanation to the main text?

    3. Reviewer #2 (Public review):

      Summary:

      The paper investigates social-decision making, and how this changes after observing the behaviour of other people, in borderline personality disorder. The paper employs a task including three phases, the first where participants make decision on how to allocate rewards to oneself and to a virtual partner, the second where they observe the same task performed by someone else, and a third phase equivalent to phase one, but with a new partner. Using sophisticated computational modelling to analyse choice data, the study reports that borderline participants (versus controls) are more certain about their preferences in phase one, used more neutral priors and are less flexible during phase two, and are less influenced by partners in phase three.

      Strengths:

      The topic is interesting and important, and the findings are potentially intriguing. The computational methods employed is clever and sophisticated, at the cutting edge of research in the field.

      Weaknesses:

      The paper is not based on specific empirical hypotheses formulated at the outset, but, rather, it uses an exploratory approach. Indeed, the task is not chosen in order to tackle specific empirical hypotheses. This, in my view, is a limitation since the introduction reads a bit vague and it is not always clear which gaps in the literature the paper aims to fill. As a further consequence, it is not always clear how the findings speak to previous theories on the topic.

    4. Reviewer #3 (Public review):

      In this paper, the authors use a three-phase economic game to examine the tendency to engage in prosocial versus competitive exchanges with three anonymous partners. In particular, they consider individual differences in the tendency to infer about others' tendencies based on one's preferences and to update one's preferences based on observations of others' behavior. The study includes a sample of individuals diagnosed with borderline personality disorder and a matched sample of psychiatrically healthy control participants.

      On the whole, the experimental design is well-suited to the questions and the computational model analyses are thorough, including modern model-fitting procedures. I particularly appreciated the clear exposition regarding model parameterization and the descriptive Table 2 for qualitative model comparison. In the revised manuscript, the authors now provide a more thorough treatment of examining group differences in computational parameters given that the best-fitting model differed by group. They also examine the connection of their task and findings to related research focusing on self-other representation and mentalization (e.g., Story et al., 2024).

      The authors note that the task does not encourage competition and instead captures individual differences in the motivation to allocate rewards to oneself and others in an interdependent setting. The paper could have been strengthened by clarifying how the Social Value Orientation framework can be used to interpret the motivations and behavior of BPD versus CON participants on the task. Although the authors note that their approach makes "clear and transparent a priori predictions," the paper could be improved by providing a clear and consolidated statement of these predictions so that the results could be interpreted vis-a-vis any a priori hypotheses.

      Finally, the authors have amended their individual difference analyses to examine psychometric measures such as the CTQ alongside computational model parameter estimate differences. I appreciate that these analyses are described as exploratory. The approach of using a partial correlation network with bootstrapping (and permutation) was interesting, but the logic of the analysis was not clearly stated. In particular, there are large group (Table 1: CON vs. BPD) differences in the measures introduced into this network. As a result, it is hard to understand whether any partial correlations are driven primarily by mean differences in severity (correlations tend to be inflated in extreme groups designs due to the absence of observation in middle of scales forming each bivariate distribution). I would have found these exploratory analyses more revealing if group membership was controlled for.

    5. Author response:

      The following is the authors’ response to the original reviews

      Response to the Editors’ Comments

      Thankyou for this summary of the reviews and recommendations for corrections. We respond to each in turn, and have documented each correction with specific examples contained within our response to reviewers below.

      ‘They all recommend to clarify the link between hypotheses and analyses, ground them more clearly in, and conduct critical comparisons with existing literature, and address a potential multiple comparison problem.’

      We have restructured our introduction to include the relevant literature outlined by the reviewers, and to be more clearly ground the goals of our model and broader analysis. We have additionally corrected for multiple comparisons within our exploratory associative analyses. We have additionaly sign posted exploratory tests more clearly.

      ‘Furthermore, R1 also recommends to include a formal external validation of how the model parameters relate to participant behaviour, to correct an unjustified claim of causality between childhood adversity and separation of self, and to clarify role of therapy received by patients.’

      We have now tempered our language in the abstract which unintentionally implied causality in the associative analysis between childhood trauma and other-to-self generalisation. To note, in the sense that our models provide causal explanations for behaviour across all three phases of the task, we argue that our model comparison provides some causal evidence for algorithmic biases within the BPD phenotype. We have included further details of the exclusion and inclusion criteria of the BPD participants within the methods.

      R2 specifically recommends to clarify, in the introduction, the specific aim of the paper, what is known already, and the approach to addressing it.’

      We have more thoroughly outlined the current state of the art concerning behavioural and computational approaches to self insertion and social contagion, in health and within BPD. We have linked these more clearly to the aims of the work.

      ‘R2 also makes various additional recommendations regarding clarification of missing information about model comparison, fit statistics and group comparison of parameters from different models.’

      Our model comparison approach and algorithm are outlined within the original paper for Hierarchical Bayesian Model comparison (Piray et al., 2019). We have outlined the concepts of this approach in the methods. We have now additionally improved clarity by placing descriptions of this approach more obviously in the results, and added points of greater detail in the methods, such as which statistics for comparison we extracted on the group and individual level.

      In addition, in response to the need for greater comparison of parameters from different models, we have also hierarchically force-fitted the full suite of models (M1-M4) to all participants. We report all group differences from each model individually – assuming their explanation of the data - in Table S2. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. Finally, we show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      ‘R3 additionally recommends to clarify the clinical and cognitive process relevance of the experiment, and to consider the importance of the Phase 2 findings.’

      We have now included greater reference to the assumptions in the social value orientation paradigm we use in the introduction. We have also responded to the specific point about the shift in central tendencies in phase 2 from the BPD group, noting that, while BPD participants do indeed get more relatively competitive vs. CON participants, they remain strikingly neutral with respect to the overall statespace. Importantly, model M4 does not preclude more competitive distributions existing.

      ‘Critically, they also share a concern about analyzing parameter estimates fit separately to two groups, when the best-fitting model is not shared. They propose to resolve this by considering a model that can encompass the full dynamics of the entire sample.’

      We have hierarchically force-fitted the full suite of models (M1-M4) to all participants to allow for comparison between parameters within each model assumption. We report all group differences from each model individually – assuming their explanation of the data - in Table S2 and Table S3. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. We also show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      Within model M1 and M2, the parameters quantify the degree to which participants believe their partner to be different from themselves. Under M1 and M2 model assumptions, BPD participants have meaningfully larger versus CON (Fig S10), which supports the notion that a new central tendency may be more parsimonious in phase 2 (as in the case of the optimal model for BPD, M4). We also show strong correlations across models between under M1 and M2, and the shift in central tendenices of beliefs between phase 1 and 2 under M3 and M4. This supports our primary comparison, and shows that even under non-dominant model assumptions, parameters demonstrate that BPD participants expect their partner’s relative reward preferences to be vastly different from themselves versus CON.

      ‘A final important point concerns the psychometric individual difference analyses which seem to be conducted on the full sample without considering the group structure.’

      We have now more clearly focused our psychometric analysis. We control for multiple comparisons, and compare parameters across the same model (M3) when assessing the relationship between paranoia, trauma, trait mentalising, and social contagion. We have relegated all other exploratory analyses to the supplementary material and noted where p values survive correction using False Discovery Rate.

      Reviewer 1:

      ‘The manuscript's primary weakness relates to the number of comparisons conducted and a lack of clarity in how those comparisons relate to the authors' hypotheses. The authors specify a primary prediction about disruption to information generalization in social decision making & learning processes, and it is clear from the text how their 4 main models are supposed to test this hypothesis. With regards to any further analyses however (such as the correlations between multiple clinical scales and eight different model parameters, but also individual parameter comparisons between groups), this is less clear. I recommend the authors clearly link each test to a hypothesis by specifying, for each analysis, what their specific expectations for conducted comparisons are, so a reader can assess whether the results are/aren't in line with predictions. The number of conducted tests relating to a specific hypothesis also determines whether multiple comparison corrections are warranted or not. If comparisons are exploratory in nature, this should be explicitly stated.’

      We have now corrected for multiple comparisons when examining the relationship between psychometric findings and parameters, using partial correlations and bootstrapping for robustness. These latter analyses were indeed not preregistered, and so we have more clearly signposted that these tests were exploratory. We chose to focus on the influence of psychometrics of interest on social contagion under model M3 given that this model explained a reasonable minority of behaviour in each group. We have now fully edited this section in the main text in response, and relegated all other correlations to the supplementary materials.

      ‘Furthermore, the authors present some measures for external validation of the models, including comparison between reaction times and belief shifts, and correlations between model predicted accuracy and behavioural accuracy/total scores. However it would be great to see some more formal external validation of how the model parameters relate to participant behaviour, e.g., the correlation between the number of pro-social choices and ß-values, or the correlation between the change in absolute number of pro-social choices and the change in ß. From comparing the behavioural and computational results it looks like they would correlate highly, but it would be nice to see this formally confirmed.’

      We have included this further examination within the Generative Accuracy and Recovery section:

      ‘We also assessed the relationship (Pearson rs) between modelled participant preference parameters in phase 1 and actual choice behaviour: was negatively correlated with prosocial versus competitive choices (r=-0.77, p<0.001) and individualistic versus competitive choices (r=-0.59, p<0.001); was positively correlated with individualistic versus competitive choices (r=0.53, p<0.001) and negatively correlated with prosocial versus individualistic choices (r=-0.69, p<0.001).’

      ‘The statement in the abstract that 'Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity disrupts this through separation of internalised beliefs' makes an unjustified claim of causality between childhood adversity and separation of self - and other beliefs, although the authors only present correlations. I recommend this should be rephrased to reflect the correlational nature of the results.’

      Sorry – this was unfortunate wording: we did not intend to imply causation with our second clause in the sentence mentioned. We have amended the language to make it clear this relationship is associative:

      ‘Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity is associated with separation of internalised beliefs, and makes clear causal predictions about the mechanisms of social information generalisation under uncertainty.’

      ‘Currently, from the discussion the findings seem relevant in explaining certain aberrant social learning and -decision making processes in BPD. However, I would like to see a more thorough discussion about the practical relevance of their findings in light of their observation of comparable prediction accuracy between the two groups.’

      We have included a new paragraph in the discussion to address this:

      ‘Notably, despite differing strategies, those with BPD achieved similar accuracy to CON participants in predicting their partners. All participants were more concerned with relative versus absolute reward; only those with BPD changed their strategy based on this focus. Practically this difference in BPD is captured either through disintegrated priors with a new median (M4) or very noisy, but integrated priors over partners (M1) if we assume M1 can account for the full population. In either case, the algorithm underlying the computational goal for BPD participants is far higher in entropy and emphasises a less stable or reliable process of inference. In future work, it would be important to assess this mechanism alongside momentary assessments of mood to understand whether more entropic learning processes contribute to distressing mood fluctuation.’

      ‘Relatedly, the authors mention that a primary focus of mentalization based therapy for BPD is 'restoring a stable sense of self' and 'differentiating the self from the other'. These goals are very reminiscent of the findings of the current study that individuals with BPD show lower uncertainty over their own and relative reward preferences, and that they are less susceptible to social contagion. Could the observed group differences therefore be a result of therapy rather than adverse early life experiences?’

      This is something that we wish to explore in further work. While verbal and model descriptions appear parsimonious, this is not straight forward. As we see, clinical observation and phenomenological dynamics may not necessarily match in an intuitive way to parameters of interest. It may be that compartmentalisation of self and other – as we see in BPD participants within our data – may counter-intuitively express as a less stable self. The evolutionary mechanisms that make social insertion and contagion enduring may also be the same that foster trust and learning.

      ‘Regarding partner similarity: It was unclear to me why the authors chose partners that were 50% similar when it would be at least equally interesting to investigate self-insertion and social contagion with those that are more than 50% different to ourselves? Do the authors have any assumptions or even data that shows the results still hold for situations with lower than 50% similarity?’

      While our task algorithm had a high probability to match individuals who were approximately 50% different with respect to their observed behaviour, there was variation either side of this value. The value of 50% median difference was chosen for two reasons: 1. We wanted to ensure participants had to learn about their partner to some degree relative to their own preferences and 2. we did not want to induce extreme over or under familiarity given the (now replicated) relationship between participant-partner similarity and intentional attributions (see below). Nevertheless, we did have some variation around the 50% median. Figure 3A in the top left panel demonstrates this fluctuation in participant-partner similarity and the figure legend further described this distribution (mean = 49%, sd = 12%). In future work we want to more closely manipulate the median similarity between participants and partners to understand how this facilitates or inhibits learning and generalisation.

      There is some analysis of the relationship between degrees of similiarity and behaviour. In the third paragraph of page 15 we report the influence of participant-partner similarity on reaction times. In prior work (Barnby et al., 2022; Cognition) we had shown that similarity was associated with reduced attributions of harm about a partner, irrespective of their true parameters (e.g. whether they were prosocial/competitive). We replicate this previous finding with a double dissociation illustrated in Figure 4, showing that greater discrepancies in participant-partner prosociality increases explicit harmful intent attributions (but not self-interest), and discrepancies in participant-partner individualism reduces explicit self-interest attributions (but not harmful intent). We have made these clearer in our results structure, and included FDR correction values for multiple comparisons.

      The methods section is rather dense and at least I found it difficult to keep track of the many different findings. I recommend the authors reduce the density by moving some of the secondary analyses in the supplementary materials, or alternatively, to provide an overall summary of all presented findings at the end of the Results section.

      We have now moved several of our exploratory findings into the supplementary materials, noteably the analysis of participant-partner similarity on reaction times (Fig S9), as well as the uncorrected correlation between parameters (Fig S7).

      Fig 2C) and Discussion p. 21: What do the authors mean by 'more sensitive updates'? more sensitive to what?

      We have now edited the wording to specify ‘more belief updating’ rather than ‘sensitive’ to be clearer in our language.

      P14 bottom: please specify what is meant by axial differences.

      We have changed this to ‘preference type’ rather than using the term ‘axial’.

      It may be helpful to have Supplementary Figure 1 in the main text.

      Thank you for this suggestion. Given the volume of information in the main text we hope that it is acceptable for Figure S1 to remain in the supplementary materials.

      Figure 3D bottom panel: what is the difference between left and right plots? Should one of them be alpha not beta?

      The left and right plots are of the change in standard deviation (left) and central tendency (right) of participant preference change between phase 1 and 3. This is currently noted in the figure legend, but we had added some text to be clearer that this is over prosocial-competitive beliefs specifically. We chose to use this belief as an example given the centrality of prosocial-comeptitive beliefs in the learning process in Figure 2. We also noticed a small labelling error in the bottom panels of 3D which should have noted that each plot was either with respect to the precision or mean-shift in beliefs during phase 3.

      ‘The relationship between uncertainty over the self and uncertainty over the other with respect to the change in the precision (left) and median-shift (right) in phase 3 prosocial-competitive beliefs .’

      Supplementary Figure 4: The prior presented does not look neutral to me, but rather right-leaning, so competitive, and therefore does indeed look like it was influenced by the self-model? If I am mistaken please could the authors explain why.

      This example distribution is taken from a single BPD participant. In this case, indeed, the prior is somewhat right-shifted. However, on a group level, priors over the partner were closely centred around 0 (see reported statistics in paragraph 2 under the heading ‘Phase 2 – BPD Participants Use Disintegrated and Neutral Priors). However, we understand how this may come across as misleading. For clarity we have expanded upon Figure S4 to include the phase 1 and prior phase 2 distributions for the entire BPD population for both prosocial and individualistic beliefs. This further demonstrates that those with BPD held surprisingly neutral beliefs over the expectations about their partners’ prosociality, but had minor shifts between their own individualistic preferences and the expected individualistic preferences of their partners. This is also visible in Figure S2.

      Reviewer 2:

      ‘There are two major weaknesses. First, the paper lacks focus and clarity. The introduction is rather vague and, after reading it, I remained confused about the paper's aims. Rather than relying on specific predictions, the analysis is exploratory. This implies that it is hard to keep track, and to understand the significance, of the many findings that are reported.’

      Thank you for this opportunity to be clearer in our framing of the paper. While the model makes specific causal predictions with respect to behavioural dynamics conditional on algorithmic differences, our other analyses were indeed exploratory. We did not preregister this work but now given the intriguing findings we intent to preregister our future analyses.

      We have made our introduction clearer with respect to the aims of the paper:

      ‘Our present work sought to achieve two primary goals: 1. Extend prior causal computational theories to formalise the interrelation between self-insertion and social contagion within an economic paradigm, the Intentions Game and 2., Test how a diagnosis of BPD may relate to deficits in these forms of generalisation. We propose a computational theory with testable predictions to begin addressing this question. To foreshadow our results, we found that healthy participants employ a mixed process of self-insertion and contagion to predict and align with the beliefs of their partners. In contrast, individuals with BPD exhibit distinct, disintegrated representations of self and other, despite showing similar average accuracy in their learning about partners. Our model and data suggest that the previously observed computational characteristics in BPD, such as reduced self-anchoring during ambiguous learning and a relative impermeability of the self, arise from the failure of information about others to transfer to and inform the self. By integrating separate computational findings, we provide a foundational model and a concise, dynamic paradigm to investigate uncertainty, generalization, and regulation in social interactions.’

      ‘Second, although the computational approach employed is clever and sophisticated, there is important information missing about model comparison which ultimately makes some of the results hard to assess from the perspective of the reader.’

      Our model comparison employed what is state of the art random-effects Bayesian model comparison (Piray et al., 2019; PLOS Comp. Biol.). It initially fits each individual to each model using Laplace approximation, and subsequently ‘races’ each model against each other on the group level and individual level through hierarchical constraints and random-effect considerations. We included this in the methods but have now expanded on the descrpition we used to compare models:

      In the results -

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      We added to our existing description in the methods –

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019). During fitting we added a small noise floor to distributions (2.22e<sup>-16</sup>) before normalisation for numerical stability. Parameters were estimated using the HBI in untransformed space drawing from broad priors (μM\=0, σ<sup>2</sup><sub>M</sub> = 6.5; where M\={M1, M2, M3, M4}). This process was run independently for each group. Parameters were transformed into model-relevant space for analysis. All models and hierarchical fitting was implemented in Matlab (Version R2022B). All other analyses were conducted in R (version 4.3.3; arm64 build) running on Mac OS (Ventura 13.0). We extracted individual and group level responsibilities, as well as the protected exceedance probability to assess model dominance per group.’

      (1) P3, third paragraph: please define self-insertion

      We have now more clearly defined this in the prior paragraph when introducing concepts.

      ‘To reduce uncertainty about others, theories of the relational self (Anderson & Chen, 2002) suggest that people have availble to them an extensive and well-grounded representation of themselves, leading to a readily accessible initial belief (Allport, 1924; Kreuger & Clement, 1994) that can be projected or integrated when learning about others (self-insertion).’

      (2) Introduction: the specific aim of the paper should be clarified - at the moment, it is rather vague. The authors write: "However, critical questions remain: How do humans adjudicate between self-insertion and contagion during interaction to manage interpersonal generalization? Does the uncertainty in self-other beliefs affect their generalizability? How can disruptions in interpersonal exchange during sensitive developmental periods (e.g., childhood maltreatment) inform models of psychiatric disorders?". Which of these questions is the focus of the paper? And how does the paper aim at addressing it?

      (3) Relatedly, from the introduction it is not clear whether the goal is to develop a theory of self-insertion and social contagion and test it empirically, or whether it is to study these processes in BPD, or both (or something else). Clarifying which specific question(s) is addressed is important (also clarifying what we already know about that specific question, and how the paper aims at elucidating that specific question).

      We have now included our specific aims of the paper. We note this in the above response to the reviwers general comments.

      (4) "Computational models have probed social processes in BPD, linking the BPD phenotype to a potential over-reliance on social versus internal cues (Henco et al., 2020), 'splitting' of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others' irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Previous studies have typically overlooked how self and other are represented in tandem, prompting further investigation into why any of these BPD phenotypes manifest." Not clear what the link between the first and second sentence is. Does it mean that previous computational models have focused exclusively on how other people are represented in BPD, and not on how the self is represented? Please spell this out.

      Thank you for the opportunity to be clearer in our language. We have now spelled out our point more precisely, and included some extra relevant literature helpfully pointed out by another reviewer.

      ‘Computational models have probed social processes in BPD, although almost exclusively during observational learning. The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      (5) P5, first paragraph. The description of the task used in phase 1 should be more detailed. The essential information for understanding the task is missing.

      We have updated this section to point toward Figure 1 and the Methods where the details of the task are more clearly outlined. We hope that it is acceptable not to explain the full task at this point for brevity and to not interrupt the flow of the results.

      “Detailed descriptions of the task can be found in the methods section and Figure 1.’

      (6) P5, second paragraph: briefly state how the Psychometric data were acquired (e.g., self-report).

      We have now clarified this in the text.

      ‘All participants also self-reported their trait paranoia, childhood trauma, trust beliefs, and trait mentalizing (see methods).’

      (7) "For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices". Not sure what criteria are used for distinguishing between individualistic and competitive - they look the same?

      Sorry. This paragraph was not clear that the issue is that the interpretation of the choice depends on both members of the pair of options. Here, in one pair {(self=5,other=5) vs (self=10,other=5)}, it is highly pro-social for the self to choose (5,5), sacrificing 5 points for the sake of equality. In the second pair {(self=10,other=10) vs (self=10,other=5)}, it is highly competitive to choose (10,5), denying the other 5 points at no benefit to the self. We have clarified this:

      ‘We analyzed the ‘types’ of choices participants made in each phase (Supplementary Table 1). The interpretation of a participant’s choice depends on both values in a choice. For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices. There were 12 of each pair in phases 1 and 3 (individualistic vs. prosocial; prosocial vs. competitive; individualistic vs. competitive).’  

      (8) "In phase 1, both CON and BPD participants made prosocial choices over competitive choices with similar frequency (CON=9.67[3.62]; BPD=9.60[3.57])" please report t-test - the same applies also various times below.

      We have now included the t test statistics with each instance.

      ‘In phase 3, both CON and BPD participants continued to make equally frequent prosocial versus competitive choices (CON=9.15[3.91]; BPD=9.38[3.31]; t=-0.54, p=0.59); CON participants continued to make significantly less prosocial versus individualistic choices (CON=2.03[3.45]; BPD=3.78 [4.16]; t=2.31, p=0.02). Both groups chose equally frequent individualistic versus competitive choices (CON=10.91[2.40]; BPD=10.18[2.72]; t=-0.49, p=0.62).’

      (9) P 9: "Models M2 and M3 allow for either self-insertion or social contagion to occur independently" what's the difference between M2 and M3?

      Model M2 hypothesises that participants use their own self representation as priors when learning about the other in phase 2, but are not influenced by their partner. M3 hypothesises that participants form an uncoupled prior (no self-insertion) about their partner in phase 2, and their choices in phase 3 are influenced by observing their partner in phase 2 (social contagion). In Figure 1 we illustrate the difference between M2 and M3. In Table 1 we specifically report the parameterisation differences between M2 and M3. We have also now included a correlational analysis of parameters between models to demonstrate the relationship between model parameters of equivalent value between models (Fig S11). We have also force fitted all models (M1-M4) to the data independently and reported group differences within each (see Table S2 and Table S3).

      (10) P 9, last paragraph: I did not understand the description of the Beta model.

      The beta model is outlined in detail in Table 1. We have also clarified the description of the beta model on page 9:

      ‘The ‘Beta model’ is equivalent to M1 in its causal architecture (both self-insertion and social contagion are hypothesized to occur) but differs in richness: it accommodates the possibility that participants might only consider a single dimension of relative reward allocation, which is typically emphasized in previous studies (e.g., Hula et al., 2018).’

      (11) P 9: I wonder whether one could think about more intuitive labels for the models, rather than M1, M2 etc.. This is just a suggestion, as I am not sure a short label would be feasible here.

      Thank you for this suggestion. We apologise that it is not very intitutive. The problem is that given the various terms we use to explain the different processes of generalisation that might occur between self and other, and given that each model is a different combination of each, we felt that numbering them was a lesser evil. We hope that the reader will be able to reference both Figure 1 and Table 1 to get a good feel for how the models and their causal implications differ.

      (12) Model comparison: the information about what was done for model comparison is scant, and little about fit statistics is reported. At the moment, it is hard for a reader to assess the results of the model comparison analysis.

      Model comparison and fitting was conducted using simultaneous hierarchical fitting and random-effects comparison. This is employed through the HBI package (Piray et al., 2019) where the assumptions and fitting proceedures are outlined in great detail. In short, our comparison allows for individual and group-level hierarchical fitting and comparison. This overcomes the issue of interdependence between and within model fitting within a population, which is often estimated separately.

      We have outlined this in the methods, although appreciate we do not touch upon it until the reader reaches that point. We have added a clarification statement on page 9 to rectify this:

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      (13) P 14, first paragraph: "BPD participants were also more certain about both types of preference" what are the two types of preferences?

      The two types of preferences are relative (prosocial-competitive) and absolute (individualistic) reward utility. These are expressed as b and a respectively. We have expanded the sentence in question to make this clearer:

      ‘BPD participants were also more certain about both self-preferences for absolute and relative reward ( = -0.89, 95%HDI: -1.01, -0.75; = -0.32, 95%HDI: -0.60, -0.04) versus CON participants (Figure 2B).’

      (14) "Parameter Associations with Reported Trauma, Paranoia, and Attributed Intent" the results reported here are intriguing, but not fully convincing as there is the problem of multiple comparisons. The combinations between parameters and scales are rather numerous. I suggest to correct for multiple comparisons and to flag only the findings that survive correction.

      We have now corrected this and controlled for multiple comparisons through partial correlation analysis, bootstrapping assessment for robustness, permutation testing, and False Detection Rate correction. We only report those that survive bootstrapping and permutation testing, reporting both corrected (p[fdr]) and uncorrected (p) significance.

      (15) Results page 14 and page 15. The authors compare the various parameters between groups. I would assume that these parameters come from M1 for controls and from M4 for BDP? Please clarify if this is indeed the case. If it is the case, I am not sure this is appropriate. To my knowledge, it is appropriate to compare parameters between groups only if the same model is fit to both groups. If two different models are fit to each group, then the parameters are not comparable, as the parameter have, so to speak, different "meaning" in two models. Now, I want to stress that my knowledge on this matter may be limited, and that the authors' approach may be sound. However, to be reassured that the approach is indeed sound, I would appreciate a clarification on this point and a reference to relevant sources about this approach.

      This is an important point. First, we confirmed all our main conclusions about parameter differences using the maximal model M1 to fit all the participants. We added Supplementary Table 2 to report the outcome of this analysis. Second, we did the same for parameters across all models M1-M4, fitting each to participants without comparison. This is particularly relevant for M3, since at least a minority of participants of both groups were best explained by this model. We report these analyses in Fig S11:

      Since the M4 is nested within M1, we argue that this comparison is still meaningful, and note explanations in the text for why the effects noted between groups may occur given the differences in their causal meaning, for example in the results under phase 2 analyses:

      ‘Belief updating in phase 2 was less flexible in BPD participants. Median change in beliefs (from priors to posteriors) about a partner’s preferences was lower versus. CON ( = -5.53, 95%HDI: -7.20, -4.00; = -10.02, 95%HDI: -12.81, -7.30). Posterior beliefs about partner were more precise in BPD versus CON ( = -0.94, 95%HDI: -1.50, -0.45;  = -0.70, 95%HDI: -1.20, -0.25).  This is unsurprising given the disintegrated priors of the BPD group in M4, meaning they need to ‘travel less’ in state space. Nevertheless, even under assumptions of M1 and M2 for both groups, BPD showed smaller posteriors median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      (16) "We built and tested a theory of interpersonal generalization in a population of matched participants" this sentence seems to be unwarranted, as there is no theory in the paper (actually, as it is now, the paper looks rather exploratory)

      We thank the reviewer for their perspective. Formal models can be used as a theoretical statement on the casual algorithmic process underlying decision making and choice behaviour; the development of formal models are an essential theoretical tool for precision and falsification (Haslbeck et al., 2022). In this sense, we have built several competing formal theories that test, using casual architectures, whether the latent distribution(s) that generate one’s choices generalise into one’s predictions about another person, and simultaneously whether one’s latent distribution(s) that represent beliefs about another person are used to inform future choices.

      Reviewer 3:

      ‘My broad question about the experiment (in terms of its clinical and cognitive process relevance): Does the task encourage competition or give participants a reason to take advantage of others? I don't think it does, so it would be useful to clarify the normative account for prosociality in the introduction (e.g., some of Robin Dunbar's work).’

      We agree that our paradigm does not encourage competition. We use a reward structure that makes it contingent on participants to overcome a particular threshold before earning rewards, but there is no competitive element to this, in that points earned or not earned by partners have no bearing on the outcomes for the participant. This is important given the consideration of recursive properties that arise through mixed-motive games; we wanted to focus purely on observational learning in phase 2, and repercussion-free choices made by participants in phase 1 and 3, meaning the choices participants, and decisions of a partner, are theoretically in line with self-preferences irrespective of the judgement of others. We have included a clearer statement of the structure of this type of task, and more clearly cited the origin for its structure (Murphy & Ackerman, 2011):

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential social value economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes motivational variation in joint reward allocation.’

      Given the introductions structure as it stands, we felt providing another paragraph on the normative assumptions of such a game was outside the scope of this article.

      ‘The finding that individuals with BPD do not engage in self-other generalization on this task of social intentions is novel and potentially clinically relevant. The authors find that BPD participants' tendency to be prosocial when splitting points with a partner does not transfer into their expectations of how a partner will treat them in a task where they are the passive recipient of points chosen by the partner. In the discussion, the authors reasonably focus on model differences between groups (Bayesian model comparison), yet I thought this finding -- BPD participants not assuming prosocial tendencies in phase 2 while CON participant did -- merited greater attention. Although the BPD group was close to 0 on the \beta prior in Phase 2, their difference from CON is still in the direction of being more mistrustful (or at least not assuming prosociality). This may line up with broader clinical literature on mistrustfulness and attributions of malevolence in the BPD literature (e.g., a 1992 paper by Nigg et al. in Journal of Abnormal Psychology). My broad point is to consider further the Phase 2 findings in terms of the clinical interpretation of the shift in \beta relative to controls.’

      This is an important point, that we contextualize within the parameterisation of our utility model. While the shift toward 0 in the BPD participants is indeed more competitive, as the reviewer notes, it is surprisingly centred closely around 0, with only a slight bias to be prosocial (mean = -0.47;  = -6.10, 95%HDI: -7.60, -4.60). Charitably we might argue that BPD participants are expecting more competitive preferences from their partner. However even so, given their variance around their priors in phase 2, they are uncertain or unconfident about this. We take a more conservative approach in the paper and say that given the tight proximity to 0 and the variance of their group priors, they are likely to be ‘hedging their bets’ on whether their partner is going to be prosocial or competitive. While the movement from phase 1 to 2 is indeed in the competitive direction it still lands in neutral territory. Model M4 does not preclude central tendancies at the start of Phase 2 being more in the competitive direction.

      ‘First, the authors note that they have "proposed a theory with testable predictions" (p. 4 but also elsewhere) but they do not state any clear predictions in the introduction, nor do they consider what sort of patterns will be observed in the BPD group in view of extant clinical and computational literature. Rather, the paper seems to be somewhat exploratory, largely looking at group differences (BPD vs. CON) on all of the shared computational parameters and additional indices such as belief updating and reaction times. Given this, I would suggest that the authors make stronger connections between extant research on intention representation in BPD and their framework (model and paradigm). In particular, the authors do not address related findings from Ereira (2020) and Story (2024) finding that in a false belief task that BPD participants *overgeneralize* from self to other. A critical comparison of this work to the present study, including an examination of the two tasks differ in the processes they measure, is important.’

      Thank you for this opportunity to include more of the important work that has preceded the present manuscript. Prior work has tended to focus on either descriptive explanations of self-other generalisation (e.g. through the use of RW type models) or has focused on observational learning instability in absence of a causal model from where initial self-other beliefs may arise. While the prior work cited by the reviewer [Ereira (2020; Nat. Comms.) and Story (2024; Trans. Psych.)] does examine the inter-trial updating between self-other, it does not integrate a self model into a self’s belief about an other prior to observation. Rather, it focuses almost exclusively on prediction error ‘leakage’ generated during learning about individual reward (i.e. one sided reward). These findings are important, but lie in a slightly different domain. They also do not cut against ours, and in fact, we argue in the discussion that the sort of learning instability described above and splitting (as we cite from Story ea. 2024; Psych. Rev.) may result from a lack of self anchoring typical of CON participants. Nevertheless we agree these works provide an important premise to contrast and set the groundwork for our present analysis and have included them in the framing of our introduction, as well as contrasting them to our data in the discussion.

      In the introduction:

      ‘The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      In the discussion:

      ‘Disruptions in self-to-other generalization provide an explanation for previous computational findings related to task-based mentalizing in BPD. Studies tracking observational mentalizing reveal that individuals with BPD, compared to those without, place greater emphasis on social over internal reward cues when learning (Henco et al., 2020; Fineberg et al., 2018). Those with BPD have been shown to exhibit reduced belief adaptation (Siegel et al., 2020) along with ‘splitting’ of latent social representations (Story et al., 2024a). BPD is also shown to be associated with overgeneralisation in self-to-other belief updates about individual outcomes when using a one-sided reward structure (where participant responses had no bearing on outcomes for the partner; Story et al., 2024b). Our analyses show that those with BPD are equal to controls in their generalisation of absolute reward (outcomes that only affect one player) but disintegrate beliefs about relative reward (outcomes that affect both players) through adoption of a new, neutral belief. We interpret this together in two ways: 1. There is a strong concern about social relativity when those with BPD form beliefs about others, 2. The absence of constrained self-insertion about relative outcomes may predispose to brittle or ‘split’ beliefs. In other words, those with BPD assume ambiguity about the social relativity preferences of another (i.e. how prosocial or punitive) and are quicker to settle on an explanation to resolve this. Although self-insertion may be counter-intuitive to rational belief formation, it has important implications for sustaining adaptive, trusting social bonds via information moderation.’

      In addition, perhaps it is fairer to note more explicitly the exploratory nature of this work. Although the analyses are thorough, many of them are not argued for a priori (e.g., rate of belief updating in Figure 2C) and the reader amasses many individual findings that need to by synthesized.’

      We have now noted the primary goals of our work in the introduction, and have included caveats about the exploratory nature of our analyses. We would note that our model is in effect a causal combination of prior work cited within the introduction (Barnby et al., 2022; Moutoussis et al., 2016). This renders our computational models in effect a causal theory to test, although we agree that our dissection of the results are exploratory. We have more clearly signposted this:

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes innate motivational variation in joint reward allocation.‘

      ‘Second, in the discussion, the authors are too quick to generalize to broad clinical phenomena in BPD that are not directly connected to the task at hand. For example, on p. 22: "Those with a diagnosis of BPD also show reduced permeability in generalising from other to self. While prior research has predominantly focused on how those with BPD use information to form impressions, it has not typically examined whether these impressions affect the self." Here, it's not self-representation per se (typically, identity or one's view of oneself), but instead cooperation and prosocial tendencies in an economic context. It is important to clarify what clinical phenomena may be closely related to the task and which are more distal and perhaps should not be approached here.’

      Thank you for this important point. We agree that social value orientation, and particularly in this economically-assessed form, is but one aspect of the self, and we did not test any others. A version of the social contagion phenomena is also present in other aspects of the self in intertemporal (Moutoussis et al., 2016), economic (Suzuki et al., 2016) and moral preferences (Yu et al., 2021). It would be most interesting to attempt to correlate the degrees of insertion and contagion across the different tasks.

      We take seriously the wider concern that behaviour in our tasks based on economic preferences may not have clinical validity. This issue is central in the whole field of computational psychiatry, much of which is based on generalizing from tasks like ours, and discussing correlations with psychometric measures. We hope that it is acceptable to leave such discussions to the many reviews on computational psychiatry (Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). Here, we have just put a caveat in the dicussion:

      ‘Finally, a limitation may be that behaviour in tasks based on economic preferences may not have clinical validity. This issue is central to the field of computational psychiatry, much of which is based on generalising from tasks like that within this paper and discussing correlations with psychometric measures. Extrapolating  economic tasks into the real world has been the topic of discussion for the many reviews on computational psychiatry (e.g. Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). We note a strength of this work is the use of model comparison to understand causal algorithmic differences between those with BPD and matched healthy controls. Nevertheless, we wish to further pursue how latent characteristics captured in our models may directly relate to real-world affective change.’

      ‘On a more technical level, I had two primary concerns. First, although the authors consider alternative models within a hierarchical Bayesian framework, some challenges arise when one analyzes parameter estimates fit separately to two groups, particularly when the best-fitting model is not shared. In particular, although the authors conduct a model confusion analysis, they do not as far I could tell (and apologies if I missed it) demonstrate that the dynamics of one model are nested within the other. Given that M4 has free parameters governing the expectations on the absolute and relative reward preferences in Phase 2, is it necessarily the case that the shared parameters between M1 and M4 can be interpreted on the same scale? Relatedly, group-specific model fitting has virtues when believes there to be two distinct populations, but there is also a risk of overfitting potentially irrelevant sample characteristics when parameters are fit group by group.

      To resolve these issues, I saw one straightforward solution (though in modeling, my experience is that what seems straightforward on first glance may not be so upon further investigation). M1 assumes that participants' own preferences (posterior central tendency) in Phase 1 directly transfer to priors in Phase 2, but presumably the degree of transfer could vary somewhat without meriting an entirely new model (i.e., the authors currently place this question in terms of model selection, not within-model parameter variation). I would suggest that the authors consider a model parameterization fit to the full dataset (both groups) that contains free parameters capturing the *deviations* in the priors relative to the preceding phase's posterior. That is, the free parameters $\bar{\alpha}_{par}^m$ and $\bar{\beta}_{par}^m$ govern the central tendency of the Phase 2 prior parameter distributions directly, but could be reparametrized as deviations from Phase 1 $\theta^m_{ppt}$ parameters in an additive form. This allows for a single model to be fit all participants that encompasses the dynamics of interest such that between-group parameter comparisons are not biased by the strong assumptions imposed by M1 (that phase 1 preferences and phase 2 observations directly transfer to priors). In the case of controls, we would expect these deviation parameters to be centred on 0 insofar as the current M1 fit them best, whereas for BPD participants should have significant deviations from earlier-phase posteriors (e.g., the shift in \beta toward prior neutrality in phase 2 compared to one's own prosociality in phase 1). I think it's still valid for the authors to argue for stronger model constraints for Bayesian model comparison, as they do now, but inferences regarding parameter estimates should ideally be based on a model that can encompass the full dynamics of the entire sample, with simpler dynamics (like posterior -> prior transfer) being captured by near-zero parameter estimates.’

      Thank you for the chance to be clearer in our modelling. In particular, the suggestion to include a model that can be fit to all participants with the equivalent of the likes of partial social insertion, to check if the results stand, can actually be accomplished through our existing models.  That is, the parameter that governs the flexibility over beliefs in phase 2 under models M1 (dominant for CON participant) and M2 parameterises the degree to which participants think their partner may be different from themselves. Thus, forcibly fitting M1 and M2 hierarchically to all participants, and then separately to BPD and CON participants, can quantify the issue raised: if BPD participants indeed distinguish partners as vastly different from themselves enough to warent a new central tendency, should be quantitively higher in BPD vs CON participants under M1 and M2.

      We therefore tested this, reporting the distributional differences between for BPD and CON participants under M1, both when fitted together as a population and as separate groups. As is higher for BPD participants under both conditions for M1 and M2 it supports our claim and will add more context for the comparison - may be large enough in BPD that a new central tendency to anchor beliefs is a more parsimonious explanation.

      We cross checked this result by assessing the discrepancy between the participant’s and assumed partner’s central tendencies for both prosocial and individualistic preferences via best-fitting model M4 for the BPD group. We thereby examined whether belief disintegration is uniform across preferences (relative vs abolsute reward) or whether one tendency was shifted dramatically more than another.  We found that beliefs over prosocial-competitive preferences were dramatically shifted, whereas those over individualistic preferences were not.

      We have added the following to the main text results to explain this:

      Model Comparison:

      ‘We found that CON participants were best fit at the group level by M1 (Frequency = 0.59, Protected Exceedance Probability = 0.98), whereas BPD participants were best fit by M4 (Frequency = 0.54, Protected Exceedance Probability = 0.86; Figure 2A). We first analyse the results of these separate fits. Later, in order to assuage concerns about drawing inferences from different models, we examined the relationships between the relevant parameters when we forced all participants to be fit to each of the models (in a hierarchical manner, separated by group). In sum, our model comparison is supported by convergence in parameter values when comparisons are meaningful. We refer to both types of analysis below.’

      Phase 1:

      ‘These differences were replicated when considering parameters between groups when we fit all participants to the same models (M1-M4; see Table S2).’

      Phase 2:

      ‘To check that these conclusions about self-insertion did not depend on the different models, we found that only under M1 and M2 were consistently larger in BPD versus CON. This supports the notion that new central tendencies for BPD participants in phase 2 were required, driven by expectations about a partner’s relative reward. (see Fig S10 & Table S2). and parameters under assumptions of M1 and M2 were strongly correlated with median change in belief between phase 1 and 2 under M3 and M4, suggesting convergence in outcome (Fig S11).’

      ‘Furthermore, even under assumptions of M1-M4 for both groups, BPD showed smaller posterior median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      ‘Assessing this same relationship under M1- and M2-only assumptions reveals a replication of this group effect for absolute reward, but the effect is reversed for relative reward (see Table S3). This accords with the context of each model, where under M1 and M2, BPD participants had larger phase 2 prior flexibility over relative reward (leading to larger initial surprise), which was better accounted for by a new central tendency under M4 during model comparison. When comparing both groups under M1-M4 informational surprise over absolute reward was consistently restricted in BPD (Table S3), suggesting a diminished weight of this preference when forming beliefs about an other.’

      Phase 3

      ‘In the dominant model for the BPD group—M4—participants are not influenced in their phase 3 choices following exposure to their partner in phase 2. To further confirm this we also analysed absolute change in median participant beliefs between phase 1 and 3 under the assumption that M1 and M3 was the dominant model for both groups (that allow for contagion to occur). This analysis aligns with our primary model comparison using M1 for CON and M4 for BPD  (Figure 2C). CON participants altered their median beliefs between phase 1 and 3 more than BPD participants (M1: linear estimate = 0.67, 95%CI: 0.16, 1.19; t = 2.57, p = 0.011; M3: linear estimate = 1.75, 95%CI: 0.73, 2.79; t = 3.36, p < 0.001). Relative reward was overall more susceptible to contagion versus absolute reward (M1: linear estimate = 1.40, 95%CI: 0.88, 1.92; t = 5.34, p<0.001; M3: linear estimate = 2.60, 95%CI: 1.57, 3.63; t = 4.98, p < 0.001). There was an interaction between group and belief type under M3 but not M1 (M3: linear estimate = 2.13, 95%CI: 0.09, 4.18, t = 2.06, p=0.041). There was only a main effect of belief type on precision under M3 (linear estimate = 0.47, 95%CI: 0.07, 0.87, t = 2.34, p = 0.02); relative reward preferences became more precise across the board. Derived model estimates of preference change between phase 1 and 3 strongly correlated between M1 and M3 along both belief types (see Table S2 and Fig S11).’

      ‘My second concern pertains to the psychometric individual difference analyses. These were not clearly justified in the introduction, though I agree that they could offer potentially meaningful insight into which scales may be most related to model parameters of interest. So, perhaps these should be earmarked as exploratory and/or more clearly argued for. Crucially, however, these analyses appear to have been conducted on the full sample without considering the group structure. Indeed, many of the scales on which there are sizable group differences are also those that show correlations with psychometric scales. So, in essence, it is unclear whether most of these analyses are simply recapitulating the between-group tests reported earlier in the paper or offer additional insights. I think it's hard to have one's cake and eat it, too, in this regard and would suggest the authors review Preacher et al. 2005, Psychological Methods for additional detail. One solution might be to always include group as a binary covariate in the symptom dimension-parameter analyses, essentially partialing the correlations for group status. I remain skeptical regarding whether there is additional signal in these analyses, but such controls could convince the reader. Nevertheless, without such adjustments, I would caution against any transdiagnostic interpretations such as this one in the Highlights: "Higher reported childhood trauma, paranoia, and poorer trait mentalizing all diminish other-to-self information transfer irrespective of diagnosis." Since many of these analyses relate to scales on which the groups differ, the transdiagnostic relevance remains to be demonstrated.’

      We have restructured the psychometric section to ensure transparency and clarity in our analysis. Namely, in response to these comments and those of the other reviewers, we have opted to remove the parameter analyses that aimed to cross-correlate psychometric scores with latent parameters from different models: as the reviewer points out, we do not have parity between dominant models for each group to warrant this, and fitting the same model to both groups artificially makes the parameters qualitatively different. Instead we have opted to focus on social contagion, or rather restrictions on , between phases 1 and 3 explained by M3. This provides us with an opportunity to examine social contagion on the whole population level isolated from self-insertion biases. We performed bootstrapping (1000 reps) and permutation testing (1000 reps) to assess the stability and significance of each edge in the partial correlation network, and then applied FDR correction (p[fdr]), thus controlling for multiple comparisons. We note that while we focused on M3 to isolate the effect across the population, social contagion across both relative and absolute reward under M3 strongly correlated with social contagion under M1 (see Fig S11).

      ‘We explored whether social contagion may be restricted as a result of trauma, paranoia, and less effective trait mentalizing under the assumption of M3 for all participants (where everyone is able to be influenced by their partner). To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). We conducted partial correlation analysis to estimate relationships conditional on all other associations and retained all that survived bootstrapping (1000 reps), permutation testing (1000 reps), and subsequent FDR correction. Persecution and CTQ scores were both moderately associated with MZQ scores (RGPTSB r = 0.41, 95%CI: 0.23, 0.60, p = 0.004, p[fdr]=0.043; CTQ r = 0.354 95%CI: 0.13, 0.56, p=0.019, p[fdr]=0.02). MZQ scores were in turn moderately and negatively associated with shifts in prosocial-competitive preferences () between phase 1 and 3 (r = -0.26, 95%CI: -0.46, -0.06, p=0.026, p[fdr]=0.043). CTQ scores were also directly and negatively associated with shifts in individualistic preferences (; r = -0.24, 95%CI: -0.44, -0.13, p=0.052, p[fdr]=0.065). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired mentalising (Figure 4A).’

      (1) As far as I could tell, the authors didn't provide an explanation of this finding on page 5: "However, CON participants made significantly fewer prosocial choices when individualistic choices were available" While one shouldn't be forced to interpret every finding, the paper is already in that direction and I found this finding to be potentially relevant to the BPD-control comparison.

      Thank you for this observation. This sentance reports the fact that CON participants were effectively more selfish than BPD participants. This is captured by the lower value of reported in Figure 2, and suggests that CON participants were more focused on absolute value – acting in a more ‘economically rational’ manner – versus BPD participants. This fits in with our fourth paragraph of the discussion where we discuss prior work that demonstrates a heightened social focus in those with BPD. Indeed, the finding the reviewer highlights further emphasises the point that those with BPD are much more sensitive, and motived to choose, options concerning relative reward than are CON participants. The text in the discussion reads:

      ‘We also observe this in self-generated participant choice behaviour, where CON participants were more concerned over absolute reward versus their BPD counterparts, suggesting a heighted focus on relative vs. absolute reward in those with BPD.’

      (2) The adaptive algorithm for adjusting partner behavior in Phase 2 was clever and effective. Did the authors conduct a manipulation check to demonstrate that the matching resulted in approximately 50% difference between one's behavior in Phase 1 and the partner in Phase 2? Perhaps Supplementary Figure suffices, but I wondered about a simpler metric.

      Thanks for this point. We highlight this in Figure 3B and within the same figure legend although appreciate the panel is quite small and may be missed.  We have now highlighted this manipulation check more clearly in behavioural analysis section of the main text:

      ‘Server matching between participant and partner in phase 2 was successful, with participants being approximately 50% different to their partners with respect to the choices each would have made on each trial in phase 2 (mean similarity=0.49, SD=0.12).’

      (3) The resolution of point-range plots in Figure 4 was grainy. Perhaps it's not so in the separate figure file, but I'd suggest checking.

      Apologies. We have now updated and reorganised the figure to improve clarity.

      (4) p. 21: Suggest changing to "different" as opposed to "opposite" since the strategies are not truly opposing: "but employed opposite strategies."

      We have amended this.

      (5) p. 21: I found this sentence unclear, particularly the idea of "similar updating regime." I'd suggest clarifying: "In phase 2, CON participants exhibited greater belief sensitivity to new information during observational learning, eventually adopting a similar updating regime to those with BPD."

      We have clarified this statement:

      ‘In observational learning in phase 2, CON participants initially updated their beliefs in response to new information more quickly than those with BPD, but eventually converged to a similar rate of updating.’

      (6) p. 23: The content regarding psychosis seemed out of place, particularly as the concluding remark. I'd suggest keeping the focus on the clinical population under investigation. If you'd like to mention the paradigm's relevance to psychosis (which I think could be omitted), perhaps include this as a future direction when describing the paradigm's strengths above.

      We agree the paragraph is somewhat speculative. We have omitted it in aid of keeping the messaging succinct and to the point.

      (7) p. 24: Was BPD diagnosis assess using unstructured clinical interview? Although psychosis was exclusionary, what about recent manic or hypomanic episodes or Bipolar diagnosis? A bit more detail about BPD sample ascertainment would be useful, including any instruments used to make a diagnosis and information about whether you measured inter-rater agreement.

      Participants diagnosed with BPD were recruited from specialist personality disorder services across various London NHS mental health trusts. The diagnosis of BPD was established by trained assessors at the clinical services and confirmed using the Structured Clinical Interview for DSM-IV (SCID-II) (First et al., 1997). Individuals with a history of psychotic episodes, severe learning disability or neurological illness/trauma were excluded. We have now included this extra detail within our methods in the paper:

      ‘The majority of BPD participants were recruited through referrals by psychiatrists, psychotherapists, and trainee clinical psychologists within personality disorder services across 9 NHS Foundation Trusts in the London, and 3 NHS Foundation Trusts across England (Devon, Merseyside, Cambridgeshire). Four BPD participants were also recruited by self-referral through the UCLH website, where the study was advertised. To be included in the study, all participants needed to have, or meet criteria for, a primary diagnosis of BPD (or emotionally-unstable personality disorder or complex emotional needs) based on a professional clinical assessment conducted by the referring NHS trust (for self-referrals, the presence of a recent diagnosis was ascertained through thorough discussion with the participant, whereby two of the four also provided clinical notes). The patient participants also had to be under the care of the referring trust or have a general practitioner whose details they were willing to provide. Individuals with psychotic or mood disorders, recent acute psychotic episodes, severe learning disability, or current or past neurological disorders were not eligible for participation and were therefore not referred by the clinical trusts.‘

    1. eLife Assessment

      This work provides important insights into mucosal antibody responses against SARS-CoV-2 following intranasal immunization by characterizing a large number of monoclonal antibodies at both mucosal and non-mucosal sites. The evidence supporting the claims is solid. The demonstrated in vitro antiviral activity of antibodies characterized provides a rationale for developing mucosal vaccines, especially if confirmed in vivo and benchmarked against antibodies generated following intramuscular vaccination.

    2. Reviewer #2 (Public review):

      Summary:

      Demonstrate the breadth of IgA response as determined by isolating individual antigen-specific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant.

      Strengths:

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses.

      Comments on Revision:

      I have re-reviewed the paper and responses to my and other reviewers' comments. I feel the authors have adequately addressed my and other reviewer's comments.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Despite evidence suggesting the benefits of neutralizing mucosa-derived IgA in the upper airway in protection against the SARS-CoV-2 virus, all currently approved vaccines are administered intramuscularly, which mainly induces systemic IgG. Waki et al. aimed to characterize the benefits of intranasal vaccination at the molecular level by isolating B cell clones from nasal tissue. The authors found that Spike-specific plasma cells isolated from the spleen of vaccinated mice showed significant clonal overlap with Spikespecific plasma cells isolated from nasal tissue. Interestingly, they could not detect any spike-specific plasma cells in the bone marrow or Peyer's patches, indicating that these nose-derived cells did not necessarily home to and reside in these locations, although the Peyer's patch is not a typical plasma cell niche - rather the lamina propria of the gut would have been a better place to look. Furthermore, they found that multimerization improves the antibody/antigen binding when the antibody is of low or intermediate affinity, but that high-affinity monomeric antibodies do not benefit from multimerization. Lastly, the authors used a competitive ELISA assay to show that multimerization could improve the neutralizing capacity of these

      antibodies. 

      The strength of this paper is the cloning of multiple IgA from the nasal mucosae (n=99) and the periphery (n=114) post-SARS-CoV-2 i.n. vaccination to examine the clonal relationship of this IgA with other sites, including the spleen. This analysis provides novel insights into the nature of the mucosal antibody response at the site where the host would encounter the virus, and whether this IgA response disseminates to other

      tissues. 

      There were also some weaknesses: 

      (1) The finding that multimerization improves binding and neutralization is not surprising as this was observed before by Wang and Nussenzweig for anti-SARS-CoV-2 IgA (authors should cite Enhanced SARS-CoV-2 neutralization by dimeric IgA. Wang et al., Sci. Transl. Med 2021, 13:3abf1555). 

      We have cited the paper, and the relevant sentence has been modified as follows (line 51-53); Recent studies have demonstrated that multimeric IgA is more effective and provides greater cross-protection than IgG and M-IgA (Okuya et al., 2020b) (Asahi et al., 2002) (Dhakal et al., 2018) (Asahi-Ozaki et al., 2004) (Wang et al., 2021).

      In addition, as far as I can tell we cannot ascertain the purity of fractions from the size exclusion chromatography thus I wasn't sure whether the input material used in Fig. 4 was a mixed population of dimer/trimer/tetramer?  

      The S-IgAs used in the SPR analysis in Fig. 4 consist of a mixture of dimers, trimers, and tetramers. The observed values indicate the average affinity of the S-IgAs. Please refer to the revised version (line 278280).

      (2) The flow cytometric assessment of the IgA+ clones from the nasal mucosae was difficult to interpret (Fig. 1B). It was hard for me to tell what they were gating on and subsequently analyzing without an IgA-negative population for reference. 

      We have updated FACS plots to illustrate the presence of IgA+ plasma cells in Fig. 1B, and the detailed gating strategy is outlined in Fig. 1B legend. Please find the relevant statements (line 115-120).

      (3) While the i.n. study itself is large and challenging, it would have been interesting to compare an i.m. route and examine the breadth of SARS-CoV-2 variant S1 binding for IgGs as in Fig. 2A. Are the IgA responses derived from the mucosae of greater breadth than systemic IgG responses? Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      I appreciate your consideration. Recent reports indicate that some M-IgA monomers possess neutralizing activity that is equivalent to or less than that of IgGs. However, the opposite phenomenon has also been observed. These results suggest that the Fc does not merely correlate with the degree of increase in antibody reactivity or functionality. We believe the discrepancies in previous studies are due to variations in the binding modes between the epitope and paratope of each antibody clone. Nevertheless, oligomerization enhances the functionality of most monomeric antibody clones, suggesting that the multivalent S-IgA enables a mode of action that is challenging to achieve with a monomeric antibody. Please refer to the revised version (line 399-403).

      Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      We have summarized the characteristics of the four types of nasal IgAs in Fig.7 and in the Discussion. Please refer to the revised version (line 405-422).

      Reviewer #2 (Public Review): 

      Summary: 

      This research demonstrates the breadth of IgA response as determined by isolating individual antigenspecific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant. 

      Strengths: 

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses. 

      Weaknesses: 

      The data presentation needs clarity, and results show mAb ability to inhibit SARS-CoV2 in vitro. How IgA functions in vivo is uncertain. 

      We conducted an additional experiment using a hamster model and confirmed that S-IgAs can protect against SARS-CoV-2 infection. Please refer to the revised version (line 349-373 and 431-438).

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 1A shows antibody titers in nasal lavage fluid and serum of mice post intranasal vaccination with SARS-CoV-2 Spike protein. The Y-axis of this figure is labeled as "U/mg" however these units are not clearly defined. 

      The antibody titers are expressed as optical density (OD450) value per total protein in nasal lavage fluids or serum. Please find the relevant statements (line 113-114).

      Furthermore, what do antibody titers in the nasal lavage fluid and serum look like post-intramuscular vaccination with the same vaccine and dose? Comparison of titers to the intramuscular route as well as to the PBS control would make this data more impactful. 

      We appreciate your consideration. We have not conducted experiments comparing the effects of intramuscular and intranasal administration using the same dosage and adjuvant. Cholera toxin has primarily been used as an adjuvant for nasal immunization, but it is seldom applied for intramuscular injection. We are interested in its impact on the immune compartment when using cholera toxin as an adjuvant for intramuscular injection. We plan to conduct further experiments in the future.

      Lastly, in Figure 1B, the detection of nasal IgG is not shown even though the authors assess nasally-derived IgG in the spleen further into the study.  

      Since the number of lymphocytes that can be collected from the nasal mucosa is limited, there is an insufficient capacity to isolate IgG+ plasma cells after collecting IgA+ plasma cells. Therefore, conducting such an experiment on mice is technically challenging. A larger animal, such as rats, will be necessary to perform this experiment. Further investigation is needed to determine whether antigen-specific IgG+ plasma cells, sharing V-(D)-J with nasal IgA, can be detected in the nasal mucosa.

      (2) There appears to be something amiss with the IgA stain. It is smushed up against the X-axis. Better flow cytometry profiles should be shown. Likewise in Supplemental Fig. 1A, their IgA stain appears to not be working. This must be addressed using positive and negative controls. 

      We have updated FACS-polts to show the IgA+ plasma cell in Fig.1B, and the detailed gating strategy is outlined in the Fig.1B legend. Please find the relevant statements on line 115-120.

      (3) We do not know the purity of the samples that were subjected to SPR and since the legend of Fig. 4 is partially incorrect, it was difficult to know how this experiment was done. 

      The S-IgA used in the SPR analysis shown in Figure 4 is a mixture of dimers, trimers, and tetramers, and the observed values are believed to reflect the affinity of the S-IgA in the nasal mucosa. Please refer to the revised version (line 278-280).

      (4) Fig. 5 results need to compare with some of the well-characterized mAb (IgG) to understand the biological significance of these neutralizing titres. 

      We have summarized the characteristics of the four types of nasal IgA in Fig.7 and in the Discussion. Please refer to the revised version (page 405-422).

      Communication of results: 

      (1) Authors could improve the communication of their results by introducing the vaccination protocol in the results section accompanied by a diagram of the vaccination strategy (nature of the Ag, route, and frequency). This could be Fig. 1A .  

      A schematic diagram of the vaccination protocol is presented in Fig.1.

      (2) Care should be taken with some of the terminology. Intranasal is the accepted term but authors sometimes use "internasal". The term "immunosuppression" on page 2 could be misleading as it means something different to other audiences. The distinction when speaking about "protection from harmful pathogens" should be made between protection against infection (ie sterilizing immunity) vs protection against disease (ie morbidity and mortality). Instead of "nose", one should say "nasal". Nose-related could be rephrased as "potentially nasal-derived". P.5, line 2 didn't make sense: "IgG+ plasma cells that express nose-related IgA"...

      In many places, Spike is missing it's "e".  

      We have made the correction accordingly.

      (3) Page 3: The lumping of the human and animal SARS-CoV-2 intranasal studies together is a bit misleading. Very little has worked for intranasal vaccination against SARS-CoV-2 in humans at this point in time (although hopefully that will change soon!). Authors should specify which studies were done in animals and which were done in humans. 

      The manuscript has been revised to include two citations on line 73-75 (Ewer et al., 2021 and Zhu et al., 2023).

      (4) What is ER-tracker? It comes out of nowhere and should be explained why it was used to the reader (as well as why they used the other markers) to sort for Spike-specific PC. 

      ER-Tracker is a fluorescent dye that is highly selective for the endoplasmic reticulum of living cells. Because plasma cells have an expanded endoplasmic reticulum for properly folding and secreting large quantities of antibodies, using ER-Tracker along with anti-CD138 facilitates the isolation of plasma cells from lymphocytes without the need for additional antibodies. Please refer to the revised version for details. (ine 130-134).

    1. eLife Assessment

      This study uses C. elegans to investigate how the Calcium/Calmodulin-dependent kinase CMK-1 regulates adaptation to thermo-nociceptive stimuli. The authors use compelling approaches to identify Calcineurin as a phosphorylation target of CMK-1 and to investigate the relationship between CMK-1 and Calcineurin using gain and loss of function genetic and pharmacological methods. The findings of this study are valuable as they show that CMK-1 and Calcineurin act in separate neurons in an antagonistic and complex manner to regulate thermo-nociceptive adaptation, and these results may be relevant for understanding some chronic human pain conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring rate of heat evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effects size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      A major concern concerning this manuscript was the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 does occasionally use the words 'habituation' or 'habituation-like' 10 times, however it uses 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity; without them, it isn't actually clear what biological process is actually being studied. The authors have accepted this distinction and now correctly call the process adaptation.

      While there was originally some discrepancy between the two in vitro phosphorylation experiments and the in silico predictions, the revision has cleared up the issues.<br /> Figure 3 -S1: This model has been adjusted to more closely fit the data.

      The authors have expanded the discussion about the significance of the sites of cmk-1 and tax-6 function in the neural circuit.